Codewashing

I have little understanding for people using large language models to generate slop; words and images that nobody asked for.

I have more understanding for people using large language models to generate code. Code isn��t the thing in the same way that words or images are; code is the thing that gets you to the thing.

And if a large language model hallucinates some code, you��ll find out soon enough:

With code you get a powerful form of fact checking for free. Run the code, see if it works.

But I want to push back on one justification I see repeatedly about using large language models to write code. Here��s Craig:

There are many moral and ethical issues with using LLMs, but building software feels like one of the few truly ethically ��clean��(er) uses (trained on open source code, etc.)

That��s not how this works. Yes, the large language models are trained on lots of code (most of it open source), but they��re not only trained on that. That��s on top of everything else; all the stolen books, all the unpaid creative work of others.

Even Robin Sloan, who first says:

I think the case of code is especially clear, and, for me, basically settled.

��goes on to acknowledge:

But, again, it��s important to say: the code only works because of Everything. Take that data away, train a model using GitHub alone, and you��ll get a far less useful tool.

When large language models are trained on domain-specific data, it��s always in addition to the mahoosive amount of content they��ve already stolen. It��s that mohoosive amount of content��not the domain-specific data��that enables them to parse your instructions.

(Note that I��m being very delibarate in saying ��parse��, not ��understand.�� Though make no mistake, I��m astonished at how good these tools are at parsing instructions. I say that as someone who tried to write natural language parsers for text-only adventure games back in the 1980s.)

So, sure, go ahead and use large language models to write code. But don��t fool yourself into thinking that it��s somehow ethical.

What I said here applies to code too: