Codewashing

I have little understanding for people using large language models to generate slop; words and images that nobody asked for.

I have more understanding for people using large language models to generate code. Code isn���t the thing in the same way that words or images are; code is the thing that gets you to the thing.

And if a large language model hallucinates some code, you���ll find out soon enough:

With code you get a powerful form of fact checking for free. Run the code, see if it works.


But I want to push back on one justification I see repeatedly about using large language models to write code. Here���s Craig:

There are many moral and ethical issues with using LLMs, but building software feels like one of the few truly ethically ���clean���(er) uses (trained on open source code, etc.)


That���s not how this works. Yes, the large language models are trained on lots of code (most of it open source), but they���re not only trained on that. That���s on top of everything else; all the stolen books, all the unpaid creative work of others.

Even Robin Sloan, who first says:

I think the case of code is especially clear, and, for me, basically settled.


���goes on to acknowledge:

But, again, it���s important to say: the code only works because of Everything. Take that data away, train a model using GitHub alone, and you���ll get a far less useful tool.


When large language models are trained on domain-specific data, it���s always in addition to the mahoosive amount of content they���ve already stolen. It���s that mohoosive amount of content���not the domain-specific data���that enables them to parse your instructions.

(Note that I���m being very delibarate in saying ���parse���, not ���understand.��� Though make no mistake, I���m astonished at how good these tools are at parsing instructions. I say that as someone who tried to write natural language parsers for text-only adventure games back in the 1980s.)

So, sure, go ahead and use large language models to write code. But don���t fool yourself into thinking that it���s somehow ethical.

What I said here applies to code too:

If you���re going to use generative tools powered by large language models, don���t pretend you don���t know how your sausage is made.


 •  0 comments  •  flag
Share on Twitter
Published on April 30, 2025 08:50
No comments have been added yet.


Jeremy Keith's Blog

Jeremy Keith
Jeremy Keith isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.
Follow Jeremy Keith's blog with rss.