AI Tells: What Words Does AI Use with Uncommon Frequency?

Jane Austen Poetry Thursday: Fine, let’s have an ...

AI Tells: What Words Does AI Use with Uncommon Frequency?

Here’s a tweet about this (I saw the link on Astral Codex Ten)

If you want to spot AI writing, here are some words to look out for: pic.twitter.com/TRXmeTD7Nh
— Samuel Hume (@DrSamuelBHume) July 5, 2025

I feel this sort of analysis is … mildly interesting? And mildly irritating, especially if someone says, Oh no, don’t use these words in case someone thinks you’re using AI to generate your posts! My feeling is that I’m not going to be bullied away from using a great word like delve — which I should use more often — by the mere fact that ChatGPT uses this word more often than random people.

Also, it’s irritating to have someone declare that words such as “additionally” are used a lot in generated text. Who cares? There’s nothing wrong with transition words that lead the reader through the passage, and there’s no way to get creative with words like “first” and “finally” — and “additionally” is just like that. This is reminding me of the time I ran a section of a paper I wrote through AI detectors and got flagged for possible plagiarism for — ready? — “See Figure 1, below.” And laughed. Why yes, clever AI detector, I bet “See Figure 1, below,” does indeed occur in many, many other papers all over the internet.

But my actual first reaction in this particular case is: Steatotic? I perk up: this is a new-to-me word! What does it mean? And how is it possible that text generators are FREQUENTLY using a word I’m not familiar with? That seems so unlikely!

Google, define “steatotic” —

ste·a·to·sis /ˌstēəˈtōsis/ noun

Medicine

infiltration of liver cells with fat, associated with disturbance of the metabolism by, for example, alcoholism, malnutrition, pregnancy, or drug therapy.

Oh, come on. “Steatotic” cannot possibly be a common word in generated text. Questions about why this word is on that graph do come up in the responses to that tweet. But I don’t care why people think it might be a common word in generated text. It can’t be. That’s just silly. The sample of generated text from which these words were drawn must be highly limited.

I do think the way AI text puts words together is a much better tell than the specific words used in the text. Even that gets tricky, as I think we’ve seen here — though some commenters seem REALLY sensitive to generated text. Here, how about this?

Me: Good morning, ChatGPT! Please write a paragraph about Death in Terry Pratchett’s style.

ChatGPT:

Death, as ever, arrived precisely when he meant to, which was rather unfair on those who had been counting on another five minutes, because Death never miscounted minutes, and besides he quite liked punctuality. He loomed in the way only a tall skeleton in a black robe with a scythe could loom, and he did it with a certain professional dignity, the kind that suggested he had turned up to an appointment with eternity and found the waiting room full of people still reading the magazines. His voice, when he spoke, rattled around the air like the echo of an empty church bell that had just realized its job was to make people nervous. YOU HAVE COME TO THE END OF THE STORY, he intoned, not unkindly. Then, because he was not without a sense of courtesy, he added, BUT AT LEAST YOU GET TO KNOW HOW IT ENDS.

Would you like me to try another paragraph in a slightly funnier, more satirical register (like Mort) or a more solemn one (like Reaper Man)?

Me: Sure, try doing it like Reaper Man.

ChatGPT:

Death stood in the field, the scythe resting lightly in his hand, as though it weighed no more than a feather, which in a sense it didn’t. The grass swayed around him in the night breeze, whispering its own secrets to the dark, and the stars overhead looked down with the calm indifference of witnesses who had seen everything before. Death, for his part, was not indifferent. He watched the little farmhouse on the hill, the dim glow of its lamp, and the slow breathing of the old man inside as though each rise and fall of his chest was a note in a long, familiar song. EVERYTHING HAS ITS SEASON, Death thought, not without sadness, and if the wind chose to bend the words across the meadow, that was its own business.

Want me to spin out a bit more—give you a whole short scene, the way Pratchett would linger on a moment before shifting to the wider Discworld shuffle?

Me: No, thank you, this is fine.

What do you think? Besides noting that obviously ChatGPT has been fed all of Terry Pratchett’s books, which is, of course, copyright violation. If someone handed you five paragraphs and four were from Pratchett’s books while the other was one of those above, would you be able to tell which was fake?

I think the very first sentence:

is incoherent in a way that is absolutely impossible for Pratchett. What is the “because” doing here? Death arriving on time is unfair BECAUSE Death never miscounts minutes? What? Death’s arriving on time is unfair BECAUSE Death likes punctuality? Huh?

This sentence is ridiculously incoherent.

This is making me think about the link between wit and precision of language. I don’t think you can get the first without the second. I think precision is absolutely crucial for wit, and I therefore now wonder whether wittiness is diagnostic of human writing — I mean, the writing of skilled authors, obviously, not all human authors. I wonder if wit is something that you can’t get with generated text, or can’t get consistently. I’ve said before that it seems to me humor is something skilled human writers manage without effort, but that text generators can’t manage at all. I don’t think I’d go so far as to say that all humor depends on precision of language. I don’t think that’s true. But I do think wit does.

What about the other sentences in this paragraph?

rattled around the air like the echo of an empty church bell

Bells rattle? The echoes of bells rattle? This is the exact kind of nonsensical metaphor that has led some of you to immediately point to generated text when I personally might not have spotted that passage as generated. I think you all pointing to this type of thing has made me more sensitive to it, because this time, it jumped out of the paragraph at me.

I loved Reaper Man, which I think might have been the first Pratchett book I read. Who will care for the grass if not the reaper? Great story, great personified Death.

I think this paragraph is harder to spot as fake.

“Everything has its season” is extremely cliched — I mean, obviously that is extremely cliched — but it might have seemed appropriate in context.

If the wind chose to bend the words across the meadow, that was its own business — that doesn’t make a lot of sense, does it? Wind would generally carry words, or perhaps muffle words. I’m not quite seeing how wind could bend words, so I think this is another example of an incoherent metaphor.

Overall: A good try at the style, but lacks sentence-level and paragraph-level coherence. Is coherence as dependent on precision of language as wittiness? I don’t think so, but maybe coherence is dependent on precision of expression at a higher level than word-by-word precision. You can’t create metaphors that work unless you can hold the actual relationship between to things in your mind and come up with something that expresses that relationship. Rather than stealing metaphors and sentences that work from alllllll the stolen text its been trained on, ChatGPT seems to generate incoherent metaphors that don’t work.

I think that’s interesting. To me, it seems that generated fiction has become less wooden and less inundated by a flood of adverbs in dialogue tags, but is at least as bad at coherence as it was to begin with.

Please Feel Free to Share: