What elements give away generated prose?

When I started doing posts with excerpts from real novels versus ChatGPT, such as this one and this one, and a while ago, this one, and more recently this one and this one, I noticed the following:

A) Bad dialogue tags.

By “bad,” I mean first, too many adverbs, and adverbs used in clumsy and silly ways. I also mean cliched dialogue tags. His heart pounds, her eyes narrow, and while this sort of thing can be fine if used with exceptional smoothness, they leap out at the reader as complete cliches if the smoothness is less than exceptional. ChatGPT gravitates to the most cliched of all cliched dialogue tags, and it does this so often and so consistently that it’s a major tell.

I also mean a category of dialogue tags that I’ve started to call “trying too hard, getting silly.” I mean dialogue like this, from an example of supposedly good dialogue that is in fact bad:

“John, are you listening?” Mary fidgeted, her heart aching at the way he was ignoring her.

“What? Oh, sure. Why are you wondering if I love you? Of course I do—how could you think such a thing?” John went back to reading his book, his brows furrowed in concentration. Mary waited for more, but he said nothing else. Then his face brightened. “Hey, what’s for dessert?”

“Chocolate cake—your favorite.” She played with her apron strings, then, with clenched teeth, she threw the apron to the ground. “I’ll go get you a piece.”

These tags are overdone and therefore obtrusive and silly. And the thing that really struck me at the time was how much this looks ChatGPT dialogue. Here, for comparison, is a sample of ChatGPT dialogue. I modified it a tiny bit to smooth it out because I want to just look at this specific problem, not other problems.

Alex gently brushes a strand of hair behind Jamie’s ear, their eyes locking. “You know, even on the toughest days, just seeing you makes everything feel right.”

Jamie smiles, her heart racing as she leans closer. “I feel the same. It’s like you’re my safe place in a world full of chaos.”

Alex takes Jamie’s hands, fingers intertwining, and whispers, “I never want to leave that safe place, not now, not ever.”

Here is another example, pulled from one of the earlier posts linked above. In this case, I had ChatGPT continue a story by feeding in a few paragraphs and saying, “Continue the story.” Then, in my blog post, I asked, “Can you tell where ChatGPT took over, and if you could, what gave it away?” And what gave it away to me was ChatGPT “trying to hard, getting silly” with dialogue tags. Here’s the example that showcases this the best. The first paragraph here is real, then the next two are ChatGPT, and this is OBVIOUS:

“Sure, but a teenager who never got over his son, and was still in love with his wife.” Theo turned his head over his shoulder, looking back, but not at anything in the room. “When Dave missed a few days at work – well, you can guess the rest. He’d collapsed in his kitchen. One of his neighbors called me with the bad news.”

Greta reached across the table, placing a comforting hand on Theo’s. The weight of Dave’s story lingered in the air, a somber reminder of life’s unpredictable turns.

“I’m so sorry, Theo,” Greta said softly, her eyes reflecting empathy. “Losing a friend, especially one who carried such heavy grief, is never easy.”

The shift from real to generated is instant and obvious. All the tags are overdone, cliched, obtrusive, and silly. These tags call attention to themselves — that’s what I mean by obtrusive. It’s hard to read past or through them without noticing them. It’s hard to read them without SEEING the author trying too hard. Any reader even moderately aware of writing craft is probably going to think, and ought to think, “Oh, this is someone who was told to show, not tell, and so here she is, trying super, super hard to show everything. She’s trying too hard and it’s getting silly.”

So this is a tell for generated fiction OR for bad writing, not just for tell for generated dialogue, which is what I initially thought. The overuse of silly adverbs in dialogue might be more of a tell, but it’s famously a feature of bad writing as well, so it might not be a way to distinguish between bad (but real) and fake generated fiction.

I realize you all already know that I don’t have a problem with adverbs personally. Since I’ve said that emphatically a lot of times, I thought I’d look for something from another author on the topic, not the sort of author who pounds the pulpit and shouts NEVER USE ADVERBS, but someone who is substantially more in agreement with me about this. Even though I haven’t actually liked the books I’ve tried by Jeff VanderMeer I’ve seen him on Twitter or Facebook or somewhere declaring forcefully that adverbs are fine, so I went and looked for a quote of his about this, and here it is:

One reason that a beginning writer should be wary of adverbs and heroic speeches, as emblematic of all kinds of other elements in fiction, is that the beginning writer is more likely to botch the uses of both—or to misunderstand the context in which they work. For beginning writers, although I try not to be absolute myself, saying “I’m going to cut a third of all adverbs from my stories” is a useful mechanical way of learning how they work—a banishment that suggests an eventual return from exile at the right time. The advanced writer isn’t as likely to fall for advice that’s absolute, and also knows better than to make general pronouncements about anything—and, hopefully, has enough mastery to make things like heroic speeches work, in part because you’re hopefully always thinking “yes, but” when considering any admonition about any aspect of writing.

I rather like this, because I do think that suggestions such as “cut a third of all your adverbs” can work the same way as suggestions like “cut one sentence per page.” These aren’t suggestions to follow. They’re ways of building awareness of what adverbs are for, or of what the words are doing on the page and which words seem dispensable and why. Of course I also like the implicit statement that absolute advice is bad advice, and that general pronouncements are likely to be wrong.

But back to the actual topic. Besides overdone dialogue tags with or without overused adverbs, what else gives away generated text?

B) Super cliched, overdone reactions.

This is often seen in combination with dialogue tags, but it can occur separately, during exposition.

My jaw dropped as the radiant silhouette of a man steadily gained definition and solidity.

Too stunned to make a sound, I remained motionless by the pool.

I could only gape at the spot in the treeline where he had vanished, my mind reeling.

Lena’s eyes softened with gratitude.

On and on, there are zillions of lines like this in a lot of generated fiction. On the other hand, if you specifically ask ChatGPT to generate something with a literary style, you can get this:

The morning after the funeral, I awoke with that peculiar weight behind the eyes that follows too much thinking and not enough sleep. Outside, the Georgia sky had turned the color of old tin, low and expressionless. The air was thick with the early onset of June heat, and somewhere beyond the back porch, a mourning dove called out like it had something important to say and no one left to say it to.

I moved through the house barefoot, the pine floors cool against my soles. Mama’s house—mine now, I supposed, though the thought still landed strange in my mouth—was quiet except for the rhythmic tick of the old Seth Thomas clock on the mantel. It ticked the same way it had my entire childhood, steady as judgment. The smell of coffee had not yet touched the air. That was telling.

It had been three days since they pulled Robert’s body from the Oconee, his skin gone blue as river stone, and still no one had come forward with anything worth believing. Sheriff Cade had asked his questions, clipped and perfunctory, like he already knew the answers and didn’t care for the details. I knew better. Cade never did like details; they had a way of unraveling his version of things.

I stood at the kitchen sink and looked out over the back field, now nothing but scorched weeds and bent fence posts. In the distance, a figure moved—a slow gait, almost hesitant, like he was waiting for someone to wave him off. I squinted against the light and caught the glint of a badge.

And nothing here has that unbearably cliched feel to it. Do you wonder whether ChatGPT searched for text tagged with “fiction” and “mystery” and “georgia” and “literary” and then stole whole sentences from somewhere? Because I certainly do wonder that.

Someone — SarahZ — in the comments to one of the recent posts about voice linked above, said, The plagiarism of the training materials bothers me way more, meaning way more than the prospect that authors writing terrible fiction might have trouble competing with generated stories. I totally agree. I was very much not okay with finding that Meta stole all my books, and I mean all of them, traditionally published or self-published, they’re all in that stolen database. At the link, an author can click to a search tool to find out if Meta stole their books and click to a cease-and-desist letter to send to Meta, which will of course do nothing in the short term, but I certainly wish Author’s Guild the best of luck with their various lawsuits against the theft of intellectual property for AI training. Especially since I joined Author’s Guild last year, so more power to them. But my point is, Cade never did like details; they had a way of unraveling his version of things looks a lot like it might have been lifted whole from some real work. Don’t you think so? Then the selection gets incoherent about characterization, because “almost hesitant, like he was waiting for someone to wave him off,” doesn’t fit the character who was suggested by the previous sentences.

C) Weird analogies and figures of speech.

Outside, the Georgia sky had turned the color of old tin, low and expressionless.

The thought still landed strange in my mouth.

The hum of the station was constant now—like a heartbeat no one wanted to acknowledge.

Europa’s ice stretched in every direction like a frozen ocean caught mid-roar.

D) Never being creative with punctuation or grammar.

I mean almost never rather than actually never, because I have in fact seen (very occasionally) use of nonstandard punctuation or grammar. In particular, current text generators seem to be able to use occasional fragments, particularly if they’re asked to write “in a literary style.” But it’s relatively rare. Text generators almost always seem to prefer the most ordinary grammar and punctuation, sentences that are of average length rather than unusually short or long, syntax and word choice that is ordinary. This is one of the characteristics that seems to make generated text “seem” or “feel” artificial.

***

None of this has to do with the overall impossibility — I think probably this is in fact impossible — of a text generator sticking to the story. I mean, creating a coherent plot, keeping characters consistent, keeping details consistent, infusing a theme into a novel-length work. I can’t see how a text generator could do any of that, considering it has no actual thought process or ability to judge anything except the probability that one word would follow a different word.

***

Recently, a student handed me a paper to review and I said, as gently as possible: “If I were your instructor, I would think this was generated and I would first run it through every AI detector I could find and then I would look at the history of the document, and if your instructor is requiring you to enable track changes, that’s a giveaway. Here, I’m highlighting all the sentences that strongly suggest this is generated text.”

Then I highlighted a lot of the text of the paper according to a single different criterion that has nothing to do with anything above:

E) Implausibly professional for the student in question.

Let’s look at an example of generated text for a typical student paper topic:

ChatGPT, please write two paragraphs about the impact of modern technology on creating opportunities for musicians.

ChatGPT:

Modern technology has dramatically transformed the music industry, creating unprecedented opportunities for musicians worldwide. Digital platforms like Spotify, YouTube, and TikTok allow artists to distribute their music instantly to global audiences, bypassing traditional gatekeepers such as record labels and radio stations. This democratization of music distribution means that artists from any background can gain exposure and build fanbases independently. Additionally, social media enables direct engagement with listeners, allowing musicians to cultivate loyal communities and receive immediate feedback on their work.

Beyond distribution and promotion, technology has also revolutionized music creation itself. Affordable home studio equipment and powerful digital audio workstations (DAWs) have made high-quality production accessible to virtually anyone with a computer. Collaboration tools and cloud-based software enable artists to work together remotely across continents. These advancements not only reduce the cost of music production but also encourage innovation by giving musicians access to a vast array of virtual instruments, effects, and sound libraries. As a result, modern technology continues to level the playing field, empowering more people to turn their musical passions into viable careers.

Everything boldfaced is text I would say is unusual to extremely unusual in student writing. Almost every sentence plus the overall coherence is unusual. Students really struggle with transitions. This two-paragraph mini-essay shows the use of an echo transition, which is all very well — I specifically teach students to use transitions of this exact type — but until I explain how to use echo transitions, students almost never use them. The transitions within the paragraph — beyond, not only but also, as a result — are rare in student papers. (Don’t get me started on how dismal the writing ability of students is today.)

Fundamentally, none of the text generated by ChatGPT sounds like student writing. It sounds like something you’d see in an editorial or a pretty well-written Substack essay or something like that. It’s more shallow than a good Substack essay, not as focused in topic as a good editorial; it tries to introduce and conclude the topic too fast; there’s nothing personal in it as might be expeted in a student paper, and overall it just does not sound like a student wrote it. I’ve seen a (small) number of students who could write well, and this doesn’t sound like their essays either, though it’s closer.

As a side note: When this specific student dropped his paper into TurnItIn, as required by the class instructor, the sentences I highlighted all turned up as “plagiarized.” AND, the paper was on the topic above, and it was basically identical to what ChatGPT just handed me. Some of the sentences are so similar, they’re almost word-for-word. Digital platforms like Spotify, YouTube, and TikTok allow artists to distribute their music instantly to global audiences, bypassing traditional gatekeepers such as record labels and radio stations. The chance that a student will use the word “bypass” this way is low. If the student is not great at writing, then the chance is zero.

***

I haven’t a great eye for detecting generated text “in a literary style,” but people more familiar with literary fiction seem to have much more facility with this. Remember this post?

The morning after the funeral, I awoke with that peculiar weight behind the eyes that follows too much thinking and not enough sleep. Outside, the Georgia sky had turned the color of old tin, low and expressionless. The air was thick with the early onset of June heat, and somewhere beyond the back porch, a mourning dove called out like it had something important to say and no one left to say it to.

I moved through the house barefoot, the pine floors cool against my soles. Mama’s house—mine now, I supposed, though the thought still landed strange in my mouth—was quiet except for the rhythmic tick of the old Seth Thomas clock on the mantel. It ticked the same way it had my entire childhood, steady as judgment. The smell of coffee had not yet touched the air. That was telling.

It had been three days since they pulled Robert’s body from the Oconee, his skin gone blue as river stone, and still no one had come forward with anything worth believing. Sheriff Cade had asked his questions, clipped and perfunctory, like he already knew the answers and didn’t care for the details. I knew better. Cade never did like details; they had a way of unraveling his version of things.

Some of you caught that this was generated and lots of you didn’t. I’m almost sure I wouldn’t have. But someone I know who has a lot more experience with literary fiction than I do pointed to it immediately, despite the distraction provided by the (many) selections from pretty bad novels that were offered as distractors.

This does remind me of the AI art turing test offered by Astral Codex Ten, and in particular the way Elaine T’s Teen immediately perceived fakeness in a way I just could not until specific fake elements were pointed out.

The tables are uniformly lit as if from above, not by the warm light from the cafe windows or from the lamp and sunrise. The condiment pots on them have highlights but cast no shadows. The shadows of the tables are vast by an overhead source of some kind. Chairs don’t all have shadows. The foreground lamp post doesn’t have any shadow at all, not from any of the assorted potential sources- it ought to be falling either foreground left or foreground right. — And so on, detailed explanation from the Teen, and none of this was obvious to me until it was all painstakingly pointed out.

This, I think, is something like looking at generated text. A student generates it and thinks it sounds great. I look at it and think, “Definitely not student writing and it looks generated.” Except I’m still learning how to pick out the elements that make me say “looks generated,” while I’m pretty secure with being able to say, “Doesn’t look like student writing.”

Please Feel Free to Share: