AI Flash Fiction Turing Tests

Recent Reading: Hemlock and Silver Poetry Thursday: Samuel Johnson

AI Flash Fiction Turing Tests

Two years ago, Marc Lawrence did his first AI Flash Fiction Turing Test

This is flash fiction because it was so obvious that AI couldn’t possibly fool anybody at longer lengths. You can click through and see what you think. Remember this is flash fiction, so it’s emphasizing the sort of thing AI might in theory be able to generate. All the real flash fiction was written by experienced authors.

You can read each one and immediately vote on how good it is and whether it was generated or is real. Marc provides a graphic showing that lots of people have trouble telling whether these particular flash fiction entries are real or not. I’m reasonably capable of understanding graphs, but I studied this one for a bit and couldn’t quite figure it out. However, I did correctly identify all but one as human vs AI, and I didn’t think most entries were especially hard …

… but here is where I should add that I’m not sure what would have happened if I hadn’t been doing various posts with 4 real novel opening paragraphs and one generated, and another such post here, or the beginning of chapter 3 for eight real novels and one fake, and so on — probably a dozen or so posts like this altogether, each with thoughts about what gives away the generated one (if anything). I think this has taught me to look for various tells, plus I know that when presented with a “can you tell?” post of this kind, I am biased to think certain kinds of bad human writing might be generated.

Things that give away real vs generated entries: obvious grammatical errors or punctuation errors make an entry look like a person wrote it, though not necessarily in a good way. Forgetting closing quotes is a human thing, not a thing you see in generated text. Ditto for comma splices, usually. Weird metaphors are, in my opinion, so far the quickest and most reliable way to spot generated fiction. The one I got wrong was human-written but included a metaphor I thought might look weird. I’d tell you which one, but I don’t want to bias your own responses. I found myself reading each entry just until I hit a metaphor that looked generated and then voting and moving on. For most of these entries, I read the whole thing only for the human-written ones.

I only thought one of the entries was actually good, and that one was human-written by an author I’ve heard of, but I’ve never read anything of hers (T. Frohock); I thought their entry was obviously good and obviously human-written rather than generated. All the other entries, I thought were passable at best. I’m biased, though, because I don’t much like most short fiction and of course flash fiction is as short as it gets.

***

Then Marc did the same thing just a few weeks ago, and here is that link.

Once again, I missed one; once again, I thought one that was human-written looked generated. These were all generated by ChatGPT-5 and I do think there were fewer obviously nutty metaphors — but there still were some, and they honestly stop the eye once you realize you should be thinking, “Does that make sense?” every time you see the word “like.”

Other tells: cliched description and cliched or overdone dialogue tags and reactions, and the human-written one I thought might be AI generated had those exact issues, in my opinion, and sorry if the author is well known, that’s still what I think.

The graphic is easier to understand this time, so that’s good. I see the one I was wrong about has opinions that divide right down the middle. I wonder what other people thought suggested that one was AI generated.

Marc Lawrence’s own story is OBVIOUSLY human generated, and you know what, I wonder what would happen if you asked authors not to write flash fiction, but to write flash fiction that is OBVIOUSLY human generated. I wonder if the results would be different if the authors thought consciously, “What can AI not do?” and tried to do that.

I would suggest: Wit, precision of language, humor, cleverness, and unexpectedness. Those are things that the AI entries just do not have. And neither do some of the human-written stories. And that’s one reason it can be hard to tell.

Marc says he’s somewhat disheartened that so many people couldn’t tell which were which and that the highest rated stories were generated. He acknowledges that flash fiction is not nearly the same thing as a novel, but even so, I think there’s less reason for him to feel that way than he may think. I suspect that if all the people who read these flash fiction pieces and voted on whether they were generated had read my various blog posts contrasting generated paragraphs with real paragraphs and discussing the tells, then that alone would constitute enough practice to greatly improve eveyrone’s ability to tell the difference.

I bet if you all — those of you who have been interested enough to read some of those posts — click through and vote, you’ll do much better than his average respondents. You can try it and see; that’s why I avoided any spoilers.

Please Feel Free to Share: