Leonard Richardson's Blog, page 17
February 19, 2014
Constellation Games Bonus Story Ebooks
February 6, 2014
Writing Aliens

The two site-specific installations that I hinted at earlier were custom scripts displaying variants on Ebooks Brillhantes and Hapax Hegemon. The text corpus comes from a scrape of everything linked to from Free Speculative Fiction
Online. The software is a heavily modified version of Bruce, modified a) to stream data from a flat text file and create the slides on the fly, instead of trying to load 20,000 slides into memory at once; and b) when restarted after a crash/shutdown, to skip the appropriate number of slides and pick up where it would be if it had been running continually.
Unfortunately I never got a picture of both displays running side-by-side; if you have such a picture, I'd really appreciate it if you could send it to me.
Just after I set up the ebooks display, I met Greg Bear, who was at Foolscap running a writing workshop. We walked over to the screen and I explained the project to him. He said "I'd better not be in there." AT THAT MOMENT the screen was showing the quote "We zoomed down eleven" from this free sample of Blood Music. It was pretty awkward.
February 4, 2014
January Film Roundup
The Hobbit: The Desolation of Smaug (2013): Or as my ticket stub calls it, HOBBIT 2. I love my now-tradition of watching the Hobbit movies with my sister Susanna, but I'm a little disappointed in this one. The thing I loved most about the first movie (dramatization of the totally canonical gaiden in which Gandalf hunts down the Necromancer) was combined with the thing I disliked most (the elevation of a throwaway character to Big Bad status, in a story that already features a frickin' dragon plus the Middle-Earth equivalent of the Crimean War). This made me suspect that the details of the Gandalf B-plot were left vague in the book for a reason.
Plus, terrible confusing action sequences all the time. The one at the end made me think that not only has Peter Jackson been playing too much Minecraft, he's the guy who wants minecarts to work like boats in lava. It was also unnecessary, since the plot of the book at that point would work just fine as the end of the second movie in a trilogy. I can only blame Hollywood meddling and hope for the best.
The good news is that we have now stretched out the story enough that the third film contains all of The Hobbit's canonical action set-pieces. But that's really an argument for making two movies, not three. Or four, as I over-enthusiastically suggested last time.
Smaug was great. I don't see a lot of movies with dragons, and I suspect such movies' dragon effects are generally lacking, because lots of people are really going ape about Smaug whereas I was thinking "yes, good, solid talking dragon implementation." The same thing happened with Gollum in the LotR movies. I guess I don't care enough about dragons in general. They're like dinosaurs... that don't exist!
Insta-update: After writing that, I listened to the episode of "The Dork Forest" with Tolkien expert Corey Olsen. It didn't change my mind on anything, but it did remind me of all the changes the filmmakers made that improved on the book, or at least made a better movie than a straight adaptation of the book would have. Especially the love triangle, the splitting up of the party to establish a POV in Laketown, the early introduction of the arrow on the mantelpiece, and all the work done to differentiate between twelve characters who are nearly identical in the book.
Yeah, only one film! Because I was travelling all month. I couldn't even count Future Love Drug, a short film made by my fellow Foolscap GoH Brooks Peck, because I came in late and only saw the last minute of the film.
I don't know if the film roundups will continue in 2014. On the one hand, I'm going to try to see, or at least review, fewer films in 2014 so I can do more reading. On the other hand, I love taking fiction apart to see how it works, and reviewing books the way I've been reviewing movies is a good way to make professional enemies. Whereas nobody cares what I say about film. So who knows?
January 27, 2014
The Crummy.com Review of Things 2013
The big one was RESTful Web APIs , a radical reimplementation of RESTful Web Services that takes the lessons of the last seven years into account. My accompanying talk is the time-travel extravaganza, "LCODC$SSU and the coming automated web" (see commentary from outside the framing device). And after the book came out we released the predecessor book under CC-BY-NC-ND.
I didn't finish writing Situation Normal but I got pretty close; I'll finish it this year and hopefully sell it.
Autonomous agent mania! I achieved a measure of fame (for Rob) with Real Human Praise, the bot whose 20,000 remaining followers proves that most people don't use Twitter the way I do. (Here's a behind-the-scenes.)
But I'm most proud of Ebooks Brilhantes, the bot that proves there's a better way to make *_ebooks bots: by reverse-engineering the actual @horse_ebooks algorithm instead of being lazy and using Markov chains.
Honorable mentions to the lovely Smooth Unicode and the ribald Dada Limericks. In non-bots, there's Apo11o ll and In Dialogue. And my explanation of comedy ethics for computer programmers, "Bots Should Punch Up".
The big NYCB posts of 2013 were my film roundups, which I really like as writing (I mean, check out the review of Norman Mailer v Fun City, USA), but which are ultimately not standalone pieces of prose. They're my impressions of the films, impressions I will be condensing into the "Film" section below.
Here's the best of the remainder:
I remembered my friend and occasional crummy.com liveblogger Aaron Swartz.
I solved the mystery of "Spacewar!"'s title.
I exposed Billy Collins's poetry hacks.
I feel like no one appreciates my Loaded Dice updates, but I like them so I'll at least keep collecting the data.
Now let's take a brief look at contributions from the not-me community:
Literature: The category that suffered the most from 2013's focus on film. I didn't read that much, and my writing is slowing down because of it. This is a strange alchemy that I can't explain but I'm pretty sure other writers recognize it. Anyway, I've got some new books I'm excited about so I'll get back on this in 2014.
For 2013 I'll give the nod to Marty Goldberg and Curt Vendel's Atari Inc.: Business is Fun, a book that... well... this review is pretty accurate, but the book has a lot of good technical and business information, plus many unverifiable anecdotes. It seems I read nothing in 2013 that I can wholeheartedly recommend without reservation... except Tina Fey's Bossypants, I guess... yes! In a late-paragraph update, Bossypants has taken the award! Wait, what's this? In a shocking upset, the ant has taken it from Bossypants! Yes, the ant is back, and out for blood!
Games: 2013 was the year I finally learned the mechanical skill of shuffling cards. Maybe this doesn't seem like a big deal to you, but I've been trying to figure this out for most of my life.
The crummy.com Board Game of the Year is "Snake Oil", a game about fulfilling user stories with lies and shoddy products. The Video Game of the Year? Man, I dunno. I'm playing computer games a little more than in 2013, but still not that many. "Starbound" is really cool, and is probably the closest I'll get to being able to play "Terraria" on Linux.
Audio: As I mentioned, I'm travelling, and away from the big XML file that contains my podcast subscriptions, so I'll fill this in later, but there's not a lot new here. But I can tell you the Crummy.com Podcast of the Year: Mike "History of Rome" Duncan's new podcast, Revolutions. The first season, covering the English Revolution, just wrapped up, so it's a good time to get into the podcast.
Hat tip to Jackie Kashian's The Dork Forest. Probably not going to have to update this one, actually.
Film: Ah, here's the big one. As I mentioned earlier, I saw 85 feature films in 2013. By amount of money I spent, the best film of the year was Gravity, which I dropped about $40 on. But by any other criteria, it wasn't even close! Well, it was close enough to get Gravity onto my top twelve, which I present now. I consider all of these absolute must-watches.
The General (1926)
Nashville (1975)
Ishtar (1987)
Ball of Fire (1941)
Calculated Movements (1985)
The World's End (2013)
No No Nooky TV (1987)
Gravity (2013)
The Godfather (1972)
Cotton Comes to Harlem (1970)
Gentlemen Prefer Blondes (1953)
No (2012)
As you can tell, only films I saw for the first time in 2013 are
eligible; we call this the "The Big Lebowski rule".
There was no movie that really changed my aesthetic sense this year, the way Celine and Julie go Boating did last year, but Nashville gave me insight into managing a large ensemble cast. Hat tip to Fahrenheit 451 for getting me to understand why I keep lining up for French New Wave films even though they keep pulling the football away from me.
I still don't feel like I know that much about film. I treat films like they're books. I'm not that interested in what people do with the cameras. I have no idea what the names of actors are. I find the prospect of making a film quite tedious. They're fun to watch though.
For the record, here's my must-see list from 2012, which I didn't spell out last time:
Celine and Julie Go Boating (1974)
Brazil (1985)
A New Leaf (1971)
All About Eve (1950)
The Whole Town's Talking (1953)
Shadow of a Doubt (1943)
Paper Moon (1973)
Marathon Man (1976)
Okay, I think that's enough. Nobody reads these things until the centennial anyway.
January 24, 2014
One week to Foolscap!
Also featured at the con will be (I think I've mentioned this before) two continuous SF/F text installations I've created to astound you. This exhibit WILL NOT BE REPEATED, unless someone asks for it at another con. So if you're in the Seattle area, sign up or just show up the day of, and you'll get to hang out with me, and the other honored guest, museum curator/SyFy monster movie screenwriter Brooks Peck.
January 7, 2014
The Bots of 2014
But for now, I have two new bots to entertain you, the general public. The Hapax Hegemon (@HapaxHegemon) posts words that occur only once in the Project Gutenberg corpus I've been getting so much mileage out of. So far it's emitted such gems as "zoy", "stupidlike", and "beer-swipers". And like so many of my recent bots, it won't stop until we're all dead.
My second new bot is the Serial Enterpreneur (@ItCantFail), which posts inventions. It's basically playing Snake Oil (spoiler: Crummy.com 2013 Board Game of the Year) with a much larger corpus, derived from the Corpus of Historical American English and the Scribblenauts word list.
So far my favorite @ItCantFail inventions are the delicious Fox Syrup, the liberal-friendly Left Drone, and the self-explanatory Riot College. Write in with your own wacky inventions! I won't use them, because that's not how this bot works, but it seems like a fun way to kill some time.
More bots are on the way! But not for a while, because I gotta do novel work and get the Foolscap-exclusive bots in shape.
January 2, 2014
December Film Roundup
I'll tackle the "best of" topic in a general 2013 wrap-up later on. For now, here's a look at December's cinematic adventures:
The Kids Are All Right (2010): This was a fun family dramedy that never went for the cop-out solution. I liked that it presented sexual orientation as a spectrum rather than a binary. Also, Mark Ruffalo looks just like Rob Dubbin. Someone should look into this.
The Marriage of Maria Braun (1979): Pretty exciting tale of a dame who sets out to be the brassiest of any dame in postwar Germany. There was a murder that I found pretty distressing, and the ending was a huge cop-out, but in the category of "random foreign film seen at the museum" I'd say it was above average.
The American soldiers in this film are clearly played by German actors. One of them speaks British English with a fake American accent. It was really, really weird.
The Big Combo (1955): How can I not love a noir in which the detective is named "Leonard Diamond"? I don't know how, but I don't love this. Richard Conte is excellent as the crime boss Mr. Brown, and there are a couple great bits involving the chief henchman's hearing aid. Also Lee van Cleef as half of a gay henchman couple. But overall this was just a noir popcorn movie for me--good, but nothing special.
Down By Law (1986): I went into this not knowing what to expect. I'd never seen a Jim Jarmusch film before [checks IMDB to avoid repeat of "Robert Altman" fiasco], and at first I was unimpressed by the way this movie dripped with sleaze and stereotypes and shiftless losers. I mean, I like Tom Waits songs, but you won't see me standing in line to see "Tom Waits Song: The Movie."
But then the shiftless losers get thrown in jail, and the movie a) radically changes direction and b) really takes off. The tight confines of the jail cell are the crucible that forges Down By Law into
a tight ball of character humor and callback-based jokes. It becomes a Marx Brothers movie written by Samuel Beckett, in which Groucho and Zeppo vie endlessly, pointlessly for supremacy, spurred onward by a combination Harpo/Chico. I can't recommend the second act of this movie enough. The third act is not quite as good, but what the hell, I'm feeling generous.
Manos: The Hands of Felt (2013): I saw this at a party. I guess it counts as a movie? It was a filmed play, but a lot of early films were effectively filmed plays.
This is a puppet adaptation of Manos using Avenue Q-style Muppets (i.e. the puppeteers are not hidden and the puppets are not the official licensed Muppets). It was all right. They added a meta-narrative that recontextualized Manos as a found-footage movie depicting the process of its own filming. Which I don't like conceptually but it kept it from getting boring, as a completely faithful adaptation would have been.
The film was edited the same way as the original Manos, with the same abrupt transitions. (Okay, yeah, it's definitely a movie, not just a filmed play.) It was hard to resist the temptation to riff Felt using the original Manos MST3K riffs.
The puppet design was very good! I want to mention two things I thought were really clever. The teenage couple who make out in their car during the entirety of Manos are depicted by a joined Bert-and-Ernie puppet with two operators. (You can see a photo here.) And in the middle of the film, the "dancing wives of Manos" scene was performed as a The Muppet Show-style "At The Dance" sketch.
Beyond Expectations (2013): Sorry, I've got to backfill this one because watching Manos reminded me of this other Kickstarter-funded film Sumana and I watched back in October. This is a documentary on The Phantom Tollbooth, a book that Sumana and I both adore. I want to say this film was "for hard-core fans only", but we're hard-core fans and we were a bit disappointed. We wanted more details about the creation of the book, and we felt this (very short) film focused too much on trying to sell the book's cultural importance to the unconverted. Interviewees rambled on about irrelevant topics and the editor didn't cut away to something more interesting.
Admittedly, the two main interviewees were Norton Juster and Jules Pfeiffer, and hearing them ramble on irrelevant topics appealed greatly to us. It's a delicate balance, and I'm not saying I could have edited the film any better, but I don't think it did justice to the source material. Great animated sequences, though.
Children of Men (2006): Super good. It has all the same problems as Gravity (highly driven by coincidence, very predictable action-movie pacing) but also a ton of spectacle. And this movie has a plot. Yeah, I don't really have much to say about this one. It's great. The exposition could be done better.
Lola (1961): At this point I know how it goes with 1960s French films, and I wasn't expecting anything from Lola except some nice visuals, which it delivered. But it also delivered some fun farce and a brief moment of excitement when it seemed like it was going to turn into a crime movie. (It doesn't.)
Unlike the American soldiers in The Marriage of Maria Braun, the American sailor in Lola is actually played by an American, Alan Scott. It's weird, though: his French sounds just like like an American speaking French, but his English sounds more like a French person faking an American accent.
Funniest line: "Learn your geography! There are no sailors in Chicago! Only gangsters!"
The Bletchley Circle (2013): British TV series. A genius premise (bored, oppressed women in postwar London use their wartime codebreaking training to hunt down a serial killer) is ill-served by the plot, in which the killer is continually revealed to be more and more clever. He has to be; otherwise he'd be no match for the sleuths, and the series, already short even by British TV standards, would be over. To the point where in the final episode he's got out-and-out superpowers, like the once-mythical Mallory. Well, maybe they got it out of their system; I'll watch the second series when it comes out.
The Godfather (1972): According to IMDB this is the second-greatest film of all time. Do I dare to be so conventional as to agree? I don't know, but I will say this is a hell of a movie. It flawlessly pulls off nearly everything it tries to do. (Notably, it does not try to have any female characters.) It's almost 3 hours long and I was only bored for a couple minutes total.
I know less about film criticism than I do about film, so I don't know how deeply this aspect of The Godfather has been explored, but the character progression was really the thing that caught my attention. The movie starts with a milquetoast undertaker asking Vito Corleone for a favor. He's terrified, because Vito Corleone is terrifying and ruthless. Everyone's afraid of him. The fact that he's polite and soft-spoken just adds to the terror. By contrast, Michael is the good guy, the "civilian", the son whose hands are clean.
Then Vito gets shot, and Sonny takes over. Sonny is a psychopath, and he's dumb, and the combination makes for a terrible crime boss. Sonny makes a lot of bad decisions and ends up getting himself killed. And then comes the turn. Vito Corleone calls in the favor he granted the milquetoast undertaker in the very first scene.
Because I was born after The Godfather came out, I came in to this movie aware of the general character of the titular Godfather. As such, this is the scene I've been dreading. How is this poor guy going to be compromised? But I'd read Vito Corleone all wrong. He doesn't compromise people for fun. He's a professional. And right now he really needs an undertaker. He needs his machine-gunned son to look presentable at the funeral. That's the favor.
And then Michael takes over the family, and it turns out that Michael isn't the good guy at all. Michael actually is the man I'd been assuming his father was. It's the "eaten by a bigger fish" trick I mentioned in my Constellation Games commentary, and I love it.
Interesting fact I'm not sure what to do with: The Godfather, The Marriage of Maria Braun, and The Bletchley Circle all cover the same time period.
Emmet Otter's Jug-Band Christmas (1979): If you believe IMDB ratings, this film is almost in the same league as Fanny and Alexander, the made-for TV Christmas movie the museum showed last year. I disagree! This is dull. I only liked a couple of the songs. The plot is the plot of an above-average children's book. Most Muppet stuff aimed at kids has something for the adults as well, but this did not. It can't help that this is the thing I saw after The Godfather.
The heavy use of water and Muppet-sized "outdoor" sets was very impressive technically. I liked the fish Muppet who had to be dragged around everywhere in a tank of water. I also enjoyed the outtakes they showed after the feature, including an interminable series of takes in which an attempt to film the behavior of a chaotically moving object goes endlessly awry. I laughed harder at that than I did at anything in the film.
I'm planning on seeing a lot of movies in 2014, but I don't know if I'm going to write these detailed reviews of each one. It takes a long time to get my thoughts in order and write it down, and, as you'll see when I write the year-end roundup, it really eats into the time I spend enjoying other media. So until next time, I'll see you at the movies! (If you are Sumana, Hal, or Babs.)
December 16, 2013
Markov vs. Queneau: Sentence Assembly Smackdown
Markov wins when the structure is complex
I got the original idea for this post when generating the fake ads for @pony_strategies. My corpus is the titles of about 50,000 spammy-sounding ebooks, and this was the first time I did a head-to-head Markov/Queneau comparison. Here are ten of Markov's entries, using the Markov chain implementation I ended up adding to olipy:
At Gas Pump!
The Guy's Guide To The Atkins Diet
Home Internet Business In The World.
101 Ways to Sharpen Your Memory
SEO Relationship Building for Beginners
Gary Secrets - Project Management Made Easy!
Weight Success
How get HER - Even If It's Just Money, So Easy and Effective Treatment Options
Sams Yourself
Define, With, Defeat! How To Get Traffic To Your Health
The Markov entries can get a little wacky ("Define, With, Defeat!"), which is good. But about half could be real titles without seeming weird at all, which is also good.
By contrast, here are ten of Queneau's entries:
Adsense I Collection Profits: The bottom Guide Income!
Reliable Your Earning Estate Develop Home And to life Fly Using Don't Your Partnership to Death
Help the Your Causes, Successfully Business Vegetarian
Connect New New Cooking
1 Tips, Me Life Starting to Simple Ultimate On Wills How Years Online With Living
How Practice Health Best w/ Beauty
Amazing Future & Codes Astrology to Definitive Green Carbs, Children Methods JV Engine Dollars And Effective Beginning Minutes NEW!
I and - Gems Secrets Making Life Today!
Succeeding For Inspiring Life
Fast Survival Baby (Health Loss) Really How other of Look Symptoms, Your Business Encouragement: drive Health to Get with Easy Guide
At their very best ("Suceeding For Inspiring Life, "How Practice Health Best w/ Beauty"), these read like the work of a non-native English speaker. But most of them are way out there. They make no sense at all or they sound like a space alien wrote them to deal with space alien concerns. Sometimes this is what you want in your generated text! But usually not.
A Queneau assembler assumes that every string in its corpus has different tokens that follow an identical grammar. This isn't really true for spammy ebook titles, and it certainly isn't true for English sentences in general. A sentence is made up of words, sure, but there's nothing special about the fourth word in a sentence, the way there is about the fourth line of a limerick.
A Markov chain assumes nothing about higher-level grammar. Instead, it assumes that surprises are rare, that the last few tokens are a good predictor of the next token. This is true for English sentences, and it's especially true for spammy ebook titles.
Markov chains don't need to bother with the overall structure of a sentence. They focus on the transitions between words, which can be modelled probabilistically. (And the good ones do treat the first and last tokens specially.)
Markov wins when the corpus is large, Queneau when the corpus is tiny
Consider what happens to the two algorithms as the corpus grows in size. Markov chains get more believable, because the second word in a title is almost always a word commonly associated with the first word in the title. Queneau assemblies get wackier, because the second word in a title can be anything that was the second word in any title.
I have a corpus of 50,000 spammy titles. What if I chose a random sample of ten titles, and used those ten titles to construct a new title via Queneau assembly? This would make it more likely that the title's structure would hint at the structure of one or two of the source titles.
This is what I did in Board Game Dadaist, one of my first Queneau experiments. I pick a small number of board games and generate everything from that limited subset, increasing the odds that the result will make some kind of twisted sense.
If you run a Markov chain on a very small corpus, you'll probably just reproduce one of your input strings. But Queneau assembly works fine on a tiny corpus. I ran Queneau assembly ten times on ten samples from the spammy ebook titles, and here are the results:
Beekeeping by Keep Grants
Lose to Audience Business to to Your Backlink Physicists Environment
HOT of Recruit Internet Because Financial the Memories
Senior Guide Way! Business Way!
Discover Can Power Successful Life How Steps
Metal Lazy, Advice
Insiders Came Warts Weapons Revealed
101 Secrets & THE Joint Health Than of Using Marketing! Using Using More Imagine
Top **How Own 101**
Multiple Spiritual Dynamite to Body - To Days
These are still really wacky, but they're better than when Queneau was choosing from 50,000 titles each time. For the @pony_strategies project, I still prefer the Markov chains.
Queneau wins when the outputs are short
Let's put spammy ebook titles to the side and move on to board game titles, a field where I think Queneau assembly is the clear winner. My corpus is here about 65,000 board game titles, gathered from BoardGameGeek. The key to what you're about to see is that the median length of a board game title is three words, versus nine words for a spammy ebook title.
Here are some of Markov's board game titles:
Pointe Hoc
Thieves the Pacific
Illuminati Set 3
Amazing Trivia Game
Mini Game
Meet Presidents
Regatta: Game that the Government Played
King the Rock
Round 3-D Stand Up Game
Cat Mice or Holes and Traps
A lot of these sound like real board games, but that's no longer a good thing. These are generic and boring. There are no surprises because the whole premise of Markov chains is that surprises are rare.
Here's Queneau:
The Gravitas
Risk: Tiles
SESSION Pigs
Yengo Edition Deadly Mat
Ubongo: Fulda-Spiel
Shantu Game Weltwunder Right
Black Polsce Stars: Nostrum
Peanut Basketball
The Tactics: Reh
Velvet Dos Centauri
Most of these are great! Board game names need to be catchy, so you want surprises. And short strings have highly ambiguous grammar anyway, so you don't get the "written by an alien" effect.
Conclusion
You know that I've been down on Markov chains for years, and you also know why: they rely on, and magnify, the predictability of their input. Markov chains turn creative prose into duckspeak. Whereas Queneau assembly simulates (or at least stimulates) creativity by manufacturing absurd juxtapositions.
The downside of Queneau is that if you can't model the underlying structure with code, the juxtapositions tend to be too absurd to use. And it's really difficult to model natural-language prose with code.
So here's my three-step meta-algorithm for deciding what to do with a corpus:
If the items in your corpus follow a simple structure, code up that structure and go with Queneau.
If the structure is too complex to be represented by a simple program (probably because it involves natural-language grammar), and you really need the output to be grammatical, go with Markov.
Otherwise, write up a crude approximation of the complex structure, and go with Queueau.
December 4, 2013
Secrets of (peoples' responses to) @horse_ebooks—revealed!
This let me prove one of my hypotheses about the secret to _ebooks style comedy gold. I also disproved one of my hypotheses re: comedy gold, and came up with an improved hypotheses that works much better. Using these as heuristics I was able to make @pony_strategies come up with more of what humans consider the good stuff.
Timing
The timing of @horse_ebooks posts formed a normal distribution with mean of 3 hours and a standard deviation of 1 hour. Looking at ads alone, the situation was similar: a normal distribution with mean of 15 hours and standard deviation of 2 hours. This is pretty impressive consistency since Jacob Bakkila says he was posting @horse_ebooks tweets by hand. (No wonder he wanted to stop it!)
My setup is much different: I wrote a cheap scheduler that approximates a normal distribution and runs every fifteen minutes to see if it's time to post something.
Beyond this point, my analysis excludes the ads and focuses exclusively on the quotes. Nobody actually liked the ads.
Length
The median length of a @horse_ebooks quote is 50 characters. Quotes shorter than the median were significantly more popular, but very long quotes were also more popular than quotes in the middle of the distribution.
Capitalization
I think that title case quotes (e.g. "Demand Furniture") are funnier than others. Does the public agree? For each quote, I checked whether the last word of the quote was capitalized.
43% of @horse_ebooks quotes end with a capitalized word. The median number of retweets for those quotes was 310, versus 235 for quotes with an uncapitalized last word. The public agrees with me. Title-case tweets are a little less common, but significantly more popular.
The punchword
Since the last word of a joke is the most important, I decided to take a more detailed look each quote's last word. My favorite @horse_ebooks tweets are the ones that cut off in the middle of a sentence, so I anticipated that I would see a lot of quotes that ended with boring words like "the".
I applied part-of-speech tagging to the last word of each quote and grouped them together. Nouns were the most common by far, followed by verb of various kinds, determiners ("the", "this", "neither"), adjectives and adverbs.
I then sorted the list of parts of speech by the median number of retweets a @horse_ebooks quote got if it ended with that part of speech. Nouns and verbs were not only the most common, they were the most popular. (Median retweets for any kind of noun was over 300; verbs ranged from 191 retweets to 295, depending on the tense of the verb.) Adjectives underperformed relative to their frequency, except for comparative adjectives like "more", which overperformed.
I was right in thinking that quotes ending with a determiner or other boring word were very common, but they were also incredibly unpopular. The most popular among these were quotes that repeated gibberish over and over, e.g. "ORONGLY DGAGREE DISAGREE NO G G NO G G G G G G NO G G NEIEHER AGREE NOR DGAGREE O O O no O O no O O no O O no neither neither neither". A quote like "of events get you the" did very poorly. (By late-era @horse_ebooks standards, anyway.)
It's funny when you interrupt a noun
I pondered the mystery of the unpopular quotes and came up with a new hypothesis. People don't like interrupted sentences per se; they like interrupted noun phrases. Specifically, they like it when a noun phrase is truncated to a normal noun. Here are a few @horse_ebooks quotes that were extremely popular:
Don t worry if you are not computer
Don t feel stupid and doomed forever just because you failed on a science
You constantly misplace your house
I have completely eliminated your meal
Clearly "computer", "science", "house", "and "meal" were originally modifying some other noun, but when the sentence was truncated they became standalone nouns. Therefore, humor.
How can I test my hypothesis without access to the original texts from which @horse_ebooks takes its quotes? I don't have any automatic way to distinguish a truncated noun phrase from an ordinary noun. But I can see how many of the @horse_ebooks quotes end with a complete noun phrase. Then I can compare how well a quote does if it ends with a noun phrase, versus a noun that's not part of a noun phrase.
About 4.5% of the total @horse_ebooks quotes end in complete noun phrases. This is comparable to what I saw in the data I generated for @pony_strategies. I compared the popularity of quotes that ended in complete noun phrases, versus quotes that ended in standalone nouns.
Quote ends in Median number of retweets
Standalone noun 330
Noun phrase 260
Other 216
So a standalone noun does better than a noun phrase, which does better than a non-noun. This confirms my hypothesis that truncating a noun phrase makes a quote funnier when the truncated phrase is also a noun. But a quote that ends in a complete noun phrase will still be more popular than one that ends with anything other than a noun.
Conclusion
At the time I did this research, I had about 2.5 million potential quotes taken from the Project Gutenberg DVD. I was looking for ways to rank these quotes and whittle them down to, say, the top ten percent. I used the techniques that I mentioned in my previous post for this, but I also used quote length, capitalization, and punchword part-of-speech to rank the quotes. I also looked for quotes that ended in complete noun phrases, and if truncating the noun phrase left me with a noun, most of the time I would go ahead and truncate the phrase. (For variety's sake, I didn't do this all the time.)
This stuff is currently not in olipy; I ran my filters and raters on the much smaller dataset I'd acquired from the DVD. There's no reason why these things couldn't go into olipy as part of the ebooks.py module, but it's going to be a while. I shouldn't be making bots at all; I have to finish Situation Normal.
@pony_strategies
Unlike @horse_ebooks, @pony_strategies will not abruptly stop publishing fun stuff, or turn out to be a cheesy tie-in trying to get you interested in some other project. It is a cheesy tie-in to some other project (Constellation Games), but you go into the relationship knowing this fact, and the connection is very subtle.
When explaining this project to people as I worked on it, I was astounded that many of them didn't know what @horse_ebooks was. But that just proves I inhabit a bubble in which fakey software has outsized significance. So a brief introduction:
@horse_ebooks was a spambot created by a Russian named Alexei Kouznetsov. It posted Twitter ads for crappy ebooks, some of which (but not all, or even most) were about horses. Its major innovative feature was its text generation algorithm for the things it would say between ads.
Are you ready? The amazing algorithm was this: @horse_ebooks ripped strings more or less randomly from the crappy ebooks it was selling and presented them with absolutely no context.
Trust me, this is groundbreaking. I'm sure this technique had been tried before, but @horse_ebooks was the first to make it popular. And it's great! Truncating a sentence in the right place generates some pretty funny stuff. Here are four consecutive @horse_ebooks tweets:
Not only that, but whether you believe it (or want to believe it) the car salesmen will continue to laugh
Demand Furniture
Including simplified four part arrangements for the novice student and
Just look at everything that I am goingThere was a tribute comic and everything.
I say @horse_ebooks "was" a spambot because in 2011 the Twitter account was acquired by two Americans, Jacob Bakkila and Thomas Bender, who took it over and started running it not to sell crappy ebooks, but to promote their Alternate Reality Game. This fact was revealed back in September 2013, and once the men behind the mask were revealed, @horse_ebooks stopped posting.
The whole conceit of @horse_ebooks was that there was no active creative process, just a dumb algorithm. But in reality
Bakkila was "impersonating" the original algorithm—most likely curating its output so that you only saw the good stuff. No one likes to be played for a sucker, and when the true purpose of @horse_ebooks was revealed, folks felt betrayed.
As it happens, the question of whether it's artistically valid to curate the output of an algorithm is a major bone of contention in the ongoing Vorticism/Futurism-esque feud between Adam Parrish and myself. He is dead set against it; I think it makes sense if you are using an algorithm as the input into another creative process, or if your sole object is to entertain. We both agree that it's a little sketchy if you have 200,000 fans whose fandom is predicated on the belief that they're reading the raw output of an algorithm. On the other hand, if you follow an ebook spammer on Twitter, you get up with fleas. I think that's how the saying goes.
In any event, the fan comics ceased when @horse_ebooks did. There was a lot of chin-stroking and art-denial and in general the reaction was strongly negative. But that's not the end of the story.
You see, the death of @horse_ebooks led to an outpouring of imitation *_ebooks bots on various topics. (This had been happening before, actually.) As these bots were announced, I swore silent vengeance on each and every one of them. Why? Because those bots didn't use the awesome @horse_ebooks algorithm! Most of them used Markov chains, that most hated technique, to generate their text. It was as if the @horse_ebooks algorithm itself had been discredited by the revelation that two guys from New York were manually curating its output. (Confused reports that those guys had "written" the @horse_ebooks tweets didn't help matters--they implied that there was no algorithm at all and that the text was original.)
But there was hope. A single bot escaped my pronouncements of vengeance: Adam's excellent @zzt_ebooks. That is a great bot which you should follow, and it uses an approximation of the real @horse_ebooks algorithm:
The corpus is word-wrapped at 35 characters per line.
Pick a line to use as the first part of a tweet.
If (random), append the next line onto the current line.
Repeat until (random) is false or the line is as large as a tweet can get.
And here are four consecutive quotes from @zzt_ebooks:
SHAPIRO: Ouch! SHAPIRO: Shapiro cares not! SHAPIRO: Hooray!
things, but I saw some originality in it. The art was very simple, but it was good
You're tackled by the opponent!
Gender: Male Height: 5'9" Pilot? Yes Ph.D.? Yes
Works great.
The ultimate genesis of @pony_strategies was this conversation I had with Adam about @zzt_ebooks. Recently my anger with *_ebooks bots reached the point where I decided to add a real *_ebooks algorithm to olipy to encourage people to use it. Of course I'd need a demo bot to show off the algorithm...
The @pony_strategies bot has sixty years worth of content loaded into it. I extracted the content from the same Project Gutenberg DVD I used to revive @everybrendan. There's a lot more where that came from--I ended up choosing about 0.0001% of the possibilities found in the DVD.
I have not manually curated the PG quotes and I have no idea what the bot is about to post. But the dataset is the result of a lot of algorithmic curation. I focused on technical books, science books and cookbooks--the closest PG equivalents to the crap that @horse_ebooks was selling. I applied a language filter to get rid of old-timey racial slurs. I privileged lines that were the beginnings of sentences over lines that were the middle of sentences. I eliminated lines that were boring (e.g. composed entirely of super-common English words).
I also did some research into what distinguished funny, popular @horse_ebooks tweets from tweets that were not funny and less popular. Instead of trying to precisely reverse-engineer an algorithm that had a human at one end, I tried to figure out which outputs of the process gave results people liked, and focused my algorithm on delivering more of those. I'll post my findings in a separate post because this is getting way too long. Suffice to say that I'll pit the output of my program against the curated @horse_ebooks feed any day. Such as today, and every day for the next sixty years.
Like its counterpart in our universe, @pony_strategies doesn't just post quotes: it also posts ads for ebooks. Some of these books are strategy guides for the "Pôneis Brilhantes" series described in Constellation Games, but the others have randomly generated titles. Funny story: they're generated using Markov chains! Yes, when you have a corpus of really generic-sounding stuff and you want to make fun of how generic it sounds by generating more generic-sounding stuff, Markov chains give the best result. But do you really want to have that on your resume, Markov chains? "Successfully posed as unimaginative writer." Way to go, man.
Anyway, @pony_strategies. It's funny quotes, it's fake ads, it's an algorithm you can use in your own projects. Use it!
Leonard Richardson's Blog
- Leonard Richardson's profile
- 43 followers
