Leonard Richardson's Blog, page 7
January 11, 2020
Leonard's Excursions 2019
Early in the year I took my first trip to Chicago, for DPLAFest. I stayed with Beth and we did some fun tourist things, like the Chicago Architecture Center boat tour! Accept no substitutes! Or do, it's probably okay. But the CAC tour was great.
We also hit the Chicago Art Institute, which was a real highlight, since Beth is a fine artist who went there all the time as a kid and talked about her favorite pieces. A few of my favorites which I'll share with you, via the medium of website links rather than my own awkward photos.
The Artist Looks at Nature by Charles Sheeler. This is the piece hung directly to the right of American Gothic. Beat the lines!
Cow Relieving Itself by Nicolaes Berchem the Elder. Nuff said.
Robot by Alexandra Exter. Absolutely incredible, especially considering it's from 1926!
i\��.. by Jacqueline Humphries. The Smooth Unicode of fine art!
Eviscerated Corpse by Mike Kelley, the work that
made my 14-year-old mind stop in its tracks at LACMA and understand contemporary art.
They've also got the old floor of the Chicago Stock Exchange off in a corner! A corner I guess they use for events, since I don't think the Chicago Stock Exchange originally had a grand piano on the floor. Some live music would have really classed up the joint, though, I tell you what.
In May, my sisters came to New York and surprised me with a weekend of tourist activities and a fancy dinner!
For my birthday we planned a getaway in upstate New York at a rented house with a few friends. Allison and I did some stargazing and saw a little bit of a meteor shower. Shout out to Rodgers Book Barn, the perfect mix of "peaceful rural atmosphere" and "huge used bookstore". Thanks to Zack and Pam for driving.
Allison and I went to a Manfred Mohr retrospective at a gallery. Never heard of him before but it was definitely art the two of us can agree on. I really liked his plotter-esque pictures from the 70s and 80s, such as P2400-297d_5225__black. The names of the artworks feel like program filenames; I was expecting a bunch of _final_FINAL.
PS: in June, Sumana and I randomly ate dinner at Copinette, a French restaurant on the former site of Copain, the much fancier French restaurant that Gene Hackman stakes out in The French Connection. You live in New York for a while and these odd coincidences become smaller and less common, but they still happen!
The Crummy.com Review Of Things 2019, Part Two: Film
Most of the movies in this year's top ten come from the 1980s, due in large part to Bill Forsyth's dominance of the scoreboard. Sorry to be the person in the Youtube comments on a rock video saying "Wish I had a time machine! I'd go back to the 80s and relive the same ten-year span over and over until I died! Who's with me? haha!"
Wings of Desire (1987)
Knives Out (2019)
Breaking In (1989)
Comfort and Joy (1984)
Face/Off (1997)
Gregory���s Girl (1980)
Working Girl (1988)
Puppy Love (1985)
Booksmart (2019)
Sweet Charity (1969)
On a meta level, I love how almost every year my top film of the year has been one I went into without any particular expectations. Keep the surprises coming, I say.
If you only care about recent movies, here's my top list from 2019:
Knives Out (2019)
Apollo 11 (2019)
Booksmart (2019)
Recorder: The Marion Stokes Project (2019)
Born Bone Born (2018)
I snuck Apollo 11 in there even though I saw it on January 5th, because it's just that good. As always, I've updated Film Roundup Roundup to include about thirty recommended films that in I either first saw or first reviewed in 2019.
January 10, 2020
The Crummy.com Review Of Things 2019, Part One
Here's our Christmas card photo. I impulsively volunteered to wear the Patience suit for an NYPL photo shoot that I don't think ended up being used for anything? I would not repeat this experience, but I'm glad I did it: I got a taste of what it's like to be the weirdo in Times Square everyone has decided to ignore. So let's start this Review of Things off right, with:
Books
The Crummy.com Books of the Year are the Steerswoman series by Rosemary Kirsten. I can't say enough good about these books: how they're fantasy and science fiction at the same time; how tight the integration is between worldbuilding, character development, and plot; and how varied the pacing is. I'm so glad that the Internet has let the books come out of midlist purgatory, find their audience, and give Kirsten a way to finish the series.
Some other notable books I read in 2019:
Lifelode by Jo Walton (Sumana's recommendation for a Steerswoman readalike)
The Pigeon Tunnel by John le Carre
Minitel: Welcome to the Internet by Julien Mailland and Kevin Driscoll
Sea People: The Puzzle of Polynesia by Christina Thompson
Elements of Surprise: Our Mental Limits and the Satisfactions of Plot by Vera Tobin (Of huge interest to writers but not, according to reactions when I talk about it, to anyone else)
I finished volume 3 of Mark Twain's autobiography, as promised. He's the Twainiest! Also, I recently learned about the incredibly sleazy tactic UC Berkeley used to keep copyright on this book until 2047, when it would have otherwise expired in 2003. The best I can say is that, judging from the contents of the autobiography, Twain himself would have approved.
I've been reading Bleak House for most of the year; it's slow going! But not for the reason I expected: there's a whole other subplot in here that I don't find super engaging.
Games
The Crummy.com Game of the Year is "Cataclysm: Dark Days Ahead", an open world zombie survival game that's also run as a modern open-source project, with pull requests and code review. Not only is this great for keeping gameplay fresh in these kitchen-sink roguelikes where wealth of detail is really important, it's really good to see on its own. This could be the gaming gateway that gets The Kids interested in software development best practices!
Other fabulous 2019 games I played include "Baba Is You", "Untitled Goose Game", "Dicey Dungeons", and "Super Mario Maker 2".
Writing
I wrote four short stories in 2019: "Meat", "Mandatory Arbitration", "User Error", and "The Scene of the Crime". Three of those stories feature a character who in one of my luckier future timelines becomes my Sherlock Holmes, a character who is remembered long after I and all of my other work have been forgotten. Very positive about this character, is what I'm saying. Really fun to write.
I assembled a NaNoGenMo novel: Linked by Love.
I'm getting much more aggressive this year about placing my fiction, so hopefully we'll see some sales. In terms of novels, there's good news and bad news and for now I'm gonna have to go with a big NO COMMENT.
Bots
I created only one bot this year, Secretly Public Domain, and I made it for a specific activist purpose which is more or less seeing results. As per NYCB passim I had some additional bot ideas, did the fun part of the work, and let the code sit in the programming/2019 folder of my archive.
I decided not to keep Almanac for New Yorkers going in 2020. There's one more year of life in the project, thanks to 1939, and the 1938 almanac for San Francisco, but the project wasn't super popular and 2020 isn't the year. Maybe later.
I do have two "just for fun" bot ideas that I'm gradually seeing through to completion. One of them is going to have to wait until I'm sick or otherwise mentally impaired and have nothing better to do than go through a huge amount of text, but you're gonna love it. And by "you" I mean "Allison".
December 30, 2019
December 23, 2019
Cassini metadata
All the photos taken by the Cassini probe are online, and each photo has associated metadata. By looking for photos taken by the same instrument at evenly spaced times, you can find frames that would look good as an animation. Here's a nice example: a "movie" of Saturn's moon Dione taken on November 1, 2011:
It was really fun to make these animations that probably only a few people before me had seen. And the fun doesn't need to stop at Saturn: the SETI Project's OPUS3 has aggregated imagery from across NASA's outer planet missions.
The bad news for anyone else who wants to try this out is that the Cassini data is huge. You can download individual frames pretty easily, but the metadata was kept in enormous tarballs and stored in an ad hoc 1990s file format. To provide a booster seat for the future, I converted the metadata into NDJSON format and put it up as cassini-metadata. Here's one more GIF to whet your interest: Saturn's rings on July 15, 2014:
December 20, 2019
Openly Public Domain
This is probably the visible result of the work described in "It's No Secret - Millions of Books Are Openly in the Public Domain", the first known blog post to cast shade on one of my bots with its title. I knew Hathi had done a few books as a test, and now it's really ramped up!
Now we're at the point where thirteen of the last twenty books my bot posted are already "Full view". And notably, the computer-history book I mentioned in the Vice story, The compatible time-sharing system: a programmer's guide., is also "Full view"! Way to go.
December 18, 2019
Only g62 Kids Will Remember These Five Moments
November 18, 2019
NaNoGenMo 2019: "Linked by Love"
August 9, 2019
Secretly Public Domain: Update
Topline number is 73%
My original estimate was that 80% of pre-1963 books were not renewed. This was based on a couple of inaccurate assumptions, the big one being that I was counting works originally published in a foreign country. Those works might have lapsed into the public domain at some point, but the US copyright has since been restored by treaty. So their renewal status isn't really relevant.
Of the books where renewal status is relevant, here are the most recent statistics:
73% have no renewal record at all.
19% have a renewal record that's an excellent match.
8% are in a grey area. They have one or more renewal records, but none of them are an excellent match. One of them might be legit, or they might all be renewals for totally different books. They need to be checked manually.
Credits
The "Secretly Public Domain" bot was a publicity stunt to draw attention to the machine-readable registration records. It worked great, but it also drew attention to me, the person doing the publicity stunt, even though I had basically nothing to do with the original work. For the record, here are the people who actually did the work. The project inside NYPL was run by Sean Redmond, Greg Cram, and Josh Hadro (now of IIIF). The work of making the copyright records machine-readable was done by Data Conversion Laboratory.
Buried treasure
Most of the books whose copyright wasn't renewed are really obscure titles, but without looking very hard I found a very well-known science fiction novel that has no renewal record. I'm not mentioning the name as an incentive to get people to look at the data themselves. It's probably not the only well-known work whose copyright wasn't renewed.
How to make your own list
My original estimate of 80% was based on the quick and dirty script I used to write the Mastodon bot. To fix the "foreign works" problem and to produce a dataset that would stand up to scrutiny, I published a Python library specifically for handling this data. It's got business logic for making determinations like "was this book published in a foreign country" and "how well does this renewal record match this registration record". You run the scripts and at the end you have a bunch of JSON files with consolidated data. If you think there are bad assumptions, you can change the business logic and run the scripts again.
How to see the data
There were a number of requests for this data in a tabular form. I totally understand where this is coming from, and it's certainly the easiest way to get into the data, but it's tricky, because converting the JSON to tabular data destroys information that would be useful for taking the next step (see below).
So, I've done the best I can. I added a script to the end of my Python workflow which generates three huge tab-separated files, and I put those files in the cce-spreadsheets project. This should be good for getting an overview of which books were renewed, which weren't, and which are foreign publications.
What's next?
Discovering that a book published in 1950 is in the public domain, doesn't make a free digitized version of that book automatically appear. Somebody has to do the work. At this point we go from fast data processing to really slow research and digitization work. You or I can now make a near-complete list of unrenewed books in a few minutes, but that list just represents an enormous to-do list for someone.
There are basically three "someones" who might step up here: Project Gutenberg, Hathi Trust, and Internet Archive.
Project Gutenberg
As I mentioned earlier, Project Gutenberg digitized the copyright renewal records some time ago, and they use them all the time. They have a section of their Copyright How-To explaining how to check whether a particular title was renewed, and whether the renewal matters. There are other steps to clear a pre-1963 work: you have to verify that the author lived in the US at the time, stuff like that. The newly digitized registration records can help with some of this, and my data processing script that combines registration and renewal can help with more of it, but there's still some manual work you have to do for each book.
Once that work is done, Project Gutenberg volunteers will locate a copy of the book, scan it, and OCR it (assuming there's no existing scan). Then they'll proofread it and put out HTML and plain-text editions. As you can imagine, this process takes a really long time, but the result is a clean, accurate copy of the book that can be read on its own or reused in other projects. The catch is that somebody has to care enough about a specific book to go through all this trouble.
Hathi Trust
Hathi Trust already has scans of a lot of these 1924-1963 books. They just don't make these scans available to the public, because as far as they know, all these books are still under copyright. If they were convinced otherwise, they'd open up the scans—they opened up almost all of their 1923 stuff this January when the 95-year copyright term finally expired. So we have to make a case for opening up these books.
Earlier, NYPL took the highest-circulating 1924-1963 books in our research collection and checked to see which ones lacked a renewal record. We sent the list to Hathi Trust, and they did their own verification and opened up some of the books: The Americans in Santo Domingo from 1928 is an example. Once Hathi opens up a scan, it's available to the public. It also becomes possible for Gutenberg et al. to turn the raw scan into something more readable.
In the near future, people at NYPL (not me) will be talking to people at Hathi Trust about what kind of evidence is necessary, in general, to convince them that the copyright on a 1924-1963 book has lapsed. Then we'll be able to give them a list of all the books where we can find that kind of evidence. There'll still be a verification process on the Hathi Trust side -- at the very least, they have to go through the book and make sure it doesn't contain unauthorized reprints from other books -- but it should streamline things quite a bit.
Internet Archive
Internet Archive is a wild card here. They scan a lot of books, and I could see them treating the "unrenewed" list as a big list of additional books to scan, but it would be a new undertaking. Making unrenewed works available is something Project Gutenberg volunteers do already, and it's something that Hathi Trust could do relatively easily, but with Internet Archive it's more the sort of thing they'd do.
Data problems
That 8% of grey area, where it's not clear whether or not a book was renewed, points to the general difficulty of meshing together two sets of public records published across half a century and digitized by different people. The grey area represents a lot of manual work that has to be done, and of course there's always the fear that a book that seems to be free and clear actually isn't: the title page says "printed in Canada", or the smoking-gun copyright renewal didn't show up because its ID number was typed wrong.
There's going to be a lot of manual work in the process of clearing these books, but there's no reason to wait until everything's perfect to get started. My preference is to cast a very wide net, try to find any renewal that might possibly be related to a registration, and make the grey area as big as possible. We know that a majority of 1924-1963 books will always come up "no renewal", because there are way more registrations than renewals. We can deal with those and then take a closer look at the grey area.
July 22, 2019
Secretly Public Domain
This is how Project Gutenberg is able to publish all these science fiction stories from the 50s and 60s. Those stories were published in issues of magazines that didn't send in the renewal form. But up til now this hasn't been a big factor, because 1) the big publishers generally made sure to send in their renewals, and 2) it's been a big pain to check the renewal for any particular book.
Up through the 1970s, the Library of Congress published a huge series of books listing all the registrations and the renewals. All these tomes have been scanned -- Internet Archive has the registration books—but only the renewal information was machine-readable. Checking renewal status for a given book was a tedious job, involving flipping back and forth between a bunch of books in a federal depository library or, more recently, a bunch of browser tabs.
But! A recent NYPL project has paid for the already-digitized registration records to be marked up as XML. (I was not involved, BTW, apart from saying "yes, this would work" four years ago.) Now for anything that's unambiguously a "book", we have a parseable record of its pre-1964 interactions with the Copyright Office: the initial registration and any potential renewal.
The two datasets are in different formats, but a little elbow grease will mesh them up. It turns out that eighty percent of 1924-1963 books never had their copyright renewed. More importantly, with a couple caveats about foreign publication and such, we now know which 80%.
This was announced back in May, but I don't think it got the attention it deserved. This is a really big deal, so I had no choice but to create a bot. Here's Secretly Public Domain, which highlights unrenewed works that have already been scanned for Hathi Trust. This only represents 10% of the 80%, but it's the ten percent most likely to be interesting, and these books have the easiest path towards being available online.
Leonard Richardson's Blog
- Leonard Richardson's profile
- 43 followers
