Goodreads Librarians Group discussion

92 views
Authors with no books, etc. script

Comments Showing 1-11 of 11 (11 new)    post a comment »
dateDown arrow    newest »

message 1: by rivka, Librarian Moderator (new)

rivka | 42039 comments Mod
Ran for the first time yesterday and deleted ~1 million "authors" with no books, quotes, fans, interviews, videos, blogs, etc.

The current plan is to run it again monthly.


message 2: by Paula (new)

Paula (paulaan) | 7027 comments Fab!


message 3: by Soul (new)

Soul (soulkeeper720) | 36 comments ;) Junk data! great.


message 4: by Vicky (new)

Vicky (librovert) | 2459 comments I already <3 this script! haha


message 5: by Keith (new)

Keith (kgf0) | 306 comments Ah, that might explain what happened to some author entries I worked very hard to correct which are now gone (along with any record of my previous edits in my edit history, which is now showing 19K edits where last week it showed 22K, which probably means about 50=100 hours of work lost). Hopefully, this will be a one-time problem resulting from the combination of losing Amazon, importing Ingram & Worldcat, then running the cleanup script.

For example, I suspect that the former entry for Tibetan author Jamyang Khentse Chokyi Lodro disappeared from a combination of removing Amazon data, imports overwriting the author name with every conceivable variant transliteration (all of which I had previously spent several days merging under the formal disambiguated name) leaving nothing in this record but the detailed author description, which the script then ignored, deleting the record.

So I have two requests:

1) We've been repeatedly assured that import data will not overwrite librarian edits. This is not true; many of my author name edits, especially among Tibetan authors, have been overwritten, often by very corrupt data (though mostly just adding honorifics like Chogyal, Rinpoche, Reverend, etc. back into the name field). If it is possible to tighten this up, I would really appreciate it—repeatedly researching the same authors in a niche field is beginning to get tedious and annoying, especially for the obscure ones that the publishers don't know how to treat either.

2) In addition to no books, quotes, etc., could we have the script ignore authors that have descriptions? If that's too tight, maybe we can differentiate real from bogus/pointless descriptions by setting a minimum length of, say, 256 characters.

Regardless of the outcome, thank you for considering these requests.


message 6: by rivka, Librarian Moderator (new)

rivka | 42039 comments Mod
1) I think the problem here has to do (among other things) with how we log author name changes. They are often not reflected on the BOOK's log, just the AUTHOR's log. Thus they don't appear to be librarian-supplied data on the book. (This may or may not be fixable; I'm discussing it with others.)

2) That seems reasonable. I'll ask for that to be added to the script.


message 7: by Sandra (new)

Sandra | 23216 comments Keith, if you look you'll find that all librarians lost edits, most were the inflated edits from when the script to save books was run, so not really any overall loss in numbers.

But I agree that the librarian edits have been overridden when they shouldn't have been.


message 8: by rivka, Librarian Moderator (new)

rivka | 42039 comments Mod
Sandra wrote: "Keith, if you look you'll find that all librarians lost edits, most were the inflated edits from when the script to save books was run, so not really any overall loss in numbers."

Oh, right. Forgot about that -- Brian ran a script to get rid of all those "blank" edits that reflected no actual changes. Later undoing an edit wouldn't actually affect a librarian's edit count, I don't think.


message 9: by Keith (new)

Keith (kgf0) | 306 comments rivka wrote: "2) That seems reasonable. I'll ask for that to be added to the script."

Thanks.

Sandra wrote: "Keith, if you look you'll find that all librarians lost edits, most were the inflated edits from when the script to save books was run, so not really any overall loss in numbers.

Yeah, I figured I probably wasn't the only one; just thought this might be the reason, but Rivka seems to have better information, unsurprisingly.


message 10: by rivka, Librarian Moderator (new)

rivka | 42039 comments Mod
#1 is probably not fixable. If the system can't tell that a piece of information came from a user, it won't prevent an import from overriding it. There are a few cases where that is going to happen, unfortunately. (We're looking at them some more, and I don't understand the technical details, but we're probably going to have to live with some of these.)

#2 is under discussion but I think we'll likely get the change (or something similar) made before the next run.


message 11: by Keith (new)

Keith (kgf0) | 306 comments I think if we get #2, then #1 will be easier to live without. Please let us know, if and when it is known, what edits the system cannot tie to a user, so I/we can ensure that we work around the limitation when possible. (E.g., if I know that an author-merge is going to leave vulnerable data, I might be more inclined to edit every edition of every book tied to the "wrong" author individually before merging, or at least one edition of each work to ensure something remains correct just in case. That's a lot of work, but it's less work than finding and remerging them all every month.)

And thanks again for keeping the ball rolling.


back to top