Goodreads Librarians Group discussion
Book & Author Page Issues
>
Warning: GoodReads scripts messing with authors
date
newest »

Michael, I already told Otis about this issue, but I hadn't realized the degree, so I'll update him.

Hard to know how serious it is, but seeing two examples of the same thing 10 minutes apart, without even looking for it, makes me think it might be pretty wide spread.
I didn't really think about it much when I saw the odd error on the book with my name on it, but when I saw ♥Eva♥'s post bells went off. I'm currently looking at the global librarian change log. Almost every single edit on the first page is the addition of authors to a book by "GoodReads" and just skimming through the list I can see:
* "Eve Berman Do" added as second author to book by "Eve Berman"
* "Mike Bales" added as second author to book by "Michael C. Bales"
* "Johnson Wales University Staf" added as second author to book by "Johnson & Wales University"
* "Maureen E. Reed" added as second author to book by "Maureen Reed"
* "Bakvis, Herman / Skogstad, Grace Bakvis, Herman / Skogstad, Grace" added as second author to book by "Grace Skogstad"
And those are just from the first 20 or so entries...


http://www.goodreads.com/author/show/...
With entries for both with the punctuation after the middle initial and without listed with the same book. Just started noticing this this morning.

11 hours later, the script is still chugging along. This should be fun.

Here's another thing - variations of the spelling of the last name.

At the risk of sounding like a b____ once again a lot of work is being made for (Volunteer) librarians.
(And I'm not one, so it isn't personal.)
(Perhaps there is an "undo" option and this can be repaired.)
Testing is a good idea.
This isn't a situation where the good outweighs the bad, is it?
Good Reads is wonderful, but it rides on the backs of volunteers who must eventually get tired of undoing things (or, I hope they will) that could have been prevented!
I know I am making myself unpopular with the Powers that Be, but someone has to say it.
But I'll export by booklist now just the same....
OK, now rant back at me :)

It might be worthwhile for Otis to start asking the librarian group about massive auto-script updating before implementation. Despite some of the errors it introduced, the auto-combine script from last week seems to have been fairly useful (it did have some bugs in it, which is a different issue). The more I look at it, the more this author script is turning into a nightmare and I think a lot of the problems could have been predicted.
In some sense, this has the potential to partly undo every author correction ever made. Every disambiguation may have the single space version added as a co-author (it's not, thankfully, replacing any authors, just adding), every name variant correction will be re-added to books, etc. It's going to mean merging hundreds to thousands of additional authors (variant names) as well as deleting thousands to (don't want to think about it) secondary authors from books.
Any why do these changes always get implemented right before the weekend?

I really hope there is an undo feature that can be implemented, or failing that, a backup made before this potential disaster of an improvement was implemented.
As to weekend running, do you see anyone else either commenting or attempting to fix it :) ? Turn it on and run...... :)
As I said, I am not a librarian, and this is one of the reasons why I have never applied. I truly feel that GR (I will blame Otis, but just because he's handy) is taking advantage of a free labor force.
It is not impossible to A: TEST on a small data set. This kind of mess would show up on even a data set of 1000. Why not try that?
B. As you said Michael, ASKING the people affected who do the work is a concept too.
Sorry to sound so irritable, but this kind of thing really pi...es me off.
I realize that GR is free to you and me and so to a point complaining is rather churlish, but taking advantage of people's goodwill and free work for profit (and I don't believe for a minute there isn't one, and there should be) is unethical and frankly, not so bright. Sooner or later volunteers will get fed up.

As for the situation of taking advantage of librarians, I think that's unfounded. It's a volunteer position, after all. Anyone who doesn't want to be one doesn't have to. And even those with librarian status are under no obligation to do anything. Some people have librarian status and rarely use it. We're grateful for those who do, but I suspect they do it out of a desire to improve the site, not out of a deep sense of obligation.
As for checking with librarians before running scripts like this, I'll let Otis speak to that, as that's really his territory.

I understand that it's a volunteer position with no obligation imposed at all.
My point is that at some point even volunteers get P/O'd when changes are made that mess up work done without any apparent test/consideration whatever.
For example, when you revert to a saved database, whatever work done (correctly) since then will be lost. Maybe not much, maybe more, who knows? But that's very frustrating for someone who has done it.
I know there are volunteers on this site (and God Bless them) whose life seems to heavily revolve around being librarians, super librarians, whatever.
But I wonder at what point even those dedicated souls will get annoyed. And sorry, but I do think it's taking advantage of everyone's current good nature.
No one is forced, everyone wants a good database.
And they get a sense of satisfaction out of the job I imagine, or a sense of empowerment, or something.
BUT, if GR didn't have volunteers and had to pay someone to do this, what would it cost? Paid employees can't complain much if they wish to remain such, but volunteers can go elsewhere. Most won't. But some will.

Apologies for the errors. Please don't assume that we do these things on purpose or that we'll rely on librarians alone to clean it up. We're all in this together! :)

I understand that it's a volunteer position with no obligation imposed at all.
My point is that at some point even volunteers ..."
Oh, it's certainly annoying. I didn't mean to imply otherwise. It's obviously a deeply frustrating experience to change something and see that work undone by an automated process. But to suggest that there is exploitation happening is incorrect. People are librarians because they want to be. If they no longer want to be librarians, as you point out, they will choose that. But the choice is, as always, theirs. We try to make the site the best it can be. Sometimes those efforts backfire, but there's nothing malicious about that.

Good thing is we have logs of all of it, so we should be able to fix easily.

I'm playing high/low with with the librarian logs trying to see when it started. To my surprise, some of the GoodReads changes seem to be much older than I expected. I don't know if every occurrence of
"GoodReads updated the book XXXX" by XXXX
additional author added: xxxx
is part of the problem or not. I don't recall ever seeing these in the past, but it's been a long time since I've scanned through the global librarian edit logs.
Based on that sort of log entry, the very earliest of these I can find (17000+ log pages back) is from March 18, 10:47am (#2732905) (this is approximately 1.7 million logged edits (from all librarians) ago). (Interestingly, the second earliest of all of these log entries that I found is an error where a variant of the author's name was added as second author). Prior to that entry, I don't see any GoodReads edits that involve adding additional authors to an existing book. I could easily have missed an earlier one, but they pick up pretty regularly in the log at that point, so I doubt I missed too many. From that point onward they start to dominate most log pages (except for the period where Otis was running his combine script, which overwhelmed most other edits at that point).
Wow, looks like this has been going on for awhile, but we've only noticed the problems it was causing very recently. I don't know if its sped up in the last week or two or it just had to reach a saturation point until we could realize the problem was systemic (probably the latter).
When GoodReads is automatically updating a book, where is it pulling the data from? Amazon, B&N or both? What triggers an auto-update? (as opposed to a manual update). Could Amazon or B&N have reset their DB in some way, causing GR to automatically try to update everything?

Implications of maliciousness were not my intent. I NEVER supposed you/whoever sat there with a gleam in your eye and thought, "now what can I do to mess it up." :)
AND, am glad to know it's being attended to.
So sorry Patrick and Otis; malicious I do not think you are.
Overworked, yes.
After all, this IS the weekend.
And Bunwat, excellent points. I am not converted completely, but I see you points.
Jess

Have faith. There is always SOMEONE willing to screw up data, either deliberately or accidentally or stupidly. Like the helpful soul who deleted a popular Sony e-reader from the e-reader page.

If we are reverting, I'll sit tight.
http://www.goodreads.com/book/show/24...
Just to make this more exciting, here's an author who has now been added automatically Arthur M. Schlesinger without the Jr. and both the father and the son are authors. This will not be a simple combine to fix this one manually. Guessing that there are many more examples as well.
Think I'll sit tight and wait until Monday

If GR has to revert, all work would be lost.
Although where to post it best, I have no idea.

Thanks. I try to be.
AND, I didn't call you any pejorative names (nor you me). What a concept. :)
Otis wrote: "We will fix - but as it's the weekend we may not get to it until next week :)"
Really, some people. ;)
I suspect that waiting until the fix is implemented is the best course, Jessie.
Really, some people. ;)
I suspect that waiting until the fix is implemented is the best course, Jessie.

Pretty sure just correcting these issues, as I believe Otis's implication is that fixing from logs should affect just the incorrectly added authors.
Anyway, that's what I'm doing. I just made several other edits.
Anyway, that's what I'm doing. I just made several other edits.
In the last day or so, "GoodReads" (according to the change log) has been updating authors on a number of books. I'm not sure what the motivation is (my guess is there is a script trying to update missing info; I don't know if this is related to the combine script or something else), but unfortunately it is starting to introduce errors into books.
For example, if the first author has been disambiguated by adding a single space, it's adding the non-spaced author as a second author (this happened to me when I discovered a new book on my home page by another author with the same name...a book which had already been disambiguated and still had the other disambiguated author as first author). It's also adding name variants (another librarian thread noted that GoodReads added Angela Hunter to a book by Angela M. Hunter) as secondary authors.
I don't know how widespread this is, but you may want to keep your eyes open for similar sorts of errors.