Goodreads Librarians Group

message 1: by Cait (new)

Apr 01, 2013 12:11PM

As I'm sure everybody's heard by now, Goodreads is back together with Amazon. The data feeds from Amazon that were taken away last year are going to be resumed sometime soon, I'd imagine.

Theoretically, this should be able to pick up just as it was when it was turned off.

In practice, I'm sure here in the librarians group we've all seen the little things that go wrong with a new data feed.

So, can we put together a list of technical "gotchas" that we, as experienced librarians, want to remind the dev team of before they happen?

(I'd like to keep this as a strictly technical thread for the ease of delivery to the dev team. If there are any policy questions which come up from this discussion, can we open new threads for them?)

reply | flag

message 2: by Cait (new)

Apr 01, 2013 12:13PM

Here's one I remember from previous Amazon data feeds: used books with ASIN numbers. As I recall, the filter eventually applied was that only Kindle editions and audiobook editions were imported if they had ASINs but no ISBNs. (This picked up a handful of used audiobooks, but that was negligible. I don't think that there was any attempt to filter the audiobooks further by publisher, which is what would be needed to get the ones for which the ASIN is in fact the primary id.)

reply | flag

message 3: by Banjomike (new)

Apr 01, 2013 12:32PM

Well, I was going to say that the most obvious point, to me, is to tell us BEFORE the feeds are turned back on. Let us know when they will start. Don't start them on a Friday... But if this is only for technical stuff I won't mention that.

First:
The non-Amazon feeds have been improved a lot since January 2012 and are much better at not dumping dubious info all over the handcrafted records that we have tweaked. I hope that the new Amazon feeds will leave any data that has been manually edited.

Second:
There was a discussion on the subject of ASINs on Audible books on the Audible site and the different ASIN that the same Audible book would have on an Amazon site. Do we need to standardise which of those ASINs we want? If Goodreads already has an Audible book with the Audible ASIN we don't want the Amazon import adding the Amazon version of the identical book. Or do we?

This thread: www.goodreads.com/topic/show/1181279-...

reply | flag

message 4: by Cait (new)

Apr 01, 2013 12:49PM

This one probably doesn't apply to Amazon data, but I want to mention it just in case because it's caused huge headaches from the Worldcat data import: the title import absolutely must include subtitles and volume information, including volume information in parentheses. (This actually may be relevant to the amazon.co.jp import in particular, since the format "[title] ([number])" is not uncommon for manga and the format "[title] ([下|上])" is not uncommon for novels split into parts. I know the autocombiner is going to scoop these up together, but it's not a big deal to split them out as long as the numbers are still there.)

reply | flag

message 5: by Moloch (new)

Apr 01, 2013 12:54PM

My request is, as Banjomike said, to NOT overwrite data manually entered by librarians.

Also, I would like to know if the various "Unknown Books" and "Unknown Authors" will be restored; I'm especially concerned about the "Unknown Authors": "Unknown Author 1384662627" can now include many books by many different authors, not just one.

reply | flag

message 6: by Monique (new)

Apr 01, 2013 01:04PM

Moloch wrote: "My request is, as Banjomike said, to NOT overwrite data manually entered by librarians.

Also, I would like to know if the various "Unknown Books" and "Unknown Authors" will be restored; I'm especi..."

That's a really good point. It might help to know from the PTB if they are planning on deleting those prior to the new Amazon feed or if they think that it will be resolved via the feed.

reply | flag

message 7: by Banjomike (new)

Apr 01, 2013 01:04PM

Moloch wrote: "so, I would like to know if the various "Unknown Books" and "Unknown Authors" will be restored; I'm especially concerned about the "Unknown Authors": "Unknown Author 1384662627" can now include many books by many different authors, not just one. "

Good point.

I have always assumed that when Goodreads purged the Amazon data they kept a little bit of it buried in a box in a cupboard that would link the vanished ASIN with the Unknown Book title, just in case. I hope they can link the two together or we are going to lose a lot more than just the ex-Amazon books when the Unknowns are eventually removed.

reply | flag

message 8: by Cait (new)

Apr 01, 2013 01:06PM

Moloch wrote: "My request is, as Banjomike said, to NOT overwrite data manually entered by librarians."

The only way this should happen is if the data is not identified as from a librarian source. This is a known issue with default descriptions, but that's the only specific data type I'm aware of which can't be sourced correctly to prevent overwriting. Do you know of other data types which are not sourced to librarians when librarians edit them?

reply | flag

message 9: by Debbie's Spurts (D.A.) (new)

Apr 01, 2013 01:15PM

I'm with Morloch and Banjomikeon overwriting data.

I'm not completely sure how data feeds all work, but, on some fields (based on other threads) if info came from a data feed, for example Ingram onix, that same data feed can also change fields such as the default description over and over until manually edited. Unless amazon changing policies so that bookcovers can be overwritten, I foresee an issue with covers being overwritten every blasted time an indie changes a cover. Reviews appearing in default descriptions, etc. (not sure the amazon data feed will work well at all with the current alternate cover editions).

Several authors are irate recently (and seen in personal experience) because amazon is not doing well at combining books or editions. So that all of an author's books are not on their author page; and (worse for my own personal reading since I use the kindle editions a lot) their kindle and other editions are not showing correctly combined with paperbacks, hardbacks, audiobooks, etc. when you go to a book page on amazon.

If editions not correctly combined on amazon site or books correctly assigned to authors—don't see how the data feed will do any better. Guess just something to watch out for; possibly staff can make a more official reminder in author and regular feedback groups alerting authors that if it ain't right on amazon site, watch for the same error to be imported to goodreads.

reply | flag

message 10: by Vicky (new)

Apr 01, 2013 05:24PM

I'm definitely hopeful that Goodreads learned from the last big data move and will be better prepared to ensure the least amount of data that librarians have fixed is overwritten.

Banjomike wrote: "Second:
There was a discussion on the subject of ASINs on Audible books on the Audible site and the different ASIN that the same Audible book would have on an Amazon site. Do we need to standardise which of those ASINs we want? If Goodreads already has an Audible book with the Audible ASIN we don't want the Amazon import adding the Amazon version of the identical book. Or do we?

This thread: www.goodreads.com/topic/show/1181279-..."

Definitely agree this should be addressed. Personally, I feel as though the Audible ASIN should be the one we keep (although this likely won't be the one that comes in through the feed.) On the other hand though, the audible "get a copy" link appears to search by title and not the identifier associated with a book, so maybe it doesn't really matter?

----

I would like to see image imports be turned off completely unless there is NO image. When the newer imports started running they often replaced perfectly good images with an image from a newer edition that should have been manually made an alternate cover edition.

----

On the topic of Unknowns, if the data comes back to all the Unknown Author/Book entries, there has to be some way to label them so we can merge them back with entries that may have been added in the mean time without overwriting that information.

Let's say I wrote a book that ended up in the Unknown Author/Book pile. There wasn't enough information for anyone to rescue it, so it's just hanging out there in the abyss. It has a few shelves/ratings but no meaty reviews. Since the book was lost to the unknown, I've re-added it to my page with the ASIN number that was lost. Since we can only use an identifier once, the old edition will (I suspect) not be able to have it's ASIN reverted. There should be some way to then label those books so librarians can find them and merge them.

Perhaps even going so far as to import all data except for the author. With all the other identifying data (and presumably the ability to again use Amazon as a source) it should be easy to find the author information - and keeping the Unknown Author # profiles will make it easy for librarians to find books that need to be merged.

reply | flag

message 11: by ❂ Murder by Death (new)

Apr 01, 2013 05:35PM

Vicky wrote: "I would like to see image imports be turned off completely unless there is NO image. When the newer imports started running they often replaced perfectly good images with an image from a newer edition that should have been manually made an alternate cover edition. "

^ THIS please....

reply | flag

message 12: by Banjomike (new)

Apr 01, 2013 05:46PM

Vicky wrote: "On the other hand though, the audible "get a copy" link appears to search by title and not the identifier associated with a book, so maybe it doesn't really matter? "

It does matter on Audible. If you search for "day of the triffids" on Audible you get three results. One drama and two readings. If someone clicks on a buy link they want to see the one they clicked on, not a further selection. many books on Audible have UK and US versions

UNABRIDGED
http://www.audible.co.uk/pd/ref=sr_1_...
and ABRIDGED
http://www.audible.co.uk/pd/ref=sr_1_...

Sometime it is even more annoying. These two Audible products appear to be identical, have identical titles "Kiss Kiss", but have different ASINs.

http://www.audible.co.uk/pd/ref=sr_1_...
http://www.audible.co.uk/pd/ref=sr_1_...

Not all of the Unknown Books and Unknown Authors were caused by the Amazon purge. If there is truly no way to link the wrecked record back to the proper book then perhaps it should simply be deleted (if no reviews etc).

reply | flag

message 13: by Lobstergirl (new)

Apr 01, 2013 05:53PM

Block imports for things like clip strips, display copies, multiple copies (e.g. 12copy, 12-cpy, 24copy, 24-cpy), and other things on the NAB list.

Strip away the "PB" or "Pb" at the end of a title that sometimes appears to indicate paperback.

Perhaps there's a way to insert a period when one is lacking, e.g. an author being imported as "Joan J Mitchell".

reply | flag

message 14: by Vicky (new)

Apr 01, 2013 05:59PM

Banjomike wrote: "It does matter on Audible. If you search for "day of the triffids" on Audible you get three results. One drama and two readings. If someone clicks on a buy link they want to see the one they clicked on, not a further selection. many books on Audible have UK and US versions."

True. And like I said, I would definitely prefer the ASIN to be the one on the Audible site.

And now that I'm looking at the link, it could easily be changed to search by the ASIN. I just removed the title from the link Goodreads created and inserted the ASIN and got the correct results. I wonder if there's a reason it searches by title?

Banjomike wrote: "Not all of the Unknown Books and Unknown Authors were caused by the Amazon purge. If there is truly no way to link the wrecked record back to the proper book then perhaps it should simply be deleted (if no reviews etc)."

What else caused the Unknown Books/Authors? Not that I don't believe you, I just don't recall them existing before the purge.

Totally agree about deleting Unknowns that haven't been shelved though. Seems like it would be a clean solution for dealing with unshelved books that had been added manually after the data was lost and it would drastically reduce the number of unknowns that might need to be fixed.

reply | flag

message 15: by Emy (new)

Apr 01, 2013 06:07PM

Can we stop ALL CAPS importing on the feeds? (Assuming we don't already!) Just a thought since I regularly come across that on Kindle editions.

reply | flag

message 16: by Banjomike (new)

Apr 02, 2013 02:27AM

Vicky wrote: "True. And like I said, I would definitely prefer the ASIN to be the one on the Audible site. "

Except that the Audible site uses multiple ASINs for the same product as per "Kiss Kiss" in my previous post.

What else caused the Unknown Books/Authors?
PM sent.

reply | flag

message 17: by Moloch (new)

Apr 02, 2013 02:45AM

In non English editions, I think Amazon has the habit of adding, after the title, "(Spanish Edition)", "(Italian Edition)", etc. I'd like those to not be imported (I always delete them, they're not part of the title and we have the language field for that). I don't know if this will be possible, though.

Often after the title there's also, in brackets, the collection; I'd like that not to be imported too (again, if possible).

reply | flag

message 18: by rivka, Former Moderator (new)

Apr 03, 2013 01:25PM

Mod

Vicky wrote: "I would like to see image imports be turned off completely unless there is NO image."

This is problematic with many big-name-author books, as well as certain publishers (for all authors), as they provide initial "cover coming soon" images, and then "preview cover" images, and finally the real thing.

Lobstergirl wrote: "Block imports for things like clip strips, display copies, multiple copies (e.g. 12copy, 12-cpy, 24copy, 24-cpy), and other things on the NAB list."

We already do this, but it will never catch everything. Any script that prevents all such items also has an unacceptable level of false negatives, and blocks legit items.

Emy wrote: "Can we stop ALL CAPS importing on the feeds?"
Moloch wrote: "In non English editions, I think Amazon has the habit of adding, after the title, "(Spanish Edition)", "(Italian Edition)", etc. I'd like those to not be imported"
It's very difficult to create scripts that strip out such data but do not also strip out wanted data. But I'll pass these requests along.

reply | flag

message 19: by Sandra (new)

Apr 04, 2013 12:51AM

Just came across one of my books whose image was changed by Onix Ingram.

http://www.goodreads.com/book/edits/3...

My copy is dated September 2000 and the cover should look like

After Dark (Harmony, #1) by Jayne Castle

If we can't get that right, how can we trust the Amazon feed?

reply | flag

message 20: by rivka, Former Moderator (new)

Apr 04, 2013 09:26AM

Mod

The image was changed by Ingram because they were also the source of the previous image. As I explained above, that behavior is deliberately allowed. (Although it's pretty easy for a librarian to undo such changes on an individual basis, as needed.)

reply | flag

message 21: by Debbie's Spurts (D.A.) (last edited Apr 04, 2013 11:43AM) (new)

Apr 04, 2013 11:42AM

If librarian feedback matters, I think there's a whole can of worms with ebooks and cover changes that may need revisiting when sale is final or data feeds from amazon restored.

Absolutely with messages 10 & 11 about protecting covers. (I know, lots up in the air because possible amazon PTB may just say "xxyyzzqq is now the cover policy"). Just wanting to put it out there.

reply | flag

message 22: by Sandra (new)

Apr 04, 2013 03:49PM

rivka wrote: "The image was changed by Ingram because they were also the source of the previous image. As I explained above, that behavior is deliberately allowed. (Although it's pretty easy for a librarian to u..."

Oh, I can create an alternative cover edition, no probs, just that I think allowing someone to just update covers automatically, sort of defeats the purpose of alternatives in the first place.

reply | flag

message 23: by Sandi (new)

Apr 04, 2013 04:13PM

German amazon has the annoying habit of putting periods at the end of book titles. When we still had data from amazon, almost all German language books came with a period at the end and needed to be fixed manually. Maybe book titles that end in a period could somehow be imported without it? At least from amazon.de ...

reply | flag

message 24: by Carolyn (new)

Apr 04, 2013 07:28PM

Renewing the feed from amazon means a whole lotta author names that came into GR wrong, have been fixed, and the bad/incorrect name was merged with the right name, are all going to be reimported again, yes?
If there is any way to at least trim off the Mr./Mr/Mrs./Mrs/Ms./Ms at the beginning of author names, and the alphabet soup at the end of names, that would be most appreciated. It is disheartening to think that the literally hundreds of hours I've put into cleaning up author records is going to be wiped out. :)

reply | flag

message 25: by Aaron (new)

Apr 04, 2013 08:30PM

Yes, it would be nice to have our own covers retained unless the image field is blank.

Having scanned a number of my old books, and having spent hours airbrushing out the cracks, and then foolishly deleting the image from my hard-drive after uploading it here, I feel a little emotional about my work.

reply | flag

message 26: by rivka, Former Moderator (new)

Apr 04, 2013 09:22PM

Mod

Carolyn wrote: "Renewing the feed from amazon means a whole lotta author names that came into GR wrong, have been fixed, and the bad/incorrect name was merged with the right name, are all going to be reimported ag..."

Quite possibly not, actually. I'm not sure if Amazon feed(s) will be handled differently than what we do now. But assuming they are handled the same as what we do now, we stopped changing primary authors on existing book records about 6-12 months (I don't recall exactly) back. So if the problem-name author belongs to an Unknown Book that was never fixed, or a new book record not yet on GR, then it probably will import. Otherwise, I think it probably won't.

[That change, of no longer allowing imports to update/correct/etc. primary authors on existing book records, was due primarily to feedback in this group. It was one of several changes to how imports from Ingram and other sources are handled, based on feedback here and elsewhere.]

reply | flag

message 27: by rivka, Former Moderator (new)

Apr 04, 2013 09:25PM

Mod

Aaron wrote: "Having scanned a number of my old books, and having spent hours airbrushing out the cracks, and then foolishly deleting the image from my hard-drive after uploading it here, I feel a little emotional about my work."

Cover images uploaded by users will still have higher priority than any import. Otis said so.

reply | flag

message 28: by Aaron (new)

Apr 04, 2013 09:35PM

Thank you rivka. Very comforting to know. I shall sleep tonight.

reply | flag

message 29: by Debbie's Spurts (D.A.) (last edited Apr 07, 2013 08:50PM) (new)

Apr 07, 2013 08:51PM

Sandi wrote: "German amazon has the annoying habit of putting periods at the end of book titles. When we still had data from amazon, almost all German language books came with a period at the end and needed to b..."

If it's a German language book, I actually think the period is correct.

For an U.S. English or U.K. English language edition, the period at end of a title is wrong.

reply | flag

message 30: by Lobstergirl (new)

Apr 12, 2013 04:12PM

Let's make sure that authors are not imported as "manufacturers." I just came across several books where Winston Churchill, the author, was listed as "Manufacturer."

The place I've seen this most often is Hal Leonard Publishing Corporation, which is the default author on many musical scores/compilations. When I see it I delete it, because I don't see any point in an author, regardless of whether it be human or corporate, described in such a way.

reply | flag

message 31: by Helmut (new)

Sep 30, 2014 11:56AM

Regarding "manufacturers": Is this additional author within policies? I'd remove it, but to be sure, I'd like to ask you...

https://www.goodreads.com/book/show/1...

reply | flag

message 32: by rivka, Former Moderator (new)

Sep 30, 2014 11:58AM

Mod

Removed.

Also, rather than bumping an old and barely-related thread, please start a new one.

reply | flag

Goodreads Librarians Group discussion

Books mentioned in this topic