Goodreads Librarians Group discussion

note: This topic has been closed to new comments.
544 views
Policies & Practices > Re-integration of Amazon data

Comments Showing 1-32 of 32 (32 new)    post a comment »
dateUp arrow    newest »

message 1: by Cait (new)

Cait (tigercait) | 4988 comments As I'm sure everybody's heard by now, Goodreads is back together with Amazon. The data feeds from Amazon that were taken away last year are going to be resumed sometime soon, I'd imagine.

Theoretically, this should be able to pick up just as it was when it was turned off.

In practice, I'm sure here in the librarians group we've all seen the little things that go wrong with a new data feed.

So, can we put together a list of technical "gotchas" that we, as experienced librarians, want to remind the dev team of before they happen?

(I'd like to keep this as a strictly technical thread for the ease of delivery to the dev team. If there are any policy questions which come up from this discussion, can we open new threads for them?)


message 2: by Cait (new)

Cait (tigercait) | 4988 comments Here's one I remember from previous Amazon data feeds: used books with ASIN numbers. As I recall, the filter eventually applied was that only Kindle editions and audiobook editions were imported if they had ASINs but no ISBNs. (This picked up a handful of used audiobooks, but that was negligible. I don't think that there was any attempt to filter the audiobooks further by publisher, which is what would be needed to get the ones for which the ASIN is in fact the primary id.)


message 3: by Banjomike (new)

Banjomike | 5166 comments Well, I was going to say that the most obvious point, to me, is to tell us BEFORE the feeds are turned back on. Let us know when they will start. Don't start them on a Friday... But if this is only for technical stuff I won't mention that.

First:
The non-Amazon feeds have been improved a lot since January 2012 and are much better at not dumping dubious info all over the handcrafted records that we have tweaked. I hope that the new Amazon feeds will leave any data that has been manually edited.

Second:
There was a discussion on the subject of ASINs on Audible books on the Audible site and the different ASIN that the same Audible book would have on an Amazon site. Do we need to standardise which of those ASINs we want? If Goodreads already has an Audible book with the Audible ASIN we don't want the Amazon import adding the Amazon version of the identical book. Or do we?

This thread: www.goodreads.com/topic/show/1181279-...


message 4: by Cait (new)

Cait (tigercait) | 4988 comments This one probably doesn't apply to Amazon data, but I want to mention it just in case because it's caused huge headaches from the Worldcat data import: the title import absolutely must include subtitles and volume information, including volume information in parentheses. (This actually may be relevant to the amazon.co.jp import in particular, since the format "[title] ([number])" is not uncommon for manga and the format "[title] ([下|上])" is not uncommon for novels split into parts. I know the autocombiner is going to scoop these up together, but it's not a big deal to split them out as long as the numbers are still there.)


message 5: by Moloch (new)

Moloch | 3975 comments My request is, as Banjomike said, to NOT overwrite data manually entered by librarians.

Also, I would like to know if the various "Unknown Books" and "Unknown Authors" will be restored; I'm especially concerned about the "Unknown Authors": "Unknown Author 1384662627" can now include many books by many different authors, not just one.


message 6: by Monique (new)

Monique (kadiya) | 1097 comments Moloch wrote: "My request is, as Banjomike said, to NOT overwrite data manually entered by librarians.

Also, I would like to know if the various "Unknown Books" and "Unknown Authors" will be restored; I'm especi..."


That's a really good point. It might help to know from the PTB if they are planning on deleting those prior to the new Amazon feed or if they think that it will be resolved via the feed.


message 7: by Banjomike (new)

Banjomike | 5166 comments Moloch wrote: "so, I would like to know if the various "Unknown Books" and "Unknown Authors" will be restored; I'm especially concerned about the "Unknown Authors": "Unknown Author 1384662627" can now include many books by many different authors, not just one. "

Good point.

I have always assumed that when Goodreads purged the Amazon data they kept a little bit of it buried in a box in a cupboard that would link the vanished ASIN with the Unknown Book title, just in case. I hope they can link the two together or we are going to lose a lot more than just the ex-Amazon books when the Unknowns are eventually removed.


message 8: by Cait (new)

Cait (tigercait) | 4988 comments Moloch wrote: "My request is, as Banjomike said, to NOT overwrite data manually entered by librarians."

The only way this should happen is if the data is not identified as from a librarian source. This is a known issue with default descriptions, but that's the only specific data type I'm aware of which can't be sourced correctly to prevent overwriting. Do you know of other data types which are not sourced to librarians when librarians edit them?


Debbie's Spurts (D.A.) | 6325 comments I'm with Morloch and Banjomikeon overwriting data.

I'm not completely sure how data feeds all work, but, on some fields (based on other threads) if info came from a data feed, for example Ingram onix, that same data feed can also change fields such as the default description over and over until manually edited. Unless amazon changing policies so that bookcovers can be overwritten, I foresee an issue with covers being overwritten every blasted time an indie changes a cover. Reviews appearing in default descriptions, etc. (not sure the amazon data feed will work well at all with the current alternate cover editions).

Several authors are irate recently (and seen in personal experience) because amazon is not doing well at combining books or editions. So that all of an author's books are not on their author page; and (worse for my own personal reading since I use the kindle editions a lot) their kindle and other editions are not showing correctly combined with paperbacks, hardbacks, audiobooks, etc. when you go to a book page on amazon.

If editions not correctly combined on amazon site or books correctly assigned to authors—don't see how the data feed will do any better. Guess just something to watch out for; possibly staff can make a more official reminder in author and regular feedback groups alerting authors that if it ain't right on amazon site, watch for the same error to be imported to goodreads.


message 10: by Vicky (new)

Vicky (librovert) | 2462 comments I'm definitely hopeful that Goodreads learned from the last big data move and will be better prepared to ensure the least amount of data that librarians have fixed is overwritten.

Banjomike wrote: "Second:
There was a discussion on the subject of ASINs on Audible books on the Audible site and the different ASIN that the same Audible book would have on an Amazon site. Do we need to standardise which of those ASINs we want? If Goodreads already has an Audible book with the Audible ASIN we don't want the Amazon import adding the Amazon version of the identical book. Or do we?

This thread: www.goodreads.com/topic/show/1181279-..."


Definitely agree this should be addressed. Personally, I feel as though the Audible ASIN should be the one we keep (although this likely won't be the one that comes in through the feed.) On the other hand though, the audible "get a copy" link appears to search by title and not the identifier associated with a book, so maybe it doesn't really matter?

----

I would like to see image imports be turned off completely unless there is NO image. When the newer imports started running they often replaced perfectly good images with an image from a newer edition that should have been manually made an alternate cover edition.

----

On the topic of Unknowns, if the data comes back to all the Unknown Author/Book entries, there has to be some way to label them so we can merge them back with entries that may have been added in the mean time without overwriting that information.

Let's say I wrote a book that ended up in the Unknown Author/Book pile. There wasn't enough information for anyone to rescue it, so it's just hanging out there in the abyss. It has a few shelves/ratings but no meaty reviews. Since the book was lost to the unknown, I've re-added it to my page with the ASIN number that was lost. Since we can only use an identifier once, the old edition will (I suspect) not be able to have it's ASIN reverted. There should be some way to then label those books so librarians can find them and merge them.

Perhaps even going so far as to import all data except for the author. With all the other identifying data (and presumably the ability to again use Amazon as a source) it should be easy to find the author information - and keeping the Unknown Author # profiles will make it easy for librarians to find books that need to be merged.


❂ Murder by Death  (murderbydeath) Vicky wrote: "I would like to see image imports be turned off completely unless there is NO image. When the newer imports started running they often replaced perfectly good images with an image from a newer edition that should have been manually made an alternate cover edition. "

^ THIS please....


message 12: by Banjomike (new)

Banjomike | 5166 comments Vicky wrote: "On the other hand though, the audible "get a copy" link appears to search by title and not the identifier associated with a book, so maybe it doesn't really matter? "

It does matter on Audible. If you search for "day of the triffids" on Audible you get three results. One drama and two readings. If someone clicks on a buy link they want to see the one they clicked on, not a further selection. many books on Audible have UK and US versions

UNABRIDGED
http://www.audible.co.uk/pd/ref=sr_1_...
and ABRIDGED
http://www.audible.co.uk/pd/ref=sr_1_...

Sometime it is even more annoying. These two Audible products appear to be identical, have identical titles "Kiss Kiss", but have different ASINs.

http://www.audible.co.uk/pd/ref=sr_1_...
http://www.audible.co.uk/pd/ref=sr_1_...


Not all of the Unknown Books and Unknown Authors were caused by the Amazon purge. If there is truly no way to link the wrecked record back to the proper book then perhaps it should simply be deleted (if no reviews etc).


message 13: by Lobstergirl (new)

Lobstergirl Block imports for things like clip strips, display copies, multiple copies (e.g. 12copy, 12-cpy, 24copy, 24-cpy), and other things on the NAB list.

Strip away the "PB" or "Pb" at the end of a title that sometimes appears to indicate paperback.

Perhaps there's a way to insert a period when one is lacking, e.g. an author being imported as "Joan J Mitchell".


message 14: by Vicky (new)

Vicky (librovert) | 2462 comments Banjomike wrote: "It does matter on Audible. If you search for "day of the triffids" on Audible you get three results. One drama and two readings. If someone clicks on a buy link they want to see the one they clicked on, not a further selection. many books on Audible have UK and US versions."

True. And like I said, I would definitely prefer the ASIN to be the one on the Audible site.

And now that I'm looking at the link, it could easily be changed to search by the ASIN. I just removed the title from the link Goodreads created and inserted the ASIN and got the correct results. I wonder if there's a reason it searches by title?

Banjomike wrote: "Not all of the Unknown Books and Unknown Authors were caused by the Amazon purge. If there is truly no way to link the wrecked record back to the proper book then perhaps it should simply be deleted (if no reviews etc)."

What else caused the Unknown Books/Authors? Not that I don't believe you, I just don't recall them existing before the purge.

Totally agree about deleting Unknowns that haven't been shelved though. Seems like it would be a clean solution for dealing with unshelved books that had been added manually after the data was lost and it would drastically reduce the number of unknowns that might need to be fixed.


message 15: by Emy (new)

Emy (emypt) | 5037 comments Can we stop ALL CAPS importing on the feeds? (Assuming we don't already!) Just a thought since I regularly come across that on Kindle editions.


message 16: by Banjomike (new)

Banjomike | 5166 comments Vicky wrote: "True. And like I said, I would definitely prefer the ASIN to be the one on the Audible site. "

Except that the Audible site uses multiple ASINs for the same product as per "Kiss Kiss" in my previous post.

What else caused the Unknown Books/Authors?
PM sent.


message 17: by Moloch (new)

Moloch | 3975 comments In non English editions, I think Amazon has the habit of adding, after the title, "(Spanish Edition)", "(Italian Edition)", etc. I'd like those to not be imported (I always delete them, they're not part of the title and we have the language field for that). I don't know if this will be possible, though.

Often after the title there's also, in brackets, the collection; I'd like that not to be imported too (again, if possible).


message 18: by rivka, Former Moderator (new)

rivka | 45177 comments Mod
Vicky wrote: "I would like to see image imports be turned off completely unless there is NO image."

This is problematic with many big-name-author books, as well as certain publishers (for all authors), as they provide initial "cover coming soon" images, and then "preview cover" images, and finally the real thing.

Lobstergirl wrote: "Block imports for things like clip strips, display copies, multiple copies (e.g. 12copy, 12-cpy, 24copy, 24-cpy), and other things on the NAB list."

We already do this, but it will never catch everything. Any script that prevents all such items also has an unacceptable level of false negatives, and blocks legit items.

Emy wrote: "Can we stop ALL CAPS importing on the feeds?"
Moloch wrote: "In non English editions, I think Amazon has the habit of adding, after the title, "(Spanish Edition)", "(Italian Edition)", etc. I'd like those to not be imported"
It's very difficult to create scripts that strip out such data but do not also strip out wanted data. But I'll pass these requests along.


message 19: by Sandra (new)

Sandra | 31413 comments Just came across one of my books whose image was changed by Onix Ingram.

http://www.goodreads.com/book/edits/3...

My copy is dated September 2000 and the cover should look like After Dark (Harmony, #1) by Jayne Castle

If we can't get that right, how can we trust the Amazon feed?


message 20: by rivka, Former Moderator (new)

rivka | 45177 comments Mod
The image was changed by Ingram because they were also the source of the previous image. As I explained above, that behavior is deliberately allowed. (Although it's pretty easy for a librarian to undo such changes on an individual basis, as needed.)


message 21: by Debbie's Spurts (D.A.) (last edited Apr 04, 2013 11:43AM) (new)

Debbie's Spurts (D.A.) | 6325 comments If librarian feedback matters, I think there's a whole can of worms with ebooks and cover changes that may need revisiting when sale is final or data feeds from amazon restored.

Absolutely with messages 10 & 11 about protecting covers. (I know, lots up in the air because possible amazon PTB may just say "xxyyzzqq is now the cover policy"). Just wanting to put it out there.


message 22: by Sandra (new)

Sandra | 31413 comments rivka wrote: "The image was changed by Ingram because they were also the source of the previous image. As I explained above, that behavior is deliberately allowed. (Although it's pretty easy for a librarian to u..."

Oh, I can create an alternative cover edition, no probs, just that I think allowing someone to just update covers automatically, sort of defeats the purpose of alternatives in the first place.


message 23: by Sandi (new)

Sandi German amazon has the annoying habit of putting periods at the end of book titles. When we still had data from amazon, almost all German language books came with a period at the end and needed to be fixed manually. Maybe book titles that end in a period could somehow be imported without it? At least from amazon.de ...


message 24: by Carolyn (new)

Carolyn (seeford) | 573 comments Renewing the feed from amazon means a whole lotta author names that came into GR wrong, have been fixed, and the bad/incorrect name was merged with the right name, are all going to be reimported again, yes?
If there is any way to at least trim off the Mr./Mr/Mrs./Mrs/Ms./Ms at the beginning of author names, and the alphabet soup at the end of names, that would be most appreciated. It is disheartening to think that the literally hundreds of hours I've put into cleaning up author records is going to be wiped out. :)


message 25: by Aaron (new)

Aaron Carson | 55 comments Yes, it would be nice to have our own covers retained unless the image field is blank.

Having scanned a number of my old books, and having spent hours airbrushing out the cracks, and then foolishly deleting the image from my hard-drive after uploading it here, I feel a little emotional about my work.


message 26: by rivka, Former Moderator (new)

rivka | 45177 comments Mod
Carolyn wrote: "Renewing the feed from amazon means a whole lotta author names that came into GR wrong, have been fixed, and the bad/incorrect name was merged with the right name, are all going to be reimported ag..."

Quite possibly not, actually. I'm not sure if Amazon feed(s) will be handled differently than what we do now. But assuming they are handled the same as what we do now, we stopped changing primary authors on existing book records about 6-12 months (I don't recall exactly) back. So if the problem-name author belongs to an Unknown Book that was never fixed, or a new book record not yet on GR, then it probably will import. Otherwise, I think it probably won't.

[That change, of no longer allowing imports to update/correct/etc. primary authors on existing book records, was due primarily to feedback in this group. It was one of several changes to how imports from Ingram and other sources are handled, based on feedback here and elsewhere.]


message 27: by rivka, Former Moderator (new)

rivka | 45177 comments Mod
Aaron wrote: "Having scanned a number of my old books, and having spent hours airbrushing out the cracks, and then foolishly deleting the image from my hard-drive after uploading it here, I feel a little emotional about my work."

Cover images uploaded by users will still have higher priority than any import. Otis said so.


message 28: by Aaron (new)

Aaron Carson | 55 comments Thank you rivka. Very comforting to know. I shall sleep tonight.


message 29: by Debbie's Spurts (D.A.) (last edited Apr 07, 2013 08:50PM) (new)

Debbie's Spurts (D.A.) | 6325 comments Sandi wrote: "German amazon has the annoying habit of putting periods at the end of book titles. When we still had data from amazon, almost all German language books came with a period at the end and needed to b..."

If it's a German language book, I actually think the period is correct.

For an U.S. English or U.K. English language edition, the period at end of a title is wrong.


message 30: by Lobstergirl (new)

Lobstergirl Let's make sure that authors are not imported as "manufacturers." I just came across several books where Winston Churchill, the author, was listed as "Manufacturer."

The place I've seen this most often is Hal Leonard Publishing Corporation, which is the default author on many musical scores/compilations. When I see it I delete it, because I don't see any point in an author, regardless of whether it be human or corporate, described in such a way.


message 31: by Helmut (new)

Helmut (schlimmerdurst) | 43 comments Regarding "manufacturers": Is this additional author within policies? I'd remove it, but to be sure, I'd like to ask you...

https://www.goodreads.com/book/show/1...


message 32: by rivka, Former Moderator (new)

rivka | 45177 comments Mod
Removed.

Also, rather than bumping an old and barely-related thread, please start a new one.


back to top
This topic has been frozen by the moderator. No new comments can be posted.