Goodreads Librarians Group discussion
note: This topic has been closed to new comments.
Policies & Practices
>
Re-integration of Amazon data
date
newest »



First:
The non-Amazon feeds have been improved a lot since January 2012 and are much better at not dumping dubious info all over the handcrafted records that we have tweaked. I hope that the new Amazon feeds will leave any data that has been manually edited.
Second:
There was a discussion on the subject of ASINs on Audible books on the Audible site and the different ASIN that the same Audible book would have on an Amazon site. Do we need to standardise which of those ASINs we want? If Goodreads already has an Audible book with the Audible ASIN we don't want the Amazon import adding the Amazon version of the identical book. Or do we?
This thread: www.goodreads.com/topic/show/1181279-...


Also, I would like to know if the various "Unknown Books" and "Unknown Authors" will be restored; I'm especially concerned about the "Unknown Authors": "Unknown Author 1384662627" can now include many books by many different authors, not just one.

Also, I would like to know if the various "Unknown Books" and "Unknown Authors" will be restored; I'm especi..."
That's a really good point. It might help to know from the PTB if they are planning on deleting those prior to the new Amazon feed or if they think that it will be resolved via the feed.

Good point.
I have always assumed that when Goodreads purged the Amazon data they kept a little bit of it buried in a box in a cupboard that would link the vanished ASIN with the Unknown Book title, just in case. I hope they can link the two together or we are going to lose a lot more than just the ex-Amazon books when the Unknowns are eventually removed.

The only way this should happen is if the data is not identified as from a librarian source. This is a known issue with default descriptions, but that's the only specific data type I'm aware of which can't be sourced correctly to prevent overwriting. Do you know of other data types which are not sourced to librarians when librarians edit them?

I'm not completely sure how data feeds all work, but, on some fields (based on other threads) if info came from a data feed, for example Ingram onix, that same data feed can also change fields such as the default description over and over until manually edited. Unless amazon changing policies so that bookcovers can be overwritten, I foresee an issue with covers being overwritten every blasted time an indie changes a cover. Reviews appearing in default descriptions, etc. (not sure the amazon data feed will work well at all with the current alternate cover editions).
Several authors are irate recently (and seen in personal experience) because amazon is not doing well at combining books or editions. So that all of an author's books are not on their author page; and (worse for my own personal reading since I use the kindle editions a lot) their kindle and other editions are not showing correctly combined with paperbacks, hardbacks, audiobooks, etc. when you go to a book page on amazon.
If editions not correctly combined on amazon site or books correctly assigned to authors—don't see how the data feed will do any better. Guess just something to watch out for; possibly staff can make a more official reminder in author and regular feedback groups alerting authors that if it ain't right on amazon site, watch for the same error to be imported to goodreads.

Banjomike wrote: "Second:
There was a discussion on the subject of ASINs on Audible books on the Audible site and the different ASIN that the same Audible book would have on an Amazon site. Do we need to standardise which of those ASINs we want? If Goodreads already has an Audible book with the Audible ASIN we don't want the Amazon import adding the Amazon version of the identical book. Or do we?
This thread: www.goodreads.com/topic/show/1181279-..."
Definitely agree this should be addressed. Personally, I feel as though the Audible ASIN should be the one we keep (although this likely won't be the one that comes in through the feed.) On the other hand though, the audible "get a copy" link appears to search by title and not the identifier associated with a book, so maybe it doesn't really matter?
----
I would like to see image imports be turned off completely unless there is NO image. When the newer imports started running they often replaced perfectly good images with an image from a newer edition that should have been manually made an alternate cover edition.
----
On the topic of Unknowns, if the data comes back to all the Unknown Author/Book entries, there has to be some way to label them so we can merge them back with entries that may have been added in the mean time without overwriting that information.
Let's say I wrote a book that ended up in the Unknown Author/Book pile. There wasn't enough information for anyone to rescue it, so it's just hanging out there in the abyss. It has a few shelves/ratings but no meaty reviews. Since the book was lost to the unknown, I've re-added it to my page with the ASIN number that was lost. Since we can only use an identifier once, the old edition will (I suspect) not be able to have it's ASIN reverted. There should be some way to then label those books so librarians can find them and merge them.
Perhaps even going so far as to import all data except for the author. With all the other identifying data (and presumably the ability to again use Amazon as a source) it should be easy to find the author information - and keeping the Unknown Author # profiles will make it easy for librarians to find books that need to be merged.

^ THIS please....

It does matter on Audible. If you search for "day of the triffids" on Audible you get three results. One drama and two readings. If someone clicks on a buy link they want to see the one they clicked on, not a further selection. many books on Audible have UK and US versions
UNABRIDGED
http://www.audible.co.uk/pd/ref=sr_1_...
and ABRIDGED
http://www.audible.co.uk/pd/ref=sr_1_...
Sometime it is even more annoying. These two Audible products appear to be identical, have identical titles "Kiss Kiss", but have different ASINs.
http://www.audible.co.uk/pd/ref=sr_1_...
http://www.audible.co.uk/pd/ref=sr_1_...
Not all of the Unknown Books and Unknown Authors were caused by the Amazon purge. If there is truly no way to link the wrecked record back to the proper book then perhaps it should simply be deleted (if no reviews etc).

Strip away the "PB" or "Pb" at the end of a title that sometimes appears to indicate paperback.
Perhaps there's a way to insert a period when one is lacking, e.g. an author being imported as "Joan J Mitchell".

True. And like I said, I would definitely prefer the ASIN to be the one on the Audible site.
And now that I'm looking at the link, it could easily be changed to search by the ASIN. I just removed the title from the link Goodreads created and inserted the ASIN and got the correct results. I wonder if there's a reason it searches by title?
Banjomike wrote: "Not all of the Unknown Books and Unknown Authors were caused by the Amazon purge. If there is truly no way to link the wrecked record back to the proper book then perhaps it should simply be deleted (if no reviews etc)."
What else caused the Unknown Books/Authors? Not that I don't believe you, I just don't recall them existing before the purge.
Totally agree about deleting Unknowns that haven't been shelved though. Seems like it would be a clean solution for dealing with unshelved books that had been added manually after the data was lost and it would drastically reduce the number of unknowns that might need to be fixed.


Except that the Audible site uses multiple ASINs for the same product as per "Kiss Kiss" in my previous post.
What else caused the Unknown Books/Authors?
PM sent.

Often after the title there's also, in brackets, the collection; I'd like that not to be imported too (again, if possible).
Vicky wrote: "I would like to see image imports be turned off completely unless there is NO image."
This is problematic with many big-name-author books, as well as certain publishers (for all authors), as they provide initial "cover coming soon" images, and then "preview cover" images, and finally the real thing.
Lobstergirl wrote: "Block imports for things like clip strips, display copies, multiple copies (e.g. 12copy, 12-cpy, 24copy, 24-cpy), and other things on the NAB list."
We already do this, but it will never catch everything. Any script that prevents all such items also has an unacceptable level of false negatives, and blocks legit items.
Emy wrote: "Can we stop ALL CAPS importing on the feeds?"
Moloch wrote: "In non English editions, I think Amazon has the habit of adding, after the title, "(Spanish Edition)", "(Italian Edition)", etc. I'd like those to not be imported"
It's very difficult to create scripts that strip out such data but do not also strip out wanted data. But I'll pass these requests along.
This is problematic with many big-name-author books, as well as certain publishers (for all authors), as they provide initial "cover coming soon" images, and then "preview cover" images, and finally the real thing.
Lobstergirl wrote: "Block imports for things like clip strips, display copies, multiple copies (e.g. 12copy, 12-cpy, 24copy, 24-cpy), and other things on the NAB list."
We already do this, but it will never catch everything. Any script that prevents all such items also has an unacceptable level of false negatives, and blocks legit items.
Emy wrote: "Can we stop ALL CAPS importing on the feeds?"
Moloch wrote: "In non English editions, I think Amazon has the habit of adding, after the title, "(Spanish Edition)", "(Italian Edition)", etc. I'd like those to not be imported"
It's very difficult to create scripts that strip out such data but do not also strip out wanted data. But I'll pass these requests along.

http://www.goodreads.com/book/edits/3...
My copy is dated September 2000 and the cover should look like

If we can't get that right, how can we trust the Amazon feed?
The image was changed by Ingram because they were also the source of the previous image. As I explained above, that behavior is deliberately allowed. (Although it's pretty easy for a librarian to undo such changes on an individual basis, as needed.)

Absolutely with messages 10 & 11 about protecting covers. (I know, lots up in the air because possible amazon PTB may just say "xxyyzzqq is now the cover policy"). Just wanting to put it out there.

Oh, I can create an alternative cover edition, no probs, just that I think allowing someone to just update covers automatically, sort of defeats the purpose of alternatives in the first place.


If there is any way to at least trim off the Mr./Mr/Mrs./Mrs/Ms./Ms at the beginning of author names, and the alphabet soup at the end of names, that would be most appreciated. It is disheartening to think that the literally hundreds of hours I've put into cleaning up author records is going to be wiped out. :)

Having scanned a number of my old books, and having spent hours airbrushing out the cracks, and then foolishly deleting the image from my hard-drive after uploading it here, I feel a little emotional about my work.
Carolyn wrote: "Renewing the feed from amazon means a whole lotta author names that came into GR wrong, have been fixed, and the bad/incorrect name was merged with the right name, are all going to be reimported ag..."
Quite possibly not, actually. I'm not sure if Amazon feed(s) will be handled differently than what we do now. But assuming they are handled the same as what we do now, we stopped changing primary authors on existing book records about 6-12 months (I don't recall exactly) back. So if the problem-name author belongs to an Unknown Book that was never fixed, or a new book record not yet on GR, then it probably will import. Otherwise, I think it probably won't.
[That change, of no longer allowing imports to update/correct/etc. primary authors on existing book records, was due primarily to feedback in this group. It was one of several changes to how imports from Ingram and other sources are handled, based on feedback here and elsewhere.]
Quite possibly not, actually. I'm not sure if Amazon feed(s) will be handled differently than what we do now. But assuming they are handled the same as what we do now, we stopped changing primary authors on existing book records about 6-12 months (I don't recall exactly) back. So if the problem-name author belongs to an Unknown Book that was never fixed, or a new book record not yet on GR, then it probably will import. Otherwise, I think it probably won't.
[That change, of no longer allowing imports to update/correct/etc. primary authors on existing book records, was due primarily to feedback in this group. It was one of several changes to how imports from Ingram and other sources are handled, based on feedback here and elsewhere.]
Aaron wrote: "Having scanned a number of my old books, and having spent hours airbrushing out the cracks, and then foolishly deleting the image from my hard-drive after uploading it here, I feel a little emotional about my work."
Cover images uploaded by users will still have higher priority than any import. Otis said so.
Cover images uploaded by users will still have higher priority than any import. Otis said so.

If it's a German language book, I actually think the period is correct.
For an U.S. English or U.K. English language edition, the period at end of a title is wrong.

The place I've seen this most often is Hal Leonard Publishing Corporation, which is the default author on many musical scores/compilations. When I see it I delete it, because I don't see any point in an author, regardless of whether it be human or corporate, described in such a way.

https://www.goodreads.com/book/show/1...
This topic has been frozen by the moderator. No new comments can be posted.
Theoretically, this should be able to pick up just as it was when it was turned off.
In practice, I'm sure here in the librarians group we've all seen the little things that go wrong with a new data feed.
So, can we put together a list of technical "gotchas" that we, as experienced librarians, want to remind the dev team of before they happen?
(I'd like to keep this as a strictly technical thread for the ease of delivery to the dev team. If there are any policy questions which come up from this discussion, can we open new threads for them?)