Goodreads Librarians Group discussion
note: This topic has been closed to new comments.
[Closed] Added Books/Editions
>
Large Book Data Import
Thank you, Sara, for elaborating on the issue (that is an all-time record for GR staff actually explaining what is going on).My final question: Is it worthwhile to start already to repair imports like
https://www.goodreads.com/book/edits/...
manually (author! etc.), or will this just trigger another import, and we should wait?
If there's something pressing that you'd like to update, by all means feel free to update it. If the book you're editing is a Kindle Edition with an ASIN, it's incredibly unlikely that it will be imported by amazon_kcw again (there would have to be some sort of race condition for that to happen - i.e. changes happen on both GR and Amazon simultaneously and the Amazon feed beats the GR edit). If it's a physical edition, there's a slight chance that a duplicate edition might be imported again if you edit or delete a GR book.That being said, we have a list of cleanup scripts almost ready to run (removing quotation marks from the publisher names, reattributing ASINs to their original GR records, removing duplicate physical editions imported by amazon_sable)
We'll also be trying to merge some of the authors created by amazon_kcw and amazon_sable (with Dr and missing periods after initials) into pre-existing authors. Don't worry, we'll be especially careful to merge the NEW record into the OLD one and not vice versa.
And I'm glad my explanations have been helpful! I know it's a complex tree of if's and then's - we're trying to pare it down and simplify it. And all of your input has been incredibly helpful in revising our strategies :-)
Sarah wrote: "Although as a separate note - in some cases kcw will create a physical book sans isbn/isbn13 because those isbns already exist in our catalog but have conflicting author/title information."Are we still reporting such errors here?
https://www.goodreads.com/book/edits/...
Imported by kcw (along with one similar edition) without ISBN/ASIN, but does not appear to be a duplicate unless I missed something, which is entirely possible as my eyes are blurring from sitting too long in front of this screen. ;-)
I'm now seeing quite a few kindle editions that no longer exist at Amazon, so I can't verify authorship to combine etc. They are coming through as published by "Lonely Planet", so I'm assuming its part of the data import.
Hey all,Just wanted to give an update on the cleanup process. We've removed the extra quotes from publisher names for all books imported by amazon_sable. Please let me know if you see any we've missed!
Sam
Forgive me if this one has been pointed out, but I ran across an ASIN edition with "[INACTVE]" in the title:https://www.goodreads.com/book/show/2...
I didn't fix it so as to leave it visible. A quick search for 'inactive' in the title field shows many of these. Should be an easy script to remove these, I should think...
Not sure what's going on with this entry. No ISBN or ASIN, imported not from Amazon but from "Goodreads," Amazon fake cover, and it has no data other than publisher. There's no need for this record to exist as there are many other extant editions of this work that do have data and legit covers.https://www.goodreads.com/book/show/2...
Not deleting because I thought it should be seen. Imported 1/17/14.
The imports have created a new author for Voltaire. Now all these editions of books listed for François Voltaire are not combined with the proper editions under just Voltaire.
Is there any ETA on a cleanup on the "([language] Edition)" thing for the amazon_kcw imports? I was doing some cleanup on 菊地 秀行 and there are ~100 new Kindle editions from amazon.co.jp (yay!) which all have "(Japanese Edition)" in the title and English as the language (boo!). (I mean, eventually I (or some librarian) will have to go in and fix all of these records anyway to list Hideyuki Kikuchi as the primary author and 菊地 秀行 as secondary since we still have no aka feature, but it's a lot easier to cut'n'paste author names if I don't have to also fix title field and language.)
I found another sort of error in the import of data from amazon_kcw.This book had Sourcebooks Casablanca, the publisher, listed as the author for the kindle edition:
https://www.goodreads.com/book/show/2...
I changed it but you can still see it in the change logs.
And here's another one where one author's name was a) turned around and separated by comma
b) and still has the "M.A." addition.
https://www.goodreads.com/book/show/1...
This book https://www.goodreads.com/book/show/1... apparently came from an Amazon import, but it seems to be a vendor listing of volumes 1 (https://www.goodreads.com/book/show/1...) & 2 (https://www.goodreads.com/book/show/1...) together (the ISBN is fake)... should it be merged into one of the volumes or should it remain as a separate work?edited to add: I think it's similar to the situation described here: https://www.goodreads.com/topic/show/...
Also, I'm still running into many Kindle editions with the ISBN included as well as the ASIN, will those be fixed on their own or should we try to correct them?
thanks!
Thanks everyone for your continued input! Just checking in to let you know that last week we had to delay some of our cleanup work, but many of the scripts are ready to go as soon as we get the go ahead to run them. We'll try our best to keep you up to date as we move forward.I'm hoping to run the (Language Edition) cleanup script this week as well as to continue the removal of duplicate physical editions created during the import - that'll help remove a bunch of the books imported with neither asin nor isbn/isbn13.
Something strange that I'm finding is that amazon_kwc imports authors with 2 spaces between first and last name: I've just fixed this one https://www.goodreads.com/author/show... imported on this record https://www.goodreads.com/book/show/1... as Mineko^^Iwasaki for some reasonSince on Goodreads that's the way to disambiguate authors with the same name, this is potentially dangerous because, while in other cases you can immeditaly see there's something wrong (last name first name, for example), this looks identical to the correct spelling
Another example of a description that needs to be somehow coded to not import Napoleon's Pyramids.Harper/Collins 2007 First Edition, first printing, measures 6 1/4" by 9 1/4" by 1", with 376 deckle edged pages and larger than average print. "Action, adventure...passion, a real page-turner in historical fiction"( Booklist).
-----------------------------------------------
Or this one that again overwrote a description:
The Rosetta Key
The Rosetta Key: A Novel, by William Dietrich (Author of Napoleon's Pyramids)
Hardcover book published by HarperCollins, First Edition, 1st printing, 2008
Lobstergirl wrote: "Lame title...https://www.goodreads.com/book/show/1..."
You don't refer to all of your books by ASIN? ;-)
Yikes. Hopefully that isn't a common pattern... I can't imagine it is. Though my imagination has broadened in regards to styles of formatting titles and authors...
This one is from last month, so maybe this is not important any more. But on this edition the import changed a valid ISBN13 https://www.goodreads.com/book/show/1...(Someone has already created a new edition with the valid ISBN13, so those two need to be merged, but I didn't want to that before reporting this here.)
FYI - there are many Amazon Sable imports, most with no IDs but some with ASINs, that have the following wrong author name format:Jr.^^First^LastName
.Leading Jr. is wrong - should be after the last name
.Two spaces after Jr. ???
I already fixed the Sr. names, but I bet there are II, III, IV names with the same problem. And probably others not found yet.
Here's one, from 5 December:https://www.goodreads.com/book/show/1...
One thing I've noticed about foreign-language editions of Kindle books is that it sometimes has (to use this one as an example) "Italian Edition" in the title with "English" listed in the language field (I have checked Amazon.com and Italian is listed in the language field there). This was another one where the ISBN was shown on the front, but auto-updated as soon as I went to edit.
I'm not sure if this is a "bad import" issue with Amazon, but it seems too conincidental. Several of my book cover images have changed over the past month. Here is a few of many. If I need to address this in a new thread let me know. Thanks.https://www.goodreads.com/book/show/4...
https://www.goodreads.com/book/show/4...
https://www.goodreads.com/book/show/3...
TW wrote: "I'm not sure if this is a "bad import" issue with Amazon, but it seems too conincidental. Several of my book cover images have changed over the past month. Here is a few of many. If I need to addre..."Yep, Amazon has been changing images left and right in the past month. I keep going through my books and finding new ones. I reverted those for you.
I think goodreads should run create a static list of all image reversions for the month for librarians to check thru. Between this amazon data feed and librarians "innocently" helping authors overwrite bookcovers even though it vandalizes reader bookshelves -- it's getting to be a mess. I know the new librarians are responsible for reading the manual, but possibly when accepted that acceptance email should include some FAQs and policy reminders of common issues they'll face. Reinforcing that goodreads keeps all editions, merge versus delete, etc. Not a long list of items because that's what the manual is for, but a few brief reminders.
Here is another description that is from third-party.Signed with a drawing by Peter Sis on the half-title page. Free tracking.
The Book of Imaginary Beings
One thing that this import is clearly showing is how much the AKA feature for authors would be useful.This way a record could be imported with author (example) "Benedetto XVI" but a librarian shouldn't go and manually merge the profile with "Pope Benedict XVI", because the 2 would already be grouped together.
Since Sarah has been very responsive in this thread, I take the opportunity to ask her to take this new feature into consideration! :-)
Moloch wrote: "One thing that this import is clearly showing is how much the AKA feature for authors would be useful.This way a record could be imported with author (example) "Benedetto XVI" but a librarian shou..."
Already on our to-do list :-) Not sure when it will happen, but we're totally with you on this. I'll let you know!
Also, just a quick update - I'm running one of the scripts to remove duplicate books and books with no isbns or asins.
Sarah wrote: "Also, just a quick update - I'm running one of the scripts to remove duplicate books and books with no isbns or asins."Will this affect ACEs in any way?
"Librarian" deleted the book with a cover and left the one without a cover.https://www.goodreads.com/book/show/1...
Is it possible the "image-not-available" image from amazon to be excluded when creating entries? Example: https://www.goodreads.com/book/edits/...
Probably a pattern in the image name can be used to filter it out.
no-img-sm._V192198896_BO1,204,203,200_.gif
Not sure how you would edit for these images - https://www.goodreads.com/book/show/2...
https://www.goodreads.com/book/show/2...
Denim wrote: "#238 is fixed."Just FYI this is not a request topic for librarian edits. But thank you.
Amazon is importing individual Maxim issues.https://www.goodreads.com/book/show/2...
https://www.goodreads.com/book/show/2...
Ellie [The Empress] wrote: "I didn't know barnes noble imports books as well: https://www.goodreads.com/book/edits/..."There was a contract for it a couple of years ago, but it's not active anymore.
Lobstergirl wrote: "Amazon is importing individual Maxim issues.https://www.goodreads.com/book/show/2...
https://www.goodreads.com/book/show/2......"
There are some other non-book items being imported periodically as well (calendars, cards, etc) - we're working on filtering those out and will do a cleanup once we have a better filtering system in place.
Sarah wrote: "There are some other non-book items being imported periodically as well (calendars, cards, etc) - we're working on filtering those out and will do a cleanup once we have a better filtering system in place."Good--I've had to NAB a number of church service supplies while editing books with the word "bread" in their title. Now what to do about the box of communion wafers someone has added to his/her "Read" list?
I like the author;s name Artistic Churchware. Someone had even added the "book". Curiously enough this was not imported by Amazon but by ingram. 0.0
Lobstergirl wrote: "Oh now I want to shelve the communion wafers, dammit."I would be funny if they get a genre box soon. :D
This topic has been frozen by the moderator. No new comments can be posted.
Books mentioned in this topic
Snobs (other topics)The Twelve Dates of Christmas: Dates 1 and 2 (other topics)
The Twelve Dates of Christmas: Dates 1 and 2 (other topics)
The Twelve Dates of Christmas: Dates 1 and 2 (other topics)
Divisadero (other topics)
More...
Authors mentioned in this topic
Unknown (other topics)Various (other topics)
Unknown (other topics)
Unknown (other topics)
Avery T. Willis Jr. (other topics)
More...






That would be fine if we could differentiate between changes that correct what had previously been an erroneous mapping and changes that should maintain the established mapping, but simply clean up the GR book record. For instance, suppose the hardcover of a book on GR had mistakenly been assigned the isbn13 for the paperback edition. If someone corrects the isbn13 to match the hardcover edition we would want to unmap the book from its previous Amazon book and remap it. On the other hand, if a Librarian updates a title to more closely match the best practices for GR, we may want to maintain its mapping to its Amazon book even if the Amazon book's title remains as it was.
Right now there's no way to tell the difference between these two types of changes when a Librarian updates a book record. That's something we're working on.
Also, keep in mind that title and author matching are done using relatively loose matching criteria - they just have to be close enough to count as a match. We're working to improve that logic on the GR side further.