Goodreads Librarians Group discussion
note: This topic has been closed to new comments.
[Closed] Added Books/Editions
>
Large Book Data Import

I think they excluded covers, but it seems to be overwriting descriptions and was stealing numbers.

General terms of what evidence seems to point to.

General terms of what evidence seems to point to."
Absolutely confirmed. amazon_kcw happily overwrites/overwrote(?) human-generated content. It will be/was(?) doing so for titles/authors/descriptions/language/covers. No idea for page nums/publishers.
Happy editing, fellow co-librarians, anyway!


I could have accepted new ones w/o any emotional involvement - just some more junk, so what? But that it would disrespectfully dare touch MY books...
No, not that bad - but almost :-)

You posted an example earlier, but that was actually onix_ingram overwriting a description -- which is a whole new problem! Do you have any example of the amazon imports doing this?

Yes, I have. I posted this, but here is the link: https://www.goodreads.com/book/edits/...
amazon_kcw overwrote the onix hachette description.

That's supposed to happen, or it was at the time: higher-priority imports overwrite lower-priority imports, and amazon_kcw's descriptions were at the time a higher priority (they've since been dropped below the onix imports' priority). User-entered data should always be the highest priority and therefore never overwritten by an import which is by definition a lower priority.

I'm having problem with this description. I keep removing it and it keeps coming back. I checked the other editions and there descriptions are different.
https://www.goodreads.com/book/edit/2...

Yes. Onix overwrote me and was in turn overwritten by amazon_kcw. I could not revert it, so I changed it back manually. See https://www.goodreads.com/book/edits/...
I had other examples as well, but because of repair jobs by co-librarians (mostly splits I assume), they look different (innocently isolated) now...

Pork and Sons
The original description, which says the most:
"Pork & Sons is an authentic and intensely personal cookbook, presenting the reader with a multitude of ideas on how to cook fine and succulent pork, whilst giving a rare glimpse into a day in the life of a small family business in rural France. The recipes are wholesome and rustic, encapsulating the flavours and taste of a region."
For some reason this was replaced by:
"Reynaud has written an authentic and intensely personal cookbook, presenting the reader with a multitude of ideas on how to cook fine and succulent pork. The 150 recipes are wholesome and rustic, encapsulating the flavors and taste of rural France."
Replaced by this Amazon-kcw garbage:
"As new but for brief gift note to front pastedown, no jacket as issued."
I've reinstated #1.

author's correct name: Stéphane Reynaud
as imported by amazon: Stxe9phane Reynaud
https://www.goodreads.com/author/show...
https://www.goodreads.com/author/show...
Is "xe9" code for "é" ?

Yes. It is shorthand for E9 (hexadecimal) which is 233 (decimal) which is the character "é" of the Latin-1 supplement in Unicode encoding (and in several other encodings).

Sarah,
there are two entries in the log because the first one logs the deletion of the former cover and the second one logs the upload of the new cover. Why those functionally totally different events are logged using the same wording is beyond me.
This should be reported as erroneous (but I have given up posting in the feedback group for lack of response, so maybe you will find more support internally)

I've complained about this before but nothing has changed...
Many things in the librarian log are misleading. Often times you'll make one change and the log will indicate that you made several changes...it will attribute a description to you that you never added....etc.

Both of those things are logged as image_uploaded_at, but they are definitely different in the log:
* New covers show image_uploaded_at: '' to 'date' (nothing to something)
* Deleted covers show image_uploaded_at: 'date' to '' (something to nothing)
* Covers updated in place show image_uploaded_at: 'date' to 'date' (something to something else)
(We can't update a cover in place through the front end; only imports can do this. We can revert an existing cover to a previous cover, though, which I recall looks something like that plus the reversion wording.)
(Which is to say, though, that an import should not even have a reason to update via delete-and-insert; it should just update. So if the amazon import is deleting-and-inserting, that should be replaced by updating anyway.)

Here are two editions where I assume that amazon_sable created the first record under Jeff S. Smith ..."
I think you're right about an error here - if we find a book/work match by author name, we should ensure that the imported book is updated with the correct author id. I'll add a ticket.

https://www.goodreads.com/book/show/1...
https://www.goodreads.com/book/show/1...
These appear to have been created ~2 seconds apart on Dec 12 2013 b..."
We're talking to the kcw team about the duplicates we're seeing. I'll let you all know if a fix goes live.

I think what's happening here is that the default description (i.e. the description the work, not the book) is wrong and needs to be changed. I'll try that right now and let me know if the problem clears up.

Re: your description getting overwritten: It looks like onix firebrand "replaced" your description with identical text. I wonder if there's a bug that allows feeds to take responsibility for identical descriptions to user-entered text. The issue of kcw then overwriting the onix feed should be taken care of now, since we've reduced the kcw and sable priorities to be below that of our general data feeds.
Quick note so you don't drive yourself nuts with editing language tags unnecessarily. We are moving toward a locale-style language code (e.g. en-US, en-UK, or just en) rather than the three letter codes we were using previously. At the moment, both should function in the same way, but we'll ultimately be adopting the locale style universally.

Pork and Sons
The original description, which says the most:
"Pork & Sons is an authentic and intensely personal cookbook, presenting the reader with ..."
Hrrrrrm..... I'm wondering if this is an issue of not being able to tell that it is the default description being overwritten (i.e. the book's description is blank and is currently using the work's default description)
I'll make a ticket to clarify the language in that case and to monitor whether that's the problem we're seeing.

author's correct name: Stéphane Reynaud
as imported by amazon: Stxe9phane Reynaud
https://www.goodreads.com/author/show...
https://www.good..."
We'll be running a cleanup today or tomorrow that will hopefully take care of some of these, though likely the first pass won't fix everything.

Ah good to know. Thanks!!

I've complained about this before but nothing has changed...
Many things in the l..."
We're hoping to clean up some of the change log functionality soonish :-)

Bogdan, this is unrelated to the topic of this thread. Please start a new thread for a new request.

How was amazon_kcw able to import this without ISBNs?
Is this even a legit entry?

How was amazon_kcw able to import this without ISBNs?
Is this even a legit entry?"
Hmmm, that looks like a case of physical book (or set of books in this case) with an ASIN, but no isbns... My inclination would be to say that we shouldn't be importing physical books without isbns, but let me look into it some more. Looking at the Amazon record, it appears that Amazon itself is not selling this set (it's several merchants), and it isn't clear to me that this is in any way an 'official' set.


That will become an endless cycle: amazon_kcw imports shitty data, librarian corrects & updates shitty data, amazon_kcw comes back, does not find matching shitty data and imports happily again... and so forth and so forth...

...and librarians learn this is a thankless task and find better ways to spend their time, and so forth and so forth...



Once a GR book has been mapped to a book in Amazon, it won't be imported by Amazon again, so that should stop cyclical importing for the majority of books.
An ASIN entered in GR for a Kindle Edition will *always* map successfully to the corresponding ASIN in the Amazon catalog just as long at the ASIN actually exists. This means that even mismatches between our data and theirs shouldn't result in eternal import updates.
The real danger of cyclical updates is with physical editions, where we currently create a new book sans isbn/isbn13 if we see conflicting title/author data. If a Librarian deletes this book (seeing that it's a duplicate), we don't currently have a way to stop it from being imported again if another Kindle user tries to import it. We're working on some strategies to solve this problem, though - and will let you know what we come up with!


Sorry - you're right, the book doesn't need to be deleted. A GR book could get unmapped from an Amazon book by edits to any of the following:
Title, Author, isbn, isbn13, or ASIN

Once a GR book has been mapped to a book in Amazon..."
How is this mapping stored? Would it be possible to transfer that onto a different edition when records are merged like the way that reviews are transferred to the kept edition?
"The real danger of cyclical updates is with physical editions, where we currently create a new book sans isbn/isbn13 if we see conflicting title/author data."
If a way could be worked out to internally store an isbn for records which are duplicates but not exact matches for other records with that isbn, that could solve the problem of matching alternate cover editions. (And possibly lead the way toward better handling of alternate cover editions entirely, although that's a little off topic.)

The mapping is stored on the Amazon side, but automatically updates when someone modifies a book on the GR side. When records are merged, the mapping for the deleted book will be removed and the mapping for the kept book will be updated if necessary. (This all happens automatically)
"If a way could be worked out to internally store an isbn for records which are duplicates but not exact matches for other records with that isbn, that could solve the problem of matching alternate cover editions. (And possibly lead the way toward better handling of alternate cover editions entirely, although that's a little off topic.)"
It's come up in discussion for sure - but I'll let you know when we have a better sense of our game plan here.

Me too, but I don't always see the "update details" link. Amazon looks different for different books. I don't get it! I try sometimes to use the picture feedback option, but they don't read the text, so it's pointless.

But this book, Hitler's Children: The Story of the Baader-Meinhof Terrorist Gang appears that 4 have come in from imports and set the default description.
The description starts out fine, but then moves into a series of promo quotes.

* New covers show image_uploaded_at: '' to 'date' (nothing to something)
* Deleted covers show image_uploaded_at: 'date' to '' (something to nothing)
"
Thank you, Cait. This subtlety escaped me. 53+ and rapidly approaching dementia... :-)

If the mapping is stored on the Amazon side (and if you GR people have some control over it), then the process could be made to remember the GR bookid which it spawned with its export (from Amazon)/import (into GR).
Next time, when it is tempted to retry again for the lack of a match in GR (because some nice librarian has already cleaned up the mess it has left behind after its initial import into GR) then it should look for the existence of that GR bookid. If it exists, it should please abstain from creating another copy. If it does not, then let us have another wonderful version (and have it remember the new GR bookid again).
This algorithm would cycle only for deleted (and possibly also for NABbed) books, but not for updated/corrected ones, even when a librarians's correction unmakes any match criteria.


I've noticed that for Kindle books, the "update info" link isn't there. Last time I just contacted customer support and they passed it on from there, explaining that the update link wasn't available on Kindle books. Hopefully they'll update that eventually, being as they double check all update info anyway.

That's approximately how the system already works - it doesn't attempt to generate a new book if the GR book id already exists in an ASIN's record. The second attempt will only be made if someone removes or alters the matching GR book in such a way that triggers an unmapping (i.e. the removal of the GR book id from the ASIN's record). Which is what I believe you're saying in your last paragraph.
Emy wrote: "AFAIK, NABs stay in the system, just hidden."
Correct - all NAB's are still stored in our system. If any of our feeds attempt to import an NAB again, it should fail to import it. This is true for both Amazon and non-Amazon feeds.

Not knowing what REALLY goes on: If an "unmapping" happens just because a librarian has modified formerly matching criteria to match no longer (e.g. corrected title & author), then we are looking forward to interesting times. If an "unmapping" happens only when the GR BookID is no longer available (because it was deleted), then we are fine.
The process has to keep up and cling tight to a once established ASIN-GR BookID relation, regardless whether the original matching criteria still hold or not. It has to "remember" the GR BookID...
This topic has been frozen by the moderator. No new comments can be posted.
Books mentioned in this topic
Snobs (other topics)The Twelve Dates of Christmas: Dates 1 and 2 (other topics)
The Twelve Dates of Christmas: Dates 1 and 2 (other topics)
The Twelve Dates of Christmas: Dates 1 and 2 (other topics)
Divisadero (other topics)
More...
Authors mentioned in this topic
Unknown (other topics)Various (other topics)
Unknown (other topics)
Unknown (other topics)
Avery T. Willis Jr. (other topics)
More...
* not always so, at least: see posts below