Goodreads Librarians Group discussion

note: This topic has been closed to new comments.
2617 views
[Closed] Added Books/Editions > Large Book Data Import

Comments Showing 151-200 of 472 (472 new)    post a comment »

message 151: by Moloch (last edited Jan 10, 2014 02:12PM) (new)

Moloch | 3975 comments For those worried that bad imports are going to overwrite librarian edits, it was reassuring for me to see that it isn't so*: here https://www.goodreads.com/book/show/1... awazon_kcw didn't overwrite the (previous) cover uploaded by a user... ironically enough, the user uploaded cover was wrong! :-) (it's been fixed)

* not always so, at least: see posts below


message 152: by Plethora (new)

Plethora (bookworm_r) | 359 comments Moloch wrote: "For those worried that bad imports are going to overwrite librarian edits, it was reassuring for me to see that it isn't so: here https://www.goodreads.com/book/show/1......"

I think they excluded covers, but it seems to be overwriting descriptions and was stealing numbers.


message 153: by Moloch (new)

Moloch | 3975 comments In my example? Or are you speaking in general?


message 154: by Plethora (new)

Plethora (bookworm_r) | 359 comments Moloch wrote: "In my example? Or are you speaking in general?"

General terms of what evidence seems to point to.


message 155: by Michael (last edited Jan 11, 2014 11:12AM) (new)

Michael (mwelser) | 217 comments Bookworm R wrote: "Moloch wrote: "In my example? Or are you speaking in general?"

General terms of what evidence seems to point to."


Absolutely confirmed. amazon_kcw happily overwrites/overwrote(?) human-generated content. It will be/was(?) doing so for titles/authors/descriptions/language/covers. No idea for page nums/publishers.

Happy editing, fellow co-librarians, anyway!


message 156: by Moloch (new)

Moloch | 3975 comments Oh well. I was too optimistic, then. I had found many new records imported with errors, true, but not overwritten


message 157: by Michael (new)

Michael (mwelser) | 217 comments Moloch wrote: "Oh well. I was too optimistic, then. I had found many new records imported with errors, true, but not overwritten"

I could have accepted new ones w/o any emotional involvement - just some more junk, so what? But that it would disrespectfully dare touch MY books...

No, not that bad - but almost :-)


message 158: by Cait (new)

Cait (tigercait) | 4988 comments Michael wrote: "Absolutely confirmed. amazon_kcw happily overwrites/overwrote(?) human-generated content. It will be/was(?) doing so for titles/authors/descriptions/language/covers."

You posted an example earlier, but that was actually onix_ingram overwriting a description -- which is a whole new problem! Do you have any example of the amazon imports doing this?


message 159: by Empress (new)

Empress (the_empress) Cait wrote: "You posted an example earlier, but that was actually onix_ingram overwriting a description -- which is a whole new problem! Do you have any example of the amazon imports doing this? "


Yes, I have. I posted this, but here is the link: https://www.goodreads.com/book/edits/...

amazon_kcw overwrote the onix hachette description.


message 160: by Cait (new)

Cait (tigercait) | 4988 comments Ellie [The Empress] wrote: "amazon_kcw overwrote the onix hachette description."

That's supposed to happen, or it was at the time: higher-priority imports overwrite lower-priority imports, and amazon_kcw's descriptions were at the time a higher priority (they've since been dropped below the onix imports' priority). User-entered data should always be the highest priority and therefore never overwritten by an import which is by definition a lower priority.


message 161: by Empress (new)

Empress (the_empress) Oh, sorry Cait. I guess it was supposed to happen.

I'm having problem with this description. I keep removing it and it keeps coming back. I checked the other editions and there descriptions are different.
https://www.goodreads.com/book/edit/2...


message 162: by Michael (last edited Jan 12, 2014 11:51PM) (new)

Michael (mwelser) | 217 comments Cait wrote: "You posted an example earlier, but that was actually onix_ingram overwriting a description -- which is a whole new problem! Do you have any example of the amazon imports doing this?"


Yes. Onix overwrote me and was in turn overwritten by amazon_kcw. I could not revert it, so I changed it back manually. See https://www.goodreads.com/book/edits/...

I had other examples as well, but because of repair jobs by co-librarians (mostly splits I assume), they look different (innocently isolated) now...


message 163: by Lobstergirl (last edited Jan 12, 2014 03:35PM) (new)

Lobstergirl A tale of three descriptions.

Pork and Sons

The original description, which says the most:

"Pork & Sons is an authentic and intensely personal cookbook, presenting the reader with a multitude of ideas on how to cook fine and succulent pork, whilst giving a rare glimpse into a day in the life of a small family business in rural France. The recipes are wholesome and rustic, encapsulating the flavours and taste of a region."

For some reason this was replaced by:

"Reynaud has written an authentic and intensely personal cookbook, presenting the reader with a multitude of ideas on how to cook fine and succulent pork. The 150 recipes are wholesome and rustic, encapsulating the flavors and taste of rural France."

Replaced by this Amazon-kcw garbage:

"As new but for brief gift note to front pastedown, no jacket as issued."

I've reinstated #1.


message 164: by Lobstergirl (new)

Lobstergirl Shit data from amazon_sable:

author's correct name: Stéphane Reynaud

as imported by amazon: Stxe9phane Reynaud

https://www.goodreads.com/author/show...

https://www.goodreads.com/author/show...

Is "xe9" code for "é" ?


message 165: by Michael (last edited Jan 12, 2014 11:44PM) (new)

Michael (mwelser) | 217 comments Lobstergirl wrote: "Is "xe9" code for "é" ?"

Yes. It is shorthand for E9 (hexadecimal) which is 233 (decimal) which is the character "é" of the Latin-1 supplement in Unicode encoding (and in several other encodings).


message 166: by Michael (new)

Michael (mwelser) | 217 comments Sarah wrote (#102): "One weird behavior I see is the cover image being uploaded twice when it's uploaded by amazon_kcw. I'll look into that."

Sarah,

there are two entries in the log because the first one logs the deletion of the former cover and the second one logs the upload of the new cover. Why those functionally totally different events are logged using the same wording is beyond me.

This should be reported as erroneous (but I have given up posting in the feedback group for lack of response, so maybe you will find more support internally)


message 167: by Lobstergirl (new)

Lobstergirl Michael wrote: "Why those functionally totally different events are logged using the same wording is beyond me. "

I've complained about this before but nothing has changed...

Many things in the librarian log are misleading. Often times you'll make one change and the log will indicate that you made several changes...it will attribute a description to you that you never added....etc.


message 168: by Cait (last edited Jan 13, 2014 07:29AM) (new)

Cait (tigercait) | 4988 comments Michael wrote: "there are two entries in the log because the first one logs the deletion of the former cover and the second one logs the upload of the new cover. Why those functionally totally different events are logged using the same wording is beyond me."

Both of those things are logged as image_uploaded_at, but they are definitely different in the log:
* New covers show image_uploaded_at: '' to 'date' (nothing to something)
* Deleted covers show image_uploaded_at: 'date' to '' (something to nothing)
* Covers updated in place show image_uploaded_at: 'date' to 'date' (something to something else)

(We can't update a cover in place through the front end; only imports can do this. We can revert an existing cover to a previous cover, though, which I recall looks something like that plus the reversion wording.)

(Which is to say, though, that an import should not even have a reason to update via delete-and-insert; it should just update. So if the amazon import is deleting-and-inserting, that should be replaced by updating anyway.)


message 169: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Cait wrote: "I've come across an interesting twist on the edition combining where it matches on author name.

Here are two editions where I assume that amazon_sable created the first record under Jeff S. Smith ..."


I think you're right about an error here - if we find a book/work match by author name, we should ensure that the imported book is updated with the correct author id. I'll add a ticket.


message 170: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Cait wrote: "Possible double import by amazon_kcw:

https://www.goodreads.com/book/show/1...
https://www.goodreads.com/book/show/1...

These appear to have been created ~2 seconds apart on Dec 12 2013 b..."


We're talking to the kcw team about the duplicates we're seeing. I'll let you all know if a fix goes live.


message 171: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Ellie [The Empress] wrote: "I'm having problem with this description. I keep removing it and it keeps coming back. I checked the other editions and there descriptions are d..."

I think what's happening here is that the default description (i.e. the description the work, not the book) is wrong and needs to be changed. I'll try that right now and let me know if the problem clears up.


message 172: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Michael wrote: "Yes. Onix overwrote me and was in turn overwritten by amazon_kcw. I could not revert it, so I changed it back manually. See https://www.goodreads.com/book/edits/..."

Re: your description getting overwritten: It looks like onix firebrand "replaced" your description with identical text. I wonder if there's a bug that allows feeds to take responsibility for identical descriptions to user-entered text. The issue of kcw then overwriting the onix feed should be taken care of now, since we've reduced the kcw and sable priorities to be below that of our general data feeds.

Quick note so you don't drive yourself nuts with editing language tags unnecessarily. We are moving toward a locale-style language code (e.g. en-US, en-UK, or just en) rather than the three letter codes we were using previously. At the moment, both should function in the same way, but we'll ultimately be adopting the locale style universally.


message 173: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Lobstergirl wrote: "A tale of three descriptions.

Pork and Sons

The original description, which says the most:

"Pork & Sons is an authentic and intensely personal cookbook, presenting the reader with ..."


Hrrrrrm..... I'm wondering if this is an issue of not being able to tell that it is the default description being overwritten (i.e. the book's description is blank and is currently using the work's default description)

I'll make a ticket to clarify the language in that case and to monitor whether that's the problem we're seeing.


message 174: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Lobstergirl wrote: "Shit data from amazon_sable:

author's correct name: Stéphane Reynaud

as imported by amazon: Stxe9phane Reynaud

https://www.goodreads.com/author/show...

https://www.good..."


We'll be running a cleanup today or tomorrow that will hopefully take care of some of these, though likely the first pass won't fix everything.


message 175: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Michael wrote: "there are two entries in the log because the first one logs the deletion of the former cover and the second one logs the upload of the new cover. Why those functionally totally different events are logged using the same wording is beyond me. "

Ah good to know. Thanks!!


message 176: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Lobstergirl wrote: "Michael wrote: "Why those functionally totally different events are logged using the same wording is beyond me. "

I've complained about this before but nothing has changed...

Many things in the l..."


We're hoping to clean up some of the change log functionality soonish :-)


message 177: by Bogdan (new)

Bogdan D (bogdand) | 1 comments How can I post a picture of an author?


message 178: by Cait (new)

Cait (tigercait) | 4988 comments Bogdan wrote: "How can I post a picture of an author?"

Bogdan, this is unrelated to the topic of this thread. Please start a new thread for a new request.


message 179: by Lobstergirl (new)

Lobstergirl https://www.goodreads.com/book/show/2...

How was amazon_kcw able to import this without ISBNs?

Is this even a legit entry?


message 180: by Empress (new)

Empress (the_empress) That is why I created this topic: https://www.goodreads.com/topic/show/...


message 181: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Lobstergirl wrote: "https://www.goodreads.com/book/show/2...

How was amazon_kcw able to import this without ISBNs?

Is this even a legit entry?"


Hmmm, that looks like a case of physical book (or set of books in this case) with an ASIN, but no isbns... My inclination would be to say that we shouldn't be importing physical books without isbns, but let me look into it some more. Looking at the Amazon record, it appears that Amazon itself is not selling this set (it's several merchants), and it isn't clear to me that this is in any way an 'official' set.


message 182: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Although as a separate note - in some cases kcw will create a physical book sans isbn/isbn13 because those isbns already exist in our catalog but have conflicting author/title information.


message 183: by Michael (last edited Jan 14, 2014 09:01AM) (new)

Michael (mwelser) | 217 comments Sarah wrote: "Although as a separate note - in some cases kcw will create a physical book sans isbn/isbn13 because those isbns already exist in our catalog but have conflicting author/title information."

That will become an endless cycle: amazon_kcw imports shitty data, librarian corrects & updates shitty data, amazon_kcw comes back, does not find matching shitty data and imports happily again... and so forth and so forth...


message 184: by Jim (new)

Jim | 2 comments Michael wrote: "That will become an endless cycle: amazon_kcw imports shitty data, librarian corrects & updates shitty data, amazon_kcw comes back, does not find matching shitty data and imports happily again... and so forth and so forth..."

...and librarians learn this is a thankless task and find better ways to spend their time, and so forth and so forth...


message 185: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Heads up: I'm running a script right now to return ASINs stolen by amazon_sable back to their original books. The script should skip any ASINs that have already been cleaned up by librarians.


message 186: by Susie (new)

Susie (dragonsusie) | 2469 comments If a notice one with bad import data, I try to look at Amazon to see what information was incorrect there, then send in a report to Amazon to correct their data. It seems the only way to avoid the never-ending cycle of bad imports.


message 187: by Sarah (last edited Jan 14, 2014 12:57PM) (new)

Sarah M (sarahsomeone) | 85 comments Re: a cycle of bad imports

Once a GR book has been mapped to a book in Amazon, it won't be imported by Amazon again, so that should stop cyclical importing for the majority of books.

An ASIN entered in GR for a Kindle Edition will *always* map successfully to the corresponding ASIN in the Amazon catalog just as long at the ASIN actually exists. This means that even mismatches between our data and theirs shouldn't result in eternal import updates.

The real danger of cyclical updates is with physical editions, where we currently create a new book sans isbn/isbn13 if we see conflicting title/author data. If a Librarian deletes this book (seeing that it's a duplicate), we don't currently have a way to stop it from being imported again if another Kindle user tries to import it. We're working on some strategies to solve this problem, though - and will let you know what we come up with!


message 188: by Michael (new)

Michael (mwelser) | 217 comments No need for a librarian to delete it. Some major repair work , e.g. authors, would be sufficient to re-trigger an import.


message 189: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Michael wrote: "No need for a librarian to delete it. Some major repair work , e.g. authors, would be sufficient to re-trigger an import."

Sorry - you're right, the book doesn't need to be deleted. A GR book could get unmapped from an Amazon book by edits to any of the following:

Title, Author, isbn, isbn13, or ASIN


message 190: by Cait (new)

Cait (tigercait) | 4988 comments Sarah wrote: "Re: a cycle of bad imports

Once a GR book has been mapped to a book in Amazon..."


How is this mapping stored? Would it be possible to transfer that onto a different edition when records are merged like the way that reviews are transferred to the kept edition?

"The real danger of cyclical updates is with physical editions, where we currently create a new book sans isbn/isbn13 if we see conflicting title/author data."

If a way could be worked out to internally store an isbn for records which are duplicates but not exact matches for other records with that isbn, that could solve the problem of matching alternate cover editions. (And possibly lead the way toward better handling of alternate cover editions entirely, although that's a little off topic.)


message 191: by Sarah (last edited Jan 14, 2014 03:02PM) (new)

Sarah M (sarahsomeone) | 85 comments Cait wrote: "How is this mapping stored? Would it be possible to transfer that onto a different edition when records are merged like the way that reviews are transferred to the kept edition?

The mapping is stored on the Amazon side, but automatically updates when someone modifies a book on the GR side. When records are merged, the mapping for the deleted book will be removed and the mapping for the kept book will be updated if necessary. (This all happens automatically)

"If a way could be worked out to internally store an isbn for records which are duplicates but not exact matches for other records with that isbn, that could solve the problem of matching alternate cover editions. (And possibly lead the way toward better handling of alternate cover editions entirely, although that's a little off topic.)"

It's come up in discussion for sure - but I'll let you know when we have a better sense of our game plan here.


message 192: by Empress (new)

Empress (the_empress) Susie wrote: "If a notice one with bad import data, I try to look at Amazon to see what information was incorrect there, then send in a report to Amazon to correct their data. It seems the only way to avoid the ..."


Me too, but I don't always see the "update details" link. Amazon looks different for different books. I don't get it! I try sometimes to use the picture feedback option, but they don't read the text, so it's pointless.


message 193: by Cait (new)

Cait (tigercait) | 4988 comments Interesting! Thanks, Sarah. :)


message 194: by Plethora (new)

Plethora (bookworm_r) | 359 comments Not sure how to keep promo quotes out of imports for descriptions.

But this book, Hitler's Children: The Story of the Baader-Meinhof Terrorist Gang appears that 4 have come in from imports and set the default description.

The description starts out fine, but then moves into a series of promo quotes.


message 195: by Michael (new)

Michael (mwelser) | 217 comments Cait wrote: "
* New covers show image_uploaded_at: '' to 'date' (nothing to something)
* Deleted covers show image_uploaded_at: 'date' to '' (something to nothing)
"


Thank you, Cait. This subtlety escaped me. 53+ and rapidly approaching dementia... :-)


message 196: by Michael (last edited Jan 15, 2014 04:29AM) (new)

Michael (mwelser) | 217 comments Sarah wrote: "The mapping is stored on the Amazon side, but automatically updates when someone modifies a book on the GR side. When records are merged, the mapping for the deleted book will be removed and the mapping for the kept book will be updated if necessary."

If the mapping is stored on the Amazon side (and if you GR people have some control over it), then the process could be made to remember the GR bookid which it spawned with its export (from Amazon)/import (into GR).

Next time, when it is tempted to retry again for the lack of a match in GR (because some nice librarian has already cleaned up the mess it has left behind after its initial import into GR) then it should look for the existence of that GR bookid. If it exists, it should please abstain from creating another copy. If it does not, then let us have another wonderful version (and have it remember the new GR bookid again).

This algorithm would cycle only for deleted (and possibly also for NABbed) books, but not for updated/corrected ones, even when a librarians's correction unmakes any match criteria.


message 197: by Emy (new)

Emy (emypt) | 5037 comments AFAIK, NABs stay in the system, just hidden. I presume that they work in the same way as my work one when I hide a record from indexing because, say, I've withdrawn all the copies... I may be completely off track though!


message 198: by Susie (new)

Susie (dragonsusie) | 2469 comments Ellie [The Empress] wrote: "Me too, but I don't always see the "update details" link. Amazon looks different for different books. I don't get it! I try sometimes to use the picture feedback option, but they don't read the text, so it's pointless."

I've noticed that for Kindle books, the "update info" link isn't there. Last time I just contacted customer support and they passed it on from there, explaining that the update link wasn't available on Kindle books. Hopefully they'll update that eventually, being as they double check all update info anyway.


message 199: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Michael wrote: "Next time, when it is tempted to retry again for the lack of a match in GR (because some nice librarian has already cleaned up the mess it has left behind after its initial import into GR) then it should look for the existence of that GR bookid. If it exists, it should please abstain from creating another copy. If it does not, then let us have another wonderful version (and have it remember the new GR bookid again)."

That's approximately how the system already works - it doesn't attempt to generate a new book if the GR book id already exists in an ASIN's record. The second attempt will only be made if someone removes or alters the matching GR book in such a way that triggers an unmapping (i.e. the removal of the GR book id from the ASIN's record). Which is what I believe you're saying in your last paragraph.

Emy wrote: "AFAIK, NABs stay in the system, just hidden."

Correct - all NAB's are still stored in our system. If any of our feeds attempt to import an NAB again, it should fail to import it. This is true for both Amazon and non-Amazon feeds.


message 200: by Michael (new)

Michael (mwelser) | 217 comments Sarah wrote: "That's approximately how the system already works - it doesn't attempt to generate a new book if the GR book id already exists in an ASIN's record. The second attempt will only be made if someone removes or alters the matching GR book in such a way that triggers an unmapping (i.e. the removal of the GR book id from the ASIN's record). Which is what I believe you're saying in your last paragraph."

Not knowing what REALLY goes on: If an "unmapping" happens just because a librarian has modified formerly matching criteria to match no longer (e.g. corrected title & author), then we are looking forward to interesting times. If an "unmapping" happens only when the GR BookID is no longer available (because it was deleted), then we are fine.

The process has to keep up and cling tight to a once established ASIN-GR BookID relation, regardless whether the original matching criteria still hold or not. It has to "remember" the GR BookID...


back to top
This topic has been frozen by the moderator. No new comments can be posted.