Goodreads Librarians Group discussion

note: This topic has been closed to new comments.
2618 views
[Closed] Added Books/Editions > Large Book Data Import

Comments Showing 101-150 of 472 (472 new)    post a comment »

message 101: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Bookworm R wrote: "Sarah wrote: "Do you have some examples? We can take a look to make sure it's not a systematic issue. ..."

I believe he pointed out one in Msg 77."


Woops - thanks for pointing me there. Missed in my first glance through the page. Taking a look.


message 102: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments We had a known issue with cover images being overwritten early in December that has been resolved - are there newer cases of cover images getting overwritten?

One weird behavior I see is the cover image being uploaded twice when it's uploaded by amazon_kcw. I'll look into that.

Also, re: descriptions like "Great story! Book is new and unread." - we'll look into whether there's a way to disregard descriptions entered by individual amazon merchants.


message 103: by Julie (last edited Jan 08, 2014 09:32AM) (new)

Julie (readerjules) | 36 comments Sarah wrote: "Also, re: descriptions like "Great story! Book is new and unread." - we'll look into whether there's a way to disregard descriptions entered by individual amazon merchants. ..."

Thank you. I have fixed 4 of these already that I found and I wasn't even looking for them.


message 104: by Emy (new)

Emy (emypt) | 5037 comments Sable is importing publishers with "" around them, e.g. as "Random House" not. Import not noticing phrase delimiters maybe?

KCW seems to be bringing authors and a lot of GR authors without their . between initials - anything there?


message 105: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Emy wrote: "Sable is importing publishers with "" around them, e.g. as "Random House" not. Import not noticing phrase delimiters maybe?

KCW seems to be bringing authors and a lot of GR authors without their ...."


The publishers with quotes is definitely on our list - essentially the data feed we got had some publishers with two sets of quotes around their names and we didn't successfully strip both sets. That'll get fixed.

As far as author names - there are several issues where the formatting of an authors name in the feed is preventing it from matching to a GR author. I'll add the missing periods between initials to the list if it isn't already there. Do you have a quick example?


message 106: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Question regarding bad descriptions: Has anyone seen any Kindle Editions with bad book descriptions or only physical editions?


message 107: by Plethora (new)

Plethora (bookworm_r) | 359 comments Here is another lovely physical edition it changed the description to "Book" ... so helpful, I would hope it is a book.

https://www.goodreads.com/book/show/2...


message 108: by Plethora (last edited Jan 08, 2014 10:54AM) (new)

Plethora (bookworm_r) | 359 comments Sarah wrote: "Question regarding bad descriptions: Has anyone seen any Kindle Editions with bad book descriptions or only physical editions?"

I know I have seen cases of public domain versions that will say something along the lines of this is an OCR version and nothing else, no book description. I don't think it is a new issue, but one that could be used to remove that type of wording and maybe just use the default description, hoping the default is at least about the book.


message 109: by Plethora (new)

Plethora (bookworm_r) | 359 comments Here are a few samples of descriptions that I think should be excluded from import and at least default to default book description.

===================================================
This book was converted from its physical edition to the digital format by a community of volunteers. You may find it for free on the web. Purchase of the Kindle edition includes wireless delivery.

I realize this particular example came from a user, but it is what you will find on Amazon, so I would expect it to come in the form of import feeds as well.

https://www.goodreads.com/book/show/1...
===================================================
More information to be announced soon on this forthcoming title from Penguin USA

This example did come from an import
https://www.goodreads.com/book/show/1...

===================================================

No Description Available

This example was from an import as well

https://www.goodreads.com/book/show/1...


message 110: by Banjomike (new)

Banjomike | 5166 comments Bookworm R wrote: "Here are a few samples of descriptions that I think should be excluded from import and at least default to default book description.

===================================================
This book w..."


I think that those type of blurbs are actually quite useful. They say to me "Don't buy me, you can get me for free elsewhere". And probably better formatted.


message 111: by Plethora (last edited Jan 08, 2014 12:00PM) (new)

Plethora (bookworm_r) | 359 comments Banjomike wrote: "Bookworm R wrote: "Here are a few samples of descriptions that I think should be excluded from import and at least default to default book description.

============================================..."


Well, I suppose but these are free editions typically to begin with. But I prefer to purchase a Penguin, Oxford etc edition instead of going with a free public domain.

I though find it rather annoying if I find a book with this type of description when I want to know what the book is about. I will research the edition, translations etc once I figure out if I want to read the darn thing. Which I can't do from such a description.

I also don't feel they fit guidelines.


message 112: by Michael (new)

Michael (mwelser) | 217 comments Sarah wrote: "Over the coming year we're hoping to expand Goodreads into other countries and make it more accessible to users that speak languages other than English. While we have a large and incredible army of librarians, that army won't necessarily scale to meet the needs of users all over the world as we begin to fully expand our library to encompass book editions for more and more countries."

This will not scale indeed. Considering the various issues with data imports, multiple language support, multiple character set support, stray editions in need to be combined, duplicated authors, multiple ISBNs without formalized storage thereof, and a data model in need of some revision - to just pump more data into this system will leave it beyond repair, and, I agree, no amount of volunteers will be able to save it. Unless you tackle the architectural/software issues first.

Good Luck.


message 113: by Michael (last edited Jan 08, 2014 02:33PM) (new)

Michael (mwelser) | 217 comments Sarah wrote: "Do you have some examples? We can take a look to make sure it's not a systematic issue."

For amazon_kcw, take http://www.goodreads.com/book/edits/1...

It lists the authors in wrong order, producing an incorrect principal author.
It clutters up the title with additional info (in brackets).
It has no clue of UTF-8 encoding in the description.
It omits the ISBN10 which can be automatically derived from the ISBN13 (well, GR could do that as well but doesn't).
It sets an incorrect *and* invalid language code.
It will leave the combination with the existent work to "volunteers".

Executive Summary: None of the relevant textual information elements provided can be used "as is".


message 114: by Michael (last edited Jan 08, 2014 02:37PM) (new)

Michael (mwelser) | 217 comments Sarah wrote: "Do you have some examples? We can take a look to make sure it's not a systematic issue."

For amazon_sable, take http://www.goodreads.com/book/show/19...

This is not the book. It has just 13 pages. And it is not available from Amazon.
The book is http://www.goodreads.com/book/show/15....


message 115: by Michael (last edited Jan 08, 2014 02:46PM) (new)

Michael (mwelser) | 217 comments Sarah wrote: "As far as author names - there are several issues where the formatting of an authors name in the feed is preventing it from matching to a GR author. I'll add the missing periods between initials to the list if it isn't already there."

How would you match a given author from Amazon, even with correct "spelling", to multiple existent authors in GR, differentiated only be the varying number of blanks in their names?

E.g. ([] signifies blank)

Amazon: "J.R.[]Smith"
GR: "J.R.[]Smith", "J.R.[][]Smith", "J.R.[][][]Smith"

Which one will you choose?


message 116: by Lobstergirl (new)

Lobstergirl Deon wrote: "https://www.goodreads.com/book/show/1...
another wierd ISBN"


That's showing up as an "EAN" on some other websites.


message 117: by Lobstergirl (new)

Lobstergirl Sarah wrote: "Lobstergirl wrote: "I don't know if this is the right thread for this, but I just noticed Amazon had uploaded a cover a few hours ago - one of those green and white generic covers with a generic font slapped on and it..."

Ah cover images. We had a few of Amazon's generic 'no cover' images blacklisted so they wouldn't be imported, but since images are often uploaded by merchants, we couldn't screen for all of them.

If you see more of these, can you let me know their GR book IDs/ASINs? We might be able to expand the blacklist to get rid of useless non covers. "


https://www.goodreads.com/book/show/1...


message 118: by Plethora (last edited Jan 08, 2014 11:05PM) (new)

Plethora (bookworm_r) | 359 comments Another example of description overwritten: https://www.goodreads.com/book/show/6...

I tried to undo the change, but it didn't seem to do anything.

Also has the " around Publisher.


message 119: by Michael (new)

Michael (mwelser) | 217 comments Bookworm R wrote: "I tried to undo the change, but it didn't seem to do anything.
"


The revert function for descriptions does not work. You have to do it manually.


message 120: by Plethora (new)

Plethora (bookworm_r) | 359 comments Michael wrote: "Bookworm R wrote: "I tried to undo the change, but it didn't seem to do anything.
"

The revert function for descriptions does not work. You have to do it manually."


Thank you for the pointer. I'll do when back at desktop unless someone else gets. It was long one that probably has formatting that is a PIA to deal with via mobile device.


message 121: by Sarah (last edited Jan 09, 2014 08:31AM) (new)

Sarah M (sarahsomeone) | 85 comments I think there are a few things going on with descriptions (and we're working on a few potential solutions)

Problem 1: Bad Amazon descriptions overwriting good onix feed descriptions
Solution 1: We're lowering the priority of the sable and kcw feeds to be below our more trusted onix feed sources. Let me know if you folks have any reservations about this - from what I've seen our usual data feeds are more reliable than data coming from amazon merchants.

Problem 2: Even if we enact Solution 1, bad descriptions will surface when no description exists for a book/work
Solution 2: We're looking into where the bad data is coming from. A lot of it seems to be coming from non-amazon affiliated merchants who tend to enter information about their particular product quality, shipping policies etc etc. We're going to see if there's some way to only whitelist descriptions form trusted merchants or for books sold directly from Amazon itself.

Problem 3: The revert function isn't working for descriptions
Solution 3: I'll look into this or file a ticket to be looked at soon. Does anyone know if this might be happening when a book's description is changed from using the work's default description? Or does it fail to work even when the change is from one non-default description to another non-default description?

(Note - I'm not implying these are all the issues, just trying to keep you all up to date on some of the solutions we're working on!)


message 122: by Emy (new)

Emy (emypt) | 5037 comments Sarah wrote: "Emy wrote: "Sable is importing publishers with "" around them, e.g. as "Random House" not. Import not noticing phrase delimiters maybe?

KCW seems to be bringing authors and a lot of GR authors wit..."


Three quick examples should all be in the change list for Rivka at the moment, unless she's got to them already.


message 123: by Keith (last edited Jan 09, 2014 05:31PM) (new)

Keith (kgf0) | 377 comments Bookworm R wrote: "I know I have seen cases of public domain versions that will say something along the lines of this is an OCR version and nothing else, no book description. I don't think it is a new issue, but one that could be used to remove that type of wording and maybe just use the default description, hoping the default is at least about the book. "

Personally, I find that particular descriptor useful: it tells me that this is the worst available edition of the book, riddled with errors, possibly missing entire sections of text, and I almost certainly don't want to buy it if there is a different real edition available.

When hand editing, I will sometimes supplement these non-descriptions with the default description or something more directly relevant, but I almost always retain some indication that it's a machine/OCR edition because that is relevant data. I wouldn't want to see that relevant data stripped by another machine.


message 124: by Plethora (last edited Jan 09, 2014 05:43PM) (new)

Plethora (bookworm_r) | 359 comments I understand that it describes that editions production. But I see that more along the lines of "sales info" and against GR guidelines versus the actual book blurb. IMO, for what my two cents are worth: That type of information should be in the reviews. Or we begin to enter very grey murky water.

I don't research my editions in that way on GR, but I also don't use pubic domain books, I will purchase the Penguin, Oxford etc book or borrow from the library.

Per manual: The description field is for entering a summary of the work.


message 125: by Keith (new)

Keith (kgf0) | 377 comments On the matter of kcw import errors, this may be of interest:

https://www.goodreads.com/book/edits/...

The author names were all rather heavily appended with academic credentials. I fixed the primary, but left the secondaries as evidence for now.


message 126: by Moloch (last edited Jan 10, 2014 02:54AM) (new)

Moloch | 3975 comments Another problem with the imports is this: very often secondary authors are with initials only. Like the book (example) "My Life" by John Doe, edited by "J. Smith" (John Smith). It's likely that this "John Smith" already has an author page in Goodreads: librarians have to merge "J. Smith" into "John Smith" to avoid having a duplicate.

So, my question is: shall we not import secondary authors? Or is it better to have a duplicate than to have no data at all? (I don't know the answer, I'm just reporting a problem: of course it's a radical solution not importing secondary authors at all, but having multiple entries for the same author is a problem too)


message 127: by Michael (last edited Jan 10, 2014 03:31AM) (new)

Michael (mwelser) | 217 comments Sarah wrote: "Problems/Solutions..."(#121)


Solution 1: Yes.
Solution 2: Yes.
Solution 3: Yes. Note: It failed on me either ways. Furthermore, there seems to be something fishy in the logging process of description changes. Sometimes an edition will change the [default] description and you will not be able to find a trace of that change in the logs... Maybe that is linked?


message 128: by Sandra (new)

Sandra | 31405 comments Also noticed that the Amazon imports sometimes have the publisher in the title, a popular eg Mills & Boon: Title of the Book.

Which makes it harder to combine (cause you have to go looking) and always with the " " around the publisher & sometimes the title.

Fixed a whole bunch of those titles yesterday if you want to track some down.


message 129: by Debbie's Spurts (D.A.) (last edited Jan 10, 2014 04:18AM) (new)

Debbie's Spurts (D.A.) | 6325 comments I would guess a great many librarian edits are correcting outright typos or standardizing information to aid in making sure books get onto correct author profiles, editions combined, isbn oddities fixed, etc. For example if there was an author "Vinnie A. Peréz" a librarian would overwrite data feed import authors "Vinnie A. Peréz" "Vinnie A. Per z" "Vinnie A Peréz" "Peréz, Vinnie A. Peréz" or "Dr.Vinnie A. Peréz" with "Vinnie A. Peréz"

I have trouble understanding why amazon wants to override any such librarian edits. As a reader, I really don't like any solution that overrides existing goodreads book data much less a solution overriding the librarian corrections.

Let's face it, unless filling in the blanks or improving image quality, what librarians are doing is correcting bad data or standardizing it to work with gr database, series, editions, etc.Doesn't make sense that corrections for bad data get overriden by any data feed, including amazon's.

In terms of benefits to amazon, well, if bookbuyers cannot find kindle or other editions for sale at amazon because the data feed caused bad author, title or series information—surely that's not what amazon or authors want? (Okay, maybe the authors who are determined to treat goodreads book pages as if their product pages on bookseller sites ... but that's another discussion).

It's just really weird to me to think of having any "solutions" that need volunteer librarians correcting data then letting next data feed undo their efforts. Not unusual for a lot of book data on goodreads to come from the publisher and it's weird that amazon data would override those records as well.


message 130: by Cait (new)

Cait (tigercait) | 4988 comments D.A. wrote: "I have trouble understanding why amazon wants to override any such librarian edits."

Amazon should not be overriding librarian edits; we are reporting instances where librarian edits are being overridden as bugs. That's the whole point of this thread, yes?


message 131: by Empress (new)

Empress (the_empress) Descriptions.



1. Is it possible tags to be replaced, removed? Such as <p>, <br>



2. Is it possible descriptions to be cleared of non-relevant informations such as OCL numbers and links?

amazon_kcw updated the book Lucky: A Memoir by Alice Sebold
description: 'In a memoir hailed for its searing candor and wit, Alice Sebold reveals how her life was utterly transformed when, as an eighteen-year-old college freshman, she was brutally raped and beaten in a park near campus. What propels this chronicle of her recovery is Sebold's indomitable spirit-as she struggles for understanding ("After telling the hard facts to anyone, from lover to friend, I have changed in their eyes"); as her dazed family and friends sometimes bungle their efforts to provide comfort and support; and as, ultimately, she triumphs, managing through grit and coincidence to help secure her attacker's arrest and conviction. In a narrative by turns disturbing, thrilling, and inspiring, Alice Sebold illuminates the experience of trauma victims even as she imparts wisdom profoundly hard-won: "You save yourself or you remain unsaved."' to 'The author describes the circumstances of her rape as an eighteen-year-old college freshman, the arrest and trial of her attacker, and her struggle to reclaim her shattered life...Title: .Lucky..Author: .Sebold, Alice..Publisher: .Little Brown & Co..Publication Date: .2002/09/16..Number of Pages: .12..Binding Type: .PAPERBACK..Library of Congress: .<a href=''http://lccn.loc.gov/BL2002011677'' target=''Library of Congress''>BL2002011677</a>

Dec 12, 2013 11:32PM (#60371416)

Edit page: https://www.goodreads.com/book/edits/...
Book page: https://www.goodreads.com/book/show/2...


message 132: by Empress (last edited Jan 10, 2014 07:50AM) (new)

Empress (the_empress) I was just told HERE:

Cait wrote: "Cait (tigercait) | 4747 comments I am pretty sure that the "Goodreads combined" indicates that a new edition was imported and autocombined with an existing edition with the exact same title. It looks like there was an amazon_kcw import of a Kindle edition with the same timestamp. "

IS it possible for that to be stopped as books with the exact same title are not always the same content? See HERE


message 133: by Ellie (new)

Ellie Loredan (ellieloredan) | 113 comments I don't know whether this has been addressed, but there's a language glitch concerning kindle editions imported by amazon_kcw:

The 'edition language' is always set to English although it should be German, French, Spanish, Italian, or Portugese respectively. Instead, the actual language is added in brackets to the title as '(German Edition)'/'(French Edition)'/'(Spanish Edition)'/etc.

Some random examples:

- German
https://www.goodreads.com/book/edits/...
https://www.goodreads.com/book/edits/...
https://www.goodreads.com/book/edits/...

- French
https://www.goodreads.com/book/edits/...
https://www.goodreads.com/book/edits/...
https://www.goodreads.com/book/edits/...

- Spanish
https://www.goodreads.com/book/edits/...
https://www.goodreads.com/book/edits/...
https://www.goodreads.com/book/edits/...

- Other languages
https://www.goodreads.com/book/edits/...
https://www.goodreads.com/book/edits/...


message 134: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Keith wrote: "The author names were all rather heavily appended with academic credentials. I fixed the primary, but left the secondaries as evidence for now. "

There seems to have been some problem with our credential/title/prefix stripping code during the import - and by seems I mean there was definitely a problem ;-)

I'm not sure if the issue came up when a name had more than one title (i.e. MD, PhD) or if the filter just wasn't being applied at the proper time.

There's a ticket in progress to clean up the authors created by this import and to try to rematch them with pre-existing authors.


message 135: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Michael wrote: "Solution 3: Yes. Note: It failed on me either ways. Furthermore, there seems to be something fishy in the logging process of description changes. Sometimes an edition will change the [default] description and you will not be able to find a trace of that change in the logs... Maybe that is linked? "

Thanks for the feedback - Ticket has been generated and hopefully we can put someone on that soon.


message 136: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Sandra wrote: "Also noticed that the Amazon imports sometimes have the publisher in the title, a popular eg Mills & Boon: Title of the Book.

Which makes it harder to combine (cause you have to go looking) and a..."


Thanks Sandra - We're talking about adding a few filters to book titles when importing (like searching for and removing the Author's name or the Publisher's name). If you see common patterns - like the one you mentioned in your post - definitely report them.


message 137: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments D.A. wrote: "For example if there was an author "Vinnie A. Peréz" a librarian would overwrite data feed import authors "Vinnie A. Peréz" "Vinnie A. Per z" "Vinnie A Peréz" "Peréz, Vinnie A. Peréz" or "Dr.Vinnie A. Peréz" with "Vinnie A. Peréz""

We're working on improving our name matching algorithm to include all these possible patterns. Like Cait says in her post below yours, no Librarian edits should be getting overwritten. Overwrites are definitely bugs. And we're working to improve our matching code so that it generates *less* work for you all!


message 138: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Ellie [The Empress] wrote: "Descriptions.

1. Is it possible tags to be replaced, removed?


I'll have to check what our standard practice is for feeds - but that's definitely something that's possible to improve.


2. Is it possible descriptions to be cleared of non-relevant informations such as OCL numbers and links?


This is a much trickier problem to do programatically. Removing content requires us to be able to match it to a specific pattern. We could potentially remove links and if OCL numbers are included in a systematic way we could try to weed them out.


message 139: by Empress (last edited Jan 10, 2014 08:47AM) (new)

Empress (the_empress) Sarah wrote: "This is a much trickier problem to do programatically. Removing content requires us to be able to match it to a specific pattern. We could potentially remove links and if OCL numbers are included in a systematic way we could try to weed them out."

I thought it might be difficult, but anyway decided to report. Maybe the script can just leave the description blank if it finds certain strings of data in the description? I've seen description that describe the physical condition of the book for example. I'm not sure what would be the better option.


message 140: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Ellie [The Empress] wrote: "I was just told HERE:

Cait wrote: "Cait (tigercait) | 4747 comments I am pretty sure that the "Goodreads combined" indicates that a new edition was imported and autocombined with an existing edit..."


This is a difficult problem to solve. You're absolutely correct that the same title + same author on two books does not necessarily indicate that they belong to the same work. However, the two choices here are essentially:

1. Make the assumption that two books by the same author with the same title are the same work - this will require exceptions to the rule to be manually corrected.

2. Never assume that two books are part of the same work - this would require every book to be manually added to the appropriate work.

Number 2 generates way more manual work, so we've opted for solution number 1. That being said, the code tries to make the best guess it can about whether a book belongs to a work or not.


message 141: by Empress (new)

Empress (the_empress) Sarah wrote: "Ellie [The Empress] wrote: "Number 2 generates way more manual work, so we've opted for solution number 1. That being said, the code tries to make the best guess it can about whether a book belongs to a work or not. "

Thank you. Is it possible a blacklist to be created for titles or author, because I separated the works again, and don't want them to be re-combined by the same script?


message 142: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Ellie wrote: "I don't know whether this has been addressed, but there's a language glitch concerning kindle editions imported by amazon_kcw:

The 'edition language' is always set to English although it should b..."


This is a known-ish issue. We were waiting for some changes to our data model to fix it, which I believe happened last night (I'll know more when the developers responsible for the change get into the office).

To avoid making this issue worse, the amazon_sable data import only imported english books - but we're still getting non-english books from amazon_kcw. We'll have to do some cleanup once the code fix goes out.


message 143: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Ellie [The Empress] wrote: "Thank you. Is it possible a blacklist to be created for titles or author, because I separated the works again, and don't want them to be re-combined by the same script?
"


I'll have to look into exactly what the sequence of events was for that particular combination before I can say for sure. The improvements we hope to make for our author and title matching algorithms should help some though.


message 144: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Ellie [The Empress] wrote: "Sarah wrote: "This is a much trickier problem to do programatically. Removing content requires us to be able to match it to a specific pattern. We could potentially remove links and if OCL numbers ..."

This issue should become less common as of yesterday afternoon - I dropped the priority of amazon_kcw to be below our usual onix feeds, so it should be less likely that an existing description will get overwritten by an Amazon merchant provided description.

The change will only affect new imports, though - we'll still have to deal with ones done previously.


message 145: by Cait (last edited Jan 10, 2014 09:11AM) (new)

Cait (tigercait) | 4988 comments Sarah wrote: "That being said, the code tries to make the best guess it can about whether a book belongs to a work or not."

One way might be to have the matching process check for librarian notes on the existing book and return no match if there is a note. You'd still have a problem where the note wasn't about combining, but often books with notes have librarians patrolling the area who would be more likely to notice that new combines are needed. (Of course, that might still fall on the side of "more work to combine stray editions than to separate incorrect ones", even so. Would it be possible to generate a report showing some recent matching action on books with notes so that a human could evaluate it?)


message 146: by Sarah (new)

Sarah M (sarahsomeone) | 85 comments Cait wrote: "One way might be to have the matching process check for librarian notes on the existing book and return no match if there is a note. "

I'll put that suggestion into our list of possible solutions :-) We can do some analysis of how many books have librarian notes to get a better sense of how much of the catalog would be affected by that policy.


message 147: by Cait (new)

Cait (tigercait) | 4988 comments I've come across an interesting twist on the edition combining where it matches on author name.

Here are two editions where I assume that amazon_sable created the first record under Jeff S. Smith on Dec 21 2013 and amazon_kcw created the second record under Jeff Smith on Jan 2 2014:
http://www.goodreads.com/work/edition...
Despite having different forms of the primary author name, these editions are combined, so I assume that they matched in some way (does the amazon_kcw feed include a list of other editions of the book?). When the second record matched as an edition of the first, it should have taken the first record's author name.

I might have the sequence of events wrong there, but here are some other examples:

http://www.goodreads.com/work/edition...
The Kindle edition was added by amazon_sable on Dec 21 2013 with a primary author of Jeff-1space-Smith but combined with two existing editions which at that time already had a primary author of Jeff-11space-Smith -- those I can confirm already existed with that disambiguated author name.

http://www.goodreads.com/work/edition...
Again, the Kindle edition was added by amazon_kcw on Jan 2 2014 with an author of Jeff-1space-Smith and matched to existing editions with Jeff-3space-Smith.

http://www.goodreads.com/book/show/19...
This is another Kindle edition which ought to have matched to the previous one but did not, created by amazon_kcw on Nov 29 2013 (was there a change in the matching after that?).

I've left all of these records as-is so that you can see the primary authors -- I'll come back for them in a bit. I consider the many, many Jeffs Smith out there one of my librarian responsibilities, as many be evidenced by the zebra-striping of notes on the Jeff-1space-Smith combine page. :)


message 148: by Cait (new)

Cait (tigercait) | 4988 comments Possible double import by amazon_kcw:

https://www.goodreads.com/book/show/1...
https://www.goodreads.com/book/show/1...

These appear to have been created ~2 seconds apart on Dec 12 2013 by amazon_kcw. They were not combined with each other (or any other editions) despite being, as far as I can see, identical. Perhaps they came in with different id numbers and both failed a title match, prompting separate creation of alternate records -- is that something which can be checked?

(As for the question of the match, I don't think there's much that can be done about that when the record comes in as a library-bound edition with a publisher of "Turtleback" instead of the actual publisher, is there? If there is, there is an edition already existing for "The Dragonslayer (Bone)" which would match to these books' "The Dragonslayer (Turtleback School & Library Binding Edition) (Bone (Prebound))" if all of the extraneous formatting notes were stripped out of the title string.)


message 149: by Cait (new)

Cait (tigercait) | 4988 comments Cait wrote: "One way might be to have the matching process check for librarian notes on the existing book and return no match if there is a note."

Sarah wrote: "I'll put that suggestion into our list of possible solutions :-) We can do some analysis of how many books have librarian notes to get a better sense of how much of the catalog would be affected by that policy."


Yay! :)


message 150: by Empress (new)

Empress (the_empress) I've noticed some changes in titles.


isbndb updated the book Out Of It by Stuart Walton
title: 'Out of It: A Cultural History of Intoxication' to 'Out Of It: A Cultural History Of Intoxication'

And Onix has done a similar import:
https://www.goodreads.com/topic/show/...

They are corrected now.


back to top
This topic has been frozen by the moderator. No new comments can be posted.