Goodreads Feedback discussion

283 views
Bugs > Is ingram import overwriting with WRONG info?

Comments (showing 1-50 of 94) (94 new)    post a comment »
« previous 1

message 1: by Peter (new)

Peter (pete_C) | 433 comments Looking at this change log: http://www.goodreads.com/book/edits/3... it looks like the book was originally imported from isbndb and then from ingram.

The data from isbndb http://isbndb.com/d/book/singing_sand... has the correct values for the fields filled in by ingram on Feb 07, 2012 05:22pm (#23898656), as verified by WorldCat http://www.worldcat.org/title/singing-sands/oclc/17732098&referer=brief_results.
ingram updated the book The Singing Sands by Josephine Tey
format: '' to 'Paperback' (undo)
num_pages: '' to '240' (undo)
publisher: '' to 'Scribner Book Company' (undo)
language_code: '' to 'eng' (undo)
The correct data is
num_pages: 222
publisher: Collier Books
I have since corrected the record, but thought to report the error here.


message 2: by MissJessie (new)

MissJessie | 1696 comments Yes they are, I just made a report re problems with authors. Sorry, didn't see this one or would have appended here.


message 3: by Lobstergirl (new)

Lobstergirl | 5196 comments It's not so wrong you can't recognize it, but it's still irritating when ingram overwrites the correct title "An Accidental Autobiography" with "Accidental Autobiography Pa." (I changed it.)

An Accidental Autobiography


message 4: by Peter (new)

Peter (pete_C) | 433 comments Why is bad data from Ingram overwriting good data either entered by members or from earlier imports?

For that matter, why is the data from Ingram bad?

Amazon at least had the excuse that the data came from their sellers.


message 5: by Petra X (new)

Petra X (PetraX) | 5206 comments Ingram get their data from the publishers. It's usually very good, my most reliable source of just about everything to do with books, but sometimes it falls down. Doesn't everything?


message 6: by Peter (new)

Peter (pete_C) | 433 comments Petra X wrote: "Ingram get their data from the publishers. It's usually very good, my most reliable source of just about everything to do with books, but sometimes it falls down. Doesn't everything?"

No, everything doesn't fall down, even sometimes. Properly verified data being manipulated by verified correct programming will always produce consistent results, and they will be correct.

It just seems a bit odd that they would get authors and/or titles wrong if they get their data from the publishers. Even worse is reporting the wrong publisher, as in substituting Scribner Book Company for Collier Books. (Come to think of it, where do they get their data for out-of-print books from now-defunct publishers?)

I know that there is a bit of buying and selling of imprints by publishers, and that some publishers have gone out of business and had their imprints bought, but Ingram ought to be able to keep that sorted out. (After all, they do have the publication date to help keep the record straight.)

The only other explanation I can think of is either a bug in Ingram's API, or GR's dev team didn't use the API correctly.

Either way, it seems that the import problems we are having with their data may be indicative of a problem; that should be investigated. I leave it to wiser heads than mine to say whether further imports from Ingram should continue until the problem is diagnosed and fixed (if it can be).


message 7: by Petra X (new)

Petra X (PetraX) | 5206 comments Peter wrote: "No, everything doesn't fall down, even sometimes"

Ingrambook is the largest supplier of English language books in the world. It is reliant on the publishers of the books it distributes for data. With the small profit margins on books you think they can check physically the data they receive from the publisher with the actual book? Well perhaps they could in a perfect world, but not in the real one.

What huge organisation do you know of that has never fallen down on anything, ever?


message 8: by [deleted user] (new)

I've seem my own name and the name of the book spelled wrong in publishers' materials.


message 9: by Peter (new)

Peter (pete_C) | 433 comments Petra X wrote: "What huge organization(It says something for the publisher do you know of that has never fallen down on anything, ever? "

According to IBM, IBM.

There are also a few governmental units that make the same claim, not to mention a few dictators.

And, of course, there is the papacy, if you hold to the doctrine of papal infallibility.

Osho wrote: "I've seem my own name and the name of the book spelled wrong in publishers' materials."

I'm not referring to spelling errors, though they should be caught by the publishers' proofreaders. (Though it does say something about the publisher if their own materials aren't proofread.) Have you seen publisher's materials listing books on their list that they did not publish? How about attributing one of their works to the wrong author? These are the kinds of errors we are finding regularly.

Petra X wrote: "With the small profit margins on books you think they can check physically the data they receive from the publisher with the actual book?"

Actually, they should have a quality control checkpoint on receiving books from publishers, that checks the books arriving against the ones ordered. They should also have a quality control checkpoint on order fulfillment, that checks the books being shipped against the ones the customer ordered. The pick list should have the ISBN, author, and title. If the order list at arrival or departure doesn't match, a flag should be raised, resulting in the data getting corrected. (This is the way it works for most large manufacturers these days. Items are checked both going in to inventory and when they come out again. Orders are checked at entry and again at shipping. It has become just too expensive to fix mistakes that get out the door. For every step down the processing line, costs of fixing errors go up by a factor of 2 to 20. Mistakes getting out the door can lose customers.)


message 10: by Nenangs (new)

Nenangs | 116 comments sounds like a QAQC lecture. :)


message 11: by Petra X (new)

Petra X (PetraX) | 5206 comments Maybe you should write to Ingram offering to take one of those low-paid jobs and promising that you will do it perfectly, you will never, ever, ever make an error, just like your examples, IBM, the Pope and ... well, let's leave out the dictators!


message 12: by Peter (new)

Peter (pete_C) | 433 comments Nenangs wrote: "sounds like a QAQC lecture. :)"

Sure did. I've sat through too many, which is why I tried to keep the last parenthetical section short. LOL


message 13: by MissJessie (new)

MissJessie | 1696 comments I agree that the data from Ingram is very poor for cataloging the GR site.

But it probably, bad as it is, suits their needs. I doubt they set it up to serve as a perfect database for the world.

I have collections inventoried for my personal use that certainly would be error-ful for anyone else, but suit me just fine. I think that's the case at Ingram.

That said, it's a right pain in the a.. and I wish there was better available and in use here at GR.


message 14: by Peter (new)

Peter (pete_C) | 433 comments Thanks, @MissJessie. You have put the matter very succinctly. Appreciated.

I still challenge GR staff with this question (originally posted here in Post #4, above): Why is bad data from Ingram overwriting good data either entered by members or from earlier imports?


message 15: by Peter (new)

Peter (pete_C) | 433 comments According to this librarian change log, ingram has overwritten most of the information on this book with bad information. It was/is combined with The Great Train Robbery > Editions by Michael Crichton. The ISBN/ISBN13 (0788740385/9780788740381) belong to the Crichton book per WorldCat.

I have not fixed it so that GR staff can investigate and mitigate the cause. GR staff, please post when the history information is no longer needed so I can fix.


message 16: by Lobstergirl (new)

Lobstergirl | 5196 comments The Riverside Shakespeare (containing his complete works, as far as I know), according to Ingram, is not by Shakespeare, but by G. Blakemore Evans.

http://www.goodreads.com/book/show/14...


message 17: by Alessandra (new)

Alessandra | 183 comments Le sigh. I've randomly run into a few Ingram-derived errors and I wasn't even particularly looking. Been correcting what I spot, of course.


message 18: by Peter (new)

Peter (pete_C) | 433 comments Alessandra wrote: "Le sigh. I've randomly run into a few Ingram-derived errors and I wasn't even particularly looking. Been correcting what I spot, of course."

I really haven't been looking, and it has been pretty obvious to me. I have been fixing the minor ones, of course, but when they are too egregious, I report them to GR first so bugs can get fixed. Once the techs no longer need the error as data to track the error, I fix it.

I hope you haven't run into too many.


message 19: by Douglas (new)

Douglas All the errors that I've seen in the last week or so relating to data quality arise from poor data from Ingram, where bad data overwrote good.
e.g. changes ## 24460531, 25016841, 24975403
I've already corrected these.


Susanna - Censored by GoodReads (SusannaG) | 1214 comments Alessandra wrote: "Le sigh. I've randomly run into a few Ingram-derived errors and I wasn't even particularly looking. Been correcting what I spot, of course."

Same here - I think.


message 21: by Peter (new)

Peter (pete_C) | 433 comments Douglas wrote: "All the errors that I've seen in the last week or so relating to data quality arise from poor data from Ingram, where bad data overwrote good."

OK GoodReads staff. What's up with this?
Why is Ingram allowed to overwrite data from other sources?
If their data quality is so bad, why aren't they at the bottom of the totem pole?

@Rivka? @Kara? @Brian? @Otis? Anybody there?


message 22: by rivka, librarian moderator (new)

rivka | 11604 comments Mod
Their data is not "so bad". Like all our data sources, some of their info is great, and some not so great. As for the hierarchy of what overwrites what (or does not), that has a number of elements. Data quality is one.


message 23: by MissJessie (new)

MissJessie | 1696 comments One? What could be more important, as a matter of some interest? Bad data is usually worse than no data, is it not?


message 24: by Velma (new)

Velma (velmalikevelvet) | 107 comments All the erroneous data I've seen this past 2 weeks has been from Ingram as well, & I agree with MissJessie re: no data > bad data.

I would like to see Otis or another staffer comment on this problem. Wasn't there a statement from GR once upon a time to the effect that Ingram data would NOT override user-generated data?


message 25: by rivka, librarian moderator (new)

rivka | 11604 comments Mod
It does not, except in cases where the system cannot tell that the data is user-edited. Brian spent a lot of time running scripts to dig up source info from logs, but in some cases it was unable to determine the source. That usually meant it was ascribed to the original importer as the source, which most often meant Amazon.

Ingram was the source of most of the data that replaced Amazon's.


message 26: by Douglas (new)

Douglas I could well believe that data from Ingram is 99% good; I've no evidence one way or the other.
I think perhaps that we here tend to concentrate on the 1% or 0.1% which is less than accurate; it's something in our nature ...


message 27: by Vicky (last edited Feb 29, 2012 09:16AM) (new)

Vicky (vnorthw) | 803 comments I think it also begs mention that once upon a time there were a lot of AMAZON imports that needed to be updated. The data quality of Goodreads was due largely in part to librarians.

For arguments sake, let's pretend that 25% of the books imported from Amazon along the way needed edits, but the other 75% came in perfect.

When Goodreads switched to primarily Ingram imports, it's possible that they have the same percentages (or higher) of correct and incorrect records. EXCEPT that Ingram's 25% overlap directly with the 75% of correct Amazon data which makes it look like Ingram has more incorrect data.

The same can be said about the data from ISBNdb vs. Ingram. In the particular case illustrated in the original post, ISBNdb had the correct data. However, that doesn't mean that ISBNdb has an OVERALL better quality of data. If Ingram has an overall higher quality of data, it should absolutely be the highest priority when it comes to importing data. Some things will be wrong, but more things will be right.

Also going back to the OP, I think the page numbers and publishers will always be a bit wonky regardless of the data source. Page numbers are always a problem - some publishers include all previews in their page count, some count the number of physical pages, etc.

I've seen many strange things with publishers. In this particular case, Collier Books and Scribner were both publishers acquired by Macmillan Publishing which was then bought out by Simon & Schuster. The discrepancy here is likely due to the internal structure of acquisitions and imprints within Simon & Schuster.

Absolutely I think that user input should be ranked higher than any import (which Goodreads has stated many times previously, with the exception from Rivka above). But as far as imports from outside data sources go - I trust that Goodreads researched these things prior to setting up the algorithms that control imports. If they think that Ingram has a higher data quality than ISBNdb, then I trust their judgement.


Snail in Danger (Sid) Nicolaides (upsight) | 210 comments I'm not sure if I should start a new thread for this or not, but it seems like some weird stuff is going on with imports. I happened to sort my all shelf by date of publication, looking to see what was coming up, when I saw a couple things that didn't look right.

The Art of the Sonnet had its information (publication date and description) replaced by those belonging to The Art of Robert Frost. They both start with "The Art of" and concern Robert Frost, but they have different authors and different publishers. I corrected this but the old values should be visible in the librarian changelog, I assume.

Then there's In God's Shadow: Politics In The Hebrew Bible, which I think is actually the hardcover edition of Thinking Politically: Essays in Political Theory, with which it's still combined. There seems to have been some kind of signal-crossing concern it and In God's Shadow: Politics in the Hebrew Bible. Leaving this one as is in case a programmer wants to take a look at it. (Or in case someone else wants to explain that I've misunderstood something and what it is - which is always a possibility.)


message 29: by Peter (new)

Peter (pete_C) | 433 comments Vicky wrote: "Also going back to the OP, I think the page numbers and publishers will always be a bit wonky regardless of the data source. Page numbers are always a problem - some publishers include all previews in their page count, some count the number of physical pages, etc. "

OK, I can go along with that.
Now, can you explain what happened in post 15 and post 16?


message 30: by Vicky (new)

Vicky (vnorthw) | 803 comments Post 15, I have no idea. Probably just a really bad error on Ingram's part. I did find two sites that refer to An Excellent Mystery with those ISBN numbers, but I wouldn't be surprised if they were sourced from Ingram themselves.

Post 16, according to WorldCat, G. Blakemore Evans is the editor of that particular edition - so he isn't quite as far fetched as one might think.


message 31: by Moloch (new)

Moloch Sometimes, when combining an Italian edition for which I've provided a description with the others, the description is set to the default one.


message 32: by Lobstergirl (new)

Lobstergirl | 5196 comments This is just sad. Rodgers Hamm??

http://www.goodreads.com/book/show/56...


message 33: by Peter (new)

Peter (pete_C) | 433 comments Lobstergirl wrote: "This is just sad. Rodgers Hamm??

http://www.goodreads.com/book/show/56..."


As is this one:
Oklahoma!  E-Z Play Today Volume 78 by Richard RodgersOklahoma!: E-Z Play Today Volume 78
Do you think they could have meant Rodgers and Hammerstein?


message 34: by Lobstergirl (new)

Lobstergirl | 5196 comments Yes....someone over there is, how shall we say, extremely lazy?


message 35: by Peter (new)

Peter (pete_C) | 433 comments Lobstergirl wrote: "Yes....someone over there is, how shall we say, extremely lazy?"

No disagreement from me.

If we are going to use them as a "high quality data source", one could wish there was some way to help them fix some of their errors -- one that would help them find and fix the source of their errors.


message 36: by Lee (last edited Mar 15, 2012 11:54PM) (new)

Lee | 21 comments So last week I put an author's books all together under a uniform name, & spent quite a bit of time doing it (no less than four variations in his name). Everything all nice & tidy & in one spot. Onix has undone my work - by putting the author variants back in as an additional author! I hate to think of all the other things that are being overwritten or added back in that shouldn't be. If there is a typo in their database it seems to me that it's going to be added. I guess there is not really much point in trying to fix it again & delete the unnecessary author?


message 37: by Peter (last edited Mar 16, 2012 01:15AM) (new)

Peter (pete_C) | 433 comments Lee wrote: "If there is a typo in their database it seems to me that it's going to be added. I guess there is not really much point in trying to fix it again & delete the unnecessary author? "

I've been doing a lot of the same kind of corrections, too.

Maybe one of the devs can fix the scripts to be able to handle this. (ie. If a spelling of an author has been removed by a librarian from an entry, then it should not be allowed to be re-added by a script. This is almost the same as overwriting librarian-entered data.)

This may be a fairly common problem, and a cause of assigning books to wrong authors. Since we use extra spaces to disambiguate authors, and we do have history showing authors imported with extra spaces (not matching our disambiguation scheme), this particular problem may an area of concern.

As an example, check out A Kiss Before Dying.


message 38: by MissJessie (new)

MissJessie | 1696 comments Lobstergirl wrote: "Yes....someone over there is, how shall we say, extremely lazy?"

Repeat of message 13

I agree that the data from Ingram is very poor for cataloging the GR site.

But it probably, bad as it is, suits their needs. I doubt they set it up to serve as a perfect database for the world.


message 39: by Lee (new)

Lee | 21 comments Peter wrote: "Lee wrote: "If there is a typo in their database it seems to me that it's going to be added. I guess there is not really much point in trying to fix it again & delete the unnecessary author? "

I'v..."


Good point. Unfortunately this will likely involve diacritics as well, though I have not noticed that so far.
I do hope it's something GR is looking into though - spending time to fix & combine author variants cleans up the database but if work is going to be undone or needs to be redone because of updates from Onix/Ingram...I might swear.


message 40: by Peter (new)

Peter (pete_C) | 433 comments Lee wrote: "I do hope it's something GR is looking into though - spending time to fix & combine author variants cleans up the database but if work is going to be undone or needs to be redone because of updates from Onix/Ingram...I might swear. "

Swearing is the least of it. What I'm really afraid of is that some of our dedicated Librarians might decide that it isn't worth the Sisyphean battle, and put their energies elsewhere.


message 41: by MissJessie (new)

MissJessie | 1696 comments This one has.


message 42: by Lee (new)

Lee | 21 comments MissJessie wrote: "This one has."

I was thinking the same thing myself, but not in such eloquent terms as Peter stated it.
Perhaps we can now catch up on the reading we've all set aside in order to take care of the database?


message 43: by Lobstergirl (new)

Lobstergirl | 5196 comments I've been seeing a lot of instances where things which had been NAB'd have been unNAB'd by an import override.

Undoing countless hours of work.


Elizabeth (Alaska) | 10372 comments has the import been completed, and now we're just getting new ISBNs?


message 45: by Peter (last edited Mar 16, 2012 02:44PM) (new)

Peter (pete_C) | 433 comments Elizabeth (Alaska) wrote: "has the import been completed, and now we're just getting new ISBNs?"

I don't think what we are talking about here is just about the big data import. At least, I'm not, and you make a good point.

However, I'm seeing things which I fix being undone. Sometimes, if I go to re-fix it, I see that the same data, from the same source (ingram, onix, etc.), has replaced my correction.

Is anyone else having this happen?


Elizabeth (Alaska) | 10372 comments I asked because I don't see why we're continuing to import data on ISBNs that already exist. I understood in February, but this is mid-March.


message 47: by Lobstergirl (new)

Lobstergirl | 5196 comments Data is being "refreshed." (This is my assessment, not info from management.) I've noticed a ton of Vintage (the imprint) books which I have shelved being updated with new publication dates; e.g., an edition from 1991 will suddenly have a 2012 publication date. It's annoying as #%&*.


Elizabeth (Alaska) | 10372 comments I think I get that - what I don't understand is why and how?


message 49: by Peter (new)

Peter (pete_C) | 433 comments Lobstergirl wrote: "Data is being "refreshed." (This is my assessment, not info from management.)"

This is something that I've complained about several times when it has happened to covers. Publishers seem to want their latest cover associated with the ISBN and the GR scripts just let them update the covers.

If I had my druthers, I would not allow cover images to be changed except by human beings, with an "Are you sure?" double check. Automatic image additions should be limited to the cases where there is no cover already loaded.


message 50: by Lobstergirl (new)

Lobstergirl | 5196 comments Yes, I've complained about that cover "refreshing" as well, in other threads.


« previous 1
back to top

unread topics | mark unread


Books mentioned in this topic

An Accidental Autobiography (other topics)
The Art of Robert Frost (other topics)
The Art of the Sonnet (other topics)
Thinking Politically: Essays in Political Theory (other topics)
Thinking Politically: Essays in Political Theory (other topics)
More...

Authors mentioned in this topic

Plato (other topics)
Homer (other topics)
Marcus Tullius Cicero (other topics)
Confucius (other topics)
Aristotle (other topics)
More...