Keith’s
Comments
(group member since Sep 19, 2008)
Keith’s
comments
from the Goodreads Librarians Group group.
Showing 301-320 of 377

Oh, well that should make life a little easier. :)

And thanks again for keeping the ball rolling.

I hadn't read her post as suggesting that either at first, but now that you mention it.... Hard to tell who's responsing to what without nested threading. Sorry for the misunderstanding.
At any rate, since the database is clearly able to handle the UTF-8 encoding that makes Cyrillic and Persian possible, I would certainly support a move toward proper accuracy. A tool for browsing recently created authors (which are often non-standard, like "Mile Zola") would be quite handy for fixing the mangled imports more quickly. I would also, however, reiterate my call for the Librarian Manual to explicitly state any such policy, lest such corrections be incorrectly undone by well-meaning but uninformed librarians, and add a call for the search engine to better support it than it presently does. (Honestly, I spend more time using Google with a "site:goodreads.com" parameter, but that's always outdated and not a viable long-term solution.)
Finally, if we were to more officially support Paula's suggestion of multiple profiles in different charsets for the same author, I think we would need a more clearly articulated policy about those as well. When I find those, I sometimes try to ensure that the unusual ones are linked to the more canonical profile, but they frequently get merged, and recreated, and deleted, and recreated, and remerged, etc., which is frustrating an not especially useful. Some way at least of indicating which profile is the master would probably be helpful. E.g., should we ensure that Фёдор Достоевский is the main author on all the Fyodor Dostoyevsky books because that's how he spelled it, or vice-versa because the latter is more common here?

Thanks.
Sandra wrote: "Keith, if you look you'll find that all librarians lost edits, most were the inflated edits from when the script to save books was run, so not really any overall loss in numbers.
Yeah, I figured I probably wasn't the only one; just thought this might be the reason, but Rivka seems to have better information, unsurprisingly.

So does that mean, now that we're not importing from Amazon anymore, items like DVD movies with an ISBN also can be deleted rather than NABed? I've tended to stay away from the NAB work because I've never understood the policy or technique as well as others, and with the old imports it seemed kinda fruitless, but maybe it's easier now.

I never suggested that; perhaps you misread what I did suggest about the description field.
As for the issue of "correctness" as rightly brought up by others, I believe there is a balance to be struck between that ideal and utility. Else, instead of Fyodor Dostoyevsky, we should have Фёдор Достоевский, or for that matter Фёдор Михайлович Достоевский which is properly his name, especially given the common variant transliterations of Dostoevsky, Dostoievski, etc. But only Russian editions will import with Cyrillic characters, and ain't nobody gonna search on that.
Whatever eventually gets decided, this should be addressed clearly in the Author section of the Librarian Manual, which is at present entirely silent on the matter.

I made a longer post in this linked thread thread about the issue of diacritics in the author name field, but the short version is that, in at least the case of authors who predominately write in the English language, I suggest that we should consider leaving the diacritics out of the name field, and include them in the description instead.
(Of course, this whole issue could be obviated by the long awaited author-alias feature, by which we could link Chögyam Trungpa with Chogyam Trungpa, ཆོས་ རྒྱམ་ དྲུང་པ་, Chos rgyam Drung pa, Dorje Dradül, etc.)


I notice that lots and lots of books from both manual entry and database import come in without diacritic marks. When that happens, if we have the right diacritics in the name field the author page, the import gets tagged to a new author profile, which it usually takes a while for someone to find and merge into the correct one (or worse, for someone to find and merge the correct one into, losing the bio & links and breaking all the old links). Then more imports are run, making another new profile, lather, rinse, repeat....
For example, we've already got Emile Zola back again, waiting to be merged, and that from users who entered other diacritic marks correctly. Plus Mile Zola as the case where the high-ASCII characters get dropped completely before or during import. And of course our search doesn't handle diacritics well at all.
I wonder if it might not be better to update policy such that, for European-language authors (including English), the name field should (generally) contain only low-ASCII characters; i.e., no diacritics. Then we could have the description field start with the fully qualified name, with any appropriate honorifics, AKAs, etc. This would seems to streamline the maintenance process, without losing data or accuracy. [Note that I am NOT suggesting that such a policy would be applied to authors writing primarily in other charsets; Persian authors published in Persian get their names entered in Persian script, Japanese in Japanese, etc., as is presently the case.]
For an example with no diacritics in the name field, and multi-charset fully qualified names in the description, see Jamgon Kongtrul Lodro Taye (who needs all four names to differentiate him from all of his successors).

For example, I suspect that the former entry for Tibetan author Jamyang Khentse Chokyi Lodro disappeared from a combination of removing Amazon data, imports overwriting the author name with every conceivable variant transliteration (all of which I had previously spent several days merging under the formal disambiguated name) leaving nothing in this record but the detailed author description, which the script then ignored, deleting the record.
So I have two requests:
1) We've been repeatedly assured that import data will not overwrite librarian edits. This is not true; many of my author name edits, especially among Tibetan authors, have been overwritten, often by very corrupt data (though mostly just adding honorifics like Chogyal, Rinpoche, Reverend, etc. back into the name field). If it is possible to tighten this up, I would really appreciate it—repeatedly researching the same authors in a niche field is beginning to get tedious and annoying, especially for the obscure ones that the publishers don't know how to treat either.
2) In addition to no books, quotes, etc., could we have the script ignore authors that have descriptions? If that's too tight, maybe we can differentiate real from bogus/pointless descriptions by setting a minimum length of, say, 256 characters.
Regardless of the outcome, thank you for considering these requests.
Feb 24, 2012 03:29PM
Feb 22, 2012 07:20PM

http://www.goodreads.com/book/show/66...
pub_date is 65535 and orig_date is 32767, clearly indicating that something is being brought in as binary data which should not be. Source for both is listed as Worldcat, which has correct pubdate in [brackets].
I have refrained from fixing any of these so the bug can be researched.
Feb 21, 2012 06:16PM
Feb 14, 2012 04:37PM


I note, in case another librarian comes along, that contrary to what I had thought, the Librarian Manual does say:
The number of pages should include the relevant text of the book -- glossaries, appendices, author's notes, etc should be included in the page count if they are labeled with regular numbers, but not if they are labeled by Roman Numerals.I get that we wouldn't want to insist on consistency, and as noted in many cases it makes little difference, but in many other cases the size of the book is markedly impacted by this exclusion, and in terms of user statistics and progress updates it would leave out about 20% of my own reading, as a ballpark figure.
Feb 14, 2012 03:44PM

http://www.goodreads.com/book/show/23...
Most of them appear to be non-English editions, and the bulk show the edits made as:
url: '' to '' (undo)
author_role: '' to '' (undo)
There was a significant run of them between Feb 12, 2012 10:32pm and Feb 13, 2012 07:54am when I wasn't even online. http://www.goodreads.com/book/show/66... which I am pretty sure I eventered entirely by hand once upon a time, shows a bunch of updates from Ingram, but most of the fields show "Chris Feldman" as the source. ("Dri" also incorrectedly changed original_title: 'The Golden Bough' to 'The Golden Bough - Illustrated' and I'm going to fix that as soon as I save this reply.)
There also was a run on Jan 24, but I presume that was the "save Amazon" import. I still intend to go back through all of those to see what it goofed.

(Or at least that's what we were doing when I started as a librarian; I haven't checked for updates to this recently.)
This can be particularly significant, as some works have over 100 Roman-numeraled pages.
Feb 14, 2012 02:59PM

I looked for a post from Otis or an employee to see if there was some process executing which I should be aware of, but I didn't see anything, so I figured I'd better just ask what's up. Maybe it's a bug, or maybe it's just a side-effect of a process we weren't warned about, but which is the case would be nice to know.
