Keith Keith’s Comments (group member since Sep 19, 2008)


Keith’s comments from the Goodreads Librarians Group group.

Showing 301-320 of 377

Mar 01, 2012 03:31AM

220 rivka wrote: "However, for any that have never been shelved, you should see a link on the edit page to mark it Not A Book."

Oh, well that should make life a little easier. :)
Mar 01, 2012 03:29AM

220 I think if we get #2, then #1 will be easier to live without. Please let us know, if and when it is known, what edits the system cannot tie to a user, so I/we can ensure that we work around the limitation when possible. (E.g., if I know that an author-merge is going to leave vulnerable data, I might be more inclined to edit every edition of every book tied to the "wrong" author individually before merging, or at least one edition of each work to ensure something remains correct just in case. That's a lot of work, but it's less work than finding and remerging them all every month.)

And thanks again for keeping the ball rolling.
Emile Zola (19 new)
Feb 28, 2012 05:20AM

220 Paula wrote: "That was in response to Cornelia"

I hadn't read her post as suggesting that either at first, but now that you mention it.... Hard to tell who's responsing to what without nested threading. Sorry for the misunderstanding.

At any rate, since the database is clearly able to handle the UTF-8 encoding that makes Cyrillic and Persian possible, I would certainly support a move toward proper accuracy. A tool for browsing recently created authors (which are often non-standard, like "Mile Zola") would be quite handy for fixing the mangled imports more quickly. I would also, however, reiterate my call for the Librarian Manual to explicitly state any such policy, lest such corrections be incorrectly undone by well-meaning but uninformed librarians, and add a call for the search engine to better support it than it presently does. (Honestly, I spend more time using Google with a "site:goodreads.com" parameter, but that's always outdated and not a viable long-term solution.)

Finally, if we were to more officially support Paula's suggestion of multiple profiles in different charsets for the same author, I think we would need a more clearly articulated policy about those as well. When I find those, I sometimes try to ensure that the unusual ones are linked to the more canonical profile, but they frequently get merged, and recreated, and deleted, and recreated, and remerged, etc., which is frustrating an not especially useful. Some way at least of indicating which profile is the master would probably be helpful. E.g., should we ensure that Фёдор Достоевский is the main author on all the Fyodor Dostoyevsky books because that's how he spelled it, or vice-versa because the latter is more common here?
Feb 28, 2012 05:00AM

220 rivka wrote: "2) That seems reasonable. I'll ask for that to be added to the script."

Thanks.

Sandra wrote: "Keith, if you look you'll find that all librarians lost edits, most were the inflated edits from when the script to save books was run, so not really any overall loss in numbers.

Yeah, I figured I probably wasn't the only one; just thought this might be the reason, but Rivka seems to have better information, unsurprisingly.
Feb 28, 2012 04:55AM

220 rivka wrote: "In any case, there's no reason to put non-ISBN items in Not A Book. We would just delete any that don't belong here."

So does that mean, now that we're not importing from Amazon anymore, items like DVD movies with an ISBN also can be deleted rather than NABed? I've tended to stay away from the NAB work because I've never understood the policy or technique as well as others, and with the old imports it seemed kinda fruitless, but maybe it's easier now.
Emile Zola (19 new)
Feb 25, 2012 02:07PM

220 Paula wrote: "I do not agree with adding both variants of name to the author name field - that is bad practice and just wrong in database terms."

I never suggested that; perhaps you misread what I did suggest about the description field.

As for the issue of "correctness" as rightly brought up by others, I believe there is a balance to be struck between that ideal and utility. Else, instead of Fyodor Dostoyevsky, we should have Фёдор Достоевский, or for that matter Фёдор Михайлович Достоевский which is properly his name, especially given the common variant transliterations of Dostoevsky, Dostoievski, etc. But only Russian editions will import with Cyrillic characters, and ain't nobody gonna search on that.

Whatever eventually gets decided, this should be addressed clearly in the Author section of the Librarian Manual, which is at present entirely silent on the matter.
Feb 24, 2012 08:11PM

220 The main problem with diacritics in the name field is that the majority of imports are going to mangle them, thereby continually creating new authors that need to be merged with the "canonically correct" spelling. And most users are going to look or search for the spelling without diacritics simply because those are easier to type. This aside from the fact that, in transliterated languages like the Tibetan which is the source for Chögyam Trungpa, even the version with diacritics isn't really canonically correct.

I made a longer post in this linked thread thread about the issue of diacritics in the author name field, but the short version is that, in at least the case of authors who predominately write in the English language, I suggest that we should consider leaving the diacritics out of the name field, and include them in the description instead.

(Of course, this whole issue could be obviated by the long awaited author-alias feature, by which we could link Chögyam Trungpa with Chogyam Trungpa, ཆོས་ རྒྱམ་ དྲུང་པ་, Chos rgyam Drung pa, Dorje Dradül, etc.)
Feb 24, 2012 07:52PM

220 I would agree with vicky_girl, FWIW. There have even been movies based on books that started as blogs, so the mere fact that it once existed as a blog shouldn't have any bearing on the matter once it becomes a book (or book-on-CD, or ebook, or....)
Emile Zola (19 new)
Feb 24, 2012 07:34PM

220 Since the recent imports, I've been wondering about the advisability of diacritic marks in the author name field.

I notice that lots and lots of books from both manual entry and database import come in without diacritic marks. When that happens, if we have the right diacritics in the name field the author page, the import gets tagged to a new author profile, which it usually takes a while for someone to find and merge into the correct one (or worse, for someone to find and merge the correct one into, losing the bio & links and breaking all the old links). Then more imports are run, making another new profile, lather, rinse, repeat....

For example, we've already got Emile Zola back again, waiting to be merged, and that from users who entered other diacritic marks correctly. Plus Mile Zola as the case where the high-ASCII characters get dropped completely before or during import. And of course our search doesn't handle diacritics well at all.

I wonder if it might not be better to update policy such that, for European-language authors (including English), the name field should (generally) contain only low-ASCII characters; i.e., no diacritics. Then we could have the description field start with the fully qualified name, with any appropriate honorifics, AKAs, etc. This would seems to streamline the maintenance process, without losing data or accuracy. [Note that I am NOT suggesting that such a policy would be applied to authors writing primarily in other charsets; Persian authors published in Persian get their names entered in Persian script, Japanese in Japanese, etc., as is presently the case.]

For an example with no diacritics in the name field, and multi-charset fully qualified names in the description, see Jamgon Kongtrul Lodro Taye (who needs all four names to differentiate him from all of his successors).
Feb 24, 2012 06:25PM

220 Ah, that might explain what happened to some author entries I worked very hard to correct which are now gone (along with any record of my previous edits in my edit history, which is now showing 19K edits where last week it showed 22K, which probably means about 50=100 hours of work lost). Hopefully, this will be a one-time problem resulting from the combination of losing Amazon, importing Ingram & Worldcat, then running the cleanup script.

For example, I suspect that the former entry for Tibetan author Jamyang Khentse Chokyi Lodro disappeared from a combination of removing Amazon data, imports overwriting the author name with every conceivable variant transliteration (all of which I had previously spent several days merging under the formal disambiguated name) leaving nothing in this record but the detailed author description, which the script then ignored, deleting the record.

So I have two requests:

1) We've been repeatedly assured that import data will not overwrite librarian edits. This is not true; many of my author name edits, especially among Tibetan authors, have been overwritten, often by very corrupt data (though mostly just adding honorifics like Chogyal, Rinpoche, Reverend, etc. back into the name field). If it is possible to tighten this up, I would really appreciate it—repeatedly researching the same authors in a niche field is beginning to get tedious and annoying, especially for the obscure ones that the publishers don't know how to treat either.

2) In addition to no books, quotes, etc., could we have the script ignore authors that have descriptions? If that's too tight, maybe we can differentiate real from bogus/pointless descriptions by setting a minimum length of, say, 256 characters.

Regardless of the outcome, thank you for considering these requests.
220 Got it, thanks Rivka.
220 I think I may have stumbled across a related data import problem. The exemplar is here, though there are many to choose from just within Fyodor Dostoyevsky:
http://www.goodreads.com/book/show/66...

pub_date is 65535 and orig_date is 32767, clearly indicating that something is being brought in as binary data which should not be. Source for both is listed as Worldcat, which has correct pubdate in [brackets].

I have refrained from fixing any of these so the bug can be researched.
220 Fixed 8718127 but ran out of time before getting a whole batch of five.
220 Thanks for the update, Brian—I was hoping it was just a script.
Feb 14, 2012 04:32PM

220 I believe that was the example given: two books republished as an omnibus, but still numbered separately. But to give another more concrete example (since for the life of me I cannot find the thread I just read this in today), http://www.goodreads.com/book/show/21... numbers each chapter starting at 1. Annoying, even to me, but true, and at 1110 pages simply using the last Arabic number on the last page would be wildly inaccurate, like by an order of magnitude.
Feb 14, 2012 03:56PM

220 But there was just a recent comment (from Rivka or Vicky, I think) that in a book that is numbered twice (e.g., 1-311 and 1-415) we add them up and mark the total (e.g., 726). What's the difference? I have books where more pages have Roman numerals than Arabic. In an average novel it doesn't make much difference, but outside the realm of popular fiction this can make a huge difference.

I note, in case another librarian comes along, that contrary to what I had thought, the Librarian Manual does say:
The number of pages should include the relevant text of the book -- glossaries, appendices, author's notes, etc should be included in the page count if they are labeled with regular numbers, but not if they are labeled by Roman Numerals.
I get that we wouldn't want to insist on consistency, and as noted in many cases it makes little difference, but in many other cases the size of the book is markedly impacted by this exclusion, and in terms of user statistics and progress updates it would leave out about 20% of my own reading, as a ballpark figure.
220 The most recent was 4 hours ago:
http://www.goodreads.com/book/show/23...

Most of them appear to be non-English editions, and the bulk show the edits made as:

url: '' to '' (undo)
author_role: '' to '' (undo)


There was a significant run of them between Feb 12, 2012 10:32pm and Feb 13, 2012 07:54am when I wasn't even online. http://www.goodreads.com/book/show/66... which I am pretty sure I eventered entirely by hand once upon a time, shows a bunch of updates from Ingram, but most of the fields show "Chris Feldman" as the source. ("Dri" also incorrectedly changed original_title: 'The Golden Bough' to 'The Golden Bough - Illustrated' and I'm going to fix that as soon as I save this reply.)

There also was a run on Jan 24, but I presume that was the "save Amazon" import. I still intend to go back through all of those to see what it goofed.
Feb 14, 2012 03:15PM

220 One thing to keep in mind when the page number counts seem high: we count the introductory pages in the total page count. So a book that starts with pages i through x and ends on page 350 actually has 360 pages counting the 10 (Roman numeral 'x') introductory pages.

(Or at least that's what we were doing when I started as a librarian; I haven't checked for updates to this recently.)

This can be particularly significant, as some works have over 100 Roman-numeraled pages.
220 Are any other librarians seeing edits in their streams that they didn't actively make? I'm seeing a lot in mine which look rather like the batch of re-edits that were done during the Amazon conversion, but I thought that was all done. Some of them look like they're associated with imports: a few for which I looked at the librarian change log showed all fields coming from somewhere else aside from one or two from me.

I looked for a post from Otis or an employee to see if there was some process executing which I should be aware of, but I didn't see anything, so I figured I'd better just ask what's up. Maybe it's a bug, or maybe it's just a side-effect of a process we weren't warned about, but which is the case would be nice to know.
Feb 14, 2012 02:50PM

220 OK, got it. Thanks for the clarification. (Nothing worse that making edits one thinks are right, only to have someone else change it back, only to find one has been doing it wrong all along.)