Goodreads Librarians Group discussion
note: This topic has been closed to new comments.
Additions to Librarian Manual
>
Added to the Manual: Author Sort fields
date
newest »



First the easy(er) one: given that there are innumerable standards, and non-standard conventions, regarding proper-name alpha sorting across hundreds of languages, and given that this is, fundamentally, a computer program running n English-language servers, might it not be easiest on everyone—especially our over-worked developers who are probably already deeply sorry that they tried to give us what we so frequently requested with full requirements documentation or user stories—to just have the sort-by fields all run off of ASCII/UTF-8 sort order?
At least that would be a single, canonical, discoverable, standardized rule common to computing for over 60 years, and likely easily implemented. Fundamentally, this is probably a question more for the development team than for us volunteers.

ISO8859-1 (aka latin-1, or latin-9, but I doubt the euro character will show up in author names, so the difference is moot) is better, but still wildly out of order for... well everything. Those are charset standards more interested in inclusivity and tend to group "lookalike" characters together, having not really been intended for sort order/collation.
There is the Unicode Collation Algorithm, as well as ISO14651, and the EOR European sort ordering rules, all of which are intended to be tailored somewhat in implementation. Any of those would make a decent internationally agreed standard base to work from though for at least the European languages.
http://www.iso.org/iso/home/store/cat...

With the title sort, it has been easy enough to replace "Volume XI" in Title with "Volume 11" in Sort, and even to replace "Volume 9" with "Volume 09" so that the single-digit volumes don't get scattered among the double-digit ones, like:
Volume 89
Volume 9
Volume 90
Particularly given that Roman numerals, duplicating letters, sort especially badly in name fields—regardless of whether they go "Surname IX, Forename", or "Surname, Forename, IX" as I believe they should—it would seem similarly advantageous to have at least a suggestion in the Manual that Roman numerals be replaced by the corresponding Arabic numbers: "Surname, Forename, 9".
This may seem an esoteric rarity to those who rarely see personal names beyond the fourth generation like Walter Cronkite IV, but once you start getting into monarchs like Louis XIV (duplicated as Louis XIV of France, and Louis XIV Bourbon), Popes, Lamas, and assorted other types of religious and civil nobility, it can start to get a right mess.
Relatedly, I note that Jr. sorts before Sr. alphabetically, and III sorts before them both, which also gets silly.
Sort as written:
Lopez, Anna [lopez, anna]
Lopez, Carlos [lopez, carlos]
Lopez, Carlos X. [lopez, carlos x.]
Lopez III, Carlos [lopez, carlos, iii]
Lopez IX, Carlos [lopez, carlos, ix]
Lopez Jr., Carlos [lopez, carlos, jr.]
Lopez Sr., Carlos [lopez, carlos, sr.]
Lopez V, Carlos [lopez, carlos, v]
Lopez XI, Carlos [lopez, carlos, xi]
Lopez, George
Sort with Arabic numbers:
Lopez, Anna [lopez, anna]
Lopez, Carlos [lopez, carlos]
Lopez, Carlos X. [lopez, carlos x.]
Lopez Sr., Carlos [lopez, carlos, 01]
Lopez Jr., Carlos [lopez, carlos, 02]
Lopez III, Carlos [lopez, carlos, 03]
Lopez V, Carlos [lopez, carlos, 05]
Lopez IX, Carlos [lopez, carlos, 09]
Lopez XI, Carlos [lopez, carlos, 11]
Lopez, George
Finally, I note for everyone who might've overlooked it that the comma before the suffix/number/numeral in the "sort" field is important to distinguish those from middle names/initials.

ETA In other languages, they may not use Sr. and Jr. Portuguese f.e. uses Filho and Neto for Jr. and III. In this case they happen to be alphabetically in the right order, but it becomes much too complicated if we have to sort suffixes according to meaning.
ETA something I thought of after logging off last night: you wouldn't sort Henry VIII before Elizabeth I, would you? And what's the difference? (And how often would one have books by both Sr. and Jr. on their shelves?)

http://www.chicagomanualofstyle.org/q...
http://www.english-for-students.com/L...
http://blog.apastyle.org/apastyle/201...

I'm with you, Elizabeth.
(And I learned an interesting tidbit: 'Charles de Gaulle' is sorted 'de Gaulle, Charles', because Gaulle has only one syllable, contrary to Maupassant, Guy de'.)

Additionally, sorting Jr, Sr, III by "meaning" is again, anglocentric, and would in the first case need additional logic added just to do that, and in the second, would need *mountains* of additional logic to make it non-anglocentric.
Imma stick a bunch of 'why did they DO that' examples in the spoiler, cos I can come up with these all day long, but skip 'em if they bore you.
(view spoiler)
Basically, getting *alphabetization* right is so insanely hard, trying to do anything past that seems slightly insane to me.

Yes, and it makes no sense. If a father John Smith has a son Jacob Smith, John is not going to be sorted before Jacob either. The only sorting should be alphabetical and numerical (with proper numbers, not words meaning 'the elder' and 'the younger').
And thank goodness Swedish kings weren't in the habit some other royalties were of writing books.
ROTFL
Willem van Oranje, since Orange isn't his surname either. I guess lethe knows what to do with him though :)
I got 10/10 for sorting back when I studied LIS. ;)
It's been a while, but I still have the book! It does date back to when it was still all about the card catalogue though. Computer catalogues were not really acknowledged yet in those rules.
Kings and queens should be sorted on first name. Willem van Oranje, Henry VIII, Elizabeth I, and the Roman numerals should be sorted like their Arabic counterparts (8, 1), as Keith already said.
I'm supposing Carl XIV Gustaf was originally named Carl Gustaf and got the numeral when he became king? He should be sorted on Carl. Same with Charles. (My book actually gives the example of Willem IV Alexander van Nassau: should be sorted on Willem.)

That's probably worth noting in the manual, since it's not obvious to those of us who didn't study LIS :)

That's probably worth noting in the manual, since it's not obvious to those of us who didn't study LIS :)"
I'm not sure if GR is going to follow that rule though. They already decided to sort on pope (title) instead of first name as it should be, analogous to the king/queen rule.

I think some discussion about some of this is ongoing, so maybe they'll give one last review to their decisions about sorting.
https://www.goodreads.com/topic/show/...
Thanks for all the feedback, suggestions, documentation of various methodologies, and other contributions to this thread.
After much debate, we have revised the new Manual section. It can be seen here: https://www.goodreads.com/help/show/4...
After much debate, we have revised the new Manual section. It can be seen here: https://www.goodreads.com/help/show/4...

Personally, I don't mind that the suffixes are
Thank you very much for all your hard work!


"
The sorting part is correct. It's the display part that is wrong, but I'll learn to live with it.

The sorting part is correct. It's the display part that is wrong, but I'll learn to live with it."
Yes, I keep confusing the two. Shown, not sorted. (Also, I still think that in the sort field the Roman numbers should be replaced by Arabic numbers. Guess I'll have to live with that :-P )
lethe wrote: "Thank you very much for all your hard work!"
I am passing that along. Glad you like at least some of the changes.
I am passing that along. Glad you like at least some of the changes.

I am passing that along. Glad you like at least some of the changes."
It's more than "at least some", really!

Really ... from the developers to the manual writers and all in between, this has been a very successful effort for everyone.

Under 'Special characters', we are told that special characters should be excluded from the sort by field and entered as follows:
capek, karel (display field: Čapek, Karel).
But under 'Diacritics', it says "The first column displays correct entry in the shelf display field, and the second column displays correct entry in the sort by field."
Č - č
Shouldn't that be c in the sort by field? And doesn't that also count for the ç and the ž?


As all of us here know, the disambiguation hack we've been using for authors with the same name has been to include extra spaces before the surname. In the sort field, we have no standard defined for discarding those spaces, or where to put them if they are to be retained (since they cannot be before the surname, and will be truncated if left trailing in their natural place).
Do we care about sorting within same-named authors, so that John Stanley's horror movie books are not scattered among John Stanley's Little Lulu comics? If we do care (and I'm not suggesting that we have to care), I think it would be worthwhile to decide upon and document a consistent means for doing so. That could look like including the number of spaces—only in the sort field—as a number like "Stanley, John 2", or our other hack of john^^stanley, like "stanley, john^^" in the sort field, or...?
FWIW, if we're going to do it at all, I prefer the numbers rather than the carets, but somebody may have a better idea. Clearly my ideas are not always useful (and thanks to Krazykiwi for reminding me why ASCII sort is a dumb idea even if we don't use mixed case in the sort field).
And if we're not going to do anything about these cases, we might want one line in the manual to say "ignore intervening disambiguation spaces" or something like that, just to make it clear that the issue was considered and settled.
Thanks again!
Keith wrote: "if we're not going to do anything about these cases, we might want one line in the manual to say "ignore intervening disambiguation spaces" or something like that, just to make it clear that the issue was considered and settled."
We have decided it is best not to add that additional layer of complexity. Your suggestion to add a line to the Manual entry is a good one though, and we have done so.
We have decided it is best not to add that additional layer of complexity. Your suggestion to add a line to the Manual entry is a good one though, and we have done so.

Acknowledged; thank you.


Ondřej Mrázek
be sorted
mrazek, ondrej
i.e.,
Ř ř - r
Š š - s
ů - u
etc.?
lethe wrote: "Do we sort the letters not in the list analogous to the ones mentioned? F.e., should
Ondřej Mrázek
be sorted
mrazek, ondrej
i.e.,
Ř ř - r
Š š - s
ů - u
etc.?"
Yes. The list is meant to be examples, not exhaustive.
Ondřej Mrázek
be sorted
mrazek, ondrej
i.e.,
Ř ř - r
Š š - s
ů - u
etc.?"
Yes. The list is meant to be examples, not exhaustive.
This topic has been frozen by the moderator. No new comments can be posted.
Authors mentioned in this topic
John Stanley (other topics)John Stanley (other topics)
Walter Cronkite IV (other topics)
Louis XIV (other topics)
Örnólfur Árnason (other topics)
More...
Accented letters are only used to stress syllables and difference in pronunciation* (and in French, ^ denotes that the vowel used to be followed by an s).
This category also contains the Dutch ë, which just denotes the start of a new syllable:
Adriaan Morriën (mor-ri-en) versus Morrien (mor-rien)
(If anyone could tell me where the ë in Brontë comes from, I'd be much obliged! I've always been bemused by that one.)
(*I'm conveniently ignoring Icelandic sort order here, it's their own fault for being so difficult :P )