Goodreads Feedback discussion

82 views
Bugs > German umlauts after data import

Comments (showing 1-16 of 16) (16 new)    post a comment »
dateDown arrow    newest »

message 1: by Martini (last edited Jan 30, 2012 04:11AM) (new)

Martini (shakenorstirred) | 144 comments I have posted this in the librarians group before, but it rather seems to be a topic for this group:

It seems as if during the import of new data a lot of the umlauts ä, ö and ü have been lost and replaced by e. g. "a" with a separate "_̈", which therefore are also displayed separately in book titles and makes them look quite weird. Example here
I have noticed this especially with worldcat data.
Is this a worldcat problem or does Goodreads have difficulties converting these umlauts when importing the data? If this is a goodreads problem, maybe they could do something about it that would save us all the additional work...


message 2: by Renske (new)

Renske | 125 comments Martini wrote: "Is this a worldcat problem or does Goodreads have difficulties converting these umlauts when importing the data?"
It is a worldcat problem, when you copy a name/word manually it gives the same result.


message 3: by [deleted user] (new)

Others have mentioned that the capitalization gets screwed up, too.


message 4: by Martini (new)

Martini (shakenorstirred) | 144 comments @Renske: Thanks for the info. Well, I fear then we'll have to correct them one by one :-(

@Jeannette: I've noticed that, too. Every Single Word Has Been Capitalized (at least in the titles). Even more work...


message 5: by rivka, librarian moderator (new)

rivka | 11965 comments Mod
Martini wrote: "Every Single Word Has Been Capitalized (at least in the titles)."

Is that not standard for book titles in German? It is in English.


message 6: by Martini (last edited Jan 30, 2012 08:49AM) (new)

Martini (shakenorstirred) | 144 comments rivka wrote: "Is that not standard for book titles in German? It is in English."

No, unfortunately it's not. Usually only nouns and proper names are capitalized.
For example: this book should be diplayed as "Die seltsamen Methoden des Dr. Irabu", as can be seen on the site of the German National Library, here.


message 7: by rivka, librarian moderator (new)

rivka | 11965 comments Mod
It may be necessary just to edit the titles. The reverse happens with English titles too, depending on their source. (WorldCat is quite inconsistent about capitalizing titles, for instance.)


message 8: by Martini (new)

Martini (shakenorstirred) | 144 comments Good to know that we are not alone with this problem! ;-)


message 9: by Lobstergirl (new)

Lobstergirl As I mentioned elsewhere, it's all sorts of diacritical marks that, even when they're over the correct letter, are just a little "off," so that a Worldcat author won't match up with the GR author record. Unfortunately librarians have a lot more work ahead of them.

Also Worldcat titles in English don't capitalize....rather annoying.

Not to even mention that Worldcat doesn't standardize author names. Various searches I've been doing on ISBNs might bring up 4 editions of a book, each one with the author slightly different, e.g. Robert Haft, R. Haft, R Haft, R Wilson Haft, etc.


message 10: by Martini (new)

Martini (shakenorstirred) | 144 comments Besides the already mentioned issues with umlauts and capitalization, WorldCat somehow deletes the space between title and subtitle, which makes them look like this (should be "Schloss Gripsholm. Eine Sommergeschichte" instead) and is very irritating.

Is there any possibility that the german data for books is taken from the German National Library instead? These three kinds of mistakes are getting really annoying.


message 11: by [deleted user] (new)

That's a pain! (and a bug!) Because the entry in worldcat sure doesn't look that way when you search for the book.


message 12: by Cecile (new)

Cecile | 28 comments I've noticed the same problem with accented letters in French titles and authors, and also with the capitalization of titles (usually titles aren't capitalized in French).

The accents issue is the worse, it looks really bad (even if it's better than no title at all).


message 13: by BoekenTrol (new)

BoekenTrol | 64 comments Cecile wrote: "I've noticed the same problem with accented letters in French titles and authors, and also with the capitalization of titles (usually titles aren't capitalized in French).

The accents issue is the..."


I noticed the same. In Dutch we don't have that much accents, but titles aren't capitalized (except when there's a name, city, country in it). I decapitalized a few titles now, but I sure hope that the data will stay corrected.
Just in case that librarians look at that book's data, I aded a note, saying how Dutch titles are written.


message 14: by vicki_girl (new)

vicki_girl | 196 comments A related issue for Worldcat data, is the importing of Japanese titles:

http://www.goodreads.com/book/show/17...

The thing that makes it really bizarre (to me) is that the correct Japanese is listed in the Worldcat entry, followed by the romanized title:

http://www.worldcat.org/title/bishojo...

This has me wondering why it's importing the romanized title, when the Japanese one available and listed first?


message 15: by [deleted user] (new)

vicki_girl wrote: "A related issue for Worldcat data, is the importing of Japanese titles:

http://www.goodreads.com/book/show/17...

The thing that makes it really bizarre (to..."


I've just noticed the same thing and it's puuzling me, too.


図書館屋 Sharon the Librarian (toshokanya) | 1 comments I can answer about Worldcat data, since I have spent the last 20+ years putting the Japanese records into Worldcat. The romanized fields are the prime fields and the Japanese ones are linked to it. The display is done to look right, but if you look at the tagging the roman are actually 100/245 fields and the Japanese are in 880 fields.

The diacritics are also programmed to display but are really disambiguated. This is why we can search without using diacritics in OCLC and our local catalogs. But if we put the diacritics in using our normal keyboards into GoodReads, we also need to search that way.

I'll stop here, but I do have questions for those of you working in foreign languages so that I can get my personal library into GR.


back to top

unread topics | mark unread