group discussion
topic:
bugs >
whitespace used in matching for edition combinations
Comments
(showing 1-6)
post a comment »
date
newest »
newest »
I believe that on emusic they solve the problem of duplicate names by giving them a number; musician (1)
musician (2)
Then you would just have to keep the books assigned to the right author.
I least I think that's how they do it. Or I'm making that up.
bb
Leading whitespace generally causes an error when editing an author's name (either on an individual book or on an author's page); I don't know what happens if it's being input by an API.
There's also an issue where GoodReads is purposefully using internal whitespace as a quick-and-dirty way of separating out authors with the same name. Thus John^Smith and John^^Smith are two separate people.I'm also pretty sure that trailing white space is already automatically trimmed from names, so John^Smith^ ends up being John^Smith. I've never seen leading white space, so I don't know if the situation you describe above could be a general issue or was a simple fluke.
<q>in that unfortunate case, it'd probably be best to force checking a "preserve whitespace" box.</q>argh, difficult to apply this to automated feeds, though.... :/
Howdy!
I was today attempting to combine two editions of Arthur Miller's A View From the Bridge. I had difficulty doing so, until I realized one edition had as author:
"Arthur Miller"
||
|/
-------/
note the two spaces. I reduced this to "Arthur Miller", and successfully merged editions. God bless monospace fonts.
I advise that whitespace be stripped from input prior to insertion into the books database. Doing whitespace-insensitive comparisons would be silly and prone to error; if there's domain-specific reasons why we're not tokenizing on all whitespace (bell hooks requires all lowercase; is there any author so obnoxious as to demand multiple nullglyphs?) prior to insertion, the whitespace-insensitive comparison would be meaningless anyway -- in that unfortunate case, it'd probably be best to force checking a "preserve whitespace" box.
Apologies if this has been discussed to death before.
unread topics | mark unread



