group discussion


32 views

topic: bugs > whitespace used in matching for edition combinations





Comments (showing 1-6)    post a comment »
dateUp_arrow    newest »

message 6: by Otis, Chief Architect (new)

1 We do strip whitespace, so not sure how that got in there. Let us know if you find any more.


message 5: by Billy (new)

1301551 I believe that on emusic they solve the problem of duplicate names by giving them a number;

musician (1)
musician (2)

Then you would just have to keep the books assigned to the right author.

I least I think that's how they do it. Or I'm making that up.

bb


message 4: by rivka (last edited Nov 20, 2008 11:04AM) (new)

171430 Leading whitespace generally causes an error when editing an author's name (either on an individual book or on an author's page); I don't know what happens if it's being input by an API.


834216 There's also an issue where GoodReads is purposefully using internal whitespace as a quick-and-dirty way of separating out authors with the same name. Thus John^Smith and John^^Smith are two separate people.

I'm also pretty sure that trailing white space is already automatically trimmed from names, so John^Smith^ ends up being John^Smith. I've never seen leading white space, so I don't know if the situation you describe above could be a general issue or was a simple fluke.


message 2: by Nick (new)

655723 <q>in that unfortunate case, it'd probably be best to force checking a "preserve whitespace" box.</q>

argh, difficult to apply this to automated feeds, though.... :/


message 1: by Nick (new)

655723 Howdy!

I was today attempting to combine two editions of Arthur Miller's A View From the Bridge. I had difficulty doing so, until I realized one edition had as author:

"Arthur Miller"
||
|/
-------/

note the two spaces. I reduced this to "Arthur Miller", and successfully merged editions. God bless monospace fonts.

I advise that whitespace be stripped from input prior to insertion into the books database. Doing whitespace-insensitive comparisons would be silly and prone to error; if there's domain-specific reasons why we're not tokenizing on all whitespace (bell hooks requires all lowercase; is there any author so obnoxious as to demand multiple nullglyphs?) prior to insertion, the whitespace-insensitive comparison would be meaningless anyway -- in that unfortunate case, it'd probably be best to force checking a "preserve whitespace" box.

Apologies if this has been discussed to death before.


back to top

unread topics | mark unread

Books mentioned in this topic

A View from the Bridge (other topics)

Authors mentioned in this topic

Arthur Miller (other topics)