Goodreads Developers discussion

119 views
Broken XML in some responses

Comments Showing 1-6 of 6 (6 new)    post a comment »
dateUp arrow    newest »

message 1: by Alexander (new)

Alexander Blom | 4 comments The XML for some books is broken (maybe I'm doing something wrong?).

For example http://www.goodreads.com/book/isbn?is... gives this error:

"org.apache.harmony.xml.ExpatParser$ParseException: At line 270, column 210: not well-formed (invalid token)"

Opening the feed in Chrome gets chrome to spit out this:

"error on line 275 at column 20: Encoding error"

This corresponds to this line:

<body><![CDATA[Medical examiner Maura Isles and Boston homicide detective Jane Rizzoli are back.
><br/>

The extra > before the <br/> seems to be what's causing it. Any chance you can fix this?

Here are some more books that contains the extra >

http://www.goodreads.com/book/isbn?is...
http://www.goodreads.com/book/isbn?is...


message 2: by Casper (new)

Casper Gasper (caspergasper) | 32 comments I don't see an issue with the first 2 (maybe they've been fixed already), but the third is an invalid UTF-8 character in one of the reviews --

xmllint --noout "http://www.goodreads.com/book/isbn?is..."
http://www.goodreads.com/book/isbn?is... parser error : CData section not finished
Wein
^
http://www.goodreads.com/book/isbn?is... parser error : PCDATA invalid Char value 25

I've had to work around this problem too for descriptions -- it would be nice if only UTF-8 characters came from the XML feeds.

Casper.


message 3: by Alexander (last edited Jul 19, 2010 02:08AM) (new)

Alexander Blom | 4 comments Casper wrote: "I don't see an issue with the first 2 (maybe they've been fixed already), but the third is an invalid UTF-8 character in one of the reviews --

xmllint --noout "http://www.goodreads.com/book/isbn?i..."


What are you using for parsing the response? On Android that is.


message 4: by Casper (new)

Casper Gasper (caspergasper) | 32 comments I'm just using the standard SAX parser, with the XML forced into UTF-8 format:

InputStreamReader reader = new InputStreamReader(is, "UTF-8");
InputSource source = new InputSource(reader);
parser.parse(source, handler);

Casper.


message 5: by Alexander (new)

Alexander Blom | 4 comments Casper wrote: "I'm just using the standard SAX parser, with the XML forced into UTF-8 format:

InputStreamReader reader = new InputStreamReader(is, "UTF-8");
InputSource source = new InputSource(reader);
parser.p..."


Aha, I never forced UTF-8. Will try that when I get home.


message 6: by Alexander (new)

Alexander Blom | 4 comments That totally worked. Thanks!


back to top