Goodreads Developers discussion

119 views

Broken XML in some responses

Comments Showing 1-6 of 6 (6 new) post a comment »

date

newest »

message 1: by Alexander (new)

Jul 18, 2010 11:41AM

The XML for some books is broken (maybe I'm doing something wrong?).

For example http://www.goodreads.com/book/isbn?is... gives this error:

"org.apache.harmony.xml.ExpatParser$ParseException: At line 270, column 210: not well-formed (invalid token)"

Opening the feed in Chrome gets chrome to spit out this:

"error on line 275 at column 20: Encoding error"

This corresponds to this line:

<body><![CDATA[Medical examiner Maura Isles and Boston homicide detective Jane Rizzoli are back.
><br/>

The extra > before the <br/> seems to be what's causing it. Any chance you can fix this?

Here are some more books that contains the extra >

http://www.goodreads.com/book/isbn?is...
http://www.goodreads.com/book/isbn?is...

reply | flag

message 2: by Casper (new)

Jul 19, 2010 12:47AM

Casper Gasper (caspergasper) | 32 comments

I don't see an issue with the first 2 (maybe they've been fixed already), but the third is an invalid UTF-8 character in one of the reviews --

xmllint --noout "http://www.goodreads.com/book/isbn?is..."
http://www.goodreads.com/book/isbn?is... parser error : CData section not finished
Wein
^
http://www.goodreads.com/book/isbn?is... parser error : PCDATA invalid Char value 25

I've had to work around this problem too for descriptions -- it would be nice if only UTF-8 characters came from the XML feeds.

Casper.

reply | flag

message 3: by Alexander (last edited Jul 19, 2010 02:08AM) (new)

Jul 19, 2010 02:08AM

Casper wrote: "I don't see an issue with the first 2 (maybe they've been fixed already), but the third is an invalid UTF-8 character in one of the reviews --

xmllint --noout "http://www.goodreads.com/book/isbn?i..."

What are you using for parsing the response? On Android that is.

reply | flag

message 4: by Casper (new)

Jul 19, 2010 02:22AM

I'm just using the standard SAX parser, with the XML forced into UTF-8 format:

InputStreamReader reader = new InputStreamReader(is, "UTF-8");
InputSource source = new InputSource(reader);
parser.parse(source, handler);

Casper.

reply | flag

message 5: by Alexander (new)

Jul 19, 2010 02:25AM

Casper wrote: "I'm just using the standard SAX parser, with the XML forced into UTF-8 format:

InputStreamReader reader = new InputStreamReader(is, "UTF-8");
InputSource source = new InputSource(reader);
parser.p..."

Aha, I never forced UTF-8. Will try that when I get home.

reply | flag

message 6: by Alexander (new)

Jul 20, 2010 12:22AM

That totally worked. Thanks!

reply | flag