Software Engineering discussion

9 views

Making Sense of NoSQL > Ch 5: Native XML Databases

Comments Showing 1-2 of 2 (2 new) post a comment »

date

newest »

message 1: by [deleted user] (new)

Feb 16, 2014 04:44PM

I don't have much background in XML databases, so this was an eye-opening chapter for me. I was impressed at the level of standardization in this area (XQuery and XPath). Dan told me that healthcare.gov used MarkLogic on the backend, so this is the mother of all case studies to dig into sometime. I wonder what other systems we know about use XML stores.

I would like to know more about how vendors get XML databases to scale and perform well, especially the efficiency of incremental updates with so many indexes on tags at play.

reply | flag

message 2: by Dan (new)

Feb 17, 2014 09:10AM

I should add that although the MarkLogic database has performed very well with almost 99.999 uptime, the staff that built the "middle layer" were Java developers and had no prior experience with NoSQL. This layer was the source of many of the delays.

The chapter on "Big Data" has additional discussions about how XML databases can be designed to scale in ways that RDBMS systems can not. The trick is to separate the query logic into two components, much like MapReduce separates queries. In the case of MarkLogic they have "query servers" that distribute documents to "data servers". Each data server is responsible for running the query on its local data. The results and then returned to the query server that "reduces" the results into a single result.

One of the big challenges for all distributed systems is to distribute statistics over a cluster to allow any query to optimize queries based on frequency counts. A hard problem that many NoSQL vendors don't support yet. This is some of the work that is going on in the Apache Drill project:

https://en.wikipedia.org/wiki/Apache_...

reply | flag