Software Engineering discussion
Making Sense of NoSQL
>
Ch 5: Native XML Databases
date
newest »


The chapter on "Big Data" has additional discussions about how XML databases can be designed to scale in ways that RDBMS systems can not. The trick is to separate the query logic into two components, much like MapReduce separates queries. In the case of MarkLogic they have "query servers" that distribute documents to "data servers". Each data server is responsible for running the query on its local data. The results and then returned to the query server that "reduces" the results into a single result.
One of the big challenges for all distributed systems is to distribute statistics over a cluster to allow any query to optimize queries based on frequency counts. A hard problem that many NoSQL vendors don't support yet. This is some of the work that is going on in the Apache Drill project:
https://en.wikipedia.org/wiki/Apache_...
I would like to know more about how vendors get XML databases to scale and perform well, especially the efficiency of incremental updates with so many indexes on tags at play.