Software Engineering discussion
Making Sense of NoSQL
>
Ch 5: Native XML Databases
date
newest »
newest »
I should add that although the MarkLogic database has performed very well with almost 99.999 uptime, the staff that built the "middle layer" were Java developers and had no prior experience with NoSQL. This layer was the source of many of the delays.The chapter on "Big Data" has additional discussions about how XML databases can be designed to scale in ways that RDBMS systems can not. The trick is to separate the query logic into two components, much like MapReduce separates queries. In the case of MarkLogic they have "query servers" that distribute documents to "data servers". Each data server is responsible for running the query on its local data. The results and then returned to the query server that "reduces" the results into a single result.
One of the big challenges for all distributed systems is to distribute statistics over a cluster to allow any query to optimize queries based on frequency counts. A hard problem that many NoSQL vendors don't support yet. This is some of the work that is going on in the Apache Drill project:
https://en.wikipedia.org/wiki/Apache_...


I would like to know more about how vendors get XML databases to scale and perform well, especially the efficiency of incremental updates with so many indexes on tags at play.