Perfect. I wish seeing more frequent updates and more extended commentary.
* JSON is good for sparse data but not perfect for general hierarchical data so RDBMS will subsume it as a data type. * SQL could be cleaned but there was no time for it so COBOL for 2020. SQL won against natural language. * ODBC isn't the best interface to embed into programmign languages and to run queries: open db, open cursor, bind query, run fetches, etc. Looking at Linq. * PostgreSQL open source educated many that are influential in new systems. * Query planning (best-effort): cost estimation (catalog), equivalence, cost-based search (DP). Concurrency control: serializable but generally more than enough so finer grained locks (default today). There is no best, totally depends on workload. * Any performance test without crossover point is uninteresting (at worst trade-off for unlimited resources assumption). Resolving conflicts via blocking might make more sense since every system is limited by nature. * Durability: ARIES (no force to write dirty pages at commit time, can flush dirty pages at any time) * Distribution: brings its own set of problems. 2PC (atomic commit): presumed commit or abort. Consensus(Paxos, Raft, etc.) is generally used for replication where master executes transactions by itself and elect master on failure. * New architectures: column-store, main-memory systems (w/ concurrency control and recovery), semi-structured data (JSON) and dataflow (Hadoop, Spark, Naiad, etc.) * Dataflow: started with map-reduce (only 2 stage). Nowadays, higher level query language(SQL), general graph (not only 2 stage) and indexing (leverage of structured parts) are supported (Spark, Flink, etc.). Influential points are schema, interface and architecture flexibility. * Non-serializable isolation is active by default and solutions for it are difficult to use. Clear research interest of weak isolation is to find simpler ways to maintain semantics and to keep programmability easeness as in serializability. * Rethinking query optimizer due to streaming, errors in estimation, data outside of RDBMS, user-defined aggregates, etc. Extract optimizer from execution and then plan generates a data flow which is executed by execution later. There are two optimizations inter operator (due to blocking in the nature of operator such as hash join) and intra operator (feedback from execution to self adapt plan). * OLAP: Sample (online or materialized - BlinkDB; countmin, hyperloglog, bloom filters, etc.), precomputation (all or critical subset since it's lattice, others can be generated) and online aggregation (feedback to user and to stop when satisfied).
A relatively quick read of a collection of commentaries about foundational and state of the art papers on database design and problems. A mostly easy read for application programmers like my self. Those interested in low level details can optionally go to the mentioned sources.
Un libro que, creo, se sigue usando en el MIT (desde 1988) para la lectura sobre cuestiones relacionadas a las Bases de Datos.
Para comprender sobre la arquitectura basica de las bases de datos me sirvio. Especialmente las unidades 1: Data Models and DBMS Architecture, 2: Query Processing 4: Transaction Management.
La parte de Data Warehousing no fue tan interesante como me parecia. Se le puede sacar mayor provecho si se lo complementa con las clases de por ejemplo: https://archive.org/details/UCBerkele...
A fast and eye-opening read. It's a brief booklet with around 50 pages. About 11 commentary articles about selected important database papers on major database cutting edge areas. It is for database professionals not for newbie like me who has very limited knowledge about database except the few ones everyone uses. So I don't understand most of it. 🤨 But it did add lots of new terms to me conceptually.
The standard text for "introduction to database systems and theory" classes in an ICS environment. Even though it presumes a lot more technical background than many IS/LIS students have, that too is useful as it provides a great primary text for syntagmatic analysis. "What Goes Around Comes Around" and "Anatomy of a Database System" should be required reading in any Information Studies course focused on databases.