The latest edition of a popular text and reference on database research, with substantial new material and revision; covers classical literature and recent hot topics.
Lessons from database research have been applied in academic fields ranging from bioinformatics to next-generation Internet architecture and in industrial uses including Web-based e-commerce and search engines. The core ideas in the field have become increasingly influential. This text provides both students and professionals with a grounding in database research and a technical context for understanding recent innovations in the field. The readings included treat the most important issues in the database area—the basic material for any DBMS professional. This fourth edition has been substantially updated and revised, with 21 of the 48 papers new to the edition, four of them published for the first time. Many of the sections have been newly organized, and each section includes a new or substantially revised introduction that discusses the context, motivation, and controversies in a particular area, placing it in the broader perspective of database research. Two introductory articles, never before published, provide an organized, current introduction to basic knowledge of the field; one discusses the history of data models and query languages and the other offers an architectural overview of a database system. The remaining articles range from the classical literature on database research to treatments of current hot topics, including a paper on search engine architecture and a paper on application servers, both written expressly for this edition. The result is a collection of papers that are seminal and also accessible to a reader who has a basic familiarity with database systems.
Perfect. I wish seeing more frequent updates and more extended commentary.
* JSON is good for sparse data but not perfect for general hierarchical data so RDBMS will subsume it as a data type. * SQL could be cleaned but there was no time for it so COBOL for 2020. SQL won against natural language. * ODBC isn't the best interface to embed into programmign languages and to run queries: open db, open cursor, bind query, run fetches, etc. Looking at Linq. * PostgreSQL open source educated many that are influential in new systems. * Query planning (best-effort): cost estimation (catalog), equivalence, cost-based search (DP). Concurrency control: serializable but generally more than enough so finer grained locks (default today). There is no best, totally depends on workload. * Any performance test without crossover point is uninteresting (at worst trade-off for unlimited resources assumption). Resolving conflicts via blocking might make more sense since every system is limited by nature. * Durability: ARIES (no force to write dirty pages at commit time, can flush dirty pages at any time) * Distribution: brings its own set of problems. 2PC (atomic commit): presumed commit or abort. Consensus(Paxos, Raft, etc.) is generally used for replication where master executes transactions by itself and elect master on failure. * New architectures: column-store, main-memory systems (w/ concurrency control and recovery), semi-structured data (JSON) and dataflow (Hadoop, Spark, Naiad, etc.) * Dataflow: started with map-reduce (only 2 stage). Nowadays, higher level query language(SQL), general graph (not only 2 stage) and indexing (leverage of structured parts) are supported (Spark, Flink, etc.). Influential points are schema, interface and architecture flexibility. * Non-serializable isolation is active by default and solutions for it are difficult to use. Clear research interest of weak isolation is to find simpler ways to maintain semantics and to keep programmability easeness as in serializability. * Rethinking query optimizer due to streaming, errors in estimation, data outside of RDBMS, user-defined aggregates, etc. Extract optimizer from execution and then plan generates a data flow which is executed by execution later. There are two optimizations inter operator (due to blocking in the nature of operator such as hash join) and intra operator (feedback from execution to self adapt plan). * OLAP: Sample (online or materialized - BlinkDB; countmin, hyperloglog, bloom filters, etc.), precomputation (all or critical subset since it's lattice, others can be generated) and online aggregation (feedback to user and to stop when satisfied).
A relatively quick read of a collection of commentaries about foundational and state of the art papers on database design and problems. A mostly easy read for application programmers like my self. Those interested in low level details can optionally go to the mentioned sources.
Un libro que, creo, se sigue usando en el MIT (desde 1988) para la lectura sobre cuestiones relacionadas a las Bases de Datos.
Para comprender sobre la arquitectura basica de las bases de datos me sirvio. Especialmente las unidades 1: Data Models and DBMS Architecture, 2: Query Processing 4: Transaction Management.
La parte de Data Warehousing no fue tan interesante como me parecia. Se le puede sacar mayor provecho si se lo complementa con las clases de por ejemplo: https://archive.org/details/UCBerkele...
A fast and eye-opening read. It's a brief booklet with around 50 pages. About 11 commentary articles about selected important database papers on major database cutting edge areas. It is for database professionals not for newbie like me who has very limited knowledge about database except the few ones everyone uses. So I don't understand most of it. 🤨 But it did add lots of new terms to me conceptually.
The standard text for "introduction to database systems and theory" classes in an ICS environment. Even though it presumes a lot more technical background than many IS/LIS students have, that too is useful as it provides a great primary text for syntagmatic analysis. "What Goes Around Comes Around" and "Anatomy of a Database System" should be required reading in any Information Studies course focused on databases.