Kindle Notes & Highlights
Read between
February 22 - April 5, 2021
The Priority Queue is similar to the Queue, with the difference that enqueued items must have an assigned priority.
When storing a bunch of items, you sometimes need more flexibility. For instance, you could want to freely reorder the items; or to access, insert and remove items at any position. In these cases, the List is handy.
The Stack or Queue should be preferred when the flexibility of List isn't needed. Using a simpler ADT ensures data is handled in a strict and robust way (FIFO or LIFO). It also makes the code easier to understand: knowing a variable is a Stack helps to see how data flows in and out.
The Sorted List is useful when you need to maintain an always sorted list of items.
The Map (aka Dictionary) is used to store mappings between two objects: a key object and a value object.
The Set represents unordered groups of unique items, like mathematical sets described in Appendix
The Array is the simplest way to store a bunch of items in computer memory.
Each object in an array occupies the same amount of space in memory.
When performance isn't an issue, we can rely on these generic ADT implementations and not worry about data structures. But when performance must be optimal, or when working with a lower level language that doesn't have such features, you must decide which data structures to use.
A tree with the origins of Indo-European languages.
In the Tree terminology, a cell is called a node, and a pointer from one cell to another is called an edge.
Searching for existing algorithms should always be your first move when solving problems.
How do you find a node in a Graph? If the Graph's structure offers no navigation help, you must visit every node in the graph until you find the one you want. To achieve that, there are two approaches: depth-first and breadth-first.
Figure 5.2 Exploring a graph depth-first versus breadth-first.
Searching a graph via Depth First Search (DFS), we keep following edges, going deeper and deeper into the graph.
If going deep in the graph isn't a good approach, you can try Breadth First Search (BFS). It explores the graph level per level: first the neighbors of your start node, then its neighbors' neighbors, and so on.
When you need to explore all the nodes of a graph, it's usually better to stick with DFS for its simple implementation and smaller memory footprint.
Figure 5.3 DFS, courtesy of http://xkcd.com.
For this problem, we're not gonna show an algorithm. You should use what you learned so far and try solving this problem by yourself. You can do so at UVA,3 an online judge that will test your proposed solution.
One famous and very effective way of finding the shortest path is the Dijkstra Algorithm
Before founding Google, Sergey Brin and Larry Page were computer science academics at Stanford University, researching graph algorithms. They modeled the Web as a graph: web pages are nodes, and links between web pages are edges. They figured if a web page receives many links from other important pages, then it must be important as well. They created the PageRank Algorithm after that idea. The algorithm runs in rounds. Each web page in the graph starts with an equal number of "points". After each round, each page distributes its points to the pages it has links to. The process is repeated
...more
During World War II, the British Army needed to make the best strategic decisions to optimize the impact of their operations. They created many analytical tools to figure out the best way to coordinate their military operations. That practice was named operations research. It improved the British early warning radar system, and helped the United Kingdom better manage its manpower and resources. During the war, hundreds of Brits were involved in operations research. After, the new ideas were applied to optimize processes in businesses and industries. Operations research involves defining an
...more
Problems where the objective and constraints can be modeled using linear equations4 are called linear programming problems.
Simplex solves linear programming problems very efficiently. It has been helping industries solve complex problems since the 1960s.
Finding a new problem that hasn't been explored before is rare. When researchers find a new problem, they write a scientific paper about it.
DataBase Management System (DBMS): a special piece of software for managing databases. The DBMS organizes and stores the data. It mediates accesses and changes to the database.
we can often extract valuable information from data. That's called data mining.
For instance, a big grocery chain analyzed its product-transaction data, and found that its top-spending customers often buy a type of cheese ranked below 200 in sales. Normally, they discontinued products with low sales. Data mining inspired managers not only to keep that cheese product, but to put it in more visible spots. That pleased their best customers, and made them come back even more.
The emergence of the relational model in the late 1960s was a huge leap for information management. Relational databases make it easy to avoid duplicate information and data inconsistencies. The majority of database systems used today are relational.
The organization of a database table is given by its fields and the restrictions they enforce. This combination of fields and restrictions is called the table's schema.
We use ID values to refer to a specific row within a table.
The ID field of a table is also known as its primary key. A field that records references to other rows' IDs is called a foreign key.
When a database is organized in a way that it is completely free of duplicate information, we say that the database is normalized. The process of transforming a database with replicated data to one without is called normalization.
As an application grows and new features are added, it's unlikely its database structure (the schema of all its tables) remains the same. When we need to change the database structure, we create a schema migration script.
Without creating schema migrations, your "manual" database changes will be hard to revert to a specific working version. It will be hard to guarantee compatibility between the local databases of different software developers. These problems occur frequently in big software projects with careless database practices.
Almost every relational DBMS works with a query language called SQL.
A database manager must always take into account the product of the number of rows of joined tables. For very large tables, joins become unfeasible. The JOIN is the greatest power and, at the same time, the major weakness of relational databases.
For a table's primary key to be useful, we must be able to quickly retrieve a data entry when given its ID value. To that end, the DBMS builds an auxiliary index, mapping row IDs to their respective addresses in memory. An index is essentially a self-balancing binary search tree (sec. 4.3). Each row in the table corresponds to a node in the tree.
Node keys are the values in the field we index. To find the register with a given value, we search for the value in the tree. Once we find a node, we get the address it stores, and use it to fetch the register. Searching a binary search tree is (log n), so finding registers in large tables is fast.
When inserting a new row, the DBMS must search the entire table to make sure no uniqueness constraint is violated. Finding if a value is present in a field without an index means checking all rows in the table. With an index, we can quickly search if the value we're trying to insert is already present. Indexing fields that have a uniqueness constraint
Many DBMSs might even refuse to fulfill queries asking to sort by a non-indexed field when the query involves too many rows.
When sorting by two fields is required, joint indexes are used.
So indexes are awesome: they allow for super fast querying and instant sorted data access. Then why don't we have indexes for all fields in every table? The problem is when a new register is inserted to or removed from the table, all its indexes must be updated to reflect that. If there are a lot of indexes, updating, inserting or removing rows can become computationally expensive (remember tree balancing). Moreover, indexes occupy disk space, which is not an unlimited resource.
If your queries are wasting too much time sequentially scanning data in a field, add an index for that field and see how that helps.
To adjust a database for higher performance, it's crucial to know which indexes to keep and which to discard. If a database is mostly read and rarely updated, it might make sense to keep more indexes. Poor indexing is a major cause for slowdown in commercial systems. Careless system administrators often won't investigate how common queries are run—they will just index random fields they "feel" will help performance. Don't do this! Use "explain" tools to check your queries and add indexes only when it makes a difference.
A transaction is a list of database operations that must be executed atomically.
The most widely known type of NoSQL database is the document store.
Data in the relational model (top) vs. NoSQL (bottom).
The non-relational model expects us to duplicate information at each relevant place. It's hard to keep duplicated data updated and consistent. In return, by grouping relevant data together, document stores can offer more flexibility: You don't need to join rows, You don't need fixed schemas, Each data entry can have its own configuration of fields. This means there are no "tables" and "rows" in document stores. Instead, a data entry is called a document. Related documents are grouped in a collection.
The key-value store is the simplest form of organized, persistent data storage. It's mainly used for caching.

