Emre Can Okten’s Kindle Notes & Highlights for Computer Science Distilled: Learn the Art of Solving Computational Problems (Code is Awesome)

In a graph database, data entries are stored as nodes, and relationships as edges.

This is the most flexible type of database. Letting go of tables and collections, you can store networked data in intuitive ways.

70%

The buzzword Big Data describes data-handling situations that are extremely challenging in terms of Volume, Velocity, or Variety.

71%

Whenever you need a non-standard data management approach because of volume, velocity or variety, you can say it's a "Big Data" application.

71%

SQL vs NoSQL Relational databases are data-centered: they maximize data structuring and eliminate duplication, regardless of how the data will be needed. Non-relational databases are application-centered: they facilitate access and use according to your needs.

71%

Your non-relational database will be powerful, but you will be responsible for updating the duplicated information across documents and collections.

71%

There are several situations in which not one, but several computers must act in coordination to provide a database system:

71%

For these scenarios, there are DBMSs that can run on several coordinated computers, forming a distributed database system.

72%

Sharding If your database receives many write queries for large amounts of data, it's hard to synchronize the database everywhere in the cluster.

72%

A sharding setup with three replicas per shard.

72%

Data Consistency In distributed databases with replication, updates made in one machine don't propagate instantly across all replicas. It takes some time until all machines in the cluster are synchronized. That can damage the consistency of your data.

72%

If your database queries do not strongly enforce data consistency, they are said to work under eventual consistency.

73%

In many cases, working with eventual consistency won't cause problems.

73%

These applications ushered the development of special database systems, known as Geographical Information Systems (GIS). They provide specially designed fields for geographical data: PointField, LineField, PolygonField, and so on.

73%

Many general-use DBMSs provide GIS extensions.

73%

GIS applications are often used in day-to-day life, for instance with GPS navigators like Google Maps or Waze.

73%

How can we store data outside of our database, in a format that is interoperable across different systems? For instance, we might want to backup the data, or export it to an other system. To do this, the data has to go through a process called serialization, where it is transformed according to an encoding format.

73%

SQL is the most common format for serializing relational databases. We write a series of SQL commands that replicate the database and all its details.

73%

XML is another way to represent structured data, but that doesn't depend on the relational model or to a database system implementation.

73%

JSON is the serializing format most the world is converging to. It can represent relational and non-relational data, in an intuitive way to coders.

74%

CSV or Comma Separated Values, is arguably the simplest format for data exchange. Data is stored in textual form, with one data element per line.

74%

Reference

75%

Almost all computers, including our laptops and phones, have the same working principle as the first computing model invented by Von-Neumann in 1945.

75%

A computer is a machine that follows instructions to manipulate data. It has two main components: processor and memory.

75%

Since the memory is an electrical component, we transmit cell addresses through wires as binary numbers.3 Each wire transmits a binary digit. Wires are set at higher voltage for the "one" signal or lower voltage for the "zero" signal.

75%

There are two things the memory can do with a given cell's address: get its value, or store a new value. The memory has a special input wire for setting its operational mode:

75%

The memory can operate in read or write mode.

76%

Usually, each memory cell stores an 8-digit binary number, which is called a byte.

76%

Computer code is essentially a sequence of numbers representing CPU operations.

77%

That's all there is to it. Whether you open a website, play a computer game, or edit a spreadsheet, computations are always the same: a series of simple operations which can only sum, compare, or move data across memory.

77%

People played it in arcade machines equipped with a 2 MHz CPU. That number indicates the CPU's clock: the number of basic operations it executes per second. With a two million hertz (2 MHz) clock, the CPU performs roughly two million basic operations per second.

77%

With modern technological progress, ordinary desktop computers and smartphones typically have 2 GHz CPUs. They can perform hundreds of millions machine instructions every second.

78%

CPU Architectures

78%

32-bit vs. 64-bit Architecture The first CPU, called Intel 4004, was built on a 4-bit architecture. This means it could operate (sum, compare, move) binary numbers of up to 4 digits in a single machine instruction. The 4004 had data and address buses with only four wires each.

78%

Big-Endian vs. Little-Endian

79%

Emulators Sometimes, it's useful to run in your own computer some code that was designed for a different CPU. That way, you can test an iPhone app without an iPhone, or play your favorite vintage Super Nintendo game. For these tasks, there are pieces of software called emulators.

79%

But we rarely write our programs directly as CPU instructions. It would be impossible for a human to write a realistic 3D computer game this way. To express our orders in a more "natural" and compact way, we created programming languages. We write our code in these languages.9 Then, we use a program called a compiler to translate our orders as machine instructions a CPU can run.

79%

The compiler translates complex instructions in a programming language into a equivalent CPU instructions.

80%

Compiled computer programs are essentially sequences of CPU instructions. As we learned, code compiled for a desktop computer won't run on a smartphone, because these machines have CPUs of different architectures. Still, a compiled program may not be usable on two computers that share the same CPU architecture. That's because programs must communicate with the computer's operating system to run.

80%

Besides targeting a specific CPU architecture, compiled code also targets a specific operating system.

81%

Focus on writing clean, self-explanatory code. If you have performance issues, use profiling tools to discover bottlenecks in your code, and try computing these parts in smarter ways.

81%

Some programming languages, called scripting languages, are executed without a direct compilation to machine code. These include JavaScript, Python, and Ruby. Code in these languages works by getting executed not directly by the CPU, but by an interpreter that must be installed in the machine that is running the code. Since the interpreter translates the code to the machine in real time, it usually runs much slower than compiled code. On the other hand, the programmer can always run the code immediately, without waiting through the compilation process. When a project is very big, compiling can ...more

82%

Google engineers had to constantly compile large batches of code. That made coders "lose" (fig. 7.9) a lot of time. Google couldn't switch to scripting languages—they needed the higher performance of the compiled binary. So they developed Go, a language that compiles incredibly fast, but still has a very high performance.

82%

Given a compiled computer program, it's impossible to recover its source code prior to compilation.

82%

Underground hackers often analyze the binary code from licensed programs like Windows, Photoshop, and Grand Theft Auto, in order to determine which part of the code verifies the license. They modify the binary code, placing an instruction to directly JUMP to the part of the code that executes after the license has been validated. When the modified binary is run, it gets to the injected JUMP command before the license is even checked, so people can run these illegal, pirated copies without paying.

82%

The most famous attack of this kind was the Stuxnet, a cyberweapon built by agencies from United States and Israel. It slowed down Iran's nuclear program by infecting computers that controlled underground Iranian fusion reactors.

82%

Without the original source code, even though you can change the binary a little bit to hack it in small ways, it's practically impossible to make any major change to the program, such as adding a new feature. Some people believe that it's much better to build code collaboratively, so they started to make their source code open for other people to change. That's the main concept about open source: software that everyone can use and modify freely. Linux-based operating systems (such as Ubuntu, Fedora, Debian) are open-source, whereas Windows and Mac OS are closed source.

82%

With open-source software, there are more eyes on the code, so it's harder for malicious third parties and government agencies to insert surveillance backdoors. When using Mac OS or Windows, you have to trust that Apple or Microsoft aren't compromising your security and are doing their best to prevent any severe security flaw. Open-source systems are open to public scrutiny, so there are less chances that security flaws slip through unnoticed.

83%

If memory access is slow, the CPU has to sit idle, waiting for the RAM to do its work. The time it takes to read and write data in memory is directly reflected in computer performance.

83%

Recent technological developments increased CPU speeds exponentially. Memory speeds also increased, but at a much slower rate. This performance gap between CPU and RAM is known as the Processor-Memory Gap: