More on this book
Community
Kindle Notes & Highlights
Read between
October 21 - November 26, 2024
If an event is lost, or if an event takes effect twice, the integrity of a data system could be violated.
In many business contexts, it is actually acceptable to temporarily violate a constraint and fix it up later by apologizing.
Dataflow systems can maintain integrity guarantees on derived data without atomic commit, linearizability, or synchronous cross-partition coordination.
Checking the integrity of data is known as auditing.
If you want to be sure that your data is still there, you have to actually read it and check.
If a transaction mutates several objects in a database, it is difficult to tell after the fact what that transaction means.
In the event sourcing approach, user input to the system is represented as a single immutable event, and any resulting state updates are derived from that event.
For any derived state, we can rerun the batch and stream processors that derived it from the event log in order to check whether we get the same result, or even run a redundant derivation in parallel.
If you are not afraid of making changes, you can much better evolve an application to meet changing requirements.
Software development increasingly involves making important ethical choices.
Reasoning about ethics is difficult, but it is too important to ignore.
Decisions made by an algorithm are not necessarily any better or any worse than those made by a human.
In many countries, anti-discrimination laws prohibit treating people differently depending on protected traits such as ethnicity, age, gender, sexuality, disability, or beliefs.
Data and models should be our tools, not our masters.
If a human makes a mistake, they can be held accountable, and the person affected by the decision can appeal.
Much data is statistical in nature, which means that even if the probability distribution on the whole is correct, individual cases may well be wrong.
A blind belief in the supremacy of data for making decisions is not only delusional, it is positively dangerous.
When services become good at predicting what content users want to see, they may end up showing people only opinions they already agree with, leading to echo chambers in which stereotypes, misinformation, and polarization can breed.
The tracking of the user serves not primarily that individual, but rather the needs of the advertisers who are funding the service.
Not all data collection necessarily qualifies as surveillance, but examining it as such can help us understand our relationship with the data collector.
When surveillance is used to determine things that hold sway over important aspects of life, such as insurance coverage or employment, it starts to appear less benign.
Without understanding what happens to their data, users cannot give any meaningful consent.
For a user who does not consent to surveillance, the only real alternative is simply not to use a service.
Privacy settings that allow a user of an online service to control which aspects of their data other users can see are a starting point for handing back some control to users.
Surveillance has always existed, but it used to be expensive and manual, not scalable and automated.
Because the data is valuable, many people want it.
When collecting data, we need to consider not just today’s political environment, but all possible future governments.
Data is the defining feature of the information age.
Undoubtedly the cost of doing business increased when factories were no longer allowed to dump their waste into rivers, sell tainted foods, or exploit workers.
Data is the pollution problem of the information age, and protecting privacy is the environmental challenge.
Companies that collect lots of data about people oppose regulation as being a burden and a hindrance to innovation.
Derived state can be updated by observing changes in the underlying data.
Explaining a joke rarely improves it, but I don’t want anyone to feel left out.