A successful pipeline moves data efficiently, minimizing pauses and blockages between tasks, keeping every process along the way operational. Apache Airflow provides a single customizable environment for building and managing data pipelines, eliminating the need for a hodge-podge collection of tools, snowflake code, and homegrown processes. Using real-world scenarios and examples, Data Pipelines with Apache Airflow teaches you how to simplify and automate data pipelines, reduce operational overhead, and smoothly integrate all the technologies in your stack.
The best book on the topic. What of course doesn't mean it's perfect.
Very good composition, the content is clear, approachable, illustrated with enough practical examples. I was a bit surprised to find out that the actual Airflow architecture is covered near the end of the book, but finally it appeared not to be an issue.
The book covers a proper intro to Airflow, describes its conceptual elements (like task groups), integrations, deployment options (all the major cloud ones), securing, testing, etc.
What would I improve in this book? 1. I'd add a chapter with architecture advice - how to organize idempotent processing pipelines at scale (esp. data and its split - conceptually Airflow doesn't bother with those and some conventions/best practices have to be put in place). 2. Airflow alternatives - where Airflow shines and which scenarios it is not the best equipped to address (e.g. comparison with NiFi or Luigi) 3. AWS scenario covers manual deployment, not a managed scenario.
I used this book to ramp up as a contributor to an existing Airflow implementation. Solid explanations with supporting examples for beginning topic through advanced setups. In fact, I found myself referencing this book more often than the official documentation, which is often scattered and incomplete. By contrast, this book is well organized and useful both as an introductory text and as a reference. Kudos to the authors (and editors, presumably) for managing to strike that difficult balance!
It is a comprehensive guide to mastering the orchestration of data workflows using Apache Airflow. The book starts with foundational concepts and progresses to advanced techniques, covering Airflow's architecture and how its components interact. It includes practical examples and step-by-step tutorials for installation, configuration, and deployment. The author provides real-world scenarios, demonstrating how Airflow can be applied to various data engineering tasks. Key topics include best practices for designing and managing data pipelines, handling dependencies, retries, and error handling. The book also covers extending Airflow with custom operators and sensors, as well as performance optimization and scaling.
This book was precisely what I needed to get up to speed with Airflow quickly. It covers core principles, best practices, testing patterns, productionization considerations, cloud deployment patterns, and much more. Furthermore, this book had some of the best example problems I've seen in a technical text: complex enough to be useful, interesting enough to grab your attention, but scoped small enough to be understood.
Their code had a few strange implementation quirks to prove a point or show a use case, but these were few and far between.
The only book about Airflow out there. It's decent and full of useful information, but covers too much ground and has an outdated cloud part.
After reading it I can certainly write that the main goal has been achieved: I learned about Airflow. The book starts with a good introduction to the basic stuff, followed by more advanced topics like best practices for developing, testing and about the inner workings of Airflow. The style is practical, there's lots of code examples and drawings that make understanding easier.
Unfortunately, the authors wanted to cover too much. That's why some topics are glossed over while others are superfluous, like explaining LDAP. Other topics are mentioned twice, like SLAs. I suspect the two authors did not coordinate enough on some points. The last part about Airflow in the cloud is largely outdated. While for AWS it does mention MWAA, to my surprise it goes on to describe a Fargate deployment and is thus almost useless.
It's a detailed guide, and if you want to get deeper into understanding of Apache Airflow - there's probably no competition to it at all. Sometimes it might be too detailed, so I'd advice readers to safely skip / skim through the topics they're not super-interested in. This should be more your "companion" while you're ramping up with Airflow, rather than trying to learn it from A to Z before writing your first pipeline.
Extensive and practical Airflow guide, which definitely helps to organise one's knowledge of the topic.
The only downside is that code provided with the book is not entirely up-to-date and tested so you could have some problems playing with some exemplary projects.
what a good intro into DE, but before read and understand it, you should a foundation about programing (especially Py), working with file, dict, ... Read it on O'Reilly
Пособие очень толково написано. Форма подачи материала понятна и намного доступнее, чем в мануале самого Airflow. Открыл для себя пару интересных моментов, которые можно использовать в работе.