Get started with Apache Flink, the open source framework that powers some of the world’s largest stream processing applications. With this practical book, you’ll explore the fundamental concepts of parallel stream processing and discover how this technology differs from traditional batch data processing. Longtime Apache Flink committers Fabian Hueske and Vasia Kalavri show you how to implement scalable streaming applications with Flink’s DataStream API and continuously run and maintain these applications in operational environments. Stream processing is ideal for many use cases, including low-latency ETL, streaming analytics, and real-time dashboards as well as fraud detection, anomaly detection, and alerting. You can process continuous data of any kind, including user interactions, financial transactions, and IoT data, as soon as you generate them.
Probably the best book available on the subject, which is a bit unfortunate. It wasn't awful or anything, but I found myself frequently stopping, reading the docs, and/or Googling in frustration because explanations and warnings simply prompted further questions that seemed obvious to me, but were not adequately explored. Not the easiest ecosystem to break into, admittedly, but I would still say this is probably the best organized intro available.
In particular, one of my largest criticisms (applies to the JVM ecosystem as a whole *way* more than most for some reason) is the amount of (IMO) unreasonable assumptions made. Terms are frequently thrown around without proper treatment. I actually got through the entire book without feeling like I had a complete understanding of what an operator was. The Flink glossary wasn't all that much more helpful, but at least had something. Do yourself a favor and read the Google Dataflow Model paper first and you'll get a *much* more thorough introduction to some of the crucial terms.
Finally, though no fault of the author, this book is old. Flink has evolved significantly since this was written, some of the APIs are deprecated, and some of the other cautions are either inaccurate or difficult to verify (I still can't figure out whether the limitation re: parallelism settings and savepoints is still valid... the book claims it was written for Flink 1.7, but the only limitations I can find in the official docs reference version 1.2).
A very comprehensive book on the ins-and-outs of Flink and I read it cover to cover. I have found myself flipping through it as a reference on many occasions when I am curious about some specific implementation detail. I give it 4 stars only because several important releases have been made since its publication in 2019 and it is dated as some of the most important new features are not included.
Just a documentation collected into the book, perhaps partially outdated now. The best part of the book is about general streaming challenges and trade-offs, watermarks, sources/sink design, state management. I found this book interesting even if you develop streaming pipelines using different frameworks (beam, kafka streams) just to compare the APIs, capabilities and limitations.
I was really surprised that there is no single page about automated tests. When I evaluate a new framework excellent support for automated tests is a must. I don’t understand why so important aspect was totally ignored.
- An approachable and practical introduction with nice examples throughout the book! - It first presents the overall architecture and then we cover datastreaming api in the later chapters with each one focussing on one aspect. - I enjoyed the chapter on integrating flink with other systems like kafka, cassandra etc. and how event guarantees are effected depending on the source and sink. This gives an overall idea of how such systems are deployed especially for the beginners who might not have holistic picture of distributed systems. - One thing that could have made this book awesome - a smallish hands-on project in the end covering many of the concepts presented throughout. I think this in itself is quite a big task and perhaps deserves its own book.