Jump to ratings and reviews
Rate this book

Learning Real-time Processing with Spark Streaming

Rate this book

Building scalable and fault-tolerant streaming applications made easy with Spark streaming

About This Book Process live data streams more efficiently with better fault recovery using Spark Streaming Implement and deploy real-time log file analysis Learn about integration with Advance Spark Libraries – GraphX, Spark SQL, and MLib. Who This Book Is For

This book is intended for big data developers with basic knowledge of Scala but no knowledge of Spark. It will help you grasp the basics of developing real-time applications with Spark and understand efficient programming of core elements and applications.

What You Will Learn Install and configure Spark and Spark Streaming to execute applications Explore the architecture and components of Spark and Spark Streaming to use it as a base for other libraries Process distributed log files in real-time to load data from distributed sources Apply transformations on streaming data to use its functions Integrate Apache Spark with the various advance libraries like MLib and GraphX Apply production deployment scenarios to deploy your application In Detail

Using practical examples with easy-to-follow steps, this book will teach you how to build real-time applications with Spark Streaming.

Starting with installing and setting the required environment, you will write and execute your first program for Spark Streaming. This will be followed by exploring the architecture and components of Spark Streaming along with an overview of libraries/functions exposed by Spark. Next you will be taught about various client APIs for coding in Spark by using the use-case of distributed log file processing. You will then apply various functions to transform and enrich streaming data. Next you will learn how to cache and persist datasets. Moving on you will integrate Apache Spark with various other libraries/components of Spark like Mlib, GraphX, and Spark SQL. Finally, you will learn about deploying your application and cover the different scenarios ranging from standalone mode to distributed mode using Mesos, Yarn, and private data centers or on cloud infrastructure.

Style and approach

A Step-by-Step approach to learn Spark Streaming in a structured manner, with detailed explanation of basic and advance features in an easy-to-follow Style. Each topic is explained sequentially and supported with real world examples and executable code snippets that appeal to the needs of readers with the wide range of experiences.

202 pages, Kindle Edition

First published September 28, 2015

12 people want to read

About the author

Sumit Gupta

39 books

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
1 (16%)
4 stars
0 (0%)
3 stars
3 (50%)
2 stars
2 (33%)
1 star
0 (0%)
Displaying 1 - 3 of 3 reviews
Profile Image for Alex Ott.
Author 3 books207 followers
January 18, 2016
Not so useful comparing with official documentation & other books.

As usual for Packt books, big part of the book describes the process of installation of different components...
Profile Image for Sujit.
4 reviews1 follower
January 10, 2016
This book provides a very comprehensive treatment of Spark Streaming. It begins with Spark installation and configuration and setting up your IDE (Eclipse) and goes all the way to packaging your Streaming application and deploying (standalone / Yarn / Mesos) and monitoring your application. It uses a Distributed Log Processing example to demonstrate various functionalities as one proceeds through the book. For external input and output, the book provides examples for Flume and Cassandra respectively. It also provides an example of integrating Streaming with GraphX. Examples are provided in both Java and Scala. The book also has quite a bit of theory about Spark to explain the reasoning behind Streaming configuration choices, as well as background to DStreams and windowing.

Overall the book should be a useful guide to anyone building Spark Streaming applications, either in Java or Scala. If you are already familiar with Spark, some of the material may be redundant, but the book is aimed at people who are new to Spark and who want to get into Spark Streaming.

DISCLAIMER: I was one of the reviewers of this book (during publishing). However, I have tried to provide an unbiased review. Hopefully readers of the review will feel the same way.
Profile Image for Greg.
29 reviews3 followers
September 20, 2016
Authors of this book focus more on broadly introducing spark streaming rather than providing deep knowledge. Code samples are quite ok besides the lengthy comments which are being duplicated most of the time. One thing I still can't understand is why the book contains so many information about configuring and starting spark, flume and others in standalone mode. Rather describing that I think it would be better to provide appendix on that and focus only on logic in a local jvm environment - the book was supposed to learn on how to write spark streaming based processes. Another thing I really missed was actually some sort of examples explaining how to deal with statefull processing.
Displaying 1 - 3 of 3 reviews

Can't find what you're looking for?

Get help and learn more about the design.