Rate this book

Spark in Action: With Examples in Java, Python and Scala

Name: Spark in Action: With Examples in Java, Python and Scala
Rating: 3.92 (8 reviews)
ISBN: 9781617295522

Jean-Georges Perrin

Rate this book

Summary
The Spark distributed data processing platform provides an easy-to-implement tool for ingesting, streaming, and processing data from any source. In Spark in Action, Second Edition, you’ll learn to take advantage of Spark’s core features and incredible processing speed, with applications including real-time computation, delayed evaluation, and machine learning. Spark skills are a hot commodity in enterprises worldwide, and with Spark’s powerful and flexible Java APIs, you can reap all the benefits without first learning Scala or Hadoop.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the technology
Analyzing enterprise data starts by reading, filtering, and merging files and streams from many sources. The Spark data processing engine handles this varied volume like a champ, delivering speeds 100 times faster than Hadoop systems. Thanks to SQL support, an intuitive interface, and a straightforward multilanguage API, you can use Spark without learning a complex new ecosystem.

About the book
Spark in Action, Second Edition, teaches you to create end-to-end analytics applications. In this entirely new book, you’ll learn from interesting Java-based examples, including a complete data pipeline for processing NASA satellite data. And you’ll discover Java, Python, and Scala code samples hosted on GitHub that you can explore and adapt, plus appendixes that give you a cheat sheet for installing tools and understanding Spark-specific terms.

What's inside

    Writing Spark applications in Java
    Spark application architecture
    Ingestion through files, databases, streaming, and Elasticsearch
    Querying distributed datasets with Spark SQL

About the reader
This book does not assume previous experience with Spark, Scala, or Hadoop.

About the author
Jean-Georges Perrin is an experienced data and software architect. He is France’s first IBM Champion and has been honored for 12 consecutive years.

Table of Contents

PART 1 - THE THEORY CRIPPLED BY AWESOME EXAMPLES

1 So, what is Spark, anyway?

2 Architecture and flow

3 The majestic role of the dataframe

4 Fundamentally lazy

5 Building a simple app for deployment

6 Deploying your simple app

PART 2 - INGESTION

7 Ingestion from files

8 Ingestion from databases

9 Advanced ingestion: finding data sources and building

your own

10 Ingestion through structured streaming

PART 3 - TRANSFORMING YOUR DATA

11 Working with SQL

12 Transforming your data

13 Transforming entire documents

14 Extending transformations with user-defined functions

15 Aggregating your data

PART 4 - GOING FURTHER

16 Cache and checkpoint: Enhancing Spark’s performances

17 Exporting data and building full data pipelines

18 Exploring deployment

GenresTechnology

576 pages, Paperback

Published June 2, 2020

5 people are currently reading

31 people want to read

About the author

Jean-Georges Perrin

19 books

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

7 (28%)

4 stars

11 (44%)

3 stars

5 (20%)

2 stars

2 (8%)

1 star

0 (0%)

Displaying 1 - 8 of 8 reviews

Yanda Huang

7 reviews1 follower

March 30, 2021

I'm not wasting my time reading how to call API. That I can do myself by reading the API doc. What I would like to see is the design tradeoff, the comparisons between different deployment options, and real in depth advice. I can probably write a book that covers more than this one.

tech

Daniel Fernandez

1 review

August 1, 2020

One of the biggest premises for using Spark was having to learn Scala. One of the key aspects of this book is that it allows you to get started with just prior knowledge of Java. For working professionals, this means that you can now easily transition to using Spark for companies it means you have a wider talent pool that can be trained with resources like these. The book has a lot of examples and it's clearly written. Definitely recommend it.

Yu Liu

3 reviews

February 17, 2021

Pros: a good start for beginners, covers basic usage paradigms.
Cons: very lengthy and wordy. the content probably could be compressed to 30 kata code scenarios. Also the book mostly just talks about basic api usage. It did provide a lot of external links however it would be good if the book can summarize some content on advanced topics.

Maxim

33 reviews1 follower

April 25, 2021

This is an exceptionally good looking and pretty decorated hello world.

The author loves to ramble, telling you jokes, short stories, and some redundant information.

The book is well suited for a student/junior developer.

Nevertheless, the book is so colourful and well-edited that I cannot rate it less than 4/5.

data-processing

Frank Palardy

Author 3 books6 followers

October 4, 2020

This is not related to the first edition. They avoid Scala, which did make the first edition harder. But spark is written in scala and scala not as different from java as it used to be. This edition tries to be easy but he covers basic issues so you don't get as much spark.

java-jvm

user18081998

4 reviews1 follower

December 20, 2021

I've only read the first part and it looks like a good starting point especially if you have 0 prior knowledge on Spark.
Takes a hands-on approach, you deal with real life datasets. Examples and labs are in Java with Scala and Python versions in the GitHub repository.

academic practical

Jin Shusong

78 reviews1 follower

January 6, 2023

Reading sample codes is more helpful.

Denis Nuțiu

62 reviews5 followers

March 14, 2023

It's tailored for beginners and it doesn't really teach you advanced stuff or how to build an application from 0 to X to do something valuable. It teaches you Spark API which can also be learned when reading the documentation.

Displaying 1 - 8 of 8 reviews