Rate this book

Data Algorithms with Spark: Recipes and Design Patterns for Scaling Up using PySpark

Name: Data Algorithms with Spark: Recipes and Design Patterns for Scaling Up using PySpark
Rating: 3.22 (2 reviews)
ISBN: 9781492082385

Mahmoud Parsian

Rate this book

Apache Spark's speed, ease of use, sophisticated analytics, and multilanguage support makes practical knowledge of this cluster-computing framework a required skill for data engineers and data scientists. With this hands-on guide, anyone looking for an introduction to Spark will learn practical algorithms and examples using PySpark. In each chapter, author Mahmoud Parsian shows you how to solve a data problem with a set of Spark transformations and algorithms. You'll learn how to tackle problems involving ETL, design patterns, machine learning algorithms, data partitioning, and genomics analysis. Each detailed recipe includes PySpark algorithms using the PySpark driver and shell script. With this book, you

435 pages, Paperback

Published May 17, 2022

1 person is currently reading

20 people want to read

About the author

Mahmoud Parsian

7 books1 follower

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

0 (0%)

4 stars

3 (33%)

3 stars

5 (55%)

2 stars

1 (11%)

1 star

0 (0%)

Displaying 1 - 2 of 2 reviews

Evren Yortuçboylu

20 reviews

December 3, 2021

I read only the first 5 chapters, the rest was not much of an interest for me.
The book starts with a really simple and fundamental problem which is nice. The author provides a couple of different solutions and compares them in terms of performance and also investigates how they actually work under the hood.

But that was all. There are no other problems like that. The rest is like teaching you the functionality of spark RDD API. I am a little bit disappointed by this approach. I was expecting the author to provide new (and even more interesting) problems in each chapter.

That way, the book would be 5/5. Now it's just 3/5.

Benjamin Dean

26 reviews5 followers

December 23, 2022

helpful overview, but a little uneven. some kind of questionable python in places. a little confused in how much math to bring to the table—for example, makes a surprising amount of reference to group and category theory, but without a whole lot of discussion for the uninitiated.

spends more time on RDDs than Dataframes, which is a bit unfortunate.

overall, probably worth skimming if you’re new to the space. may be more useful in tackling specific problems…

Displaying 1 - 2 of 2 reviews