Jump to ratings and reviews
Rate this book

Data Algorithms with Spark: Recipes and Design Patterns for Scaling Up using PySpark

Rate this book
Apache Spark's speed, ease of use, sophisticated analytics, and multilanguage support makes practical knowledge of this cluster-computing framework a required skill for data engineers and data scientists. With this hands-on guide, anyone looking for an introduction to Spark will learn practical algorithms and examples using PySpark. In each chapter, author Mahmoud Parsian shows you how to solve a data problem with a set of Spark transformations and algorithms. You'll learn how to tackle problems involving ETL, design patterns, machine learning algorithms, data partitioning, and genomics analysis. Each detailed recipe includes PySpark algorithms using the PySpark driver and shell script. With this book, you

435 pages, Paperback

Published May 17, 2022

1 person is currently reading
20 people want to read

About the author

Mahmoud Parsian

7 books1 follower

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
0 (0%)
4 stars
3 (33%)
3 stars
5 (55%)
2 stars
1 (11%)
1 star
0 (0%)
Displaying 1 - 2 of 2 reviews
Profile Image for Evren Yortuçboylu.
20 reviews
December 3, 2021
I read only the first 5 chapters, the rest was not much of an interest for me.
The book starts with a really simple and fundamental problem which is nice. The author provides a couple of different solutions and compares them in terms of performance and also investigates how they actually work under the hood.

But that was all. There are no other problems like that. The rest is like teaching you the functionality of spark RDD API. I am a little bit disappointed by this approach. I was expecting the author to provide new (and even more interesting) problems in each chapter.

That way, the book would be 5/5. Now it's just 3/5.
Profile Image for Benjamin Dean.
26 reviews5 followers
December 23, 2022
helpful overview, but a little uneven. some kind of questionable python in places. a little confused in how much math to bring to the table—for example, makes a surprising amount of reference to group and category theory, but without a whole lot of discussion for the uninitiated.

spends more time on RDDs than Dataframes, which is a bit unfortunate.

overall, probably worth skimming if you’re new to the space. may be more useful in tackling specific problems…
Displaying 1 - 2 of 2 reviews

Can't find what you're looking for?

Get help and learn more about the design.