Rate this book

Data Algorithms: Recipes for Scaling Up with Hadoop and Spark

Name: Data Algorithms: Recipes for Scaling Up with Hadoop and Spark
Rating: 3.52 (2 reviews)
ISBN: 9781491906132

Mahmoud Parsian

Rate this book

If you are ready to dive into the MapReduce framework for processing large datasets, this practical book takes you step by step through the algorithms and tools you need to build distributed MapReduce applications with Apache Hadoop or Apache Spark. Each chapter provides a recipe for solving a massive computational problem, such as building a recommendation system. You'll learn how to implement the appropriate MapReduce solution with code that you can use in your projects.Dr. Mahmoud Parsian covers basic design patterns, optimization techniques, and data mining and machine learning solutions for problems in bioinformatics, genomics, statistics, and social network analysis. This book also includes an overview of MapReduce, Hadoop, and Spark.Topics basket analysis for a large set of transactionsData mining algorithms (K-means, KNN, and Naive Bayes)Using huge genomic data to sequence DNA and RNANaive Bayes theorem and Markov chains for data and market predictionRecommendation algorithms and pairwise document similarityLinear regression, Cox regression, and Pearson correlationAllelic frequency and mining DNASocial network analysis (recommendation systems, counting triangles, sentiment analysis)

GenresComputersComputer Science

778 pages, Kindle Edition

First published August 1, 2014

11 people are currently reading

82 people want to read

About the author

Mahmoud Parsian

7 books1 follower

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

5 (18%)

4 stars

10 (37%)

3 stars

8 (29%)

2 stars

2 (7%)

1 star

2 (7%)

Displaying 1 - 2 of 2 reviews

Xianshun Chen

90 reviews4 followers

February 3, 2021

Very nice book which teaches how to implement mechine learning and data mining techniques such as NBC, recommender, clustering, etc. Implemented in java, the book provides codes in both hadoop mapreduce and apache spark in simple-to-understand and clean manner. Have re-coded most of the algorithms in the book except for chapters dealing with some of the bio stuff which i am not particularly interested at the moment.