Apache Spark is an open-source big-data processing framework built around speed, ease of use, and sophisticated analytics. Spark has several advantages compared to other big-data and MapReduce technologies like Hadoop and Storm. It provides a comprehensive, unified framework with which to manage big-data processing requirements for datasets that are diverse in nature (text data, graph data, etc.) and that come from a variety of sources (batch versus real-time streaming data). Spark enables applications in HDFS clusters to run up to a hundred times faster in memory and ten times faster even when running on disk. In this mini-book, the reader will learn about the Apache Spark framework and will develop Spark programs for use cases in big-data analysis. The book covers all the libraries that are part of Spark ecosystem, which includes Spark Core, Spark SQL, Spark Streaming, Spark MLlib, and Spark GraphX.
This mini-book provides good and accelerated introduction to big-data processing with Apache Spark. Apache spark is an open-source big-data processing framework. Author explains different part of spark stack and their usage. Book includes good samples about how to use spark in different topics.