Address Big Data challenges with the fast and scalable features of Spark
About This BookGet to know the fundamentals of Spark 2Learn to process Big Data faster for sharper analyticsUnlock the capabilities of various Spark components to perform efficient data processing, machine learning, and graph processingDive deeper and explore various facets of data science with SparkWho This Book Is ForAre you a technologist who wants to expand knowledge to perform data science operations in Spark, or a data scientist who wants to understand how algorithms are implemented in Spark, or a newbie with minimal development experience and want to learn about Big Data analytics? If yes, then this course is ideal you.What You Will LearnAn introduction to Big Data and data scienceUnderstand Spark and its ecosystem of packages in data scienceConsolidate, clean, and transform your data acquired from various data sourcesUnderstand the Spark machine learning algorithm to build a simple pipelineExplore graphical techniques to see what your data looks likeBuild a recommendation engineIn DetailWhen people want a way to process Big Data at speed, Spark is invariably the solution. With its ease of development (in comparison to the relative complexity of Hadoop), it’s unsurprising that it’s becoming popular with data analysts and engineers everywhere. It is one of the most widely-used large-scale data processing engines and runs extremely fast.
The aim of the course is to make you comfortable and confident to perform real-time data processing using Spark.
Let’s take a look at the learning journey. The course begins with the basics of Spark 2 and covers the core data processing framework and API, installation, and application development setup. Then, you’ll be introduced to the Spark programming model through real-world examples. Next, you’ll learn how to collect, clean, and visualize the data coming from Twitter with Spark streaming. Then, you will get acquainted with Spark machine learning algorithms and different machine learning techniques. You will also learn to apply statistical analysis and mining operations on your dataset. The course will give you ideas on how to perform analysis including graph processing. Finally, we will take up an end-to-end case study and apply all that we have learned so far.
By the end of the course, you should be able to put your learnings into practice for faster, slicker Big Data projects.
Style and approachThis course is a hands-on tutorial that starts with the fundamentals of Spark. After you have a good grip on Spark and have worked with a few examples, you will start to use Spark for all data science-related work. By the end of this course, you will have acquired the skills that will help you become more comprehensive data scientist.
This course is a blend of text, videos, code examples, and assessments, all packaged up keeping your journey in mind. The curator of this course has combined some of the best that Packt has to offer in one complete package. It includes content from the following Packt
Data Science with Spark by Eric CharlesSpark for Data Science by Bikramaditya Singhal and Srinivas Duvvuri Apache Spark 2 for Beginners by Rajanarayanan Thottuvaikkatumana