Jump to ratings and reviews
Rate this book

Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions

Rate this book
Gain hands-on experience in building an efficient and scalable big data architecture on Kubernetes, utilizing leading technologies such as Spark, Airflow, Kafka, and Trino

Key FeaturesLeverage Kubernetes in a cloud environment to integrate seamlessly with variety of toolsExplore best practices for optimizing performance of big data pipelinesBuild end-to-end data pipelines and discover real-world use cases using popular tools like Spark, Airflow, and KafkaPurchase of the print or Kindle book includes a free PDF eBookBook DescriptionIn today's data-driven world, organizations across sectors need scalable and efficient solutions for processing large volumes of data. Kubernetes offers an open-source and cost-effective platform for deploying and managing big data tools and workloads, ensuring optimal resource utilization and minimizing operational overhead. If you want to master the art of building and deploying big data solutions using Kubernetes, then this book is for you.

Written by an experienced data specialist, Big Data on Kubernetes takes you through the entire process of developing scalable and resilient data pipelines, with a focus on practical implementation. Starting with the basics, you’ll progress toward learning how to install Docker and run your first containerized applications. You’ll then explore Kubernetes architecture and understand its core components. This knowledge will pave the way for exploring a variety of essential tools for big data processing such as Apache Spark and Apache Airflow. You’ll also learn how to install and configure these tools on Kubernetes clusters. Throughout the book, you’ll gain hands-on experience building a complete big data stack on Kubernetes.

By the end, you’ll be equipped with the skills and knowledge needed to tackle real-world big data challenges with confidence.

What you will learnInstall and utilize Docker to run containers and build concise imagesGain a deep understanding of Kubernetes architecture and its componentsDeploy and manage Kubernetes clusters on different cloud platformsImplement and manage data pipelines using Apache Spark and Apache AirflowDeploy and configure Apache Kafka for real-time data ingestion and processingBuild and orchestrate a complete big data pipeline using open-source toolsConnect AI and ML platforms with Kubernetes-based data architecturesWho this book is forIf you are a data engineer, BI analyst, data team leader, data architect, or tech manager with a basic understanding of big data technologies, then this book is for you. Familiarity with the basics of Python programming, SQL queries, and YAML is required to understand the topics discussed in this book.

Table of ContentsGetting Started with ContainersKubernetes ArchitectureKubernetes - Hands OnThe Modern Data Stack Big Data processing with Apache SparkApache Airflow for building pipelinesApache Kafka for real time events and data ingestionDeploying the Big Data Stack on KubernetesData consumption layerBuilding a Big Data Pipeline on KubernetesAI/ML Workloads on KubernetesWhere to go from here

466 pages, Kindle Edition

Published July 19, 2024

7 people want to read

About the author

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
0 (0%)
4 stars
0 (0%)
3 stars
1 (100%)
2 stars
0 (0%)
1 star
0 (0%)
Displaying 1 of 1 review
Profile Image for Vladislav Ladenkov.
12 reviews
January 3, 2025
Book is good, it gives you basic outlook at modern stack you can use on kubernetes.
However, it's too basic and a lot of pages were explainng things one can get on the technology quickstart page
Displaying 1 of 1 review

Can't find what you're looking for?

Get help and learn more about the design.