No products in the cart.
Course Agenda
Module 1: Getting Familiar with Spark
- Apache Spark in Big Data Landscape and purpose of Spark
- Apache Spark vs. Apache MapReduce
- Components of Spark Stack
- Downloading and installing Spark
- Launch Spark
Module 2: Working with Resilient Distributed Dataset (RDD)
- Transformations and Actions in RDD
- Loading and Saving Data in RDD
- Key-Value Pair RDD
- MapReduce and Pair RDD Operations
- Playing with Sequence Files
- Using Partitioner and its impact on performance improvement
Module 3: Spark Application Programming
- Master SparkContext
- Initialize Spark with Java
- Create and Run Real time Project with Spark
- Pass functions to Spark
- Submit Spark applications to the cluster
Module 4: Spark Libraries
Module 5: Spark configuration, monitoring, and tuning
- Understand various components of Spark cluster
- Configure Spark to modify
- Spark properties
- environmental variables
- logging properties
- Visualizing Jobs and DAGs
- Monitor Spark using the web UIs, metrics, and external instrumentation
- Understand performance tuning requirements
Module 6: Spark Streaming
- Understanding the Streaming Architecture – DStreams and RDD batches
- Receivers
- Common transformations and actions on DStreams
Module 7: MLlib and GraphX