Back to Courses

Apache Spark Developer

Training by Agilitics

Inquire Now

On-Site / Training

Inquire to Request Schedule

Agilitics Pte Ltd Noida G Block Sector 63, Noida, Uttar Pradesh, India 201301

INR 20,000.00

Inquire Book Now

RESPONSE RATE

Details
About the Provider

Book Now

Inquire

Details

Overview

This four day course of Spark Developer is for data engineers, analysts, architects; software engineers; IT operations; and technical managers interested in a thorough, hands-on overview of Apache Spark.

The course covers the core APIs for using Spark, fundamental mechanisms and basic internals of the framework, SQL and other high-level data access tools, as well as Spark’s streaming capabilities and machine learning APIs.

Objectives

After taking this class you will be able to:

Describe Spark’s fundamentalmechanics
Use the core Spark APIs to operate ondata
Articulate and implement typical use cases forSpark
Build data pipelines with SparkSQL andDataFrames
Analyze Spark jobs using the UIs and logs Create Streaming and Machine Learning jobs

Pre requisitie :

Required

◦ Basic to intermediate Linux knowledge, including: Theabilitytouseatexteditor,suchasvi

Familiarity with basic command-line options such a mv, cp, ssh, grep, cd, useradd

◦ Knowledge of application development principles

• Recommended

◦ Knowledge of functionalprogramming

◦ Knowledge of Scala orPython

◦ Beginner fluency withSQL

Course Overview

Lesson 1 – Introduction to Apache Spark (Day 1)

Describe the features of ApacheSpark
Advantages ofSpark
How Spark fits in with the Big Data applicationstack
How Spark fits in withHadoop
Define Apache Sparkcomponents

Lesson 2 – Load and Inspect Data in Apache Spark

Describe different ways of getting data intoSpark
Create and use Resilient Distributed Datasets(RDDs)
Apply transformation toRDDs
Use actions onRDDs
Load and inspect data inRDD
Cache intermediateRDDs
Use Spark DataFrames for simplequeries
Load and inspect data inDataFrames

Lesson 3 – Build a Simple Apache Spark Application(Day 2)

Define the lifecycle of a Sparkprogram
Define the function ofSparkContext
Create theapplication
Define different ways to run a Sparkapplication
Run your Sparkapplication
Launch theapplication

Lesson 4 – Work with PairRDD (Day 2)

Review loading and exploring data inRDD
Load and explore data inRDD
Describe and create PairRDD
Create and explorePairRDD
Control partitioning acrossnodes

Lesson 5 – Work with DataFrames (Day 3)

CreateDataFrames

◦ From existingRDD

◦ From datasources

Work with data inDataFrames

◦ Use DataFrameoperations

◦ UseSQL

◦ Explore data inDataFrames

Create user-defined functions(UDF)

◦ UDF used with ScalaDSL

◦ UDF used withSQL

◦ Create and use user-definedfunctions

RepartitionDataFrames
Supplemental Lab: Build a standaloneapplication

Lesson 6 – Monitor Apache Spark Applications (Day 3)

Describe components of the Spark executionmodel
Use Spark Web UI to monitor Sparkapplications
Debug and tune Sparkapplications
Use the Spark WebUI

Lesson 7 – Introduction to Apache Spark Data Pipelines (Day 3)

Identify components of Apache Spark UnifiedStack
List benefits of Apache Spark over Hadoopecosystem
Describe data pipeline usecases

Lesson 8 – Create an Apache Spark Streaming Application(Day 4)

Describe Spark Streamingarchitecture
Create DStreams and a Spark Streamingapplication
BuildandrunaStreamingapplicationwhichwritestoHBase
Apply operations onDStream
Define windowoperations

◦ Build and run a Streaming application withSQL

◦ BuildandrunaStreamingapplicationwithWindowsandSQL

Describe how Streaming applications arefault-tolerant

Lesson 9 – Use Apache Spark GraphX (Day 4)

DescribeGraphX
Define regular, directed, and propertygraphs
Create a propertygraph
Perform operations ongraphs
Create a propertygraph
Apply graphoperations

Lesson 10 – Use Apache Spark MLlib (Day 4)

Describe SparkMLlib
Describe the Machine Learningtechniques

◦ Classification

◦ Clustering

◦ Collaborativefiltering

Use collaborative filtering to predict userchoice
Load and inspect data using the Sparkshell

Reviews

Be the first to write a review about this course.

Write a Review

Agilitics Pte. Ltd. is a reknowned Big Data Analytics firm headquartered in Singapore with operations in mulitple countries. They are expert of big data and belive and spreading the knowledging for betterment of the Big Data community and generating bigger and better talent pool for industry.

View Complete Profile

IT Service Management