We've noticed this is not your region.
Redirect me to my region
What do you want to learn today?

Details

Overview

This four day course of Spark Developer is for data engineers,  analysts, architects; software engineers; IT operations; and technical managers interested in a thorough, hands-on overview of Apache Spark.

 

The course covers the core APIs for using Spark, fundamental mechanisms and basic internals of the framework, SQL and other high-level data access tools, as well as Spark’s streaming capabilities and machine learning APIs.

 

Objectives

After taking this class you will be able to:

  • Describe Spark’s fundamentalmechanics
  • Use the core Spark APIs to operate ondata
  • Articulate and implement typical use cases forSpark
  • Build data pipelines with SparkSQL andDataFrames
  • Analyze Spark jobs using the UIs and logs Create Streaming and Machine Learning jobs
Pre requisitie :

 

  • Required

◦                            Basic to intermediate Linux knowledge, including: Theabilitytouseatexteditor,suchasvi

Familiarity with basic command-line options such a mv, cp, ssh, grep, cd, useradd

◦                           Knowledge of application development principles

 

•   Recommended

◦                            Knowledge of functionalprogramming

◦                            Knowledge of Scala orPython

◦                            Beginner fluency withSQL


Course Overview

Lesson 1 – Introduction to Apache Spark (Day 1)

 

  • Describe the features of ApacheSpark
  • Advantages ofSpark
  • How Spark fits in with the Big Data applicationstack
  • How Spark fits in withHadoop
  • Define Apache Sparkcomponents

 

Lesson 2 – Load and Inspect Data in Apache Spark

 

  • Describe different ways of getting data intoSpark
  • Create and use Resilient Distributed Datasets(RDDs)
  • Apply transformation toRDDs
  • Use actions onRDDs
  • Load and inspect data inRDD
  • Cache intermediateRDDs
  • Use Spark DataFrames for simplequeries
  • Load and inspect data inDataFrames

 

Lesson 3 – Build a Simple Apache Spark Application(Day 2)

 

  • Define the lifecycle of a Sparkprogram
  • Define the function ofSparkContext
  • Create theapplication
  • Define different ways to run a Sparkapplication
  • Run your Sparkapplication
  • Launch theapplication
Lesson 4 – Work with PairRDD (Day 2)

 

  • Review loading and exploring data inRDD
  • Load and explore data inRDD
  • Describe and create PairRDD
  • Create and explorePairRDD
  • Control partitioning acrossnodes

Lesson 5 – Work with DataFrames (Day 3)

 

  • CreateDataFrames

◦                            From  existingRDD

◦                            From  datasources

  • Work with data inDataFrames

◦                            Use DataFrameoperations

◦                            UseSQL

◦                            Explore data inDataFrames

  • Create user-defined functions(UDF)

◦                            UDF used with ScalaDSL

◦                            UDF used withSQL

◦                            Create and use user-definedfunctions

  • RepartitionDataFrames
  • Supplemental Lab: Build a standaloneapplication

 

Lesson 6 – Monitor Apache Spark Applications (Day 3)

 

  • Describe components of the Spark executionmodel
  • Use Spark Web UI to monitor Sparkapplications
  • Debug and tune Sparkapplications
  • Use the Spark WebUI

 

Lesson 7 – Introduction to Apache Spark Data Pipelines (Day 3)

 

  • Identify components of Apache Spark UnifiedStack
  • List benefits of Apache Spark over Hadoopecosystem
  • Describe data pipeline usecases

 

Lesson 8 – Create an Apache Spark Streaming Application(Day 4)
  • Describe Spark Streamingarchitecture
  • Create DStreams and a Spark Streamingapplication
  • BuildandrunaStreamingapplicationwhichwritestoHBase
  • Apply operations onDStream
  • Define windowoperations

◦                            Build and run a Streaming application withSQL

◦                            BuildandrunaStreamingapplicationwithWindowsandSQL

  • Describe how Streaming applications arefault-tolerant

 

Lesson 9 – Use Apache Spark GraphX (Day 4)

 

  • DescribeGraphX
  • Define regular, directed, and propertygraphs
  • Create a propertygraph
  • Perform operations ongraphs
  • Create a propertygraph
  • Apply graphoperations

 

Lesson 10 – Use Apache Spark MLlib (Day 4)

 

  • Describe SparkMLlib
  • Describe the Machine Learningtechniques

◦         Classification

◦         Clustering

◦         Collaborativefiltering

  • Use collaborative filtering to predict userchoice
  • Load and inspect data using the Sparkshell
Reviews
Be the first to write a review about this course.
Write a Review
Agilitics Pte. Ltd. is a reknowned Big Data Analytics firm headquartered in Singapore with operations in mulitple countries. They are expert of big data and belive and spreading the knowledging for betterment of the Big Data community and generating bigger and better talent pool for industry.
Sending Message
Please wait...
× × Speedycourse.com uses cookies to deliver our services. By continuing to use the site, you are agreeing to our use of cookies, Privacy Policy, and our Terms & Conditions.