Back to Courses

Hortonworks Data Platform (HDP) Developer: Java Training

Online Training by Agilitics

Inquire Now

Online / Training

Inquire to Request Schedule

USD 2,800.00

Inquire Book Now

RESPONSE RATE

Details
About the Provider

Book Now

Inquire

Details

Hortonworks Data Platform (HDP) Developer: Java

This course provides Java programmers a deep-dive into Hadoop application development. Students will learn how to design and develop efficient and effective MapReduce applications for Hadoop using the Hortonworks Data Platform, including how to implement combiners, partitioners, secondary sorts, custom input and output formats, joining large datasets, unit testing, and developing UDFs for Pig and Hive. Labs are run on a 7-node HDP 2.1 cluster running in a virtual machine that students can keep for use after the training.

Prerequisites
Students must have experience developing Java applications and using a Java IDE. Labs are completed using the Eclipse IDE and Gradle. No prior Hadoop knowledge is required.

Target Audience

Experienced Java software engineers who need to develop Java MapReduce applications for Hadoop.

Format 50% Lecture/Discussion 50% Hands-0n Labs

Agenda Summary

Day 1: Understanding Hadoop, the Hadoop Distributed File System (HDFS) and Map Reduce
Day 2: Partitioning, Sorting and Input/Output Formats
Day 3: Optimizing MapReduce Jobs, Advanced MapReduce Features and HBase Programming
Day 4: Pig and Hive Programming, Defining Workflows

Day 1 Objectives
• Describe Hadoop 2.X and the Hadoop Distribute File System
• Describe the YARN framework
• Describe the Purpose of NameNodes and Data Nodes
• Describe the Purpose of HDFS High Availability (HA)
• Describe the Purpose of the Quorum Journal Manager
• List Common HDFS Commands
• Describe the Purpose of YARN
• List Open-Source YARN Use Cases
• List the Components of YARN
• Describe the Life Cycle of a YARN Application
• Define Map Aggregation
• Describe the Purpose of Combiners
• Describe the Purpose of In-Map Aggregation
• Describe the Purpose of Counters
• Describe the Purpose of User-Defined Counters

Day 1 Labs And Demontrations
• Demonstration: Understanding Block Storage
• Configuring a Hadoop Development Environment
• Putting Files in HDFS with Java
• Demonstration: Understanding Map Reduce
• Word Count
• Distributed Grep
• Inverted Index
• Using a Combiner
• Computing an Average

Day 2 Objectives
• Describe the Purpose of a Partitioner
• List the Steps for Writing a Custom Partitioner
• Describe How to Create and Distribute a Partition File
• Describe the Purpose of Sorting
• Describe the Purpose of Custom Keys
• Describe How to Write a Group Comparator
• List the Built-In Input Formats
• Describe the Purpose of Input Formats
• Define a Record Reader
• Describe How to Handle Records that Span Splits
• List the Built-In Output Formats
• Describe How to Write a Custom Output Format
• Describe the Purpose of the MultipleOutputs Class

Day 2 Labs And Demontrations
• Writing a Custom Partitioner
• Using TotalOrderPartitioner
• Custom Sorting
• Demonstration: Combining Input Files
• Processing Multiple Inputs
• Writing a Custom Input Format
• Customizing Output
• Working with a Simple Moving Average

Day 3 Objectives
• List Optimization Best Practices
• Describe How to Optimize the Map and Reduce Phases
• Describe the Benefits of Data Compression
• Describe the Limits of Data Compression
• Describe the Configuration of Data Compression
• Describe the Purpose of a RawComparator
• Describe the Purpose of Localization
• List Scenarios for Performing Joins in MapReduce
• Describe the Purpose of the Bloom Filter
• Describe the Purpose of MRUnit and the MRUnit API
• Describe How to Set Up a Test
• Describe How to Test a Mapper
• Describe How to Test a Reducer
• Describe the Purpose of HBase
• Define the Differences Between a Relational Database and HBase
• Describe the HBase Architecture
• Demonstrate the Basics of HBase Programming
• Describe an HBase MapReduce Applications DAY 3 LABS
• Using Data Compression
• Defining a RawComparator
• Performing a Map-Side Join
• Using a Bloom Filter
•Unit Testing a MapReduce Job
• Importing Data to HBase
• Creating an HBase Mapreduce Job

Day 4 Objectives
• Describe the Purpose of Apache Pig and Pig Latin
• Demonstrate the Use of the Grunt Shell
• List the Common Pig Data Types
• Describe the Purpose of the FOREACH GENERATE Operator
• Describe the Purpose of Pig User Defined Functions (UDFs)
• Describe the Purpose of Filter Functions
• Describe the Purpose of Accumulator UDFs
• Describe the Purpose of Algebraic Functions
• Describe the Purpose of Apache Hive
• Describe the Differences Between Apache Hive and SQL
• Describe Apache Hive Architecture
• Describe How to Load Data Into Hive
• Demonstrate How to Perform Queries
• Describe the Purpose of Hive User Defined Functions (UDFs)
• Write a Hive UDF
• Describe the Purpose of HCatalog
• Describe the Purpose of Apache Oozie
• Describe How to Define an Oozie Workflow
• Describe Pig and Hive Actions
• Describe How to Define an Oozie Coordinator Job DAY 4 LABS AND DEMONSTRATIONS
• Demonstration: Understanding Pig
• Writing a Pig UDF
• Writing a Pig Accumulator
• Writing a Apache Hive UDF
• Defining an Oozie Workflow
• Working with TF-IDF and the JobControl Class

Reviews

Be the first to write a review about this course.

Write a Review

Agilitics Pte. Ltd. is a reknowned Big Data Analytics firm headquartered in Singapore with operations in mulitple countries. They are expert of big data and belive and spreading the knowledging for betterment of the Big Data community and generating bigger and better talent pool for industry.

View Complete Profile

Software Development