992 097 7080 / +91 22 66395181
Welcome  Guest   Login  | Register    

Big Data and Hadoop Developer

Total course duration: 5 Days(40 hrs)
Each day has an 8-hour session

Module 1: Hadoop Architecture

What is Big Data, Hadoop Architecture, Hadoop ecosystem components, Hadoop Storage: HDFS, Hadoop Processing: MapReduce Framework, Hadoop Server Roles: NameNode, Secondary NameNode, and DataNode, Anatomy of File Write and Read.

Module 2: Hadoop Cluster Configuration and Data Loading

Hadoop Cluster Architecture, Hadoop Cluster Configuration files, Hadoop Cluster Modes, Multi-Node Hadoop Cluster, A Typical Production Hadoop Cluster, MapReduce Job execution, Common Hadoop Shell commands, Data Loading Techniques: FLUME, SQOOP, Hadoop Copy Commands, Hadoop Project: Data Loading.

Module 3 Hadoop MapReduce framework

Hadoop Data Types, Hadoop MapReduce paradigm, Map and Reduce tasks, MapReduce Execution Framework, Partitioners and Combiners, Input Formats (Input Splits and Records, Text Input, Binary Input, Multiple Inputs), Output Formats (TextOutput, BinaryOutPut, Multiple Output), Hadoop Project: MapReduce Programming.

Module 4: Advance MapReduce

Counters, Custom Writables, Unit Testing: JUnit and MRUnit testing framework, Error Handling, Tuning, Advance MapReduce, Hadoop Project: Advance MapReduce programming and error handling.

Module 5: Pig and Pig Latin

Installing and Running Pig, Grunt, Pig's Data Model, Pig Latin, Developing & Testing Pig Latin Scripts, Writing Evaluation, Filter, Load & Store Functions, Hadoop Project: Pig Scripting.

Module 6: Hive and HiveQL

Topics - Hive Architecture and Installation, Comparison with Traditional Database, HiveQL: Data Types, Operators and Functions, Hive Tables(Managed Tables and External Tables, Partitions and Buckets, Storage Formats, Importing Data, Altering Tables, Dropping Tables), Querying Data (Sorting And Aggregating, Map Reduce Scripts, Joins & Subqueries, Views, Map and Reduce side Joins to optimize Query).

Module 7: Hadoop Project Environment

Some of the data sets on which you may work as a part of the project work:

I. Twitter Data Analysis : Download twitter data and the put it in HBase and use Pig, Hive and MapReduce to garner the popularity of some hashtags

II. Stack Exchange Ranking and Percentile data-set : It is dataset from StackOverFlow, in which there ranking and percentile details of Users

III. Loan Dataset : It deals with the users who has taken along with their Emi details, time period etc.

Pricing

25,000 INR + 14% Service Tax

 
I am Abdulaziz Saeed Bekeer I Employee in Newhorizons Saudi Arabia since 6 years
i working as developer in IT Department and Instructor ,i will talk about iOS Cour... read more
IT consultant and Instructor, NewHorizon, Jedda, Saudi Arabia
Owner : Abdulaziz Bekeer

Register for Big Data (Hadoop Developer) batch coming up in september before 31st July and get 20% off. ... read more