AT TECHTRADE.
WITH OUR INDUSTRY ORIENTED TRAININGS AND LIVE PROJECTS WE EDUCATE AND MAKE YOU JOB READY

Apache Hadoop - Big Data

What is Hadoop ?

Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and should be automatically handled by the framework. 


How Hadoop helps us ?

Hadoop splits files into large blocks and distributes them across nodes in a cluster. To process data, Hadoop transfers packaged code for nodes to process in parallel based on the data that needs to be processed. This approach takes advantage of data locality — nodes manipulating the data they have access to— to allow the dataset to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are distributed via high-speed networking.


Future of Hadoop and where its Going ?

One of the other areas where Big Data technologies are headed is towards some kind of architectural revamp in the area of data security, that allows massive data sets to be collected, streamed or analyzed in a relatively secure way. A lot of organizations that require the implementation of real time analytics would undoubtedly require such advanced data security layers or capabilities.


Our Course Structure:

1. Introduction to BigData & Hadoop - 6 Hrs
Introduction to Big Data
Introduction to Hadoop
Why Hadoop?
History of Hadoop
Components of Hadoop
Brief idea about HDFS, MapReduce, PIG, Hive, SQOOP, HBASE, OOZIE, Flume, Zookeeper and so on…
Scope of Hadoop



2. HDFS (Storing the Data) - 12 Hrs

Introduction of HDFS
HDFS Design
HDFS role in Hadoop
Features of HDFS
Daemons of Hadoop and its functionality Name Node Secondary Name Node Job Tracker Data Node Task Tracker
Anatomy of File Wright
Anatomy of File Read
Network Topology Nodes Racks Data Center
Parallel Copying using DistCp
Basic Configuration for HDFS
Data Organization Blocks Replication
Rack Awareness
Heartbeat Signal
How to Store the Data into HDFS
How to Read the Data from HDFS
Accessing HDFS (Introduction of Basic UNIX commands)
CLI commands



3. MapReduce using Java (Processing the Data) - 15 Hrs

The introduction of MapReduce.
MapReduce Architecture
Data flow in MapReduce Splits Mapper Portioning Sort and Shuffle Combiner Reducer
Understand Difference Between Block and InputSplit
Role of RecordReader
Basic Configuration of MapReduce
MapReduce life cycle Driver Code Mapper Reducer
How MapReduce Works
Writing and Executing the Basic MapReduce Program using Java
Submission & Initialization of MapReduce Job
File Input/Output Formats in MapReduce Jobs Text Input Format Key Value Input Format Sequence File Input Format NLine Input Format
Joins Map-side Joins Reducer-side Joins
Word Count Example
Partition MapReduce Program
Side Data Distribution Distributed Cache (with Program)
Counters (with Program) Types of Counters Task Counters Job Counters User Defined Counters Propagation of Counters
Job Scheduling



4. PIG - 10 Hrs
Introduction to Apache PIG
Introduction to PIG Data Flow Engine
MapReduce vs. PIG
When PIG should be used?
Data Types in PIG
Basic PIG programming
Modes of Execution in PIG Local Mode MapReduce Mode
Execution Mechanisms Grunt Shell Script Embedded
Operators/Transformations in PIG
PIG UDF’s with Program
Word Count Example in PIG
MapReduce and PIG: Comparison



5. SQOOP - 6 Hrs
Introduction to SQOOP
Use of SQOOP
Connect to MySql database
SQOOP commands Import Export Eval Codegen etc…
Joins in SQOOP
Export to MySQL
Export to HBase



6. HIVE - 8 Hrs
Introduction to HIVE
HIVE Meta Store
HIVE Architecture
Tables in HIVE Managed Tables External Tables
Hive Data Types Primitive Types Complex Types
Partition
Joins in HIVE
HIVE UDF’s and UADF’s with Programs
Word Count Example



7. HBASE - 12 Hrs
Introduction to HBASE
Basic Configurations of HBASE
Fundamentals of HBase
NoSQL
HBase Data Model Table and Row Column Family and Column Qualifier Cell and its Versioning
Categories of NoSQL Data Bases Key-Value Database Document Database Column Family Database
HBASE Architecture HMaster Region Servers Regions MemStore Store
SQL vs. NoSQL
HBASE and RDBMS: Comparison
HDFS vs. HBase
Client-side buffering or bulk uploads
HBase Designing Tables
HBase Operations Get Scan Put Delete



8. MongoDB - 6 Hrs
What is MongoDB?
Where to Use?
Configuration On Windows
Inserting Data into MongoDB
Reading the MongoDB data



9. Cluster Setup - 6 Hrs
Downloading and installing the Ubuntu12.x
Installing Java
Installing Hadoop
Creating Cluster
Increasing Decreasing the Cluster size
Monitoring the Cluster Health
Starting and Stopping the Nodes



10. Zookeeper - 6 Hrs
Introduction to Zookeeper
Data Model
Operations



11. OOZIE - 6 Hrs
Introduction to OOZIE
Use of OOZIE
Where to use OOZIE?



12. Flume - 6 Hrs
Introduction to Flume
Uses of Flume
Flume Architecture Flume Master Flume Collectors Flume Agents






Techtradeindia Techtradeindia Author