Hadoop / Big Data Training Syllabus

Hadoop / Big Data Training Syllabus

Take a Hadoop / Big Data Training Course in Chennai and master the cutting-edge method of storing and processing data. The fundamentals and more advanced concepts of Hadoop / Big data are covered in depth throughout the Hadoop course curriculum.

Hadoop Ecosystem components like HDFS, YARN, MapReduce, Hive, and Pig are covered in depth in our comprehensive Hadoop/ Big Data training syllabus, which has been curated by industry experts with 10+ years of experience. As part of our Hadoop. Big Data course curriculum, we supply you with assignments, training materials, and placement assistance for a comprehensive education.

Hadoop / Bigdata Training Syllabus

Big Data Introduction

  • What is Big Data
  • Evolution of Big Data
  • Benefits of Big Data
  • Operational vs Analytical Big Data
  • Need for Big Data Analytics
  • Big Data Challenges

Hadoop cluster:

  • Master Nodes
    • Name Node
    • Secondary Name Node
    • Job Tracker
  • Client Nodes
  • Slaves
  • Hadoop configuration
  • Setting up a Hadoop cluster

HDFS:

  • Introduction to HDFS
  • HDFS Features
  • HDFS Architecture
  • Blocks
  • Goals of HDFS
  • The Name node & Data Node
  • Secondary Name node
  • The Job Tracker
  • The Process of a File Read
  • How does a File Write work
  • Data Replication
  • Rack Awareness
  • HDFS Federation
  • Configuring HDFS
  • HDFS Web Interface
  • Fault tolerance
  • Name node failure management
  • Access HDFS from Java

Yarn

  • Introduction to Yarn
  • Why Yarn
  • Classic MapReduce v/s Yarn
  • Advantages of Yarn
  • Yarn Architecture
    • Resource Manager
    • Node Manager
    • Application Master
  • Application submission in YARN
  • Node Manager containers
  • Resource Manager components
  • Yarn applications
  • Scheduling in Yarn
    • Fair Scheduler
    • Capacity Scheduler
  • Fault tolerance

MapReduce:

  • What is MapReduce
  • Why MapReduce
  • How MapReduce works
  • Difference between Hadoop 1 & Hadoop 2
  • Identity mapper & reducer
  • Data flow in MapReduce
  • Input Splits
  • Relation Between Input Splits and HDFS Blocks
  • The flow of Job Submission in MapReduce
  • Job submission & Monitoring
  • MapReduce algorithms
    • Sorting
    • Searching
    • Indexing
    • TF-IDF

Hadoop Fundamentals:

  • What is Hadoop
  • History of Hadoop
  • Hadoop Architecture
  • Hadoop Ecosystem Components
  • How does Hadoop work
  • Why Hadoop & Big Data
  • Hadoop Cluster introduction
  • Cluster Modes
    • Standalone
    • Pseudo-distributed
    • Fully – distributed
  • HDFS Overview
  • Introduction to MapReduce
  • Hadoop in demand

HDFS Operations:

  • Starting HDFS
  • Listing files in HDFS
  • Writing a file into HDFS
  • Reading data from HDFS
  • Shutting down HDFS

HDFS Command Reference:

  • Listing contents of the directory
  • Displaying and printing disk usage
  • Moving files & directories
  • Copying files and directories
  • Displaying file contents

Java Overview For Hadoop:

  • Object-oriented concepts
  • Variables and Data types
  • Static data type
  • Primitive data types
  • Objects & Classes
  • Java Operators
  • Method and its types
  • Constructors
  • Conditional statements
  • Looping in Java
  • Access Modifiers
  • Inheritance
  • Polymorphism
  • Method overloading & overriding
  • Interfaces

MapReduce Programming:

  • Hadoop data types
  • The Mapper Class
    • Map method
  • The Reducer Class
    • Shuffle Phase
    • Sort Phase
    • Secondary Sort
    •  Reduce Phase
  • The Job class
    • Job class constructor
  • JobContext interface
  • Combiner Class
    • How Combiner works
    • Record Reader
    • Map Phase
    • Combiner Phase
    • Reducer Phase
    • Record Writer
  • Partitioners
    • Input Data
    • Map Tasks
    • Partitioner Task
    • Reduce Task
    • Compilation & Execution

Hadoop Ecosystems

Pig:

  • What is Apache Pig?
  • Why Apache Pig?
  • Pig features
  • Where should Pig be used
  • Where not to use Pig
  • The Pig Architecture
  • Pig components
  • Pig v/s MapReduce
  • Pig v/s SQL
  • Pig v/s Hive
  • Pig Installation
  • Pig Execution Modes & Mechanisms
  • Grunt Shell Commands
  • Pig Latin – Data Model
  • Pig Latin Statements
  • Pig data types
  • Pig Latin operators
  • CaseSensitivity
  • Grouping & Co Grouping in Pig Latin
  • Sorting & Filtering
  • Joins in Pig latin
  • Built-in Function
  • Writing UDFs
  • Macros in Pig

HBase:

  • What is HBase
  • History Of HBase
  • The NoSQL Scenario
  • HBase & HDFS
  • Physical Storage
  • HBase v/s RDBMS
  • Features of HBase
  • HBase Data model
  • Master server
  • Region servers & Regions
  • HBase Shell
  • Create table and column family
  • The HBase Client API

Spark:

  • Introduction to Apache Spark
  • Features of Spark
  • Spark built on Hadoop
  • Components of Spark
  • Resilient Distributed Datasets
  • Data Sharing using Spark RDD
  • Iterative Operations on Spark RDD
  • Interactive Operations on Spark RDD
  • Spark shell
  • RDD transformations
  • Actions
  • Programming with RDD
    • Start Shell
    • Create RDD
    • Execute Transformations
    • Caching Transformations
    • Applying Action
    • Checking output
  • GraphX overview

Impala:

  • Introducing Cloudera Impala
  • Impala Benefits
  • Features of Impala
  • Relational databases vs Impala
  • How Impala works
  • Architecture of Impala
  • Components of the Impala
    • The Impala Daemon
    • The Impala Statestore
    • The Impala Catalog Service
  • Query Processing Interfaces
  • Impala Shell Command Reference
  • Impala Data Types
  • Creating & deleting databases and tables
  • Inserting & overwriting table data
  • Record Fetching and ordering
  • Grouping records
  • Using the Union clause
  • Working of Impala with Hive
  • Impala v/s Hive v/s HBase

MongoDB Overview:

  • Introduction to MongoDB
  • MongoDB v/s RDBMS
  • Why & Where to use MongoDB
  • Databases & Collections
  • Inserting & querying documents
  • Schema Design
  • CRUD Operations

Oozie & Hue Overview:

  • Introduction to Apache Oozie
  • Oozie Workflow
  • Oozie Coordinators
  • Property File
  • Oozie Bundle system
  • CLI and extensions
  • Overview of Hue

Hive:

  • What is Hive?
  • Features of Hive
  • The Hive Architecture
  • Components of Hive
  • Installation & configuration
  • Primitive types
  • Complex types
  • Built in functions
  • Hive UDFs
  • Views & Indexes
  • Hive Data Models
  • Hive vs Pig
  • Co-groups
  • Importing data
  • Hive DDL statements
  • Hive Query Language
  • Data types & Operators
  • Type conversions
  • Joins
  • Sorting & controlling data flow
  • local vs mapreduce mode
  • Partitions
  • Buckets

Sqoop:

  • Introducing Sqoop
  • Scoop installation
  • Working of Sqoop
  • Understanding connectors
  • Importing data from MySQL to Hadoop HDFS
  • Selective imports
  • Importing data to Hive
  • Importing to Hbase
  • Exporting data to MySQL from Hadoop
  • Controlling import process

Flume:

  • What is Flume?
  • Applications of Flume
  • Advantages of Flume
  • Flume architecture
  • Data flow in Flume
  • Flume features
  • Flume Event
  • Flume Agent
    •  Sources
    •  Channels
    •  Sinks
  • Log Data in Flume

Zookeeper Overview:

  • Zookeeper Introduction
  • Distributed Application
  • Benefits of Distributed Applications
  • Why use Zookeeper
  • Zookeeper Architecture
  • Hierarchial Namespace
  • Znodes
  • Stat structure of a Znode
  • Electing a leader

Kafka Basics:

  • Messaging Systems
    • Point-to-Point
    • Publish – Subscribe
  • What is Kafka
  • Kafka Benefits
  • Kafka Topics & Logs
  • Partitions in Kafka
  • Brokers
  • Producers & Consumers
  • What are Followers
  • Kafka Cluster Architecture
  • Kafka as a Pub-Sub Messaging
  • Kafka as a Queue Messaging
  • Role of Zookeeper
  • Basic Kafka Operations
    • Creating a Kafka Topic
    • Listing out topics
    • Starting Producer
    • Starting Consumer
    • Modifying a Topic
    • Deleting a Topic
  • Integration With Spark

Scala Basics:

  • Introduction to Scala
  • Spark & Scala interdependence
  • Objects & Classes
  • Class definition in Scala
  • Creating Objects
  • Scala Traits
  • Basic Data Types
  • Operators in Scala
  • Control structures
  • Fields in Scala
  • Functions in Scala
  • Collections in Scala
    • Mutable collection
    • Immutable collection

Project

Conclusion 

Went through our Hadoop / Big Data course Syllabus? Felt like an industrial-oriented Hadoop / Big Data Training Curriculum right? Then our Hadoop/ Big Data training in Chennai is your ideal destination. Enrol now.