Hadoop / Big Data Training Syllabus

    Hadoop / Big Data Training Syllabus

    Take a Hadoop / Big Data Training Course in Chennai and master the cutting-edge method of storing and processing data. The fundamentals and more advanced concepts of Hadoop / Big data are covered in depth throughout the Hadoop course curriculum.

    Hadoop Ecosystem components like HDFS, YARN, MapReduce, Hive, and Pig are covered in depth in our comprehensive Hadoop/ Big Data training syllabus, which has been curated by industry experts with 10+ years of experience. As part of our Hadoop. Big Data course curriculum, we supply you with assignments, training materials, and placement assistance for a comprehensive education.

    Hadoop / Bigdata Training Syllabus

    Big Data Introduction

    • What is Big Data
    • Evolution of Big Data
    • Benefits of Big Data
    • Operational vs Analytical Big Data
    • Need for Big Data Analytics
    • Big Data Challenges

    Hadoop cluster:

    • Master Nodes
      • Name Node
      • Secondary Name Node
      • Job Tracker
    • Client Nodes
    • Slaves
    • Hadoop configuration
    • Setting up a Hadoop cluster

    HDFS:

    • Introduction to HDFS
    • HDFS Features
    • HDFS Architecture
    • Blocks
    • Goals of HDFS
    • The Name node & Data Node
    • Secondary Name node
    • The Job Tracker
    • The Process of a File Read
    • How does a File Write work
    • Data Replication
    • Rack Awareness
    • HDFS Federation
    • Configuring HDFS
    • HDFS Web Interface
    • Fault tolerance
    • Name node failure management
    • Access HDFS from Java

    Yarn

    • Introduction to Yarn
    • Why Yarn
    • Classic MapReduce v/s Yarn
    • Advantages of Yarn
    • Yarn Architecture
      • Resource Manager
      • Node Manager
      • Application Master
    • Application submission in YARN
    • Node Manager containers
    • Resource Manager components
    • Yarn applications
    • Scheduling in Yarn
      • Fair Scheduler
      • Capacity Scheduler
    • Fault tolerance

    MapReduce:

    • What is MapReduce
    • Why MapReduce
    • How MapReduce works
    • Difference between Hadoop 1 & Hadoop 2
    • Identity mapper & reducer
    • Data flow in MapReduce
    • Input Splits
    • Relation Between Input Splits and HDFS Blocks
    • The flow of Job Submission in MapReduce
    • Job submission & Monitoring
    • MapReduce algorithms
      • Sorting
      • Searching
      • Indexing
      • TF-IDF

    Hadoop Fundamentals:

    • What is Hadoop
    • History of Hadoop
    • Hadoop Architecture
    • Hadoop Ecosystem Components
    • How does Hadoop work
    • Why Hadoop & Big Data
    • Hadoop Cluster introduction
    • Cluster Modes
      • Standalone
      • Pseudo-distributed
      • Fully – distributed
    • HDFS Overview
    • Introduction to MapReduce
    • Hadoop in demand

    HDFS Operations:

    • Starting HDFS
    • Listing files in HDFS
    • Writing a file into HDFS
    • Reading data from HDFS
    • Shutting down HDFS

    HDFS Command Reference:

    • Listing contents of the directory
    • Displaying and printing disk usage
    • Moving files & directories
    • Copying files and directories
    • Displaying file contents

    Java Overview For Hadoop:

    • Object-oriented concepts
    • Variables and Data types
    • Static data type
    • Primitive data types
    • Objects & Classes
    • Java Operators
    • Method and its types
    • Constructors
    • Conditional statements
    • Looping in Java
    • Access Modifiers
    • Inheritance
    • Polymorphism
    • Method overloading & overriding
    • Interfaces

    MapReduce Programming:

    • Hadoop data types
    • The Mapper Class
      • Map method
    • The Reducer Class
      • Shuffle Phase
      • Sort Phase
      • Secondary Sort
      •  Reduce Phase
    • The Job class
      • Job class constructor
    • JobContext interface
    • Combiner Class
      • How Combiner works
      • Record Reader
      • Map Phase
      • Combiner Phase
      • Reducer Phase
      • Record Writer
    • Partitioners
      • Input Data
      • Map Tasks
      • Partitioner Task
      • Reduce Task
      • Compilation & Execution

    Hadoop Ecosystems

    Pig:

    • What is Apache Pig?
    • Why Apache Pig?
    • Pig features
    • Where should Pig be used
    • Where not to use Pig
    • The Pig Architecture
    • Pig components
    • Pig v/s MapReduce
    • Pig v/s SQL
    • Pig v/s Hive
    • Pig Installation
    • Pig Execution Modes & Mechanisms
    • Grunt Shell Commands
    • Pig Latin – Data Model
    • Pig Latin Statements
    • Pig data types
    • Pig Latin operators
    • CaseSensitivity
    • Grouping & Co Grouping in Pig Latin
    • Sorting & Filtering
    • Joins in Pig latin
    • Built-in Function
    • Writing UDFs
    • Macros in Pig

    HBase:

    • What is HBase
    • History Of HBase
    • The NoSQL Scenario
    • HBase & HDFS
    • Physical Storage
    • HBase v/s RDBMS
    • Features of HBase
    • HBase Data model
    • Master server
    • Region servers & Regions
    • HBase Shell
    • Create table and column family
    • The HBase Client API

    Spark:

    • Introduction to Apache Spark
    • Features of Spark
    • Spark built on Hadoop
    • Components of Spark
    • Resilient Distributed Datasets
    • Data Sharing using Spark RDD
    • Iterative Operations on Spark RDD
    • Interactive Operations on Spark RDD
    • Spark shell
    • RDD transformations
    • Actions
    • Programming with RDD
      • Start Shell
      • Create RDD
      • Execute Transformations
      • Caching Transformations
      • Applying Action
      • Checking output
    • GraphX overview

    Impala:

    • Introducing Cloudera Impala
    • Impala Benefits
    • Features of Impala
    • Relational databases vs Impala
    • How Impala works
    • Architecture of Impala
    • Components of the Impala
      • The Impala Daemon
      • The Impala Statestore
      • The Impala Catalog Service
    • Query Processing Interfaces
    • Impala Shell Command Reference
    • Impala Data Types
    • Creating & deleting databases and tables
    • Inserting & overwriting table data
    • Record Fetching and ordering
    • Grouping records
    • Using the Union clause
    • Working of Impala with Hive
    • Impala v/s Hive v/s HBase

    MongoDB Overview:

    • Introduction to MongoDB
    • MongoDB v/s RDBMS
    • Why & Where to use MongoDB
    • Databases & Collections
    • Inserting & querying documents
    • Schema Design
    • CRUD Operations

    Oozie & Hue Overview:

    • Introduction to Apache Oozie
    • Oozie Workflow
    • Oozie Coordinators
    • Property File
    • Oozie Bundle system
    • CLI and extensions
    • Overview of Hue

    Hive:

    • What is Hive?
    • Features of Hive
    • The Hive Architecture
    • Components of Hive
    • Installation & configuration
    • Primitive types
    • Complex types
    • Built in functions
    • Hive UDFs
    • Views & Indexes
    • Hive Data Models
    • Hive vs Pig
    • Co-groups
    • Importing data
    • Hive DDL statements
    • Hive Query Language
    • Data types & Operators
    • Type conversions
    • Joins
    • Sorting & controlling data flow
    • local vs mapreduce mode
    • Partitions
    • Buckets

    Sqoop:

    • Introducing Sqoop
    • Scoop installation
    • Working of Sqoop
    • Understanding connectors
    • Importing data from MySQL to Hadoop HDFS
    • Selective imports
    • Importing data to Hive
    • Importing to Hbase
    • Exporting data to MySQL from Hadoop
    • Controlling import process

    Flume:

    • What is Flume?
    • Applications of Flume
    • Advantages of Flume
    • Flume architecture
    • Data flow in Flume
    • Flume features
    • Flume Event
    • Flume Agent
      •  Sources
      •  Channels
      •  Sinks
    • Log Data in Flume

    Zookeeper Overview:

    • Zookeeper Introduction
    • Distributed Application
    • Benefits of Distributed Applications
    • Why use Zookeeper
    • Zookeeper Architecture
    • Hierarchial Namespace
    • Znodes
    • Stat structure of a Znode
    • Electing a leader

    Kafka Basics:

    • Messaging Systems
      • Point-to-Point
      • Publish – Subscribe
    • What is Kafka
    • Kafka Benefits
    • Kafka Topics & Logs
    • Partitions in Kafka
    • Brokers
    • Producers & Consumers
    • What are Followers
    • Kafka Cluster Architecture
    • Kafka as a Pub-Sub Messaging
    • Kafka as a Queue Messaging
    • Role of Zookeeper
    • Basic Kafka Operations
      • Creating a Kafka Topic
      • Listing out topics
      • Starting Producer
      • Starting Consumer
      • Modifying a Topic
      • Deleting a Topic
    • Integration With Spark

    Scala Basics:

    • Introduction to Scala
    • Spark & Scala interdependence
    • Objects & Classes
    • Class definition in Scala
    • Creating Objects
    • Scala Traits
    • Basic Data Types
    • Operators in Scala
    • Control structures
    • Fields in Scala
    • Functions in Scala
    • Collections in Scala
      • Mutable collection
      • Immutable collection

    Project

    Conclusion 

    Went through our Hadoop / Big Data course Syllabus? Felt like an industrial-oriented Hadoop / Big Data Training Curriculum right? Then our Hadoop/ Big Data training in Chennai is your ideal destination. Enrol now.