Hadoop Architecture and Administration

5 Day Course
Hands On
Code BIGDAT4

Book Now - 1 Delivery Method Available:

Classroom Virtual Classroom Private Group - Virtual Self-Paced Online

Overview

This course provides delegates with a comprehensive understanding of all the steps necessary to install, operate and maintain a Hadoop cluster using the Apache Hadoop Distribution.

From installation and configuration through load balancing, security and tuning, this course will provide hands-on preparation for the real-world challenges faced by Hadoop administrators.

Objectives

After completing this course, you will be able to:

  • Understand Hadoop Architecture and Administration
  • Deploy a Hadoop Cluster
  • Run Applications
  • Configure a Hadoop Cluster
  • Performance Tuning
  • Manage and maintain a Hadoop Cluster
  • Secure Hadoop
  • Manage Hadoop ecosystems

Modules

Hide all

Understanding Big Data and Hadoop (4 topics)

  • Big data
  • Apache Hadoop 2.0
  • Hadoop cluster components and ecosystem
  • Role of a Hadoop cluster administrator

Planning (7 topics)

  • Planning a Hadoop 2.0 cluster
  • Cluster sizing
  • Hardware
  • Network and software considerations
  • Popular Hadoop distributions
  • Workload and usage patterns
  • Industry recommendations

Hadoop Architecture and Cluster Setup (7 topics)

  • Hadoop server roles
  • Hadoop installation
  • Installing Hadoop daemons
  • Initial configuration
  • Deploying Hadoop in a multi-node Hadoop cluster
  • Optimizing the network architecture
  • Installing Hadoop Clients

HDFS (4 topics)

  • Defining key design assumptions and architecture
  • Setting basic configuration parameters
  • Configuring and setting up the file system
  • Issuing commands from the console

Creating a fault-tolerant file system (4 topics)

  • Isolating single points of failure
  • Maintaining High availability
  • Triggering manual failover
  • Automating failover with Zookeeper

YARN (2 topics)

  • YARN architecture
  • Identifying the new daemons

MapReduce (4 topics)

  • MapReduce Fundamentals
  • Installing and setting up the MapReduce environment
  • Delivering redundant load balancing via Rack Awareness
  • Working with schedulers

Administering MapReduce (3 topics)

  • Managing MapReduce Jobs
  • Tracking progress with monitoring tools
  • Commissioning Computer Nodes

Ecosystem information access (2 topics)

  • Enabling SQL-like querying with Hive
  • Installing Pig to create MapReduce jobs

Ecosystem additional elements (2 topics)

  • Using HBase for a tabular view on HDFS
  • Configuring Oozie to schedule workflows

Spark Ecosystem (3 topics)

  • Introduction to Spark
  • Overview of Spark application programming
  • Spark Configuration, monitoring and tuning

Backup, Recovery and Maintenance (4 topics)

  • Data backup and recovery
  • Enabling trash
  • Namespace quotas
  • Manual failover or metadata recovery

Employing Built-in Tools (2 topics)

  • Managing processes using JVM Metrics
  • Benchmarking to ensure continuous performance

Tuning (2 topics)

  • Using Ganglia to assess performance
  • Benchmarking ensuring performance

Course PDF

Print

Share this Course

Share

Recommend this Course

Sections