Course Outline
Introduction
- Introduction to Cloud Computing and Big Data solutions
- Overview of Apache Hadoop Features and Architecture
Setting up Hadoop
- Planning a Hadoop cluster (on-premise, cloud, etc.)
- Selecting the OS and Hadoop distribution
- Provisioning resources (hardware, network, etc.)
- Downloading and installing the software
- Sizing the cluster for flexibility
Working with HDFS
- Understanding the Hadoop Distributed File System (HDFS)
- Overview of HDFS Command Reference
- Accessing HDFS
- Performing Basic File Operations on HDFS
- Using S3 as a complement to HDFS
Overview of the MapReduce
- Understanding Data Flow in the MapReduce Framework
- Map, Shuffle, Sort and Reduce
- Demo: Computing Top Salaries
Working with YARN
- Understanding resource management in Hadoop
- Working with ResourceManager, NodeManager, Application Master
- Scheduling jobs under YARN
- Scheduling for large numbers of nodes and clusters
- Demo: Job scheduling
Integrating Hadoop with Spark
- Setting up storage for Spark (HDFS, Amazon, S3, NoSQL, etc.)
- Understanding Resilient Distributed Datasets (RDDs)
- Creating an RDD
- Implementing RDD Transformations
- Demo: Implementing a Text Search Program for Movie Titles
Managing a Hadoop Cluster
- Monitoring Hadoop
- Securing a Hadoop cluster
- Adding and removing nodes
- Running a performance benchmark
- Tuning a Hadoop cluster to optimizing performance
- Backup, recovery and business continuity planning
- Ensuring high availability (HA)
Upgrading and Migrating a Hadoop Cluster
- Assessing workload requirements
- Upgrading Hadoop
- Moving from on-premise to cloud and vice-versa
- Recovering from failures
Troubleshooting
Summary and Conclusion
Requirements
- System administration experience
- Experience with Linux command line
- An understanding of big data concepts
Audience
- System administrators
- DBAs
Testimonials (7)
I liked that it was practical. Loved to apply the theoretical knowledge with practical examples.
Aurelia-Adriana - Allianz Services Romania
Course - Python and Spark for Big Data (PySpark)
A lot of practical examples, different ways to approach the same problem, and sometimes not so obvious tricks how to improve the current solution
Rafał - Nordea
Course - Apache Spark MLlib
This is one of the best hands-on with exercises programming courses I have ever taken.
Laura Kahn
Course - Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP
I thought he did a great job of tailoring the experience to the audience. This class is mostly designed to cover data analysis with HIVE, but me and my co-worker are doing HIVE administration with no real data analytics responsibilities.
ian reif - Franchise Tax Board
Course - Data Analysis with Hive/HiveQL
Sufficient hands on, trainer is knowledgable
Chris Tan
Course - A Practical Introduction to Stream Processing
The VM I liked very much The Teacher was very knowledgeable regarding the topic as well as other topics, he was very nice and friendly I liked the facility in Dubai.
Safar Alqahtani - Elm Information Security
Course - Big Data Analytics in Health
Having hands on session / assignments