O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Hadoop Fundamentals LiveLessons (Video Training)

Video Description

Apache Hadoop is a freely available open source tool-set that enables big data analysis. This Hadoop Fundamentals LiveLessons tutorial demonstrates the core components of Hadoop including Hadoop Distriuted File Systems (HDFS) and MapReduce. In addition, the tutorial demonstrates how to use Hadoop at several levels including the native Java interface, C++ pipes, and the universal streaming program interface. Examples of how to use high level tools include the Pig scripting language and the Hive 'SQL like' interface. Finally, the steps for installing Hadoop on a desktop virtual machine, in a Cloud environment, and on a local stand-alone cluster are presented. Topics covered in this tutorial apply to Hadoop version 2 (i.e., MR2 or Yarn).

The source code repository for this LiveLesson can be found at www.clustermonkey.net/download/LiveLessons/Hadoop_Fundamentals/.

About the Author:

Douglas Eadline, PhD, began his career as a practitioner and a chronicler of the Linux Cluster HPC revolution and now documents big data analytics. Starting with the first Beowulf How To document, Dr. Eadline has written hundreds of articles, white papers, and instructional documents covering virtually all aspects of HPC computing. Prior to starting and editing the popular ClusterMonkey.net web site in 2005, he served as EditorĀ­inĀ­chief for ClusterWorld Magazine, and was Senior HPC Editor for Linux Magazine. Currently, he is a consultant to the HPC industry and writes a monthly column in HPC Admin Magazine. Both clients and readers have recognized Dr. Eadline's ability to present a "technological value proposition" in a clear and accurate style. He has practical hands on experience in many aspects of HPC including, hardware and software design, benchmarking, storage, GPU, cloud, and parallel computing.

Table of Contents

  1. Introduction
    1. Introduction to Hadoop Fundamentals LiveLessons 00:02:30
  2. Lesson 1: Background Concepts
    1. Learning objectives 00:00:35
    2. 1.1 Understand the problem Hadoop solves 00:10:37
    3. 1.2 Understand the Hadoop approach 00:03:36
    4. 1.3 Understand the Hadoop Project 00:06:42
  3. Lesson 2: Running Hadoop on a Desktop or Laptop
    1. Learning objectives 00:00:56
    2. 2.1 Install Hortonworks HDP Sandbox 00:09:53
  4. Lesson 3: The Hadoop Distributed File System
    1. Learning objectives 00:00:45
    2. 3.1 Understand HDFS basics 00:24:12
    3. 3.2a Use HDFS tools 00:17:53
    4. 3.2b Do HDFS administration 00:26:18
    5. 3.3 Use HDFS in programs 00:17:33
  5. Lesson 4: Hadoop MapReduce
    1. Learning objectives 00:00:46
    2. 4.1 Understand the MapReduce paradigm 00:07:45
    3. 4.2 Develop and run a Java MapReduce application 00:15:51
    4. 4.3 Understand how MapReduce works 00:17:45
  6. Lesson 5: Hadoop Examples
    1. Learning objectives 00:00:35
    2. 5.1 Use the Streaming Interface 00:10:37
    3. 5.2 Use the Pipes Interface 00:07:24
    4. 5.3 Run the Hadoop grep example 00:06:28
    5. 5.4 Debug MapReduce 00:11:04
  7. Lesson 6: Higher Level Tools
    1. Learning objectives 00:00:41
    2. 6.1 Use Pig 00:07:59
    3. 6.2 Use Hive 00:06:37
  8. Lesson 7: Setting Up Hadoop in the Cloud
    1. Learning objectives 00:00:35
    2. 7.1 Use Whirr to launch Hadoop in the Cloud 00:12:08
  9. Lesson 8: Set Up Hadoop on a Local Cluster
    1. Learning objectives 00:00:41
    2. 8.1 Specify and prepare servers 00:19:19
    3. 8.2 Install and configure Hadoop Core 00:27:00
    4. 8.3 Install and configure Pig and Hive 00:03:46
    5. 8.4 Install and configure Ganglia 00:05:05
    6. 8.5 Perform simple administration and monitoring 00:07:30