Introduction to Big Data

Machine Learning

Jesus A. Gonzalez

August 10, 2019

Introduction

Introduction

Introduction

Introduction

Hadoop

Hadoop

What is Big data?

What is Big data?

What is Big data?

Hadoop Examples

Hadoop Examples

Hadoop Examples

Hadoop Examples

Hadoop Examples

Hadoop Examples

Hadoop Examples

Hadoop doesn’t solve just any problem

Solutions for Big Data

Big Data and the Cloud

Hadoop Architecture

Hadoop Architecture

Hadoop Architecture

Hadoop Architecture

Hadoop Distributed File System (HDFS)

Hadoop Distributed File System (HDFS)

Hadoop Distributed File System (HDFS)

Hadoop Distributed File System (HDFS)

Hadoop Replication

MapReduce Engine

Types of Nodes

Type of Nodes

Type of Nodes: NameNode

Types of Nodes: DataNode

Types of Nodes: JobTracker

Types of Nodes: TaskTracker

Topology Awareness

Topology Awareness

Writting a File to HDFS

Command Line Interface

Command Line Interface

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

Reduce Operation

Reduce Operation

Reduce Operation

Reduce Operation

Reduce Operation

Reduce Operation

Hadoop MapReduce

Example of a MapReduce Job

Example of a MapReduce Job

MapReduce Operations in Sequence

MapReduce Operations in Sequence

Fundamental Data Types

Data Flow Example

Fault Tolerance

Fault Tolerance

Fault Tolerance