Login

eastmananoop · 08-16-2017, 10:33 PM

HADOOP DISTRIBUTED FILE SYSTEM

[attachment=16321]

Introduction

Apache Hadoopis asoftware frameworkthat supports data-intensivedistributed applicationsunder afree license.It enables applications to work with thousands of nodes andpetabytesof data.
Hadoop was inspired byGoogle'sMapReduceandGoogle File System(GFS) papers.
Hadoop is a Apacheproject being built and used by a global community of contributors, using theJavaprogramming language.
Yahoo!has been the largest contributor to the project, and uses Hadoop extensively across its businesses.
Hadoop was created byDoug Cutting, who named it after his son's toy elephant.

Benefits of Hadoop

Hadoop is designed to run on cheap commodity hardware
It automatically handles data replication and node failure
It does the hard work you can focus on processing data
Cost Saving and efficient and reliable data processing

Overall Architecture

A small Hadoop cluster will include a single master and multiple worker nodes.
The master node consists of:-
Jobtracker
Tasktracker
Namenode
Datanode.
A slave orworker nodeconsists of a datanodeand tasktracker, though it is possible to have data-only worker nodes, and compute-only worker nodes