Rack Awareness by Sohel Teli

Sohel Teli August 29, 2018 Big Data

Rack Awareness:

Rack awareness is having the knowledge of Cluster topology or more specifically how the different data nodes are distributed across the racks of a Hadoop cluster. The importance of this knowledge relies on this assumption that collocated data nodes inside a specific rack will have more bandwidth and less latency whereas two data nodes in separate racks will have comparatively less bandwidth and higher latency.

The main purpose of Rack awareness is:

Increasing the availability of data block
Better cluster performance
Let us assume the cluster has 9 Data Nodes with replication factor 3.

Let us also assume that there are 3 physical racks where these machines are placed:

Rack1: DN1;DN2;DN3

Rack2: DN4;DN5;DN6

Rack3: DN7:DN8;DN9

The following diagram depicts an example block placement when HDFS and Yarn are not rack aware:

What happens if Rack1 goes down? -> Potentially data in Block1 might be lost
- Not being Rack aware the entire cluster is thought of placed in default-rack

The following diagram depicts an example block placement when HDFS and Yarn are rack aware:

What happens if Rack1 goes down? We still have the block replicas in other data nodes

So evidently Rack awareness increases data availability. Also the HDFS balancer and decommissioning of data nodes are rack aware operations.

What about performance?

Faster replication operation. Since the replicas are placed within the same rack it would use higher bandwidth and lower latency hence making it faster.
If YARN is unable to create a container in the same data node where the queried data is located it would try to create the container in a data node within the same rack. This would be more performant because of the higher bandwidth and lower latency of the data nodes inside the same rack.

5 comments

Dinesh
August 29, 2018 at 6:50 am
Nice artical
Yogesh Patil
August 29, 2018 at 7:44 am
Really nice article.. Impressed…!!
Tushar Bhoyar
August 29, 2018 at 8:01 am
One of the best article on rack awareness I have read ?
Rizwan Shaikh
August 29, 2018 at 11:28 am
Good work in writing your article about Rack Awareness..
Impressive
Monis Khatik
August 29, 2018 at 4:04 pm
Nice Article

Database Tutorials MSSQL, Oracle, PostgreSQL, MySQL, MariaDB, DB2, Sybase, Teradata, Big Data, NOSQL, MongoDB, Couchbase, Cassandra, Windows, Linux

Rack Awareness by Sohel Teli

Rack Awareness:

About Sohel Teli

5 comments

Leave a Reply Cancel reply