Friday , November 8 2024

Hadoop-3.1.0 Multi Node Connection and Configuration in easiest way

HADOOP MULTI NODE INSTALLATION PROCESS

[ Note: I am creating multi node in most easiest way by creating and using everything in root mode on default users like (master, slave1, slave2…etc). Please don’t create any hadoop users like (hadoop, hduser, …etc) to configure and share hadoop installations. I am using root user in all machine to make connection for multi node.]

Step 1: Update one or all packages on your system

Step 2: Update packages taking obsoletes into account

Step 3: Check the hostname in master & slave systems and rename it accordingly

***check host name of master and slaves by using below command***

***rename hostname of your machines***

In Master machine:

master

***similarly for slaves***

In Slave Machines:

slave1
slave2

Step 4: Edit /etc/hosts

Now lets ssh into master, slave1, slave2 and change the /etc/hosts file, so that we can use hostname instead of IP everytime we wish to use or ping any of these machines

***Add the ip address and hostname of your master and slave machines by using below lines in master, slave1, slave2***

192.168.1.9 master
192.168.1.3 slave1
192.168.1.23 slave2

Step 5: Generate password-less Ssh key:

Install OpenSSH Server:

Make sure port 22 is opened:

Edit /etc/sysconfig/iptables (IPv4 firewall):

***If iptable is not installed in machine. Please follow step 6 to install iptable***

***Add the below line and save it***

Start OpenSSH Service:

If your site uses IPv6, and you are editing ip6tables, use the line:

***Add the below line and save it***

-A RH-Firewall-1-INPUT -m tcp -p tcp –dport 22 -j ACCEPT

Restart iptables:

Generate public ssh key:

In master machine

Create ssh key:

***create password-less public ssh key by using below command***

*** adding the public key to the authorized_keys file by using below command***

Set right and Authorizing permission to ssh key:

Copy ssh key from master to slave1:

***Only the public key is copied to the server(slave1, slave2) by using below commands. The private key should never be copied to another machine***

[Note: If above command shows error. Please follow step 6 to Configure and Disable firewall and iptables]

Copy ssh key from master to slave2:

[Note: If above command shows error. Please follow step 6 to Configure and Disable firewall and iptables]

Test the new key:

***pinging ssh connection from master to their own ***

***pinging ssh connection from master to slave1 machine***

***pinging ssh connection from master to slave2 machine***

In slave1 machine

Create ssh key:

***create password-less public ssh key by using below command***

*** adding the public key to the authorized_keys file by using below command***

Set right and Authorizing permission to ssh key:

Copy ssh key from slave1 to master:

***Only the public key is copied to the server(slave1, slave2) by using below commands. The private key should never be copied to another machine***

[Note: If above command shows error. Please follow step 6 to Configure and Disable firewall and iptables]

Copy ssh key from slave1 to slave2:

[Note: If above command shows error. Please follow step 6 to Configure and Disable firewall and iptables]

Test the new keys:

***pinging ssh connection from slave1 to their own ***

***pinging ssh connection from slave1 to master machine***

***pinging ssh connection from slave1 to slave2 machine***

In slave2 machine

Create ssh key:

***create password-less public ssh key by using below command***

*** adding the public key to the authorized_keys file by using below command***

Set right and Authorizing permission to ssh key:

Copy ssh key from slave2 to master:

***Only the public key is copied to the server (slave1, slave2) by using below commands. The private key should never be copied to another machine***

[Note: If above command shows error. Please follow step 6 to Configure and Disable firewall and iptables]

Copy ssh key from slave2 to slave 1:

[Note: If above command shows error. Please follow step 6 to Configure and Disable firewall and iptables]

Test the new keys:

***pinging ssh connection from master to their own ***

***pinging ssh connection from slave2 to master machine***

***pinging ssh connection from slave2 to slave1 machine***

Step 6: Disable Firewall and Iptable

[Note : Follow this steps only if you’re facing problem in ssh connection or pinging between master and slave machines]

Disable firewalld:

Stop firewalld:

Check the status of firewalld:

***since the firewalld service should not be started manually while the iptables services are running hence to prevent the firewallservice from starting automatically at boot***

Install iptable:

Enable iptable:

Start iptable:

Stop iptables:

Stop ip6table:

Step 7: Installing Java 1.8

[ Note: Install java individually on master machine and slave machines]

***Download jdk 1.8 from oracle website***

***Extract jdk-1.8 rpm file by using below command***

***move the extracted java file from /home/Download to /usr/local ***

Step 8 : Installing Hadoop

[ Note: Download and Install hadoop in master machine alone and share the hadoop installed folder to slave machines]

***Use the below command to download hadoop-3.1.0 in master system***

***move the hadoop-3.1.0.tar.gz file from /home/Download to /usr/local in master system***

***To extract or untar the hadoop-3.1.0 file***

Step 9: Edit Hadoop Configuration Files

1. Edit ~/.bash_profile

****Add the below lines*****

***source it to reflect changes***

***Now Check the JAVA VERSION using below command***

***Now Check the HADOOP VERSION using below command***

2. Edit core-site.xml

****Add the below lines*****

3. Edit hdfs-site.xml

****Add the below lines*****

4. Edit mapred-site.xml

****Add the below lines*****

5. Edit yarn-site.xml

****Add the below lines*****

6. Edit hadoop-env.sh

****Add the below line*****

Step 10: Create namenode directory in master machine

Step 11: Modify Masters file and add ip address of namenode in master system

***create a masters file in master machine***

***add ip address of master machine in masters file****

Step 12: Modify Slaves file and add ip address of datanode’s in master system

***create slaves file in master machine***

***add ip address of slave machine’s in slaves file***

To view contents of masters file:

***it will show master ip address***

To view contents of Slaves file:

***it will show list of slaves ip address***

Step 13: Copy hadoop-3.1.0 file to slaves

To Secure copy hadoop file from Master machine /usr/local/hadoop-3.1.0 to Slave machines

***Type the below command in master machine and copy the file from Master to Slave1***

***Type the below command in master machine and copy the file from Master to Slave2***

Step 14: Make datanode’s directory on both Slave1 and Slave2 machines

***make a directory in hadoop-3.1.0 for datanode in slave machines***

Step 15: Format namenode on master machine

Step 16: Start Namenode and Datanode

Step 17: Start Nodemanager and Resourcemanager

Step 18: Start Hadoop Daemons on Master and Slaves

***Type the below command in master machine***

***It will show output daemons of master machine***

10307 Jps
7961 ResourceManager
7105 NameNode
7352 SecondaryNameNode

***Type the below command in slave1 machine’s***

***It will show output daemons of slave1 machine***

2780 Jps
2181 NodeManager
1996 DataNode

***Type the below command in slave2 machine***

***It will show output daemons of slave2 machine ***

1735 Jps
2328 NodeManager
1983 DataNode

[ Note : If any of the daemons are missing in any one of the machine. Then we need to format namenode and restart the services and check once again. If it is not showing output as exactly we need to check the hadoop configuration files from begining].

Step 19: Check Hadoop Web Interfaces (HADOOP WEB UI)

***web url for DFS health reports of master machine***

http://192.168.1.9:9870/dfshealth.html#tab-overview

***web url for DFS health reports of slave1 machine***

http://192.168.1.23:9864

***web url for DFS health reports of slave2 machine***

http://192.168.1.3:9864

***to verify through command line***

root@master: /usr/local/hadoop-3.1.0/etc/hadoop > hdfs dfsadmin -report

***web url for resource manager***

http://192.168.1.9:8088/cluster

Congrats! Now we’ve successfully installed hadoop in multi node using yarn mode.

——–
For any doubts/clarifications
contact me in email : [email protected]
whats app: +91 7401256086

Loading

About Prabhu Sundarraj

~An Enthusiastic Data Engineer with more than 2 years of experience. ~Having very good exposure to deal with MULTI-BILLION events per day and all the follow up challenges as a result. ~Building scalable big-data pipelines / applications / data warehouse with Hadoop, Spark, Scala. Have been working with Big Data stack for past 2 plus years. ~An active follower of data industry to understand the landscape very well. ~Seeking an opportunity for boosting my carrer

6 comments

  1. Its awesome consolidation of all prerequisites. But i have one doubt over here?
    May i kkow why are you using ip addresses inside masters and slaves files?
    I hope it is better to user hostnames everywhere that is the reason we configured /etc/hots across all hosts.
    Please clarify.

    • Yes, you are right. We can use hostname also instead of ipaddress. Only when you configure… etc/hostname… file.
      For better understanding I am using ipaddress instead of hostname for beginners to understand easily. you can also try by hostname and check whether it is pinging or not.

    • Reason to configure /etc/hosts:
      we need to mention the ipaddress as well as hostname of desired machine to connect with the machine in which we’re configuring.. /etc/hosts…..
      Then we can ping by (ssh hostname/ssh ipaddress).. It is a flavour we can use any of this..

      See for example if you are adding hostname alone and not ipaddress in your master machine /etc/hosts.
      After that you’re trying to ping by (ssh hostname) it will ping/connect or not!!???
      Answer is… Obviously it will not connect.
      Only when you add ipaddress of that hostname in that line… Then only master machine will know this hostname machine belongs to this ipaddress machine… So only by ipaddress the master machine know how to connect that hostname machine.
      So we’re using ipadreess for more accuracy between connections

  2. Really OSM, Good work!
    anyone can learn hadoop multinode installation. this site is explaining each and every line.
    very usefull ji.. Thanks!

  3. good thing you did…
    I really appreciate you Prabhu Sundarraj
    share knowledge as much as you can…

  4. Great work bro!

Leave a Reply

Your email address will not be published. Required fields are marked *