Thursday, July 25, 2013

Installing and Configuring Hadoop in Fully Distributed Mode

For setting up a multi-node cluster, You first need to learn and understand setting up a pseudo distributed cluster. You can go thru my earlier blog by clicking in here to learn this process. 

Now that You know how to set up a pseudo distributed Hadoop cluster, You can do the below steps to build your multimode cluster. I will mention the process of setting up a two node cluster here with a "master" and a "slave" machine. The master is where the name node and job tracker runs and is the single point of interaction and failure in the cluster. The slaves run data node and task tracker and act as per the direction of the hdfs and map reduce master. You can use this process to scale up to as many nodes as You want.

1. Pick Your 2nd computer which You want to make as "the slave" in the cluster. We will call it "slave". Find out its ip address issuing the ifconfig command in the terminal. If you are in a wifi LAN to set up your cluster and if Your ip address is set to DHCP and changes frequently everytime You login, then You may consider setting up a static ip address as done for the master node in my previous tutorial. Click here to go to the blog to set up static ip address.

2. Do steps 1, 2, 3, 4, 5, 6 of pseudo-distributed cluster set-up process blog on the slave machine. The hostname on this machine should be set as "slave".

3. Define the slave machine in the /etc/hosts file of the master and vice versa.

4. scp the slave machine's id_rsa.pub key generated in step 5, to the master machine.
On Slave: $ scp -r .ssh/id_rsa.pub hduser@master:/home/hduser/
It will ask you for the password of master machine's hduser account at this point.
Now Concatenate this id_rsa.pub of slave to the authorized_keys file of master.
On Master: cat $HOME/id_rsa.pub >> $HOME/.ssh/authorized_keys
On Master: $ rm -rf $HOME/id_rsa.pub
Now SCP the authorized_keys file from master to slave.
On Master:  $ scp -r .ssh/authorized_keys hduser@slave:/home/hduser/.ssh/
It will ask you for a password of the slave hduser account at this point.
After this step, the passwordless ssh communication is set up between your master and slave machine using hduser account. You can test it out by using the following commands
On slave(logging into hduser account): $ ssh master 
On master(logging onto hduser account): $ssh slave
SSH login to the other system won't need a password now and the two systems can talk to each other.

5. Now on the master machine where you had set up your pseudo cluster, make the below changes:
  • Add "slave" to the hadoop/conf/slaves file. Let master continue to be listed there, so that a data node and a task tracker runs on the master machine too. The slave nodes have to mentioned in the slaves file, one slave per line.
  • Change the dfs.replication property in hdfs-site.xml file from 1 to 2. So 2 copies of each block is going to be stored in the cluster redundantly.

6. Now scp the entire /home/hduser/bigdata/hadoop/ directory from master to slave computer. 
On Master: $ scp -r bigdata/hadoop/* hduser@slave:/home/hduser/bigdata/hadoop/ 
Note that the absolute path of the hadoop home should remain same in both machines. Also the below directory structure should be present in the slave as in the master:

- bigdata (our main bigdata projects related directory)
      - hadoop (Hadoop App Folder. This one is scp 'ed from the master.)
      - hadoopdata (data directory)
         - name (dfs.name.dir points to this directory)
         - data (dfs.data.dir points to this directory)
         - tmp  (hadoop.tmp.dir points to this directory)

7. Clear the masters and slaves file in the slave machine.

8. Now clear the $HOME/bigdata/hadoopdata/data directory in both the machines.
$ rm -rf $HOME/bigdata/hadoop/data 

9. Format the namenode on master machine:
On Master: $ hadoop namenode -format

Now Your cluster is ready to hadoop to run in fully distributed mode. You can run the start-all.sh in your master machine. It will start the name node, data node, job tracker, secondary name node, task tracker daemons in the master and start a data node and a task tracker in the slave machine. You can check the url master:50070 for name node administration and master:50030 for map reduce administration. After You are done with your hadoop work, dont forget to issue the stop-all.sh on the master node to stop the daemons running on the cluster.

35 comments:

  1. Hi,i hope to your information really understand.

    Refer The Link Below:
    Besant Technologies
    &
    Seleniumtraininginchennai

    ReplyDelete
  2. Now a days cloud based technologies are getting popular like wild fire. So as the training programs related to these technologies. Thanks for providing an useful information.

    Hadoop Training Chennai
    Salesforce Training in Chennai

    ReplyDelete

  3. if share valuable information about cloud computing training courses, certification, online resources, and private training for Developers, Administrators, and Data Analysts may visit
    Cloud-Computing-course-content.html

    ReplyDelete

  4. I have read your blog, it was good to read & I am getting some useful info's through your blog keep sharing... Informatica is an ETL tools helps to transform your old business leads into new vision. Learn Informatica training in chennai from corporate professionals with very good experience in informatica tool.
    Regards,
    Informatica training center in Chennai|Informatica training chennai

    ReplyDelete
  5. Thanks for splitting your comprehension with us. It’s really useful to me & I hope it helps the people who in need of this vital information.
    Regards,
    Best Informatica Training In Chennai|Informatica training chennai|sas training in Chennai

    ReplyDelete
  6. There are lots of information about latest technology and how to get trained in them, like Hadoop Training Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Hadoop Training in Chennai). By the way you are running a great blog. Thanks for sharing this.

    ReplyDelete
  7. Pretty Post! It is really interesting to read from the beginning & I would like to share your blog to my circles for getting awesome knowledge, keep your blog as updated.
    Regards,
    sas training in Chennai|sas training chennai|sas institutes in Chennai

    ReplyDelete
  8. installing and configuring nice posts..

    Hadoop online training .All the basic and get the full knowledge of hadoop.
    hadoop online training

    ReplyDelete
  9. I believe there are many more pleasurable opportunities ahead for individuals that looked at your site.


    oracle training in bangalore

    ReplyDelete
  10. Existing without the answers to the difficulties you’ve sorted out through this guide is a critical case, as well as the kind which could have badly affected my entire career if I had not discovered your website.
    amazon-web-services-training-in-bangalore

    ReplyDelete
  11. Very nice post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.
    Hadoop Training in Chennai

    Hadoop Training in Bangalore

    Big data training in tambaram

    Big data training in Sholinganallur

    Big data training in annanagar

    Big data training in Velachery

    ReplyDelete
  12. Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.
    Devops Training in pune

    Devops Training in Chennai

    Devops Training in Bangalore

    AWS Training in chennai

    AWS Training in bangalore





    ReplyDelete
  13. I always enjoy reading quality articles by an individual who is obviously knowledgeable on their chosen subject. Ill be watching this post with much interest. Keep up the great work, I will be back
    python training in rajajinagar
    Python training in btm

    ReplyDelete
  14. I read this post two times, I like it so much, please try to keep posting & Let me introduce other material that may be good for our community.
    java training in annanagar | java training in chennai


    java training in marathahalli | java training in btm layout

    ReplyDelete
  15. I’m experiencing some small security issues with my latest blog, and I’d like to find something safer. Do you have any suggestions?
    safety course institute in chennai

    ReplyDelete
  16. Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.

    angularjs Training in chennai
    angularjs Training in chennai

    angularjs-Training in tambaram

    angularjs-Training in sholinganallur

    angularjs-Training in velachery

    ReplyDelete
  17. I really love the theme/design of your website. Do you ever run into any browser compatibility problems? A small number of my blog audience have complained about my site not working correctly in Explorer but looks great in Safari. Do you have any ideas to help fix this problem?
    safety course in chennai

    ReplyDelete
  18. Wow - that looks amazing!! Love love love it. You are killing that to do list.
    JAVA Training in Chennai |
    JAVA Course in Chennai |
    Best JAVA Training in Chennai

    ReplyDelete
  19. Hello! This is my first visit to your blog! We are a team of volunteers and starting a new initiative in a community in the same niche. Your blog provided us useful information to work on. You have done an outstanding job.



    AWS Training in Bangalore | Amazon Web Services Training in Bangalore

    AWS Interview Questions And Answers

    Learn Amazon Web Services Tutorial |AWS Tutorials For Beginners

    Amazon Web Services Training in OMR , Chennai | Best AWS Training in OMR,Chennai

    ReplyDelete
  20. I was recommended this web site by means of my cousin.
    I am now not certain whether this post is written through him as nobody else recognise such precise about my difficulty. You're amazing! Thank you!

    selenium training in Chennai
    selenium training in Tambaram
    selenium training in Velachery
    selenium training in Omr
    selenium training in Annanagar

    ReplyDelete
  21. I always enjoy reading quality articles by an individual who is obviously knowledgeable on their chosen subject. Ill be watching this post with much interest. Keep up the great work, I will be back
    python training in chennai
    Python Online training in usa
    python course institute in chennai

    ReplyDelete


  22. AWS Training in Chennai AWS Training in Chennai in weekends.Learn AWS in just 5 weekends from BITA-Best Training Institute in Chennai.

    ReplyDelete
  23. Good to know about the email list business. I was looking for such a service for a long time o grow my local business but the rates that other companies were offering were not satisfactory. Thanks for sharing the recommendations in this post.hadoop training in bangalore

    ReplyDelete
  24. I am looking for and I love to post a comment that "The content of your post is awesome" Great work!

    Simple Linear Regression

    Correlation vs covariance

    KNN Algorithm

    Logistic Regression explained

    ReplyDelete
  25. Sharing the same interest, Infycle feels so happy to share our detailed information about all these courses with you all! Do check them out
    oracle training in chennai & get to know everything you want to about software trainings.

    ReplyDelete
  26. Searching for the Oracle Training in Chennai? Then come to Infycle for the best software training in Chennai. Infycle Technologies is one of the best Oracle training institute in Chennai, which offers various programs in Oracle such as Oracle PLSQL, Oracle DBA, etc., in complete hands-on practical training from professionals in the field. Along with that, the interviews will be arranged for the candidates and 200% placement assurance will be given here. To have the words above in your life, call 7502633633 to Infycle Technologies and grab a free demo to know more.Best Oracle Training Institute in Chennai

    ReplyDelete
  27. Thanks for such a great post and the review, I am totally impressed! Keep stuff like this coming.
    data analytics courses in hyderabad with placements

    ReplyDelete
  28. This comment has been removed by the author.

    ReplyDelete
  29. Eco-Cottage Restaurant a superb resort in an island offers you a variety of mouthwatering snacks as well as a corporate lunch / dinner ‘Buffet’ in ‘INDIAN’, ‘CHINESE’, TANDOOR etc.
    Thank You!

    ReplyDelete

Popular

Featured

Three Months of Chadhei