Setup Hadoop Single Node

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. This guide help you setup hadoop single node, it's required before you can setup hadoop multi-node or setup hive

  1. Install Java

    a. Choose Installed Java Version: alternatives --config java

    b. Show Java Location: $JAVA_HOME

    c. If java can't found, please install java with instruction Setup Java

  2. Create hadoop user

    useradd hadoop
    passwd hadoop
    
  3. Logout root user and login hadoop user

    su - hadoop
    
  4. Create ssh key

    ssh-keygen -t rsa
    cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
    chmod 0600 $HOME/.ssh/authorized_keys
    
  5. Install Hadoop 1

    su - root
    wget http://mirrors.digipower.vn/apache/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz
    tar -xzf hadoop-1.2.1.tar.gz
    mv hadoop-1.2.1 /opt/hadoop
    chown -R hadoop /opt/hadoop
    
  6. Configure Hadoop 1

    conf/core-site.xml

    # add following properties tag into  tag
    <property>
      <name>fs.default.name</name>
      <value>hdfs://localhost:9000</value>
    </property>
    <property>
      <name>dfs.permissions</name>
      <value>false</value>
    </property>
    

    conf/hdfs-site.xml

    # add following properties tag into  tag  
    <property>
      <name>dfs.data.dir</name>
      <value>/opt/hadoop/dfs/name/data</value>
      <final>true</final>
    </property>
    <property>
      <name>dfs.name.dir</name>
      <value>/opt/hadoop/dfs/name</value>
      <final>true</final>
    </property>
    <property>
      <name>dfs.replication</name>
      <value>1</value>
    </property>
    

    conf/mapred-site.xml

    # add following properties tag into  tag  
    <property>
      <name>mapred.job.tracker</name>
      <value>localhost:9001</value>
    </property>
    

    conf/hadoop-env.sh

    export JAVA_HOME=${JAVA_HOME}
    # disable IPv6
    export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
    

  7. Start ssh service

    chkconfig sshd on
    service sshd start
    
  8. Format name node

    # do not format a running Hadoop 1 filesystem as you will lose all the data currently in the cluster (in HDFS)
    su - hadoop
    cd /opt/hadoop
    bin/hadoop namenode -format
    
  9. Start Hadoop 1

    bin/start-all.sh
    
  10. Test Hadoop 1 service

    jps
    $JAVA_HOME/bin/jps
    
  11. Access Hadoop 1 service

    Jobtracker: http://localhost:50030/

    Namenode: http://localhost:50070/

    Tasktracker: http://localhost:50060/

  12. Get list job running

    bin/hadoop job -list
    
  13. Terminate a job by id

    bin/hadoop job -kill [job-id]
    
  14. Stop Hadoop 1

    bin/stop-all.sh
    
  15. Environment Variables for Hadoop 1

    vi $HOME/.bash_profile
    
    export HADOOP_PREFIX=/opt/hadoop
    export PATH=$PATH:$HADOOP_PREFIX/bin
    
  16. Reload .bash_profile

    source $HOME/.bash_profile
    

Comments

Popular posts from this blog

Reduce TIME_WAIT Socket Connections