Setup Hadoop MapReduce Next Generation Single Node

Apache Hadoop MapReduce Next Generation (hadoop-2.x) consists of significant improvements over the previous stable release (hadoop-1.x). This guide help you setup hadoop mapreduce next generation (yarn) single node, it's required before you setup hadoop mapreduce next generation multi-node

  1. Install Java

    a. Choose Installed Java Version: alternatives --config java

    b. Show Java Location: $JAVA_HOME

    c. If java can't found, please install java with instruction Setup Java

  2. Create yarn user

    useradd yarn
    passwd yarn
    
  3. Logout root user and login yarn user

    su - yarn
    
  4. Gen ssh key

    ssh-keygen -t rsa
    cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
    chmod 0600 $HOME/.ssh/authorized_keys
    
  5. Install Hadoop

    su - root
    cd /opt
    wget http://mirrors.maychuviet.vn/apache/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
    tar -xzf hadoop-2.7.1.tar.gz
    mv hadoop-2.7.1 /opt/yarn
    chown -R yarn /opt/yarn
    
  6. Declare Hadoop Environment Variables

    vi $HOME/.bash_profile
    
    export HADOOP_PREFIX=/opt/yarn
    # export HADOOP_ROOT_LOGGER=DEBUG,console
    # export HADOOP_HOME=/opt/yarn
    # export HADOOP_MAPRED_HOME=/opt/yarn
    # export HADOOP_COMMON_HOME=/opt/yarn
    # export HADOOP_HDFS_HOME=/opt/yarn
    # export YARN_HOME=/opt/yarn
    # export HADOOP_YARN_HOME=/opt/yarn
    # export HADOOP_CONF_DIR=/opt/yarn/etc/hadoop
    # export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
    # export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
    
  7. Reload .bash_profile

    source $HOME/.bash_profile
    
  8. Create HDFS directory

    mkdir -p /opt/yarn_data/hdfs/namenode
    mkdir -p /opt/yarn_data/hdfs/datanode
    
  9. Configure Hadoop

    cd $HADOOP_PREFIX/etc/hadoop
    

    a. core-site.xml

    vi core-site.xml
    
    <!-- add following properties tag into <configuration> tag -->
    <property>
       <name>fs.default.name</name>
       <value>hdfs://localhost:9000</value>
    </property>
    

    b. hdfs-site.xml

    vi hdfs-site.xml
    
    <!-- add following properties tag into <configuration> tag -->
    <property>
      <name>dfs.replication</name>
      <value>1</value>
    </property>
    <property>
      <name>dfs.namenode.name.dir</name>
      <value>file:/opt/yarn_data/hdfs/namenode</value>
    </property>
    <property>
      <name>dfs.datanode.data.dir</name>
      <value>file:/opt/yarn_data/hdfs/datanode</value>
    </property>
    

    c. mapred-site.xml

    vi mapred-site.xml
    
    <!-- If this file does not exist, create it and paste the content provided below -->
    <?xml version="1.0"?>
    <configuration>
      <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
      </property>
    </configuration>
    

    d. yarn-site.xml

    vi yarn-site.xml
    
    <!-- add following properties tag into <configuration> tag -->
    <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
    </property>
    <property>
      <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
      <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    
  10. Start ssh service

    chkconfig sshd on
    service sshd start
    
  11. Format Name Node

    # do not format a running Hadoop filesystem as you will lose all the data currently in the cluster (in HDFS)
    su - yarn
    cd /opt/yarn
    bin/hdfs namenode -format
    
  12. Start HDFS service

    sbin/start-dfs.sh
    
  13. Start Hadoop Map-Reduce service

    sbin/start-yarn.sh
    
  14. Job History Server

    sbin/mr-jobhistory-daemon.sh start historyserver
    
  15. Stop service

    sbin/stop-yarn.sh
    sbin/stop-dfs.sh
    sbin/mr-jobhistory-daemon.sh stop historyserver
    
  16. Hadoop Web Interfaces

    NameNode: http://localhost:50070/

    ResourceManager: http://localhost:8088/

    MapReduce JobHistory Server: http://localhost:19888/

  17. Get list job running

    bin/hadoop job -list
    
  18. Terminate a job by id

    bin/hadoop job -kill [job-id]
    

Comments

Popular posts from this blog

Reduce TIME_WAIT Socket Connections