Setup Hadoop MapReduce Next Generation Single Node

Apache Hadoop MapReduce Next Generation (hadoop-2.x) consists of significant improvements over the previous stable release (hadoop-1.x). This guide help you setup hadoop mapreduce next generation (yarn) single node, it's required before you setup hadoop mapreduce next generation multi-node

  1. Install Java

    a. Choose Installed Java Version: alternatives --config java

    b. Show Java Location: $JAVA_HOME

    c. If java can't found, please install java with instruction Setup Java

  2. Create yarn user

    useradd yarn
    passwd yarn
  3. Logout root user and login yarn user

    su - yarn
  4. Gen ssh key

    ssh-keygen -t rsa
    cat $HOME/.ssh/ >> $HOME/.ssh/authorized_keys
    chmod 0600 $HOME/.ssh/authorized_keys
  5. Install Hadoop

    su - root
    cd /opt
    tar -xzf hadoop-2.7.1.tar.gz
    mv hadoop-2.7.1 /opt/yarn
    chown -R yarn /opt/yarn
  6. Declare Hadoop Environment Variables

    vi $HOME/.bash_profile
    export HADOOP_PREFIX=/opt/yarn
    # export HADOOP_ROOT_LOGGER=DEBUG,console
    # export HADOOP_HOME=/opt/yarn
    # export HADOOP_MAPRED_HOME=/opt/yarn
    # export HADOOP_COMMON_HOME=/opt/yarn
    # export HADOOP_HDFS_HOME=/opt/yarn
    # export YARN_HOME=/opt/yarn
    # export HADOOP_YARN_HOME=/opt/yarn
    # export HADOOP_CONF_DIR=/opt/yarn/etc/hadoop
    # export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
  7. Reload .bash_profile

    source $HOME/.bash_profile
  8. Create HDFS directory

    mkdir -p /opt/yarn_data/hdfs/namenode
    mkdir -p /opt/yarn_data/hdfs/datanode
  9. Configure Hadoop

    cd $HADOOP_PREFIX/etc/hadoop

    a. core-site.xml

    vi core-site.xml
    <!-- add following properties tag into <configuration> tag -->

    b. hdfs-site.xml

    vi hdfs-site.xml
    <!-- add following properties tag into <configuration> tag -->

    c. mapred-site.xml

    vi mapred-site.xml
    <!-- If this file does not exist, create it and paste the content provided below -->
    <?xml version="1.0"?>

    d. yarn-site.xml

    vi yarn-site.xml
    <!-- add following properties tag into <configuration> tag -->
  10. Start ssh service

    chkconfig sshd on
    service sshd start
  11. Format Name Node

    # do not format a running Hadoop filesystem as you will lose all the data currently in the cluster (in HDFS)
    su - yarn
    cd /opt/yarn
    bin/hdfs namenode -format
  12. Start HDFS service

  13. Start Hadoop Map-Reduce service

  14. Job History Server

    sbin/ start historyserver
  15. Stop service

    sbin/ stop historyserver
  16. Hadoop Web Interfaces

    NameNode: http://localhost:50070/

    ResourceManager: http://localhost:8088/

    MapReduce JobHistory Server: http://localhost:19888/

  17. Get list job running

    bin/hadoop job -list
  18. Terminate a job by id

    bin/hadoop job -kill [job-id]


