Setup HBase

HBase is the Hadoop database, a distributed, scalable, big data store. Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware

  1. Install Requirement

    a. Install Hadoop

    b. Setup ZooKeeper

  2. Setup HBase

    wget http://mirror.reverse.net/pub/apache/hbase/1.1.1/hbase-1.1.1-bin.tar.gz
    tar -xzf hbase-1.1.1-bin.tar.gz
    mv hbase-1.1.1 /opt/hbase
    
  3. Setup HBase Standalone

    This is the default mode. In standalone mode, HBase does not use HDFS -- it uses the local filesystem instead -- and it runs all HBase daemons and a local ZooKeeper all up in the same JVM. Zookeeper binds to a well known port so clients may talk to HBase.

    a. Edit hbase-env.sh

    vi /opt/hbase/conf/hbase-env.sh
    
    # uncomment export JAVA_HOME and set JAVA_HOME directory path
    export JAVA_HOME=/opt/jdk1.7.0_79/
    

    b. Edit hbase-site.xml

    vi /opt/hbase/conf/hbase-site.xml
    
    <configuration>
      <property>
        <name>hbase.rootdir</name>
        <value>file:///opt/hbase/hfiles</value>
      </property>
      <property>
        <name>hbase.zookeeper.property.dataDir</name>
        <value>/opt/zookeeper</value>
      </property>
    </configuration>
    
  4. Setup HBase Pseudo-distributed

    Pseudo-distributed mode means that HBase still runs completely on a single host, but each HBase daemon (HMaster, HRegionServer, and Zookeeper) runs as a separate process. A pseudo-distributed mode is simply a fully-distributed mode run on a single host.

    a. Edit hbase-env.sh

    vi /opt/hbase/conf/hbase-env.sh
    
    # uncomment export JAVA_HOME and set JAVA_HOME directory path
    export JAVA_HOME=/opt/jdk1.7.0_79/
    

    b. Edit hbase-site.xml

    vi /opt/hbase/conf/hbase-site.xml
    
    # E.g., HDFS is running on the localhost at port 9000
    <configuration>
      <property>
        <name>hbase.rootdir</name>
        <value>hdfs://localhost:9000/hbase</value>
      </property>
      <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
      </property>
      <property>
        <name>hbase.zookeeper.property.dataDir</name>
        <value>/opt/zookeeper</value>
      </property>
    </configuration>
    
  5. Start and stop HBase

    cd /opt/hbase/bin/
    ./start-hbase.sh
    ./stop-hbase.sh
    
  6. Use HBase Shell

    cd /opt/hbase/bin/
    ./hbase shell
    
    # show status of HBase
    1.8.7-p357 :001 > status
    # check exists table
    1.8.7-p357 :002 > exists 'tbl1'
    # create table
    1.8.7-p357 :003 > create 'tbl1', 'cf1'
    # list table
    1.8.7-p357 :004 > list
    # put data into table
    1.8.7-p357 :005 > put 'tbl1', 'row1', 'cf1:a', 'value1'
    1.8.7-p357 :006 > put 'tbl1', 'row2', 'cf1:b', 'value2'
    1.8.7-p357 :007 > put 'tbl1', 'row3', 'cf1:c', 'value3'
    # scan all table data
    1.8.7-p357 :008 > scan 'tbl1'
    # get single row data
    1.8.7-p357 :009 > get 'tbl1', 'row1'
    # disable table
    1.8.7-p357 :010 > disable 'tbl1'
    # check table is disabled
    1.8.7-p357 :011 > is_disabled 'tbl1'
    # enable table
    1.8.7-p357 :012 > enable 'tbl1'
    # check table is enabled
    1.8.7-p357 :013 > is_enabled 'tbl1'
    # drop table
    1.8.7-p357 :014 > disable 'tbl1'
    1.8.7-p357 :015 > drop 'tbl1'
    # truncate table - disable, drop, and recreate table
    1.8.7-p357 :016 > truncate 'tbl1'
    
  7. Start and stop a backup HBase Master (HMaster) server

    The HMaster server controls the HBase cluster.

    Warning: Running multiple HMaster instances on the same hardware does not make sense in a production environment

    cd /opt/hbase/bin/
    # start
    ./local-master-backup.sh 2 3 5
    # stop (kill a backup master without killing the entire cluster)
    # cat /tmp/hbase-USER-X-master.pid |xargs kill -9
    cat /tmp/hbase-testuser-1-master.pid |xargs kill -9
    
  8. Start and stop additional RegionServers

    The HRegionServer manages the data in its StoreFiles as directed by the HMaster. Generally, one HRegionServer runs per node in the cluster. Running multiple HRegionServers on the same system can be useful for testing in pseudo-distributed mode.

    cd /opt/hbase/bin/
    # ./local-regionservers.sh [--config ] [start|stop] offset(s)
    ./local-regionservers.sh start 2 3 4 5
    ./local-regionservers.sh stop 3
    

Comments

Popular posts from this blog

Reduce TIME_WAIT Socket Connections