Config Hive

Configuration management overview

  1. Runtime configuration

    • Hive by default gets its configuration from <install-dir>/conf/hive-default.xml
    • The location of the Hive configuration directory can be changed by setting the HIVE_CONF_DIR environment variable.
    • Configuration variables can be changed by (re-)defining them in <install-dir>/conf/hive-site.xml
    • Log4j configuration is stored in <install-dir>/conf/hive-log4j.properties
    • Hive configuration is an overlay on top of hadoop - it inherits the hadoop configuration variables by default.
    • Hive configuration can be manipulated by:
      • Editing hive-site.xml and defining any desired variables (including hadoop variables) in it
      • From the cli using the set command (see below)
      • Invoking hive using the syntax:
        bin/hive -hiveconf x1=y1 -hiveconf x2=y2
        

        this sets the variables x1 and x2 to y1 and y2 respectively

      • Setting the HIVE_OPTS environment variable to "-hiveconf x1=y1 -hiveconf x2=y2" which does the same as above.
  2. Configuration Hive or Hadoop Properties

    • Hive queries are executed using map-reduce queries and, therefore, the behavior of such queries can be controlled by the hadoop configuration variables.
    • The cli command 'SET' can be used to set any hadoop (or hive) configuration variable. For example:
      hive> SET mapred.job.tracker=myhost.mycompany.com:50030;
      hive> SET -v;
      

      The latter shows all the current settings. Without the -v option only the variables that differ from the base hadoop configuration are displayed.

  3. Error Logs

    Hive uses log4j for logging. By default logs are not emitted to the console by the CLI. The default logging level is WARNand the logs are stored in the folder: /tmp/<user.name>/hive.log

    Note: In local mode, the log file name is ".log" instead of "hive.log".If the user wishes - the logs can be emitted to the console by adding the arguments shown below:

    bin/hive -hiveconf hive.root.logger=INFO,console
    

    Alternatively, the user can change the logging level only by using:

    bin/hive -hiveconf hive.root.logger=INFO,DRFA
    

    Note that setting hive.root.logger via the 'set' command does not change logging properties since they are determined at initialization time.

    Hive also stores query logs on a per hive session basis in /tmp/<user.name>/, but can be configured in hive-site.xml with the hive.querylog.location property.

    Logging during Hive execution on a Hadoop cluster is controlled by Hadoop configuration. Usually Hadoop will produce one log file per map and reduce task stored on the cluster machine(s) where the task was executed. The log files can be obtained by clicking through to the Task Details page from the Hadoop JobTracker Web UI.

    When using local mode (using mapred.job.tracker=local), Hadoop/Hive execution logs are produced on the client machine itself. Starting v-0.6 - Hive uses the hive-exec-log4j.properties (falling back to hive-log4j.properties only if it's missing) to determine where these logs are delivered by default. The default configuration file produces one log file per query executed in local mode and stores it under /tmp/<user.name>/. The intent of providing a separate configuration file is to enable administrators to centralize execution log capture if desired (on a NFS file server for example). Execution logs are invaluable for debugging run-time errors.

    Error logs are very useful to debug problems. Please send them with any bugs (of which there are many!) to hive-dev@hadoop.apache.org.

Comments

Popular posts from this blog

Reduce TIME_WAIT Socket Connections