[HIVE] The Outline

By | Y2015Y2015-4M-D

Apache Hive

(https://cwiki.apache.org/confluence/display/Hive/Home)
Hive defines a simple SQL-like query language, called QL, that enables users familiar with SQL to query the data. At the same time, this language also allows programmers who are familiar with the MapReduce framework to be able to plug in their custom mappers and reducers to perform more sophisticated analysis that may not be supported by the built-in capabilities of the language. QL can also be extended with custom scalar functions (UDF’s), aggregations (UDAF’s), and table functions (UDTF’s).

  • Tools to enable easy data extract/transform/load (ETL)
  • A mechanism to impose structure on a variety of data formats
  • Access to files stored either directly in Apache HDFSTM or in other data storage systems such as Apache HBaseTM
  • Query execution via MapReduce

Components of Hive include HCatalog and WebHCat.

  • HCatalog is a component of Hive. It is a table and storage management layer for Hadoop that enables users with different data processing tools — including Pig and MapReduce — to more easily read and write data on the grid.
  • WebHCat provides a service that you can use to run Hadoop MapReduce (or YARN), Pig, Hive jobs or perform Hive metadata operations using an HTTP (REST style) interface.

Basic Installation
(according to the Getting Started Guide in hive docs)

  1. Setup environment variables in /etc/profile.d/custom.sh
    (then apply with source command)
  2. Download(in case of Korean region) & Decompression in temporary
  3. Copy directory hive directory to /usr/local/
  4. Setup configuration files (using template)
  5. Modify the hive-site.xml – The item ${system:java.io.tmpdir}/${system:user.name} in [hive.exec.local.scratchdir] should be replaced with “/tmp/hive”

Troubleshoots

  1. jline library – Incompatble Class Change Error

    1. Cause: Hive has upgraded to Jline2 but jline 0.94 exists in the Hadoop lib.
    2. Resolution: common issues in cwiki
      1. Delete jline from the Hadoop lib directory (it’s only pulled in transitively from ZooKeeper).
      2. export HADOOP_USER_CLASSPATH_FIRST=true
        (Solution what I applied.)
      3. If this error occurs during mvn test, perform a mvn clean install on the root project and itests directory.

1,800 total views, 1 views today

댓글 남기기