Hive
Apache Hive Architecture :
The above figure shows the architecture of Apache Hive and its major components.
The major components of Apache Hive are :
1. Hive Client
2. Hive Services
3. Processing and Resource Management
4. Distributed Storage
HIVE CLIENT :
Hive supports applications written in any language like Python, Java, C++, Ruby, etc using JDBC, ODBC, and Thrift drivers, for performing queries on the Hive.
Hence, one can easily write a hive client application in any language of its own choice.
Hive clients are categorized into three types :
1. Thrift Clients : The Hive server is based on Apache Thrift so that it can serve the request from a thrift client.
2. JDBC client : Hive allows for the Java applications to connect to it using the JDBC driver. JDBC driver uses Thrift to communicate with the Hive Server.
3. ODBC client : Hive ODBC driver allows applications based on the ODBC protocol to connect to Hive. Similar to the JDBC driver, the ODBC driver uses Thrift to communicate with the Hive Server.
HIVE SERVICE :
To perform all queries, Hive provides various services like the Hive server2, Beeline, etc.
The various services offered by Hive are :
1. Beeline
2. Hive Server 2
3. Hive Driver
4. Hive Compiler
5. Optimizer
6. Execution Engine
7. Metastore
8. HCatalog
9. WebHCat
PROCESSING AND RESOURCE MANAGEMENT :
Hive internally uses a MapReduce framework as a defacto engine for executing the queries.
MapReduce is a software framework for writing those applications that process a massive amount of data in parallel on the large clusters of commodity hardware.
MapReduce job works by splitting data into chunks, which are processed by map-reduce tasks.
DISTRIBUTED STORAGE :
Hive is built on top of Hadoop, so it uses the underlying Hadoop Distributed File System for the distributed storage.
0 Comments