Hive Services :
The following are the services provided by Hive :
- Hive CLI: The Hive CLI (Command Line Interface) is a shell where we can execute Hive queries and commands.
- Hive Web User Interface: The Hive Web UI is just an alternative of Hive CLI. It provides a web-based GUI for executing Hive queries and commands.
- Hive metastore: It is a central repository that stores all the structure information of various tables and partitions in the warehouse. It also includes metadata of column and its type information, the serializers and deserializers which is used to read and write data and the corresponding HDFS files where the data is stored.
- Hive Server: It is referred to as Apache Thrift Server. It accepts the request from different clients and provides it to Hive Driver.
- Hive Driver: It receives queries from different sources like web UI, CLI, Thrift, and JDBC/ODBC driver. It transfers the queries to the compiler.
- Hive Compiler: The purpose of the compiler is to parse the query and perform semantic analysis on the different query blocks and expressions. It converts HiveQL statements into MapReduce jobs.
- Hive Execution Engine: Optimizer generates the logical plan in the form of DAG of map-reduce tasks and HDFS tasks. In the end, the execution engine executes the incoming tasks in the order of their dependencies.
MetaStore :
Hive metastore (HMS) is a service that stores Apache Hive and other metadata in a backend RDBMS, such as MySQL or PostgreSQL.
Impala, Spark, Hive, and other services share the metastore.
The connections to and from HMS include HiveServer, Ranger, and the NameNode, which represents HDFS.
Beeline, Hue, JDBC, and Impala shell clients make requests through thrift or JDBC to HiveServer.
The HiveServer instance reads/writes data to HMS.
By default, redundant HMS operate in active/active mode.
The physical data resides in a backend RDBMS, one for HMS.
All connections are routed to a single RDBMS service at any given time.
HMS talks to the NameNode over thrift and functions as a client to HDFS.
HMS connects directly to Ranger and the NameNode (HDFS), and so does HiveServer.
One or more HMS instances on the backend can talk to other services, such as Ranger.
Comparison with Traditional Database :
RDBMS | HIVE |
It is used to maintain the database. | It is used to maintain a data warehouse. |
It uses SQL (Structured Query Language). | It uses HQL (Hive Query Language). |
Schema is fixed in RDBMS | Schema varies in it. |
Normalized data is stored. | Normalized and de-normalized both type of data is stored. |
Tables in rdms are sparse. | The table in hive is dense. |
It doesn’t support partitioning. | It supports automation partition. |
No partition method is used | The sharding method is used for partition |
0 Comments