HBase Concepts :
HBase is a distributed column-oriented database built on top of the Hadoop file system.
It is an open-source project and is horizontally scalable.
HBase is a data model that is similar to Google’s big table designed to provide quick random access to huge amounts of structured data.
It leverages the fault tolerance provided by the Hadoop File System (HDFS).
It is a part of the Hadoop ecosystem that provides random real-time read/write access to data in the Hadoop File System.
One can store the data in HDFS either directly or through HBase.
Data consumer reads/accesses the data in HDFS randomly using HBase.
HBase sits on top of the Hadoop File System and provides read and write access.
HBase Vs RDBMS :
RDBMS | HBase |
It requires SQL (structured query language) | NO SQL |
It has a fixed schema | No fixed schema |
It is row-oriented | It is column-oriented |
It is not scalable | It is scalable |
It is static in nature | Dynamic in nature |
Slower retrieval of data | Faster retrieval of data |
It follows the ACID (Atomicity, Consistency, Isolation and Durability) property. | It follows CAP (Consistency, Availability, Partition-tolerance) theorem. |
It can handle structured data | It can handle structured, unstructured as well as semi-structured data |
It cannot handle sparse data | It can handle sparse data |
Schema Design :
HBase table can scale to billions of rows and any number of columns based on your requirements.
This table allows you to store terabytes of data in it.
The HBase table supports the high read and writes throughput at low latency.
A single value in each row is indexed; this value is known as the row key.
The HBase schema design is very different compared to the relational database schema design.
Some of the general concepts that should be followed while designing schema in Hbase:
· Row key: Each table in the HBase table is indexed on the row key. There are no secondary indices available on the HBase table.
· Automaticity: Avoid designing a table that requires atomicity across all rows. All operations on HBase rows are atomic at row level.
· Even distribution: Read and write should be uniformly distributed across all nodes available in the cluster. Design row key in such a way that, related entities should be stored in adjacent rows to increase read efficacy.
0 Comments