Header Ads Widget

HBASE

HBase Concepts :

HBase is a distributed column-oriented database built on top of the Hadoop file system.

It is an open-source project and is horizontally scalable.

HBase is a data model that is similar to Google’s big table designed to provide quick random access to huge amounts of structured data.

It leverages the fault tolerance provided by the Hadoop File System (HDFS).

It is a part of the Hadoop ecosystem that provides random real-time read/write access to data in the Hadoop File System.

One can store the data in HDFS either directly or through HBase.

Data consumer reads/accesses the data in HDFS randomly using HBase.

HBase sits on top of the Hadoop File System and provides read and write access.

 

HBase Vs RDBMS :

RDBMSHBase
It requires SQL (structured query language)NO SQL
It has a fixed schemaNo fixed schema
It is row-orientedIt is column-oriented
It is not scalableIt is scalable
It is static in natureDynamic in nature
Slower retrieval of dataFaster retrieval of data
It follows the ACID (Atomicity, Consistency, Isolation and Durability) property.It follows CAP (Consistency, Availability, Partition-tolerance) theorem.
It can handle structured dataIt can handle structured, unstructured as well as semi-structured data
It cannot handle sparse dataIt can handle sparse data

Schema Design :

HBase table can scale to billions of rows and any number of columns based on your requirements.

This table allows you to store terabytes of data in it.

The HBase table supports the high read and writes throughput at low latency.

A single value in each row is indexed; this value is known as the row key.

The HBase schema design is very different compared to the relational database schema design.

Some of the general concepts that should be followed while designing schema in Hbase:

·       Row key: Each table in the HBase table is indexed on the row key. There are no secondary indices available on the HBase table.

·       Automaticity: Avoid designing a table that requires atomicity across all rows. All operations on HBase rows are atomic at row level.

·       Even distribution: Read and write should be uniformly distributed across all nodes available in the cluster. Design row key in such a way that, related entities should be stored in adjacent rows to increase read efficacy.

Post a Comment

0 Comments