HBASE

HBase Concepts :

HBase is a distributed column-oriented database built on top of the Hadoop file system.

It is an open-source project and is horizontally scalable.

HBase is a data model that is similar to Google’s big table designed to provide quick random access to huge amounts of structured data.

It leverages the fault tolerance provided by the Hadoop File System (HDFS).

It is a part of the Hadoop ecosystem that provides random real-time read/write access to data in the Hadoop File System.

One can store the data in HDFS either directly or through HBase.

Data consumer reads/accesses the data in HDFS randomly using HBase.

HBase sits on top of the Hadoop File System and provides read and write access.

HBase Vs RDBMS :

RDBMS	HBase
It requires SQL (structured query language)	NO SQL
It has a fixed schema	No fixed schema
It is row-oriented	It is column-oriented
It is not scalable	It is scalable
It is static in nature	Dynamic in nature
Slower retrieval of data	Faster retrieval of data
It follows the ACID (Atomicity, Consistency, Isolation and Durability) property.	It follows CAP (Consistency, Availability, Partition-tolerance) theorem.
It can handle structured data	It can handle structured, unstructured as well as semi-structured data
It cannot handle sparse data	It can handle sparse data

Schema Design :

HBase table can scale to billions of rows and any number of columns based on your requirements.

This table allows you to store terabytes of data in it.

The HBase table supports the high read and writes throughput at low latency.

A single value in each row is indexed; this value is known as the row key.

The HBase schema design is very different compared to the relational database schema design.

Some of the general concepts that should be followed while designing schema in Hbase:

· Row key: Each table in the HBase table is indexed on the row key. There are no secondary indices available on the HBase table.

· Automaticity: Avoid designing a table that requires atomicity across all rows. All operations on HBase rows are atomic at row level.

· Even distribution: Read and write should be uniformly distributed across all nodes available in the cluster. Design row key in such a way that, related entities should be stored in adjacent rows to increase read efficacy.

Header Ads Widget

HBASE

Post a Comment

0 Comments

Total Pageviews

Search This Blog

Subject Labels

Popular Posts

Classification/Types of Operating Systems

Introduction of Operating System

Operating-System Structure

Contact form

Sponsor

Popular Posts

Fixed (or static) Partitioning in Operating System

Variable (or dynamic) Partitioning

Bare Machine and Resident Monitor

Ad Space

Random Posts

Recent in Sports

Popular Posts

Classification/Types of Operating Systems

Introduction of Operating System

Operating-System Structure

Menu Footer Widget