Tuesday, April 12, 2022

Home Big Data Big Data Unit-2 DATA FORMAT

DATA FORMAT

Rahul Saini April 12, 2022 ,Big Data ,Big Data Unit-2

DATA FORMAT :

A data/file format defines how information is stored in HDFS.

Hadoop does not have a default file format and the choice of a format depends on its use.

The big problem in the performance of applications that use HDFS is the information search time and the writing time.

Managing the processing and storage of large volumes of information is very complex that’s why a certain data format is required.

The choice of an appropriate file format can produce the following benefits:

Optimum writing time
Optimum reading time
File divisibility
Adaptive scheme and compression support

Some of the most commonly used formats of the Hadoop ecosystem are :

● Text/CSV: A plain text file or CSV is the most common format both outside and within the Hadoop ecosystem.

● SequenceFile: The SequenceFile format stores the data in binary format, this format accepts compression but does not store metadata.

● Avro: Avro is a row-based storage format. This format includes the definition of the scheme of your data in JSON format. Avro allows block compression along with its divisibility, making it a good choice for most cases when using Hadoop.

● Parquet: Parquet is a column-based binary storage format that can store nested data structures. This format is very efficient in terms of disk input/output operations when the necessary columns to be used are specified.

● RCFile (Record Columnar File): RCFile is a columnar format that divides data into groups of rows, and inside it, data is stored in columns.

● ORC (Optimized Row Columnar): ORC is considered an evolution of the RCFile format and has all its benefits alongside some improvements such as better compression, allowing faster queries.

Random Posts

Tuesday, April 12, 2022

DATA FORMAT

No comments:

Post a Comment

Post Top Ad

Author Details

Socialize

Comments

Ad Code

Facebook

Total Pageviews

Search This Blog

Blog Archive

Ad Home

Pages

Random Posts

Recent Posts

Header Ads

Menu Footer Widget

Social Plugin

Subject Labels

Tags

Advertisement

Advertisement

Sponsor

Popular Posts

Recent in Sports

Random Posts

Popular Posts

Popular Posts

Facebook

Categories

Pages

About Me

Popular Posts

Tags

Send Quick Message

SoraTemplates