Thursday, May 12, 2022

Home Big Data Big Data Unit-4 Resilient Distributed Databases

Resilient Distributed Databases

Rahul Saini May 12, 2022 ,Big Data ,Big Data Unit-4

Resilient Distributed Databases

Resilient Distributed Datasets (RDD) is a fundamental data structure of Spark. It is an immutable distributed collection of objects. Each dataset in RDD is divided into logical partitions, which may be computed on different nodes of the cluster. RDDs can contain any type of Python, Java, or Scala objects, including user-defined classes.

Formally, an RDD is a read-only, partitioned collection of records. RDDs can be created through deterministic operations on either data on stable storage or other RDDs. RDD is a fault-tolerant collection of elements that can be operated on in parallel.

There are two ways to create RDDs − parallelizing an existing collection in your driver program or referencing a dataset in an external storage system, such as a shared file system, HDFS, HBase, or any data source offering a Hadoop Input Format.

Spark makes use of the concept of RDD to achieve faster and efficient MapReduce operations. Let us first discuss how MapReduce operations take place and why they are not so efficient.

Random Posts

Thursday, May 12, 2022

Resilient Distributed Databases

No comments:

Post a Comment

Post Top Ad

Author Details

Socialize

Comments

Ad Code

Facebook

Total Pageviews

Search This Blog

Blog Archive

Ad Home

Pages

Random Posts

Recent Posts

Header Ads

Menu Footer Widget

Social Plugin

Subject Labels

Tags

Advertisement

Advertisement

Sponsor

Popular Posts

Recent in Sports

Random Posts

Popular Posts

Popular Posts

Facebook

Categories

Pages

About Me

Popular Posts

Tags

Send Quick Message

SoraTemplates