Header Ads Widget

Data Processing Operators

Data Processing Operators :

The Apache Pig Operators is a high-level procedural language for querying large data sets using Hadoop and the Map-Reduce Platform.

A Pig Latin statement is an operator that takes a relation as input and produces another relation as output. 

These operators are the main tools for Pig Latin provides to operate on the data.

They allow you to transform it by sorting, grouping, joining, projecting, and filtering.

The Apache Pig operators can be classified as :

Relational Operators :

Relational operators are the main tools Pig Latin provides to operate on the data.

Some of the Relational Operators are :

LOAD: The LOAD operator is used to loading data from the file system or HDFS storage into a Pig relation.

FOREACH: This operator generates data transformations based on columns of data. It is used to add or remove fields from a relation.

FILTER: This operator selects tuples from a relation based on a condition.

JOIN: JOIN operator is used to performing an inner, equijoin join of two or more relations based on common field values

ORDER BY: Order By is used to sort a relation based on one or more fields in either ascending or descending order using ASC and DESC keywords.

GROUP: The GROUP operator groups together the tuples with the same group key (key field).

COGROUP: COGROUP is the same as the GROUP operator. For readability, programmers usually use GROUP when only one relation is involved and COGROUP when multiple relations are reinvolved.

Diagnostic Operator :

The load statement will simply load the data into the specified relation in Apache Pig.

To verify the execution of the Load statement, you have to use the Diagnostic Operators.

Some Diagnostic Operators are :

DUMPThe DUMP operator is used to run Pig Latin statements and display the results on the screen.

DESCRIBEUse the DESCRIBE operator to review the schema of a particular relation. The DESCRIBE operator is best used for debugging a script.

ILLUSTRATE: ILLUSTRATE operator is used to review how data is transformed through a sequence of Pig Latin statements. ILLUSTRATE command is your best friend when it comes to debugging a script.

EXPLAINThe EXPLAIN operator is used to display the logical, physical, and MapReduce execution plans of a relation.

Post a Comment

0 Comments