Header Ads Widget

Pig Latin

Pig Latin :

The Pig Latin is a data flow language used by Apache Pig to analyze the data in Hadoop.

It is a textual language that abstracts the programming from the Java MapReduce idiom into a notation.

The Pig Latin statements are used to process the data.

It is an operator that accepts a relation as an input and generates another relation as an output.

·       It can span multiple lines.
·       Each statement must end with a semi-colon.
·       It may include expression and schemas.
·       By default, these statements are processed using multi-query execution


User-Defined Functions :

Apache Pig provides extensive support for User Defined Functions(UDF’s).

Using these UDF’s, we can define our own functions and use them.

The UDF support is provided in six programming languages:

·       Java

·       Jython

·       Python

·       JavaScript

·       Ruby

·       Groovy

For writing UDF’s, complete support is provided in Java and limited support is provided in all the remaining languages.

Using Java, you can write UDF’s involving all parts of the processing like data load/store, column transformation, and aggregation.

Since Apache Pig has been written in Java, the UDF’s written using Java language work efficiently compared to other languages.

Types of UDF’s in Java :

Filter Functions :

  • The filter functions are used as conditions in filter statements.
  • These functions accept a Pig value as input and return a Boolean value.

Eval Functions :

  • The Eval functions are used in FOREACH-GENERATE statements.
  • These functions accept a Pig value as input and return a Pig result.

Algebraic Functions :

  • The Algebraic functions act on inner bags in a FOREACHGENERATE statement.
  • These functions are used to perform full MapReduce operations on an inner bag.

Post a Comment

0 Comments