Introduction to PIG :
Pig is a high-level platform or tool which is used to process large datasets.
It provides a high level of abstraction for processing over MapReduce.
It provides a high-level scripting language, known as Pig Latin which is used to develop the data analysis codes.
Pig Latin and Pig Engine are the two main components of the Apache Pig tool.
The result of Pig is always stored in the HDFS.
One limitation of MapReduce is that the development cycle is very long. Writing the reducer and mapper, compiling packaging the code, submitting the job and retrieving the output is a time-consuming task.
Apache Pig reduces the time of development using the multi-query approach.
Pig is beneficial for programmers who are not from Java backgrounds.
200 lines of Java code can be written in only 10 lines using the Pig Latin language.
Programmers who have SQL knowledge needed less effort to learn Pig Latin.
Execution Modes of Pig :
Apache Pig scripts can be executed in three ways :
Interactive Mode (Grunt shell) :
You can run Apache Pig in interactive mode using the Grunt shell.
In this shell, you can enter the Pig Latin statements and get the output (using the Dump operator).
Batch Mode (Script) :
You can run Apache Pig in Batch mode by writing the Pig Latin script in a single file with the .pig extension.
Embedded Mode (UDF) :
Apache Pig provides the provision of defining our own functions (User Defined Functions) in programming languages such as Java and using them in our script.
Comparison of Pig with Databases :
PIG | SQL |
Pig Latin is a procedural language | SQL is a declarative language |
In Apache Pig, the schema is optional. We can store data without designing a schema (values are stored as $01, $02 etc.) | Schema is mandatory in SQL. |
The data model in Apache Pig is nested relational. | The data model used in SQL is flat relational. |
Apache Pig provides limited opportunity for Query optimization. | There is more opportunity for query optimization in SQL. |
Grunt :
Grunt shell is a shell command.
The Grunt shell of the Apace pig is mainly used to write pig Latin scripts.
Pig script can be executed with grunt shell which is a native shell provided by Apache pig to execute pig queries.
We can invoke shell commands using sh and fs.
Syntax of sh command :
grunt> sh ls
Syntax of fs command :
0 Comments