Header Ads Widget

Phases of Compiler

What are the Phases of Compiler Design?

Compiler operates in various phases each phase transforms the source program from one representation to another. Every phase takes inputs from its previous stage and feeds its output to the next phase of the compiler.
There are 6 phases in a compiler. Each of this phase help in converting the high-level langue the machine code. The phases of a compiler are:

  1. Lexical analysis
  2. Syntax analysis
  3. Semantic analysis
  4. Intermediate code generator
  5. Code optimizer
  6. Code generator


Phase 1: Lexical Analysis

Lexical Analysis is the first phase when compiler scans the source code. This process can be left to right, character by character, and group these characters into tokens.
Here, the character stream from the source program is grouped in meaningful sequences by identifying the tokens. It makes the entry of the corresponding tickets into the symbol table and passes that token to next phase.
The primary functions of this phase are:

  • Identify the lexical units in a source code
  • Classify lexical units into classes like constants, reserved words, and enter them in different tables. It will Ignore comments in the source program
  • Identify token which is not a part of the language

Example:
x = y + 10

Tokens

Xidentifier
=Assignment operator
Yidentifier
+Addition operator
10Number

Phase 2: Syntax Analysis

Syntax analysis is all about discovering structure in code. It determines whether or not a text follows the expected format. The main aim of this phase is to make sure that the source code was written by the programmer is correct or not.
Syntax analysis is based on the rules based on the specific programing language by constructing the parse tree with the help of tokens. It also determines the structure of source language and grammar or syntax of the language.
Here, is a list of tasks performed in this phase:

  • Obtain tokens from the lexical analyzer
  • Checks if the expression is syntactically correct or not
  • Report all syntax errors
  • Construct a hierarchical structure which is known as a parse tree

Example

Any identifier/number is an expression
If x is an identifier and y+10 is an expression, then x= y+10 is a statement.
Consider parse tree for the following example

(a+b)*c

Example of Syntax Analysis

In Parse Tree

  • Interior node: record with an operator filed and two files for children
  • Leaf: records with 2/more fields; one for token and other information about the token
  • Ensure that the components of the program fit together meaningfully
  • Gathers type information and checks for type compatibility
  • Checks operands are permitted by the source language

Phase 3: Semantic Analysis

Semantic analysis checks the semantic consistency of the code. It uses the syntax tree of the previous phase along with the symbol table to verify that the given source code is semantically consistent. It also checks whether the code is conveying an appropriate meaning.
Semantic Analyzer will check for Type mismatches, incompatible operands, a function called with improper arguments, an undeclared variable, etc.
Functions of Semantic analyses phase are:

  • Helps you to store type information gathered and save it in symbol table or syntax tree
  • Allows you to perform type checking
  • In the case of type mismatch, where there are no exact type correction rules which satisfy the desired operation a semantic error is shown
  • Collects type information and checks for type compatibility
  • Checks if the source language permits the operands or not

Example

float x = 20.2;
float y = x*30;

In the above code, the semantic analyzer will typecast the integer 30 to float 30.0 before multiplication

Phase 4: Intermediate Code Generation

Once the semantic analysis phase is over the compiler, generates intermediate code for the target machine. It represents a program for some abstract machine.
Intermediate code is between the high-level and machine level language. This intermediate code needs to be generated in such a manner that makes it easy to translate it into the target machine code.
Functions on Intermediate Code generation:

  • It should be generated from the semantic representation of the source program
  • Holds the values computed during the process of translation
  • Helps you to translate the intermediate code into target language
  • Allows you to maintain precedence ordering of the source language
  • It holds the correct number of operands of the instruction

Example

For example,

total = count + rate * 5

Intermediate code with the help of address code method is:

t1 := int_to_float(5)
t2 := rate * t1
t3 := count + t2
total := t3

Phase 5: Code Optimization

The next phase of is code optimization or Intermediate code. This phase removes unnecessary code line and arranges the sequence of statements to speed up the execution of the program without wasting resources. The main goal of this phase is to improve on the intermediate code to generate a code that runs faster and occupies less space.
The primary functions of this phase are:

  • It helps you to establish a trade-off between execution and compilation speed
  • Improves the running time of the target program
  • Generates streamlined code still in intermediate representation
  • Removing unreachable code and getting rid of unused variables
  • Removing statements which are not altered from the loop

Example:
Consider the following code

a = intofloat(10)
b = c * a
d = e + b
f = d

Can become

b =c * 10.0
f = e+b

Phase 6: Code Generation

Code generation is the last and final phase of a compiler. It gets inputs from code optimization phases and produces the page code or object code as a result. The objective of this phase is to allocate storage and generate relocatable machine code.
It also allocates memory locations for the variable. The instructions in the intermediate code are converted into machine instructions. This phase coverts the optimize or intermediate code into the target language.
The target language is the machine code. Therefore, all the memory locations and registers are also selected and allotted during this phase. The code generated by this phase is executed to take inputs and generate expected outputs.

Example

a = b + 60.0
Would be possibly translated to registers.

MOVF a, R1
MULF #60.0, R2
ADDF R1, R2

Symbol Table Management

A symbol table contains a record for each identifier with fields for the attributes of the identifier. This component makes it easier for the compiler to search the identifier record and retrieve it quickly. The symbol table also helps you for the scope management. The symbol table and error handler interact with all the phases and symbol table update correspondingly.

Error Handling Routine

In the compiler design process error may occur in all the below-given phases:

  • Lexical analyzer: Wrongly spelled tokens
  • Syntax analyzer: Missing parenthesis
  • Intermediate code generator: Mismatched operands for an operator
  • Code Optimizer: When the statement is not reachable
  • Code Generator: When the memory is full or proper registers are not allocated
  • Symbol tables: Error of multiple declared identifiers

Most common errors are invalid character sequence in scanning, invalid token sequences in type, scope error, and parsing in semantic analysis.
The error may be encountered in any of the above phases. After finding errors, the phase needs to deal with the errors to continue with the compilation process. These errors need to be reported to the error handler which handles the error to perform the compilation process. Generally, the errors are reported in the form of message.

Example:  Position := initial + rate*60


Viva Questions 

1) What is a compiler?

A computer program called a compiler converts source code written in a high-level language into a low-level machine language.

2) What is compiler design?

Compiler design involves developing software that can read and interpret source code written in a human language and produce binary code that can be read and understood by a computer. A compiler is a tool responsible for this transformation; it reads the source code, checks it for mistakes, and outputs the program in machine language. The generated binary code can be directly used on a computer without extra processing.

3) List various types of compilers.

There are three types of compilers are described below:

  • Single-Pass Compilers
  • Two-Pass Compilers
  • Multipass Compilers

4) What is an assembler?

When run on a computer, programs written in assembly language are converted into machine language using a piece of software known as an assembler.

5) What is a Symbol Table?

A symbol table is a database in which each identifier is represented by a record that includes fields for the identifier's attributes. Because of the database's organization, we can easily store or get information from the correct record based on identification. A lexical analyzer will add an identifier to the symbol table whenever it finds one. A lexical analyzer cannot deduce an identifier's properties.

6) What tools are used for writing compilers in Python?

The PLY toolkit is now the most widely used tool for creating compilers in Python. PLY is an implementation in Python of the well-known C tools Lex and Yacc, which are used for writing compilers in C.

7) What Is Code Motion?

Code motion is an optimization approach whereby a loop's total number of lines of code is reduced. Any expression that finishes with the same value after being run through the loop can benefit from this change. You can find this kind of statement right before the loop.

8) Explain what YACC is?

The YACC is a construction tool for the Unix compiler. It is put to use in the process of generating a parser, a piece of software that determines whether or not the source code for a program is valid by the syntactic rules of the language. In most cases, YACC is used in conjunction with the lexical analyzer tool, which produces a lexer.

9) Differentiate Tokens, Patterns, and Lexeme.

  • Tokens: Tokens are character sequences that have significance when taken together as a whole.
  • Patterns: Patterns are recurring occurrences of the exact string in the input that result in the generation of the same token in the output. The rule referred to as a pattern is associated with the token and is used to characterize this group of strings.
  • Lexeme: It is a string of characters in the source code used to determine whether or not a token should be granted access. The fundamental units of any language are called tokens.

10) What Are The Benefits Of Intermediate Code Generation?

  • Making a compiler for several machines is as simple as connecting a new back end to the front end of each device.
  • You can make a compiler for multiple languages by connecting their respective front ends to the same back end.
  • The code generation process can be optimized by applying a machine-independent code optimizer to intermediate code.

11) What are the six phases of a compiler?

The 6 phases of a compiler are:

  • Syntactic Analysis or Parsing.
  • Intermediate Code Generation.
  • Lexical Analysis.
  • Code Optimization.
  • Code Generation.
  • Semantic Analysis.

12) What are the two types of compiler design?

The two types of compiler design are:

  • Cross-compiler: In the field of compiler design development, a cross-compiler is a discussion board that facilitates the creation of machine-readable code.
  • Source-to-source compiler: A source-to-source compiler is used to translate source code from one programming language into another code.

13) What is meant by three address codes in the compiler?

As an intermediate code, three-address code is simple to produce and even simpler to translate into machine language. An expression can be represented by no more than three addresses and a single operator. The value computed at each instruction is saved in a temporary variable established by the compiler.

14) What are the compiler design tools?

The tools used for compiler construction are as follows:

  • Scanner Generator
  • Parser Generator
  • Data-flow analysis engines
  • Automatic code generators
  • Compiler construction toolkits
  • Syntax-directed translation engines

15) Describe the Front End Of A Compiler?

A compiler's front end comprises the components of stages that are mainly device-independent and typically rely on the source language. The front end can also be used for some code optimization. Includes dealing with errors at each of those steps as well. Such factors include

  • Semantic evaluation
  • Lexical analysis
  • Syntactic analysis
  • Generation of intermediate code
  • The introduction of the symbol table

16) Describe the Back-end Phases Of A Compiler?

The back-end phases of a compiler consist of the parts specific to the targeted machine that does not rely on the source language but on the intermediate language. One such example is:

  • Code optimization
  • Code generation, along with error handling and symbol-table operations

17) Which language is used in compiler design?

A user creates a program using the C programming language (high-level language). The software is compiled using the C compiler, which then transforms it into an assembly program (low-level language). Afterward, software called an assembler converts the assembly program into machine code (object).

18) What tools are used for compiler construction?

Tools for creating compilers are the same as those for creating other programming languages like Java and C++. Examples of this are:

  • A parser
  • A lexical analyzer
  • A compiler frontend

19) What is bootstrapping in compiler design?

Bootstrapping is a type of compiler design in which the compiler uses an in-house language to implement the entire language rather than a different language for each language being compiled.

20) Can you explain context-free grammar and its importance in compiler design?

Grammar is a set of rules that specify how a language might be formed. A context-free grammar is a type of grammar. It is an important point in the design of compilers since the compiler needs to comprehend the structure of the programming language it is translating into machine code to do it accurately.

21) What is SDD in compiler design?

A type of abstract specification is called syntax-directed definition (SDD). It is an extension of context-free grammar where each grammar production X -> a has a set of production rules connected with the type s = f(b1, b2,......bk) where s is the attribute acquired from function f. A string, number, type, or memory location can be used as the attribute. Semantic rules are code snippets typically added at the end of production and enclosed in curly brackets.

Example: E --> E1 + T { E.val = E1.val + T.val}

22) What Is syntax in compiler design?

After lexical analysis, the second phase is syntax analysis or parsing. It examines whether or not the input has been provided in a way that complies with the syntax rules of the language in which it has been written by analyzing the syntactic structure of the data that has been provided.

23) What is Parsing in Compiler Design? 

The process of moving information from one format to another is called "parsing." Parsing can complete this task automatically. The parser is a part of the translator that helps to arrange the linear structure of the text by a predetermined set of rules called grammar.

24) What is lexical analysis?

The technique of determining which lexemes are present in a sentence is known as lexical analysis. Words and morphemes are common names for lexemes, the more fundamental units of meaning in a language. Lexemes are also sometimes referred to as morphemes. Lexical analysis is used not just for studying written texts and phrases spoken aloud. However, it also has applications for analyzing the spoken language used in naturalistic research.

25) What is an overview of the structure of a typical compiler?

Lexical analyzers, code generators, parsers, and optimizers are the backbone of any compiler. Each line of code is parsed into individual tokens by the lexical analyzer. The parser then analyzes the code's structure and produces a code tree. After the code tree is constructed, the code generator converts it into machine code or assembly language. The final step is for the optimizer to examine the resulting code and perform any necessary optimizations.

26) What is a linker?

A linker is a piece of software that helps connect many files. When you want to automate the process of merging two or more files, for example, when integrating a program with its data file, this is the method that is typically used.

27) What are the two parts of a compilation?

The front matter and contents comprise the two components of a compilation.

Front Matter: The first part of a book is called the front case, and it includes details such as the author's name, the book's title, the publishing date, and information on the author's copyright.
Content: The contents of a book include everything that can be found in the central part of the text, such as the chapters, sections, and appendices.

28) What are compilers and interpreters?

  • Compiler: A compiler translates the complete source code in a single run. A compiler is more efficient than an interpreter since it requires less time to complete the task.
  • Interpreters: An interpreter performs a line-by-line translation of the complete source code. The interpreter requires much more time than the compiler, which means it is considerably slower than the compiler.

29) Why is parsing important for compiler design?

As the process of transforming code from one form to another, parsing is important to compiler design. A compiler must be capable of parsing the code to comprehend and produce the correct output.

30) Define Compiler-compiler.

Compiler-compilers, compiler-generators, and translator-writing systems are all names that have been used to refer to different types of software that assist in developing compilers. To a large extent, they are centered on a specific model of languages. As a result, they are suited for the generation of compilers for languages that share a similar model.

31) What is the fastest compiler language?

C++ is an easy and efficient programming language. Its fast runtime and extensive collection of Standard Template Libraries make it a favorite among competitive programmers (STL).

32) What are the properties of optimizing a compiler?

The properties of an optimizing compiler are:

  • It is a collection of directives that transforms source code into a form that may be executed.
  • It results in a smaller overall size of the code.
  • It leads to an improvement in performance.
  • It improves the overall quality of the code.

33) What are the basic goals of code movement?

The primary objectives of code movement are to guarantee that the source code will be preserved and made accessible to the developers of the project, as well as to guarantee that the output of a program that is executed on one platform will be compatible with the output generated by programs running on other platforms.

34) List the various compiler construction tools.

  • Compiler tool: Building a Compiler Instruments
  • Debugger: a tool for stepping into a program at predetermined times and setting breakpoints
  • Source code management tool: A source code management tool is needed to monitor and control the project's codebase and its reliant projects.
  • Performance profiling tool: You can determine how much time and space your software use with a performance profiling tool.

35) Write a regular expression for an identifier?

Let's say that a regular expression for an identifier is something like /[a-z]+$/, for example. In that scenario, the identifier will be checked against the string "a-z" + to establish whether or not it is a legitimate identifier. If it is not legitimate, the resulting string should not be considered a match if it does not match the pattern /[a-z]+$/. If it does match, it should be deemed a match.

36) What are some examples of compile-time and runtime errors?

Any fault during the software compilation process is a compile-time error. Any error that occurs while a program is being executed is referred to as a runtime error. 

Examples: Syntax mistakes, type errors, and name errors are all examples of errors that might occur during the compilation process. Error during the runtime includes illegal type conversion, division by zero, and indexing outside the allowed range.

37) What does semantic analysis do?

Semantic analysis is an essential component of compiler design. It analyzes the significance of the logical structure of a program and then disassembles it into its parts. It assists in comprehending how the program operates and its tasks and ensuring that the compiler generates code that adheres to the structure.

38) What are the various types of intermediate code representation?

When a high-level language is represented as an intermediate code, it is translated into a simpler language. It is also translated into other languages and called a compilation.

It's common to practice using a variety of intermediate code representations in the programming field. Examples of this are:

  • interpreter
  • code generation
  • jit (just in time) compiler

39) Can you explain what syntax and semantics mean in the context of compiler design?

While semantics relates to the meaning of the language, syntax refers to the rules guiding the structure of a programming language. A compiler must comprehend both the syntax and semantics of the source and destination languages to translate code from one language to another correctly.

40) What Are The Properties Of Optimizing Compilers?

  • The code must be written to generate the smallest possible amount of the desired code.
  • Currently, there can be no inaccessible source code.
  • Any and every unused or unnecessary code must be eliminated from the source code.
  • The code improvements should be applied by optimizing compilers to the source language.
  • Elimination of Frequent Subexpressions.
  • Power savings, code migration, and the eradication of useless codes.

41) Can you explain the various function parameter passing options?

  • Call by call
  • Call by fee
  • Copy-restore
  • Call via reference

42) Define Symbol Table.

A symbol table is a type of data structure that the compiler uses to track how the variables are used. It keeps records of names and information about how they are used and bound.

43) What is the Application of Compilers?

Here are some important applications of Compilers:

  • Compilers are helpful tools for putting into practice higher-level programming languages.
  • It offers support for optimization for parallelism in computer architecture.
  • It is used in designing new memory structures for computer systems.
  • It is used extensively in programs that translate.
  • It can be used in the synthesis of hardware, the translation of binary, and the interpretation of database queries, among other program translations.
  • It is simple to use in conjunction with various other software productivity tools.

44) Which language is used in compiler design?

A user creates a program using the C programming language (high-level language). The software is compiled using the C compiler, which then transforms it into an assembly program (low-level language). Afterward, software called an assembler converts the assembly program into machine code (object).

45) Why is compiler design used?

The principles of compiler design provide an in-depth look at the translation and optimization processes. The design of a compiler includes the primary translation mechanism, error detection, and recovery. It includes frontend lexical, syntax, semantic analysis, back-end code generation, and optimization.

46) Is compiler design difficult?

The process of building a compiler is a difficult one. A good compiler takes concepts from formal language theory, the study of algorithms, artificial intelligence, systems design, computer architecture, and the theory of programming languages and applies them to translate a program. Other areas of expertise include studying artificial intelligence and computer systems design.

47) What is Backpatching in compiler design?

Backpatching involves filling in blanks with unclear information. Label information is provided here. In essence, it generates code by performing the relevant semantic activities. When constructing TACs for the supplied expressions, it may mention the Label's address in goto statements.

48) Is the compiler software or hardware?

The compiler is a piece of software that takes a program written in a high-level language (the Source Language) and translates it to a low-level language (the Object/Target/Machine Language/0, 1's).

49) Who compiles the compiler?

An assembler and machine code can be used to create a very basic compiler. You can use the initial compiler to create a more complex one once you have software that can convert data into binary instructions.

50) What is an example of a compiler?

Examples of compilers are Java, C, C++, and C#.

51) What is a linker in compiler design?

When many object files (created by a compiler or an assembler) need to be combined into a single executable file, library file, or other "object" file, a linker or link editor is the computer system tool used to do so.

Post a Comment

0 Comments