Chapter 2. Compilers

When a programmer writes down a solution of her problems, she writes a program on a special programming language. These programming languages are very different from the proper languages of computers, from the machine languages. Therefore we have to produce the executable forms of programs created by the programmer. We need a software or hardware tool, that translates the source language program – written on a high level programming language – to the target language program, a lower level programming language, mostly to a machine code program.

There are two fundamental methods to execute a program written on higher level language. The first is using an interpreter. In this case, the generated machine code is not saved but executed immediately. The interpreter is considered as a special computer, whose machine code is the high level language. Essentially, when we use an interpreter, then we create a two-level machine; its lower level is the real computer, on which the higher level, the interpreter, is built. The higher level is usually realized by a computer program, but, for some programming languages, there are special hardware interpreter machines.

The second method is using a compiler program. The difference of this method from the first is that here the result of translation is not executed, but it is saved in an intermediate file called target program.

The target program may be executed later, and the result of the program is received only then. In this case, in contrast with interpreters, the times of translation and execution are distinguishable.

In the respect of translation, the two translational methods are identical, since the interpreter and the compiler both generate target programs. For this reason we speak about compilers only. We will deal the these translator programs, called compilers (Figure 2.1).

Figure 2.1. The translator.

The translator.


Our task is to study the algorithms of compilers. This chapter will care for the translators of high level imperative programming languages; the translational methods of logical or functional languages will not be investigated.

First the structure of compilers will be given. Then we will deal with scanners, that is, lexical analysers. In the topic of parsers – syntactic analysers –, the two most successful methods will be studied: the LL and the LALR} parsing methods. The advanced methods of semantic analysis use O-ATG grammars, and the task of code generation is also written by this type of grammars. In this book these topics are not considered, nor we will study such important and interesting problems as symbol table handling, error repairing or code optimising. The reader can find very new, modern and efficient methods for these methods in the bibliography.

2.1. 2.1 The structure of compilers

A compiler translates the source language program (in short, source program) into a target language program (in short, target program). Moreover, it creates a list by which the programmer can check her private program. This list contains the detected errors, too.

Using the notation program (input)(output) the compiler can be written by

  compiler (source program)(target program, list).  

In the next, the structure of compilers are studied, and the tasks of program elements are described, using the previous notation.

The first program of a compiler transforms the source language program into character stream that is easy to handle. This program is the source handler.

  source handler(source program)(character stream).  

The form of the source program depends from the operating system. The source handler reads the file of source program using a system, called operating system, and omits the characters signed the end of lines, since these characters have no importance in the next steps of compilation. This modified, “poured” character stream will be the input data of the next steps.

The list created by the compiler has to contain the original source language program written by the programmer, instead of this modified character stream. Hence we define a list handler program,

  list handler (source program, errors)(list),  

which creates the list according to the file form of the operating system, and puts this list on a secondary memory.

It is practical to join the source handler and the list handler programs, since they have same input files. This program is the source handler.

  source handler (source program, errors)(character stream, list).  

The target program is created by the compiler from the generated target code. It is located on a secondary memory, too, usually in a transferable binary form. Of course this form depends on the operating system. This task is done by the code handler program.

  code handler (target code)(target program).  

Using the above programs, the structure of a compiler is the following (Figure 2.2):

source handler (source program, errors) (character string, list),

compiler (character stream)(target code, errors),

code handler (target code)(target program).

Figure 2.2.  The structure of compilers.

The structure of compilers.


This decomposition is not a sequence: the three program elements are executed not sequentially. The decomposition consists of three independent working units. Their connections are indicated by their inputs and outputs.

In the next we do not deal with the handlers because of their dependentness on computers, operating system and peripherals – although the outer form, the connection with the user and the availability of the compiler are determined mainly by these programs. The task of the program compiler is the translation. It consists of two main subtasks: analysing the input character stream, and to synthetizing the target code. The first problem of the analysis is to determine the connected characters in the character stream. These are the symbolic items, e.g., the constants, names of variables, keywords, operators. This is done by the lexical analyser, in short, scanner.

From the character stream the scanner makes a series of symbols and during this task it detects lexical errors.

  scanner (character stream)(series of symbols, lexical errors).  

This series of symbols is the input of the syntactic analyser, in short, parser. Its task is to check the syntactic structure of the program. This process is near to the checking the verb, the subject, predicates and attributes of a sentence by a language teacher in a language lesson. The errors detected during this analysis are the syntactic errors. The result of the syntactic analysis is the syntax tree of the program, or some similar equivalent structure.

  parser (series of symbols)(syntactically analysed program, syntactic errors).  

The third program of the analysis is the semantic analyser. Its task is to check the static semantics. For example, when the semantic analyser checks declarations and the types of variables in the expression a + b, it verifies whether the variables a and b are declared, do they are of the same type, do they have values? The errors detected by this program are the semantic errors.

  semantic analyser (syntactically analysed program)(analysed program, semantic errors).  

The output of the semantic analyser is the input of the programs of synthesis. The first step of the synthesis is the code generation, that is made by the code generator:

  code generator (analysed program)(target code).  

The target code usually depends on the computer and the operating system. It is usually an assembly language program or machine code. The next step of synthesis is the code optimisation:

  code optimiser (target code)(target code).  

The code optimiser transforms the target code on such a way that the new code is better in many respect, for example running time or size.

Figure 2.3.  The programs of the analysis and the synthesis.

The programs of the analysis and the synthesis.


As it follows from the considerations above, a compiler consists of the next components (the structure of the compiler program is in the Figure 2.3):

source handler (source program, errors)(character stream, list),

scanner (character stream)(series of symbols, lexical errors),

parser (series of symbols)(syntactically analysed program, syntactic errors),

semantic analyser (syntactically analysed program)(analysed program, semantic errors),

code generator (analysed program)(target code),

code optimiser (target code)(target code),

code handler (target code)(target program).

The algorithm of the part of the compiler, that performs analysis and synthesis, is the next:

Compiler

  1  determine the symbolic items in the text of source program   2  check the syntactic correctness of the series of symbols   3  check the semantic correctness of the series of symbols   4  generate the target code   5  optimise the target code 

The objects written in the first two points will be analysed in the next sections.

Exercises

2.1-1 Using the above notations, give the structure of interpreters.

2.1-2 Take a programming language, and write program details in which there are lexical, syntactic and semantic errors.

2.1-3 Give respects in which the code optimiser can create better target code than the original.