Summary of the chapter:In this chapter, we discuss the basic expressions, notations, definitions and theorems of the scientific field of formal languages and automata theory. In the first part of this chapter, we introduce the alphabet, the word, the language and the operations over them. In the second part, we show general rewriting systems and a way to define algorithms by rewriting systems, namely Markov (normal) algorithms. In the third part, we describe a universal method to define a language by a generative grammar. Finally, in the last part of this chapter we show the Chomsky hierarchy: a classification of generative grammars are based on the form of their production rules, and the classification of formal languages are based on the classification of generative grammars generating them.

An alphabet is a finite nonempty set of symbols. We can use the union, the intersection and the relative complement set operations over the alphabet. The absolute complement operation can be used if a universe set is defined.

**Example 1.** *Let the alphabet E be the English alphabet, the alphabet V contains the vowels,
and the alphabet C is the set of the consonants.
Then V*∪*C=E, V*∩*C=*∅ *and*

A word is a finite sequence of symbols of the alphabet. The length of the word is the number of symbols it is composed of, with repetitions. Two words are equal if they have the same length and they have the same letter in each position. This might sound trivial, but let us see the following example:

**Example 2.** *Let the alphabet V=*{1,2,+} *and the words p=*1+1*,
q=*2*.
In this case *1+1≠ 2*, i.e. p*≠*q,
because these two words have different lengths, and also different letters in the first position.*

There is a special operation on words called concatenation, this is the operation of joining two words end to end.

**Example 3.** *Let the alphabet E be the English alphabet. Let the word p=railway, and the word q=station over the alphabet E.
Then, the concatenation of p and q is p· q=railwaystation. The length of p is* ∣*p*∣=7 *and the length
of q is* ∣*q*∣=7, *as well.*

If there is no danger of confusion we can omit the dot from between the parameters of the concatenation operation.
It is obvious that the equation ∣*pq*∣=∣*p*∣+∣*q*∣
holds for all words *p* and *q*.
There is a special word called an empty word,
whose length is 0 and denoted by λ. The empty word is the identity element of the concatenation operation,
λ*p*=*p*λ=p holds for each word *p* over any alphabet.
The word *u* is a prefix of the word *p* if there exists a word *w* such
that *p*=*uw*, and the word *w* is a suffix of the word *p* if there exists a word
*u* such that *p*=*uw*. The word *v* is a subword
of word *p* if there exists words *u*,*w* such that *p*=*uvw*.
The word *u* is a proper prefix of the word *p*, if it is a prefix of *p*,
and the following properties hold: *u*≠λ and *u*≠*p*.
The word *w* is a proper suffix of the word *p*, if it is a suffix of *p*,
and *w*≠λ, *w*≠ *p*.
The word *v* is a proper subword of the word *p*, if it is a subword of *p*,
and *v*≠λ, *v*≠*p*. As you can see,
both the prefixes and suffixes are subwords, and both the proper prefixes and proper suffixes are proper subwords, as well.

**Example 4.** *Let the alphabet E be the English alphabet. Let the word p = railwaystation, and the words u=rail,
v = way and w = station. In this example, the word u is a prefix, the word w is a suffix and the word v is a subword of the word p.
However, the word uv is also a prefix of the word p, and it is a subword of the word p, as well. The word uvw is a suffix of the word p,
but it is not a proper suffix.*

We can use the exponentiation operation on a word in a classical way, as well. *p*^{0} = λ by definition,
*p*^{1} = *p*,
and *p** ^{i}* =

A language over an alphabet is not necessarily a finite set of words, and it is usually denoted by *L*.
For a given alphabet *V* the language *L* over *V* is
*L* ⊆ V^{ *}. We have a set again, so we can use the classical
set operations, union, intersection, and the complement operation, if the operands are defined over the same alphabet.
For the absolute complement operation we use *V*^{ *} for universe, so
We can also use the concatenation operation. The concatenation of the languages *L*_{1} and
*L*_{2} contains
all words *pq* where *p* ∈ *L*_{1} and *q* ∈
*L*_{2}. The exponentiation operation is defined in a classical way,
as well. *L*^{0}={λ} by definition, *L*^{1}=*L*,
and *L ^{i}*=

The algebraic approach to formal languages can be useful for readers who prefer the classical mathematical models.
The free monoid on an alphabet *V* is the monoid whose elements are from *V*^{ *}. From this point of
view both operations, concatenation, (also called product) - which is not a commutative operation in this case
-, and the union operation (also called addition), create a free monoid on set *V*, because these operations are associative,
and they both have an identity element.

Associative:

(

*L*_{1}·*L*_{2}) ·*L*_{3}=*L*_{1}· (*L*_{2}·*L*_{3}),(

*L*_{1}∪*L*_{2}) ∪*L*_{3}=*L*_{1}∪ (*L*_{2}∪*L*_{3}),

where

*L*_{1},*L*_{2},*L*_{3}⊆*V*^{ *}.Identity element:

*L*^{0}·*L*_{1}=*L*_{1}·*L*^{0}=*L*_{1},*L*∪_{e}*L*_{1}=*L*_{1}∪*L*=_{e}*L*_{1},

where

*L*_{1}⊆*V*^{ *},*L*^{ 0}= {λ},*L*= ∅._{e}

The equation *L*_{e} · *L*_{1} = *L*_{1} · *L _{e}* =