**Tartalom**

Summary of the chapter:In this chapter, we deal with the simplest languages of the Chomsky hierarchy, i.e., the regular languages. We show that they can be characterized by regular expressions. Another description of this class of languages can be given by finite automata: both the class of deterministic finite automata and the class of nondeterministic finite automata accept this class. The word problem (parsing) can be solved in real-time for this class by the deterministic finite automata. This class is closed under regular and under set-theoretical operations. This class also has a characterization in terms of analyzing the classes of possible continuations of the words (Myhill-Nerode theorem). We also present Mealy-type and Moore-type transducers: finite transducers are finite automata with output.

In order to be comprehensive we present the definition of regular grammars here again.

**Definition ** (Regular grammars). *A grammar G=(N,T,S,P) is regular if each of its productions
has one of the following forms: A → u, A → uB, where A,B∈ N, u∈ T ^{*}.
The languages that can be generated by regular grammars are the regular
languages (they are also called type 3 languages of the Chomsky hierarchy).*

Animation 2. presents an example for a derivation in a regular grammar.

We note here that the grammars and languages of the definition above are commonly referred to as right-linear grammars and languages, and regular grammars and languages are defined in a more restricted way:

**Definition 12.** (Alternative definition of regular grammars).
*A grammar G = *( *N, T, S, P *) *is regular if
each of its productions has one of the following forms: A → a, A → aB, S → *λ, *where
A,B *∈ *N, a *∈ *T. The languages that can be generated by these grammars
are the regular languages.*

Now we show that the two definitions are equivalent in the sense that they define the same class of languages.

**Theorem 2.** *The language classes defined by our original definition
and by the alternative definition coincide. *

**Proof.** The proof consists of two parts: we need to show that languages generated by grammars of one
definition can also be generated by grammars of the other definition.

It is clear the grammars satisfying the alternative definition satisfy our original definition at the same time, therefore, every language that can be generated by the grammars of the alternative definition can also be generated by grammars of the original definition.

Let us consider the other direction. Let a grammar *G *= ( *N, T, S, P *) may contain rules of types
*A* → *u* and *A* → *uB*
(*A,B* ∈ *N, u* ∈ *T*^{*}).
Then, we give a grammar *G'* = ( *N', T, S', P' *) such that it may
only contain rules of the forms *A* → *a, A* → *aB, S'* → λ
(where *A,B* ∈ *N', a* ∈ *T*) and it
generates the same language as *G*, i.e., *L *(*G*) = *L* (*G'*).

First we obtain a grammar *G''* such that *L *(*G*) = *L* (*G''*)
and *G''* = ( *N'', T, S, P'' *) may contain only
rules of the following forms: *A* → *a, A* → *aB, A* → *B, A* → λ
(where *A,B* ∈ *N'', a* ∈ *T*).
Let us check every rule in *P*: if it has one of the forms above, then we copy it to the set *P''*,
else we will do the following:

if the rule is of the form

*A*→*a*_{1}...*a*for_{k}*k*> 1, where*a*∈_{i}*T, i*∈{1,...,*k*}, then let the new nonterminals*X*_{1},...,*X*_{k}_{-1}be introduced (new set for every rule) and added to the set*N''*and instead of the actual rule the next set of rules is added to*P''*:*A*→*a*_{1}*X*_{1},*X*_{1}→*a*_{2}*X*_{2}, ...,*X*_{k}_{-2}→*a*_{k}_{-1}*X*_{k}_{-1},*X*_{k}_{-1}→*a*._{k}if the rule is of the form

*A*→*a*_{1}...*a*for_{k}B*k*> 1, where*a*∈_{i}*T, i*∈ {1,...,*k*}, then let the new nonterminals*X*_{1},...,*X*_{k}_{-1}be introduced (new set for every rule) and put to the set*N''*, and instead of the actual rule the next set of rules is added to*P''*:*A*→*a*_{1}*X*_{1},*X*_{1}→*a*_{2}*X*_{2}, ...,*X*_{k}_{-2}→*a*_{k}_{-1}*X*_{k}_{-1},*X*_{k}_{-1}→*a*._{k}B

When every rule is analyzed (and possibly replaced by a set of rules) we have grammar *G''*. It is
clear that the set of productions *P''* of *G''* may contain only rules of the forms
*A* → *a, A* → *aB, A* → *a, A* → λ
(where *A,B* ∈ *N'', a*∈ *T*), since we have copied only rules from *P*
that have these forms, and all the rules that are added to the set of productions *P''* by
replacing a rule are of the forms *A* → *aB, A* → *a*
(where *A,B* ∈ *N'', a* ∈ *T*).
Moreover, exactly the same words can be derived in *G''* and in *G*. The derivation graphs in the
two grammars can be mapped in a bijective way. When a derivation step is applied in G with a rule
*A* → *a*_{1}... that is not in *P''*, then the rules must be used
in *G''* that are used to replace
the rule *A* → *a*_{1}...: applying the first added rule first
*A* → *a*_{1}*X*_{1}, then there is only one
rule that can be applied in *G''*, since there is only one rule added with *X*_{1} in its left hand
side... therefore, one needs to apply all the rules of the chain that was used to replace the
original rule *A* → *a*_{1}.... Then, if there is a nonterminal *B* at the end of the rule
*A* → *a*_{1}...,
then it is located at the end of the last applied rule of the chain, and thus the
derivation can be continued in the same way in both grammars. The other way around, if we use a
newly introduced rule in the derivation in *G''*, then we must use all the rules of the chain, and
also we can use the original rule that was replaced by this chain of rules in the grammar *G*. In
this way, there is a one-to-one correspondence in the completed derivations of the two grammars.
(See Figure 2.1. for an example of replacing a long rule by a sequence of shorter rules.)

**2.1. ábra - In derivations the rules with long right hand side are replaced by chains of shorter
rules, resulting binary derivation trees in the new grammar.**

Now we may have some rules in *P''* that do not satisfy the alternative definition. The form of
these rules can only be *A* → *B* and *C* → λ
(where *A,B,C* ∈ *N'', C* ≠ *S*).
The latter types of rules can be eliminated by the Empty-word lemma (see Theorem 1.).
Therefore, we can assume that we have a grammar *G'''* = ( *N''', T, S', P''' *) such that
*L *(*G'''*) = *L* (*G*)
and *P'''* may contain only rules of the forms *A* → *aB, A* → *B, A*
→ *a, S'* → λ (where
*A,B* ∈ *N''', a*∈ *T* and in
case *S'* → λ ∈ *P'''* the start symbol *S'* does not occur on
the right hand side of any of the rules of *P'''*). Let us define the following sets of nonterminals:

let

*U*_{1}(*A*) = {*A*},let

*U*_{i}_{+1}(*A*) =*U*(_{i}*A*) ∪ {*B*∈*N'''*∣∃*C*∈*U*(_{i}*A*) such that*C*→*B*∈*P'''*}, for*i*> 1.

Since *N'''* is finite there is a minimal index *k* such that
*U _{k}*(

QED.

Further we will call a grammar a *regular grammar in normal form* if it satisfies our
alternative definitions. In these grammars the structures of the rules are more restricted: if we
do not derive the empty word, then we can use only rules that have exactly one terminal in their
right hand side.

**Example 19.** *Let*

G= ({S,A,B},{0,1,2},S,

{S→ 010B,A→B,A→ 2021,B→ 2A,B→S,B→ λ }).

*Give a grammar that is equivalent to G and is in normal form. *

*Solution: *

*Let us exclude first the rules that contain more than one terminal symbols.
Such rules are S *→ 010*B and A* → 2021. *Let us substitute them by the sets*

{

S→ 0X_{1},X_{1}→ 1X_{2},X_{2}→ 0B}

*and *

{

A→ 2X_{3},X_{3}→ 0X_{4},X_{4}→ 2X_{5},X_{5}→ 1}

*of rules, respectively, where the newly introduced nonterminals are *{*X*_{1}, *X*_{2}}
*and *{*X*_{3}, *X*_{4}, *X*_{5}}, *respectively. *
*The obtained grammar is *

G''= ({S, A, B, X_{1},X_{2},X_{3},X_{4},X_{5}},{0,1,2},S,

{S→ 0X_{1},X_{1}→ 1X_{2},X_{2}→ 0B,A→B,A→ 2X_{3},X_{3}→ 0X_{4},X_{4}→ 2X_{5},X_{5}→ 1,B→ 2A,B→S,B→ λ ).

*Now, by applying the Empty-word lemma, we can exclude rule B *→ λ *(the empty word can be
derived from the nonterminals B and A in our example) and by doing so we obtain grammar *

G'''= ({S, A, B, X_{1},X_{2},X_{3},X_{4},X_{5}}, {0,1,2},S,

{S→ 0X_{1},X_{1}→ 1X_{2},X_{2}→ 0B,X_{2}→ 0,A→B,A→ 2X_{3},X_{3}→ 0X_{4},X_{4}→ 2X_{5},X_{5}→ 1,B→ 2A,B→ 2,B→S}).

*Now we are excluding the chain rules A *→ *B, B *→ *S. To do this step, first,
we obtain the following sets: *

U_{0} (S) = {S} | U_{0} (A) = {A} | U_{0} (B) = {B} | ||

U_{1} (S) = {S} = U(S) | U_{1} (A) = {A,B} | U_{1} (B) = {B,S} | ||

U_{2} (A) = {A,B,S} | U_{2} (B) = {B,S} = U(B) | |||

U_{3} (A) = {A,B,S} = U(A) |

*Actually for those nonterminals that are not appearing in chain rules these sets are the trivial
sets, e.g., U* (*X*_{1}) = *U*_{0}
(*X*_{1}) = {*X*_{1}}. *Thus, finally, we obtain grammar*

G'= ({S, A, B, X_{1},X_{2},X_{3},X_{4},X_{5}}, {0,1,2},S,

{S→ 0X_{1},A→ 0X_{1},B→ 0X_{1},X_{1}→ 1X_{2},X_{2}→ 0B,X_{2}→ 0,A→ 2X_{3},X_{3}→ 0X_{4},X_{4}→ 2X_{5},X_{5}→ 1,B→ 2A,A→ 2A,B→ 2,A→ 2 }).

*Since our transformations preserve the generated language, every obtained grammar (G'', G''' and
also G') is equivalent to G. Moreover, G' is in normal form. Thus the problem is solved.
*

**Exercise 8.** *Let*

G= ({S, A, B, C},{a,b},S,

{S→abA,S→A,A→B,B→abab,B→aA,B→aaC,B→ λ,C→aaS}).

*Give a grammar that is equivalent to G and is in normal form.*

**Exercise 9.** *Let*

G= ({S, A, B, C},{a,b,c},S,

{S→A,S→B,A→aaB,B→A,B→acS,B→C,C→c,C→ λ }).

*Give a grammar that is equivalent to G and is in normal form. *

**Exercise 10.** *Let*

G= ({S, A, B, C, D},{a,b,c},S,

{S→aA,S→bB,A→B,B→A,B→ccccC,B→acbcB,C→caacA,C→cba}).

*Give a grammar that is equivalent to G and is in normal form. *

**Exercise 11.** *Let*

G= ({S, A, B, C, D},{1,2,3,4},S,

{S→ 11A,S→ 12B,A→B,B→C,B→ 14,B→ 4431,C→ 3D,D→ 233C}).

*Give a grammar that is equivalent to G and is in normal form. *