In this section, we will describe the regular languages in another way. First, we define the syntax and the semantics of regular expressions, and then we show that they describe the same class of languages as the class that can be generated by regular grammars.

**Definition 13.** (Regular expressions).
*Let T be an alphabet and V = T *∪ {∅, λ, +, ·, ^{*},(,)} *be defined
as its extension, such that
T *∩ {∅, λ, +, ·, ^{*},(,)} = ∅. *Then, we define the regular
expressions over T as expressions over the alphabet V in an inductive way:*

*Base of the induction:*∅

*is a (basic) regular expression,*λ

*is a (basic) regular expression,**every a*∈*T is a (basic) regular expression;*

*Induction steps**if p and r are regular expressions, then (p+r) is also a regular expression,**if p and r are regular expressions, then*(*p*·*r*)*is also a regular expression (we usually eliminate the sign*·*and write*(*pr*)*instead*(*p*·*r*)*),**if r is a regular expression, then r*^{*}is also a regular expression.

*Further, every regular expression can be obtained from the basic regular expressions using finite
number of induction steps.*

**Definition 14** (Languages described by regular expressions).
*Let T be an alphabet. Then, we define languages described by the regular expressions over the
alphabet T following their inductive definition.*

*Basic languages:*∅

*refers to the empty language*{},λ

*refers to the language*{λ},*for every a*∈*T, the expression refers to the language*{*a*};

*Induction steps:**if p and r refer to the languages L*(_{p}and L_{r}, respectively, then the regular expressions*p+r*)*refers to the language L*∪_{p}*L*_{r},*if p and r refer to the languages L*(_{p}and L_{r}, respectively, then the regular expressions*p · r*)*or*(*pr*)*refers to the language L*_{p}· L_{r},*if r refers to a language L*_{r}then r^{*}refers to the language L_{r}^{*}.

The language operations used in the above definition are the *regular operations*:

the addition (+) is the union of two languages,

the product is the concatenation of two languages, and

the reflexive and transitive closure of concatenation is the (Kleene-star) iteration of languages.

Two regular expressions are equivalent if they describe the same language. Here are some examples:

(p+r) | ≡ | (r+p) | (commutativity of union) |

((p+r)+q) | ≡ | (p+(r+q)) | (associativity of union) |

(r+∅) | ≡ | r | (additive zero element, unit element for union) |

((pr)q) | ≡ | (p(rq) | (associativity of concatenation) |

(rλ) | ≡ | (λr) ≡ r | (unit element for concatenation) |

((p+r)q) | ≡ | ((pq)+(rq)) | (left distributivity) |

(p(r+q)) | ≡ | ((pr)+(pq)) | (right distributivity) |

(r∅) | ≡ | ∅ | (zero element for concatenation) |

λ ^{*} | ≡ | λ | (iteration of the unit element) |

∅ ^{*} | ≡ | λ | (iteration of the zero element) |

(rr^{*}) | ≡ | (r^{*}r) | (positive iteration) |

Actually, the values of the last equivalence are frequently used, and they are denoted by *r*^{+},
i.e., *r*^{+} ≡ *rr*^{*} by definition.
This type of iteration which does not allow for 0 times
iteration, i.e., only positive numbers of iterations are allowed, is usually called Kleene-plus iteration.

With the use of the above equivalences we can write most of the regular expressions in a shorter form: some of the brackets can be eliminated without causing any ambiguity in the language described. The elimination of the brackets is usually based on the associativity (both for the union and for the concatenation). Some further brackets can be eliminated by fixing a precedence order among the regular operations: the unary operation (Kleene-)star is the strongest, then the concatenation (the product), and (as usual) the union (the addition) is the weakest.

We have regular grammars to generate languages and regular expressions to describe languages, but these concepts are not independent. First we will prove one direction of the equivalence between them.

**Theorem 3.** *Every language described by a regular expression can be
generated by a regular grammar.*

**Proof.** The proof goes by induction based on the definition of regular expressions. Let *r* be a basic
regular expression, then

if

*r*is ∅, then the empty language can be generated by the regular grammar(

*S, A, T, S,*{*A → a*});if

*r*is λ, then the language {λ} can be generated by the regular grammar(

*S, T, S,*{*S*→ λ});if

*r*is*a*for a terminal*a*∈*T*, then the language {*a*} is generated by the regular grammar(

*S, T, S,*{*S → a*}).

If *r* is not a basic regular expression then the following cases may occur:

*r*is (*p+q*) with some regular expressions*p,q*such that the regular grammars*G*= (_{p}*N*) and_{p}, T, S_{p}, P_{p}*G*= (_{q}*N*) generate the languages described by expression_{q}, T, S_{q}, P_{q}*p*and*q*, respectively, where*N*∩_{p}*N*= ∅ (this can be done by renaming nonterminals of a grammar without affecting the generated language). Then let_{q}*G*= (*N*∪_{p}*N*∪ {_{q}*S*},*T, S, P*∪_{p}*P*∪ {_{q}*S*→*S*, S→_{p}*S*}),_{q}where

*S*∉*N*∪_{p}*N*is a new symbol. It can be seen that_{q}*G*generates the language described by expression*r*.*r*is (*p*·*q*) with some regular expressions*p,q*such that the regular grammars*G*= (_{p}*N*) and_{p}, T, S_{p}, P_{p}*G*= (_{q}*N*) generate the languages described by expression_{q}, T, S_{q}, P_{q}*p*and*q*, respectively, where*N*∩_{p}*N*= ∅. Then let_{q}*G*= (*N*∪_{p}*N*∪ {_{q},T, S_{p}, P_{q}*A*→*uB*∣*A*→*uB*∈*P*∈_{p}, A,B*N*∈_{p}, u*T*^{*}} ∪ {*A*→*uS*∣_{q}*A*→*u*∈*P*∈_{p}, A*N*∈_{p}, u*T*^{*}}).It can be shown that

*G*generates the language described by expression*r*.*r*is a regular expression of the form*q*^{*}for a regular expression*q*. Further let*G*= (_{q}*N*) be a regular grammar that generates the language described by expression_{q}, T, S_{q}, P_{q}*q*. Then, let*G*= (*N*∪ {_{q}*S*},*T,S, P*∪ {_{q}*S*→ λ,*S*→*S*} ∪ {_{q}*A*→*uS*∣_{q}*A*→*u*∈*P*,_{q}*A*∈*N*∈_{q}, u*T*^{*}}),where

*S*∉*N*. It can be shown that_{q}*G*generates the language described by expression*r*.

Since every regular expression built by finitely many applications of the induction step, for any regular expression one can construct a regular grammar such that the grammar generates the language described by the expression.

QED.

When we want to have a grammar that generates the language given by a regular expression, the process can be faster, if we know that every singleton, i.e., language containing only one word, can easily be obtained by a grammar having only one nonterminal (the start symbol) and only one production that allows to generate the given word in one step.

**Example 20.** *Let r = *(*cbb*)^{*}(*ab*+*ba*) *be a regular expression (that describes
a language over the alphabet *{*a,b,c*}). *Give a regular grammar that generates this language.*

*Solution:*

*Let us build up r from its subexpressions.
According to the above observation, the language *{*cbb*}* can be generated by the grammar *

G=({_{cbb}S},{_{cbb}a,b,c},S,{_{cbb}S→_{cbb}cbb}).

*Now, let us use our induction step to
obtain the grammar G*_{(}_{cbb}_{)*} *that generates the language *(*cbb*)^{*}: then

G_{(}_{cbb}_{)*}= ({S},{_{cbb}, S_{(cbb)*}a,b,c},S_{(}_{cbb}_{)*},

{

S→_{cbb}ccb,S_{(}_{cbb}_{)*}→ λ,S_{(}_{cbb}_{)*}→S,_{cbb}S_{(}_{cbb}_{)}→cbb S_{(}_{cbb}_{)}}).

*The languages *{*ab*}* and *{*ba*}* can be generated by grammars *

G= ({_{ab}S},{_{ab}a,b,c},S,{_{ab}S→_{ab}ab})

*and*

G= ({_{ab}S},{_{ba}a,b,c},S,{_{ba}S→_{ba}ba}),

*respectively. Their union, ab+ba, then is generated by the grammar *

G= ({_{ab+ba}S},{_{ab},S_{ba},S_{ab+ba}a,b,c},S,_{ab+ba}

{

S→_{ab}ab, S→_{ba}ba, S→_{ab+ba}S→_{ab}, S_{ab+ba}S})_{ba}

*according to our induction step. *

*Finally, we need the concatenation of the previous expressions *(*cbb*)* ^{*}
and *(

G_{(}_{cbb}_{*)(}_{ab+ba}_{)}= ({S,_{cbb}S_{(}_{cbb}_{)*},S}, {_{ab},S_{ba},S_{ab+ba}a,b,c},S_{(}_{cbb}_{)*},

{

S→_{cbb}cbbS,_{ab+ba}S_{(}_{cbb}_{)*}→S,_{ab+ba}S_{(}_{cbb}_{)*}→S,_{cbb}S_{(}_{cbb}_{)*}→cbb S_{(}_{cbb}_{)*},

S→_{ab}ab, S→_{ba}ba, S→_{ab+ba}S,_{ab}S→_{ab+ba}S})_{ba}

*due to our induction step. The problem is solved.*

**Exercise 12.**
*Give a regular expression that describes the language containing exactly those words that contain
three consecutive a's over the alphabet *{*a,b*}.

**Exercise 13.**
*Give a regular expression that describes the language containing exactly those words that do not
contain two consecutive a's (over the alphabet *{*a,b*}).

**Exercise 14.**
*Give a regular grammar that generates the language* 0^{*}(1+22)(2^{*}+00).

**Exercise 15.**
*Give a regular grammar that generates the language* 0+1(1+0)^{*}.

**Exercise 16.**
*Give a regular grammar that generates the language *(*a+bb*(*b+*(*cc*)^{*}))^{*}(*ababa+c ^{*}*).