## 2.3. Finite Automata as Language Recognizers

In this section we first define several variations of the finite automata distinguished by the properties of the transition function.

Definition 15 (Finite automata). Let A = ( Q, T, q0, δ, F ). It is a finite automaton (recognizer), where Q is the finite set of (inner) states, T is the input (or tape) alphabet, q0 Q is the initial state, F Q is the set of final (or accepting) states and δ is the transition function as follows.

• δ : Q × (T ∪ {λ}) → 2Q (for nondeterministic finite automata with allowed λ-transitions);

• δ : Q × T → 2Q (for nondeterministic finite automata without λ-transitions);

• δ : Q × TQ (for deterministic finite automata, λ can be partially defined);

• δ : Q × TQ (for completely defined deterministic finite automata (it is not allowed that δ is partial function, it must be completely defined).

One can observe, that the second variation is a special case of the first one (not having λ-transitions). The third variation is a special case of the second one having sets with at most one element as images of the transition function, while the fourth case is more specific allowing sets exactly with one element.

One can imagine a finite automaton as a machine equipped with an input tape. The machine works on a discrete time scale. At every point of time the machine is in one of its states, then it reads the next letter on the tape (the letter under the reading head), or maybe nothing (in the first variations), and then, according to the transition function (depending on the actual state and the letter being read, if any) it goes to a/the next state. It may happen in some variations that there is no transition defined for the actual state and letter, then the machine gets stuck and cannot continue its run.

There are two widely used ways to present automata: by Cayley tables or by graphs. When an automaton is given by a Cayley table, then the 0th line and the 0th column of the table are reserved for the states and for the alphabet, respectively (and it is marked in the 0th element of the 0th row). In some cases it is more convenient to put the states in the 0th row, while in some cases it is a better choice to put the alphabet there. We will look at both possibilities. The initial state should be the first among the states (it is advisable to mark it by a → sign also). The final states should also be marked, they should be circled. The transition function is written into the table: the elements of the set δ(q,a) are written (if any) in the field of the column and row marked by the state q and by the letter a. In the case when λ-transitions are also allowed, then the 0th row or the column (that contains the symbols of the alphabet) should be extended by the empty word (λ) also. Then λ-transitions can also be indicated in the table.

Automata can also be defined in a graphical way: let the vertices (nodes, that are drawn as circles in this case) of a graph represent the states of the automaton (we may write the names of the states into the circles). The initial state is marked by an arrow going to it not from a node. The accepting states are marked by double circles. The labeled arcs (edges) of the graph represent the transitions of the automaton. If p ∈ δ (q,a) for some p,qQ, aT ∪ {λ}, then there is an edge from the circle representing state q to the circle representing state p and this edge is labeled by a. (Note that our graph concept is wider here than the usual digraph concept, since it allows multiple edges connecting two states, in most cases these multiple edges are drawn as normal edges having more than 1 labels.)

In this way, implicitly, the alphabet is also given by the graph (only those letters are used in the automaton which appear as labels on the edges).

In order to provide even more clarification, we present an example. We describe the same nondeterministic automaton both by a table and by a graph.

Example 21. Let an automaton be defined by the following Cayley table:

T Qq0q1q2q3
aq1q1q2, q3-
bq0q0-q3
cq0q2-q1,q2,q3

Figure 2.2. shows the graph representation of the same automaton.

These automata are used to accept words, and thus, languages:

Definition 16. (Language accepted by finite automaton). Let A = (Q, T, q0, δ, F) be an automaton and w T* be an input word. We say that w is accepted by A if there is a run of the automaton, i.e., there is an alternating sequence q0 t1 q1 ... qk-1 tk qk of states and transitions, that starts with the initial state q0, (qi Q for every i, they are not necessarily distinct, e.g., qi = qj is allowed even if i ≠ j) and for every of its transition ti of the sequence

• ti : qi δ (qi-1,ai) in nondeterministic cases

• ti : qi = δ (qi-1,ai) in deterministic cases,

where a1... ak = w, and qk F. This run is called an accepting run.

All words that A accepts form L(A), the language accepted (or recognized) by the automaton A.

Example 22. Let A be the automaton drawn in the next animations. We show a non-accepting run of a non-deterministic automaton A (with λ-transitions) in Animation 3.

Animation 3.

However the word 1100 is accepted by A, since it has also an accepting run that is shown in Animation 4.

Animation 4.

These finite automata are also called finite-state acceptors or Rabin-Scott automata. Let us see the language class(es) that can be accepted by these automata.

Two automata are equivalent if they accept the same language.

We have defined four types of finite automata and by the definition it seems that the latter ones are more restricted than the former ones. However, it turns out that all four versions characterize the same language class:

Theorem 4. For every finite automaton there is an equivalent (completely defined) deterministic finite automaton.

Proof. The proof is constructive. Let A = (Q, T, q0, δ, F) be a nondeterministic finite automaton (allowing λ-transitions). Let us define, first, the λ-closure of an arbitrary set q' of states.

• let U1 ({q'}) = {q'},

• let Ui+1 ({q'}) = Ui (q') ∪ {pQ∣∃rUi (q') such that p ∈ δ (r,λ)}, for i > 1.

Since Q is finite, there is a value k such that Uk (q') = Uk+1 (q'), let us denote this set by U(q'). Practically, this set contains all the states that can be reached starting from a state of q' by only λ-transitions.

Now we are ready to construct the automaton A' = (Q', T, U (q0), δ', F'), where Q' = 2Q, F'Q' includes every element q'Q' such that q'F ≠ ∅. The transition function δ' is defined as follows: for any aT and q'Q'. Actually while this can be done for all subsets of Q, subsets which cannot be reached by transitions from U (q0) by δ' can be deleted (these useless states are not needed).

One can observe that A' is a completely defined deterministic automaton. Also, every run of A has an equivalent run of A', in fact, A' simulates every possible run of A on the input at the same time. Conversely, if A' has an accepting run, then A also has at least one accepting run for the same input. Therefore, A and A' accept the same language, consequently they are equivalent.

QED

Our previous proof gives an algorithm for the ''determinization'' of any finite automaton having only states reachable from the initial state as we will see it in details in Example 23. Note that even if we deleted these useless states, the automaton may not be minimal in the sense that the same language can be accepted by a completely defined deterministic finite automaton having less number of states than our automaton.

Example 23. Let a nondeterministic automaton be defined by the following Cayley table (note that in this algorithm the rows refer to the states of the automaton and the columns to the letters of the alphabet, and in this automaton λ-transitions are allowed):

Q Tabλ
q0q0,q1q2-
q1q1-q2
q2q0q1 -

We start with the λ-closure of the initial state U (q0) = {q0}. This set will count as the initial state of the new automaton: let it be in the first row of the table of this new automaton. Let us see which sets of states can be obtained from this set by using the letters of the alphabet:

• by letter a the set {q0, q1} is obtained, however, its λ-closure is {q0, q1, q2};

• by letter b the set {q2} is obtained and its λ-closure is {q2}.

Let us write these two sets in the second and third row of the table. Now let us see what sets of states can be reached from these sets. First, let us see the set {q0, q1, q2}.

• by letter a the set {q0, q1} is obtained, however, its λ-closure is {q0, q1, q2};

• by letter b the set {q1, q2} is obtained and its λ-closure is {q1, q2}.

Since this latter set is not in the table yet, it is added to the fourth row. Now let us see the set {q2}.

• by letter a the set {q0} is obtained, and its λ-closure is {q0};

• by letter b the set {q1} is obtained and its λ-closure is {q1, q2}.

Since both of these two sets are already in the table we do not need to add a new row. Finally, let us analyse the set {q1,q2} (that is the last row of the table).

• by letter a the set {q0, q1} is obtained, and its λ-closure is {q0, q1, q2};

• by letter b the set {q1} is obtained and its λ-closure is {q1, q2}.

These sets are in the table. So the table is filled. The initial state of the new deterministic automaton is {q0}. The final states are: {q0, q1, q2}, {q2}, and {q1, q2}. The next table shows the resulting deterministic finite automaton:

Q Tab
→ {q0}{q0, q1, q2}{q2}
⊂{q0, q1, q2}⊃{q0, q1, q2}{q1, q2}
⊂{q2}⊃{q0}{q1, q2}
⊂{q1, q2}⊃{q0, q1, q2}{q1, q2}

Example 24. Animation 5. shows an example how to obtain a completely defined deterministic automaton that is equivalent to the original nondeterministic automaton.

Animation 5.

Let A = (Q, T, q0, δ, F) be a deterministic finite automaton such that each of its states is reachable from its initial state (there are no useless states). Then we can construct the minimal deterministic finite automaton that is equivalent to A in the following way:

Let us divide the set of states into two groups obtaining the classification C1 = {F, Q\F}. (We denote the class where state q is by C1[q].)

Then, for i > 1 the classification Ci is obtained from Ci-1: the states p and q are in the same class by Ci if and only if they are in the same class by Ci-1 and for every aT they behave similarly: δ (p,a) and δ (q,a) are in the same class by Ci.

Set Q is finite and, therefore, there is a classification Cm such that it is the same as Cm+1.

Then, we can define the minimal completely defined deterministic automaton that is equivalent to A: its states are the groups of the classification Cm, the initial state is the group containing the initial state of the original automaton, the final states are those groups that are formed from final states of the original automaton, formally:

(Cm, T, Cm [q0], δCm, FCm),

where δCm(Cm [q], a) = Cm [δ (q,a)] for every Cm [q] ∈ Cm, aT and FCm = {Cm [q]∣qF}.

It may happen that there are some words wT* that are not prefixes of any words of a regular language L. Then, the minimal completely defined deterministic automaton contains a sink state, that is the state where the word w and other words with the same property lead the automaton. When we want to have a minimal deterministic finite automaton for these languages, allowing partial (not completely defined) finite automata, then we may delete this sink state (with the transitions into it) by decreasing the number of the states by one.

Let us see yet another example. When applying the minimization algorithm it is more convenient to put the states to the 0th row of the table and the letters of the alphabet to the 0th column of the table.

Example 25. Let the deterministic automaton A be given as follows:

T Qq0q1q2q3q4q5q6
aq2q5q1q1q2q1q0
bq1q0q3q4q5q3q2

Give a minimal deterministic automaton that is equivalent to A.

Solution:

Before applying the algorithm we must check which states can be reached from the initial state: from q0 one can reach the states q0, q2, q1, q3, q5, q4. Observe that the automaton cannot enter state q6, therefore, this state (column) is deleted. The task is to minimize the automaton

T Qq0q1q2q3q4q5
aq2q5q1q1q2q1
bq1q0q3q4q5q3

by the algorithm. When we perform the first classification of the states C1 = {Q1, Q2} by separating the accepting and non-accepting states: Q1 = {q2, q4, q5}, Q2 = {q0, q1, q3} then we have:

QQ1Q2
T q2q4q5q0q1q3
aQ2Q1Q2Q1Q1Q2
bQ2Q1Q2Q2Q2Q1

Then C2 = {Q11, Q12, Q21, Q22} with Q11 = {q2, q5}, Q12 = {q4}, Q21 = {q0, q1}, Q22 = {q3}. Then according to this classification we have

QQ11Q12Q21Q22
T q2q5q4q0q1q3
aQ21Q21Q11Q11Q11Q21
bQ22Q22Q11Q21Q21Q12

Since C3 = C2 we have the solution, the minimal deterministic finite automaton equivalent to A:

T QQ11Q12Q21Q22
aQ21Q11Q11Q21
bQ22Q11Q21Q12

We conclude this subsection by a set of exercises.

Exercise 17. Give a finite automaton that accepts the language of words that contain the consecutive substring baab over the alphabet {a,b}.

Exercise 18. Let the automaton A be given in a Cayley table as follows:

T Qq0q1q2q3q4
0q0q1q2q4q1
1q1q2q3q3q4

Draw the graph of A. What is the type of A (e.g., nondeterministic with allowed λ-transitions)?

Exercise 19. The graph of the automaton A is given in Figure 2.3.

Describe A by utilizing a Cayley table. What is the type of A (e.g., deterministic with partially defined transition function)? What is the language that A recognizes?

Exercise 20. Let a nondeterministic automaton with λ-transitions, A, be defined by the following Cayley table:

Q Tabcλ
q0q0, q3q1--
q1q1q3-q2
q2-q0q2-
q3q1q1q2q0

Give a completely defined deterministic automaton that is equivalent to A.

Exercise 21. Let a nondeterministic automaton A be defined by the following table:

Q T01
q0q0, q1q1
q1q1q2, q3
q2q0-
q3q3q0, q1, q2

Give a completely defined deterministic automaton that accepts the same language as A.

Exercise 22. Let a nondeterministic automaton A be defined by the graph shown in Figure 2.4.

Give a completely defined deterministic automaton that accepts the same language as A.

Exercise 23. Let the deterministic automaton A be given as follows:

T Qq0q1q2q3q4q5q6q7
aq6q3q1q7q1q0q4q1
bq4q7q1q2q3q1q5q6
cq5q2q6q1q3q6q0q6

Give a minimal deterministic automaton that is equivalent to A.

Exercise 24. Let the deterministic automaton A be given by the following table.

T Qq0q1q2q3q4q5q6q7q8
0q3q5q7q0q2q0q4q2q5
1q8q1q3q2q6q7q3q5q2

Give a minimal deterministic automaton that is equivalent to A. (Hint: check first which states can be reached from the initial state.)

Exercise 25. Let a nondeterministic automaton A be defined by the graph shown in Figure 2.5.

Give a minimal deterministic automaton that is equivalent to A.

### 2.3.1. Synthesis and Analysis of Finite Automata

Now we are going to show that exactly the class of regular languages can be described with the help of the finite automata. First we show that every regular language (type 3 language) can be accepted by finite automata.

Theorem 5. Every language generated by a regular grammar is accepted by a finite automaton.

Proof. The proof is constructive. Let G = (N, T, S, P) be a regular grammar in normal form. Then, let the finite automaton

A = (Q, T, q0, δ,F)

be defined as follows:

let Q = N ∪ {F'} (where F'N),

q0 = S.

Let the transition function δ be defined by the elements of P: let Bδ (A,a) if AaBP; and let F'δ (A,a) if AaP. Further, let the set of accepting states be {F'} if S →λ is not in P and let F = {F',S} if S → λ ∈ P.

One can see that the successful derivations in the grammar and the accepting runs of the automaton have a one-to-one correspondence.

QED.

Example 26. Let

G = ({S, A, B, C}, {a, b, c}, S,

```{  S → abbA,
S → baaB,
S → λ,
A → aS,
A → aC,
B → bC,
C → aS,
C → cc
}) ```

generating the language L(G). Give a finite automaton that accepts the language L(G).

Solution:

First we must exclude the rules containing more than 1 terminals, as we did in the proof of Theorem 2. In this way the grammar

G' =({S, A, B, C, X1, X2, X3, X4, X5}, {a, b, c}, S,

```{  S → aX1,
X1 → bX2,
X2 → bA,
S → bX3,
X3 → aX4,
X4 → aB,
S → λ,
A → aS,
A → aC,
B → bC,
C → aS,
C → cX5,
X5 → c
}) ```

can be obtained. Then, we can draw the graph of automaton A as it can be seen in Figure 2.6.

Now we are going to show that the class of finite automata cannot accept more languages than the class that can be described by regular expressions.

Theorem 6. Every language accepted by a finite automaton can be described by a regular expression.

Proof. The proof is constructive. We present an algorithm that shows how one can construct a regular expression from a finite automaton. We can restrict ourselves to deterministic finite automaton, since we have already seen that they are equivalent to the nondeterministic finite automata. Let the states of a deterministic finite automaton be 1,2,... for the sake of simplicity, i.e., let A = ({1,... n}, T, 1, δ, F) be given. Let denote the regular expression that describes the language accepted by the automaton ({1,...,k} ∪ {i,j}, T, i, δ', {j}), where δ' is the restriction of δ containing only transitions from the set {i} ∪ {1,...,k} to the set {1,...,k} ∪ {j} (1 ≤ i,jn, 0 ≤ kn and 1,..., 0 means the empty set).

Then, describes the regular language that is given by direct transition(s) from state i to j. Therefore, gives the language {λ} ∪ {aδ (i,a) = i} (this language contains the empty word and possibly some one-letter-long words, i.e., it is described by a basic regular expression or finite union of basic regular expressions).

Further, describes the language {aδ (i,a) = j} if ij (it can contain some one-letter-long words). These regular expressions can easily be obtained and proven to describe the languages mentioned.

Now we use induction on the upper index:

for 1 ≤ kn. This expression can be seen intuitively: starting from state i we could reach state j by using the first k states in our path, either without using state k (the part refers to this case) or by a path reaching state , and then reaching it arbitrarily many times (including 0 times: , and finally reaching state j from state

Finally, gives a regular expression that describes exactly the language accepted by A. In this way it is constructively proven that for any finite automaton one can construct a regular expression that describes the language accepted by the automaton.

QED.

Example 27. Let the Cayley table of automaton A be given as follows:

T Qq1q2q3
aq2q3q1
bq3q2-

Describe the accepted language L(A) by a regular expression.

Solution:

Let us describe the regular expressions for i,j ∈ {1,2,3} by using the transitions of A.

Now by using the inductive step we compute for i,j ∈ {1,2,3}.

Now we can continue by computing the expressions

To describe L(A) we need Let us compute it:

Thus, the regular expression

(ab*a + b)(aab*a + ab)* aab* + ab* + (ab*a + b)(aab*a + ab)* = (ab*a + b)(aab*a + ab)* (λ + aab*) + aab*

describes L(A). The problem is solved.

Now we have proven that the three concepts we have discussed in this chapter, the regular grammars, regular expressions and finite automata are equivalent in the sense that each of them characterize exactly the class of regular languages (see Figure 2.7).

The aim of the analysis of a finite automaton is the task to describe the accepted language in another way, usually, by regular expressions. The synthesis of a finite automaton is the construction of the automaton that accepts a regular language that is usually given by a regular expression. Kleene has proven the equivalence of finite automata and regular expressions.

The minimization algorithm is very important, since the minimal, completely defined, deterministic finite automaton is (the only known) unique identification of a regular language. In this way, we can decide if two regular languages (given by regular expressions, grammars or automata) coincide or not.

We close this subsection by a set of exercises.

Exercise 26. Let

G = ({S, A, B}, {0,1}, S,

```{  S → 000A,
S → 111B,
A → λ,
A → 0S,
A → 11,
B → 1S,
B → 000
}) ```

generating language L(G). Give a finite automaton that accepts language L(G). (Hint: first transform the grammar to normal form.)

Exercise 27. Let

G = ({S, A, B, C}, {a,b}, S,

```{  S → aaaA,
S → bbB,
S → C,
A → S,
A → baB,
A → ba,
B → S,
B → C,
B → b,
B → λ,
C → B,
C → aA
}) ```

generating language L(G). Give a finite automaton that accepts language L(G).

Exercise 28. Let

G = ({S, A, B}, {a, b, c}, S,

```{  S → abA,
S → bccS,
A → bS,
A → c,
A → B,
B → S,
B → aA,
B → bcc,
B → λ
}) ```

generating language L(G). Give a finite automaton that accepts language L(G).

Exercise 29. Let the automaton A be defined by the following table:

T Qq1q2q3
0q1q3q3
1q2q2q1

Give a regular expression that describes the language accepted by automaton A.

Exercise 30. Let automaton A, accepting language L(A), be defined by the Cayley table:

T Q→⊂q1q2q3
aq1q3-
b-q2q1
cq2q2-

Give a regular expression that describes language L(A).

Exercise 31. Let automaton A be as it is shown in Figure 2.8. Give a regular expression that defines the same language as A.

### 2.3.2. The Word Problem

The word problem is to decide whether any given word w belongs to a given regular language (or not). This can be done very efficiently by using a deterministic finite automaton that accepts the given language. (Such an automaton can be constructed from a regular expression, first by Theorem 3. and then/or from a grammar by Theorem 5. and then/or from a nondeterministic finite automaton by Theorem 4.) Reading the word w can be done by ∣w∣ steps, and then, if the automaton accepts w, i.e., it arrived at an accepting state in this run, then w is an element of the language. Otherwise, (if there were some undefined transitions and the automaton gets stuck, or by reading the word finally a non accepting state is reached) w does not belong to the given regular language. The decision on an input word of length n is done in at most n steps, therefore, this is a real time algorithm (i.e., linear time algorithm with coefficient 1). There is no faster algorithm that can read the input, so the word problem for regular languages can be solved very efficiently.

Exercise 32. Let automaton A be given as it can be seen in Figure 2.9.

```
Decide whether the words

abab,
baba,
aaaabbb,
bbbaaaab,
baabaabaabb and
aaaabbbababaaa are in L(A).
```

Exercise 33. Let the nondeterministic automaton A be defined by the Cayley table:

Q T01λ
q0q1, q4q5q2
q1q3q1, q4-
q2q1q2-
q3--q2
q4-q4-
q5q5q5q0

Decide which of the words 0011, 0101, 0110, 01110010, 100, 1011110 belong to the accepted language of A.

Exercise 34. Let

G = ({S, A, B}, {a, b, c}, S,

```{  S → abcS,
S → bcA,
S → acB,
A → aS,
A → a,
B → bS,
B → bcc,
S → ccc
}) ```

be a regular grammar. Decide which of the following words can be generated by G:

```
abcccc,
acbcc,
acbbacbccc,
bca,
bcacacc,
bcaabcacbcc,
ccc,
cccacbc.
```

Exercise 35. Given the regular language

a* + (a+b)*baba (a+b)*),

decide if the following words are in the described language or not:

```
aaaaa,
aaabaa,
ababa,
abbabaaba.
```