1.2. 1.2 Finite automata and regular languages

Finite automata are computing models with input tape and a finite set of states (Fig. 1.1). Among the states some are called initial and some final. At the beginning the automaton read the first letter of the input word written on the input tape. Beginning with an initial state, the automaton read the letters of the input word one after another while change its states, and when after reading the last input letter the current state is a final one, we say that the automaton accepts the given word. The set of words accepted by such an automaton is called the language accepted (recognized) by the automaton.

Figure 1.1.  Finite automaton.

Finite automaton.

Definition 1.9 A nondeterministic finite automaton (NFA) is a system , where

is a finite, nonempty set of states,

is the input alphabet,

is the set of transitions (or of edges), where ,

is the set of initial states,

is the set of final states.

An NFA is in fact a directed, labelled graph, whose vertices are the states and there is a (directed) edge labelled with from vertex to vertex if . Among vertices some are initial and some final states. Initial states are marked by a small arrow entering the corresponding vertex, while the final states are marked with double circles. If two vertices are joined by two edges with the same direction then these can be replaced by only one edge labelled with two letters. This graph can be called a transition graph.

Example 1.9 Let , where , ,

,

The automaton can be seen in Fig. 1.2.

Figure 1.2.  The finite automaton of Example 1.9.

The finite automaton of Example 1.9.


In the case of an edge vertex is the start-vertex, the end-vertex and the label. Now define the notion of the walk as in the case of graphs. A sequence

of edges of a NFA is a walk with the label . If then and . Such a walk is called an empty walk. For a walk the notation

will be used, or if then we write shortly . Here is the start-vertex and the end-vertex of the walk. The states in a walk are not necessary distinct. A walk is productive if its start-vertex is an initial state and its end-vertex is a final state. We say that an NFA accepts or recognizes a word if this word is the label of a productive walk. The empty word is accepted by an NFA if there is an empty productive walk, i.e. there is an initial state which is also a final state.

Figure 1.3.  Nondeterministic finite automata.

Nondeterministic finite automata.

Figure 1.4.  Transition tables of the NFA in Fig. 1.3

Transition tables of the NFA in Fig. 1.3


The set of words accepted by an NFA will be called the language accepted by this NFA. The language accepted or recognized by NFA is

The NFA and are equivalent if .

Sometimes it is useful the following transition function:

This function associate to a state and input letter the set of states in which the automaton can go if its current state is and the head is on input letter .

Denote by the cardinal (the number of elements) of .

Footnote. The same notation is used for the cardinal of a set and length of a word, but this is no matter of confusion because for word we use lowercase letters and for set capital letters. The only exception is , but this could not be confused with a word.

An NFA is a deterministic finite automaton (DFA) if

In Fig. 1.2 a DFA can be seen.

Condition can be replaced by

If for a DFA for each state and for each letter then it is called a complete DFA.

Every DFA can be transformed in a complete DFA by introducing a new state, which can be called a snare state. Let be a DFA. An equivalent and complete DFA will be , where is the new state and . It is easy to see that . Using the transition function we can easily define the transition table. The rows of this table are indexed by the elements of , its columns by the elements of . At the intersection of row and column we put . In the case of Fig. 1.2, the transition table is:

The NFA in Fig. 1.3 are not deterministic: the first (automaton A) has two initial states, the second (automaton B) has two transitions with from state (to states and ). The transition table of these two automata are in Fig. 1.4. is set of words over which do not begin with two zeroes (of course is in language), is the set of words which contain as a subword.

Eliminating inaccessible states.

Let be a finite automaton. A state is accessible if it is on a walk which starts by an initial state. The following algorithm determines the inaccessible states building a sequence , , of sets, where is the set of initial states, and for any is the set of accessible states, which are at distance at most from an initial state.

Inaccessible-States(A)

  1     2     3  
                     REPEAT
                     4       5    
                     FOR
                   all    6       
                     DO
                   
                  
                     FOR
                   all    7          
                     DO
                   
                     8  
                     UNTIL
                   
                     9    10  
                     RETURN
                   
                   
               

The inaccessible states of the automaton can be eliminated without changing the accepted language.

If and then the running time of the algorithm (the number of steps) in the worst case is , because the number of steps in the two embedded loops is at most and in the loop rEPEAT at most .

Set has the property that if and only if . The above algorithm can be extended by inserting the condition to decide if language is or not empty.

Eliminating nonproductive states.

Let be a finite automaton. A state is productive if it is on a walk which ends in a terminal state. For finding the productive states the following algorithm uses the function :

This function for a state and a letter gives the set of all states from which using this letter the automaton can go into the state .

Nonproductive-States(A)

  1     2     3  
                     REPEAT
                     4       5    
                     FOR
                   all    6       
                     DO
                   
                  
                     FOR
                   all    7          
                     DO
                   
                     8  
                     UNTIL
                   
                     9    10  
                     RETURN
                   
                   
               

The nonproductive states of the automaton can be eliminated without changing the accepted language.

If is the number of states, the number of letters in the alphabet, then the running time of the algorithm is also as in the case of the algorithm INACCESSIBLE-STATES .

The set given by the algorithm has the property that if and only if . So, by a little modification it can be used to decide if language is or not empty.

1.2.1. 1.2.1 Transforming nondeterministic finite automata

As follows we will show that any NFA can be transformed in an equivalent DFA.

Theorem 1.10 For any NFA one may construct an equivalent DFA.

Proof. Let be an NFA. Define a DFA , where

,

edges of are those triplets for which are not empty, and ,

,

.

We prove that .

a) First prove that . Let . Then there exists a walk

Using the transition function of NFA we construct the sets , . Then and since we get , so . Thus, there exists a walk

There are sets for which , and for we have , and

is a productive walk. Therefore . That is .

b) Now we show that . Let . Then there is a walk

Using the definition of we have , i.e. there exists , that is by the definitions of and there is such that . Similarly, there are the states such that , where , thus, there is a walk

so .

In constructing DFA we can use the corresponding transition function :

The empty set was excluded from the states, so we used here instead of .

Figure 1.5.  The equivalent DFA with NFA The equivalent DFA with NFA \mathrm{A} in Fig. 1.3. in Fig. 1.3.

The equivalent DFA with NFA \mathrm{A} in Fig. 1.3.

Example 1.10 Apply Theorem 1.10 to transform NFA in Fig. 1.3. Introduce the following notation for the states of the DFA:

where is the initial state. Using the transition function we get the transition table:

This automaton contains many inaccessible states. By algorithm INACCESSIBLE-STATES we determine the accessible states of DFA:

.

Initial state is also a final state. States and are final states. States are inaccessible and can be removed from the DFA. The transition table of the resulted DFA is

The corresponding transition graph is in Fig. 1.5.

The algorithm given in Theorem 1.10 can be simplified. It is not necessary to consider all subset of the set of states of NFA. The states of DFA can be obtained successively. Begin with the state and determine the states for all . For the newly obtained states we determine the states accessible from them. This can be continued until no new states arise.

In our previous example is the initial state. From this we get

The transition table is

which is the same (excepted the notation) as before.

The next algorithm will construct for an NFA the transition table of the equivalent DFA , but without to determine the final states (which can easily be included). Value of ISIN () in the algorithm is true if state is already in and is false otherwise. Let be an ordered list of the letters of .

NFA-DFA(A)

  1     2     3        
                      
                      counts the rows.   4        
                      
                      counts the states.   5  
                        REPEAT
                        6    
                        FOR
                      
                           
                      
                      counts the columns.   7       
                        DO
                      
                        8          
                        IF
                      
                        9             
                        THEN
                      
                     
                        IF
                      
                     
                        ISIN
                     ()  10                
                        THEN
                      
                       11                
                        ELSE
                      
                       12                     13                     14                     15             
                        ELSE
                      
                       16      17  
                        UNTIL
                      
                       18  
                        RETURN
                      transition table  of  
                  

Since loop rEPEAT is executed as many times as the number of states of new automaton, in worst case the running time can be exponential, because, if the number of states in NFA is , then DFA can have even states. (The number of subsets of a set of elements is , including the empty set.)

Theorem 1.10 will have it that to any NFA one may construct an equivalent DFA. Conversely, any DFA is also an NFA by definition. So, the nondeterministic finite automata accepts the same class of languages as the deterministic finite automata.

1.2.2. 1.2.2 Equivalence of deterministic finite automata

In this subsection we will use complete deterministic finite automata only. In this case has a single element. In formulae, sometimes, instead of set we will use its single element. We introduce for a set the function which give us the single element of set , so . Using walks which begin with the initial state and have the same label in two DFA's we can determine the equivalence of these DFA's. If only one of these walks ends in a final state, then they could not be equivalent.

Consider two DFA's over the same alphabet and . We are interested to determine if they are or not equivalent. We construct a table with elements of form , where and . Beginning with the second column of the table, we associate a column to each letter of the alphabet . If the first element of the th row is then at the cross of th row and the column associated to letter will be the pair .

In the first column of the first row we put and complete the first row using the above method. If in the first row in any column there occur a pair of states from which one is a final state and the other not then the algorithm ends, the two automata are not equivalent. If there is no such a pair of states, every new pair is written in the first column. The algorithm continues with the next unfilled row. If no new pair of states occurs in the table and for each pair both of states are final or both are not, then the algorithm ends and the two DFA are equivalent.

If , and then taking into account that in worst case loop rEPEAT is executed times, loop fOR times, the running time of the algorithm in worst case will be , or if then .

Our algorithm was described to determine the equivalence of two complete DFA's. If we have to determine the equivalence of two NFA's, first we transform them into complete DFA's and after this we can apply the above algorithm.

DFA-Equivalence( )

  1  write in the first column of the first row the pair    2     3  
                        REPEAT
                        4       5    let  be the pair in the first column of the th row   6    
                        FOR
                      all    7       
                        DO
                      write in the column associated to  in the th row                 the pair   8          
                        IF
                      one state in  is final                    and the other not  9             
                        THEN
                      
                     
                        RETURN
                      NO  10             
                        ELSE
                      write pair                        in the next empty row of the first column,                      if not occurred already in the first column 11  
                        UNTIL
                      the first element of th row becomes empty  12  
                        RETURN
                      YES 

Example 1.11 Determine if the two DFA's in Fig. 1.6 are equivalent or not. The algorithm gives the table

The two DFA's are equivalent because all possible pairs of states are considered and in every pair both states are final or both are not final.

Figure 1.6.  Equivalent DFA's (Example 1.11).

Equivalent DFA's (Example 1.11).

Example 1.12 The table of the two DFA's in Fig. 1.7 is:

These two DFA's are not equivalent, because in the last column of the second row in the pair the first state is final and the second not.

Figure 1.7.  Non equivalent DFA's (Example 1.12).

Non equivalent DFA's (Example 1.12).

1.2.3. 1.2.3 Equivalence of finite automata and regular languages.

We have seen that NFA's accept the same class of languages as DFA's. The following theorem states that this class is that of regular languages.

Theorem 1.11 If is a language accepted by a DFA, then one may construct a regular grammar which generates language .

Proof. Let be the DFA accepting language , that is . Define the regular grammar with the productions:

If for and , then put production in .

If and , then put also production in .

Prove that .

Let and . Thus, since accepts word , there is a walk

Then there are in the productions

(in the right-hand side of the last production does not occur, because ), so there is the derivation

Therefore, .

Conversely, let and . Then there exists a derivation

in which productions

were used, which by definition means that in DFA there is a walk

and since is a final state, .

If the DFA accepts also the empty word , then in the above grammar we introduce a new start symbol instead of , consider the new production and for each production introduce also .

Example 1.13 Let be a DFA, where . The corresponding transition table is

The transition graph of is in Fig. 1.8. By Theorem 1.11 we define regular grammar with the productions in

One may prove that .

Figure 1.8.  DFA of the Example 1.13.

DFA of the Example 1.13.


The method described in the proof of Theorem 1.11 easily can be given as an algorithm. The productions of regular grammar obtained from the DFA can be determined by the following algorithm.

Regular-Grammar-from-DFA(A)

  1     2  
                        FOR
                      all    3    
                        DO
                      
                     
                        FOR
                      all    4       
                        DO
                      
                     
                        FOR
                      all    5          
                        DO
                      
                     
                        IF
                      
                        6             
                        THEN
                      
                        7                
                        IF
                      
                        8                   
                        THEN
                      
                        9  
                        IF
                      
                       10    
                        THEN
                      
                       11  
                        RETURN
                      
                      
                  

It is easy to see that the running time of the algorithm is , if the number of states is and the number of letter in alphabet is . In lines 2–4 we can consider only one loop, if we use the elements of . Then the worst case running time is , where is the number of transitions of DFA. This is also , since all transitions are possible. This algorithm is:

Regular-Grammar-from-Dfa'(A)

  1     2  
                        FOR
                      all    3    
                        DO
                      
                        4       
                        IF
                      
                        5          
                        THEN
                      
                        6  
                        IF
                      
                        7    
                        THEN
                      
                        8  
                        RETURN
                      
                      
                  

Theorem 1.12 If is a regular language, then one may construct an NFA that accepts language .

Proof. Let be the grammar which generates language . Define NFA :

, where (i.e. is a new symbol),

For every production , define transition in .

For every production , define transition in .

Prove that .

Let , . Then there is in a derivation of word :

This derivation is based on productions

Then, by the definition of the transitions of NFA there exists a walk

Thus, . If , there is production , but in this case the initial state is also a final one, so . Therefore, .

Let now . Then there exists a walk

If is the empty word, then instead of we have in the above formula , which also is a final state. In other cases only can be as last symbol. Thus, in there exist the productions

and there is the derivation

thus, and therefore .

Example 1.14 Let be a regular grammar. The NFA associated is , where . The corresponding transition table is

The transition graph is in Fig. 1.9. This NFA can be simplified, states and can be contracted in one final state.

Figure 1.9.  NFA associated to grammar in Example 1.14.

NFA associated to grammar in Example 1.14.


Using the above theorem we define an algorithm which associate an NFA to a regular grammar .

NFA-from-Regular-Grammar(A)

  1     2     3  
                        FOR
                      all    4    
                        DO
                      
                     
                        FOR
                      all    5       
                        DO
                      
                     
                        IF
                      
                        6          
                        THEN
                      
                        7       
                        FOR
                      all    8          
                        DO
                      
                     
                        IF
                      
                        9             
                        THEN
                      
                       10  
                        IF
                      
                       11    
                        THEN
                      
                       12    
                        ELSE
                      
                       13  
                        RETURN
                      A 

As in the case of algorithm REGULAR-GRAMMAR-FROM-DFA , the running time is , where is number of nonterminals and the number of terminals. Loops in lines 3, 4 and 7 can be replaced by only one, which uses productions. The running time in this case is better and is equal to , if is the number of productions. This algorithm is:

NFA-from-Regular-Grammar'(A)

  1     2     3  
                        FOR
                      all    4    
                        DO
                      
                     
                        IF
                      
                        5       
                        THEN
                      
                        6    
                        IF
                      
                        7       
                        THEN
                      
                        8  
                        IF
                      
                        9    
                        THEN
                      
                       10    
                        ELSE
                      
                       11  
                        RETURN
                      A 

From Theorems 1.10, 1.11 and 1.12 results that the class of regular languages coincides with the class of languages accepted by NFA's and also with class of languages accepted by DFA's. The result of these three theorems is illustrated in Fig. 1.10 and can be summarised also in the following theorem.

Figure 1.10.  Relations between regular grammars and finite automata. To any regular grammar one may construct an NFA which accepts the language generated by that grammar. Any NFA can be transformed in an equivalent DFA. To any DFA one may construct a regular grammar which generates the language accepted by that DFA.

Relations between regular grammars and finite automata. To any regular grammar one may construct an NFA which accepts the language generated by that grammar. Any NFA can be transformed in an equivalent DFA. To any DFA one may construct a regular grammar which generates the language accepted by that DFA.

Theorem 1.13 The following three class of languages are the same:

the class of regular languages,

the class of languages accepted by DFA's,

the class of languages accepted by NFA's.

1.2.3.1.  Operation on regular languages

It is known (see Theorem 1.8) that the set of regular languages is closed under the regular operations, that is if are regular languages, then languages , and are also regular. For regular languages are true also the following statements.

The complement of a regular language is also regular. This is easy to prove using automata. Let be a regular language and let be a DFA which accepts language . It is easy to see that the DFA accepts language . So, is also regular.

The intersection of two regular languages is also regular. Since , the intersection is also regular.

The difference of two regular languages is also regular. Since , the difference is also regular.

1.2.4. 1.2.4 Finite automata with special moves

A finite automaton with -moves (FA with -moves) extends NFA in such way that it may have transitions on the empty input , i.e. it may change a state without reading any input symbol. In the case of a FA with -moves for the set of transitions it is true that .

The transition function of a FA with -moves is:

The FA with -moves in Fig. 1.11 accepts words of form , where and .

Figure 1.11.  Finite automata with Finite automata with \epsilon -moves.-moves.

Finite automata with \epsilon -moves.

Theorem 1.14 To any FA with -moves one may construct an equivalent NFA (without -moves).

Let be an FA with -moves and we construct an equivalent NFA . The following algorithm determines sets and . For a state denote by the set of states (including even ) in which one may go from using -moves only. This may be extended also to sets

Clearly, for all and both and may be computed. Suppose in the sequel that these are given.

The following algorithm determine the transitions using the transition function , which is defined in line 5.

If and , then lines 2–6 show that the running time in worst case is .

Eliminate-Epsilon-Moves(A)

  1     2  
                        FOR
                      all    3    
                        DO
                      
                     
                        FOR
                      all    4       
                        DO
                      
                        5             6     7  
                        RETURN
                      
                      
                  

Example 1.15 Consider the FA with -moves in Fig. 1.11. The corresponding transition table is:

Apply algorithm ELIMINATE-EPSILON-MOVES .

, ,

, and its intersection with is not empty, thus .

,

.

,

,

,

,

.

The transition table of NFA is:

and the transition graph is in Fig. 1.12.

Figure 1.12.  NFA equivalent to FA with NFA equivalent to FA with \varepsilon -moves given in Fig. 1.11.-moves given in Fig. 1.11.

NFA equivalent to FA with \varepsilon -moves given in Fig. 1.11.


Define regular operations on NFA: union, product and iteration. The result will be an FA with -moves.

Operation will be given also by diagrams. An NFA is given as in Fig. 1.13(a). Initial states are represented by a circle with an arrow, final states by a double circle.

Figure 1.13.  (a) Representation of an NFA. Initial states are represented by a circle with an arrow, final states by a double circle. (b) Union of two NFA's.

(a) Representation of an NFA. Initial states are represented by a circle with an arrow, final states by a double circle. (b) Union of two NFA's.


Let and be NFA. The result of any operation is a FA with -moves . Suppose that always. If not, we can rename the elements of any set of states.

Union. , where

,

,

,

,

.

For the result of the union see Fig. 1.13(b). The result is the same if instead of a single initial state we choose as set of initial states the union . In this case the result automaton will be without -moves. By the definition it is easy to see that .

Product. , where

,

,

,

,

.

For the result automaton see Fig. 1.14(a). Here also .

Figure 1.14.  (a) Product of two FA. (b) Iteration of an FA.

(a) Product of two FA. (b) Iteration of an FA.

Iteration. , where

,

,

,

,

.

The iteration of an FA can be seen in Fig. 1.14(b). For this operation it is also true that .

The definition of these tree operations proves again that regular languages are closed under the regular operations.

1.2.5. 1.2.5 Minimization of finite automata

A DFA is called minimum state automaton if for any equivalent complete DFA it is true that . We give an algorithm which builds for any complete DFA an equivalent minimum state automaton.

Figure 1.15.  Minimization of DFA.

Minimization of DFA.


States and of an DFA are equivalent if for arbitrary word we reach from both either final or nonfinal states, that is

if for any word

If two states are not equivalent, then they are distinguishable. In the following algorithm the distinguishable states will be marked by a star, and equivalent states will be merged. The algorithm will associate list of pair of states with some pair of states expecting a later marking by a star, that is if we mark a pair of states by a star, then all pairs on the associated list will be also marked by a star. The algorithm is given for DFA without inaccessible states. The used DFA is complete, so contains exact one element, function defined in Subsection 1.2.2, which gives the unique element of the set, will be also used here.

Automaton-Minimization( )

  1  mark with a star all pairs of states  for which  and  or  and    2  associate an empty list with each unmarked pair    3  
                        FOR
                      all unmarked pair of states  and for all symbol            examine pairs of states           
                     
                        IF
                      any of these pairs is marked,             
                        THEN
                      mark also pair  with all the elements on the list before                associated with pair              
                     
                        ELSE
                      
                     
                        IF
                      all the above pairs are unmarked                
                        THEN
                      put pair  on each list associated with pairs                   , unless   4  merge all unmarked (equivalent) pairs 

Figure 1.16.  Minimum automaton equivalent with DFA in Fig. 1.15.

Minimum automaton equivalent with DFA in Fig. 1.15.


After finishing the algorithm, if a cell of the table does not contain a star, then the states corresponding to its row and column index, are equivalent and may be merged. Merging states is continued until it is possible. We can say that the equivalence relation decomposes the set of states in equivalence classes, and the states in such a class may be all merged.

Remark. The above algorithm can be used also in the case of an DFA which is not complete, that is there are states for which does not exist transition. Then a pair may occur, and if is a final state, consider this pair marked.

Example 1.16 Let be the DFA in Fig. 1.15. We will use a table for marking pairs with a star. Marking pair means putting a star in the cell corresponding to row and column (or row and column ).

First we mark pairs , , , and (because is the single final state). Then consider all unmarked pairs and examine them as the algorithm requires. Let us begin with pair . Associate with it pairs , that is , . Because pair is already marked, mark also pair .

In the case of pair the new pairs are and . With pair associate pair on a list, that is

Now continuing with one obtain pairs and , with which nothing are associated by algorithm.

Continue with pair . The associated pairs are and . None of them are marked, so associate with them on a list pair , that is

Now continuing with we get the pairs and , and because this latter is marked we mark pair and also pair associated to it on a list. Continuing we will get the table in Fig. 1.15, that is we get that and . After merging them we get an equivalent minimum state automaton (see Fig. 1.16).

1.2.6. 1.2.6 Pumping lemma for regular languages

The following theorem, called pumping lemma for historical reasons, may be efficiently used to prove that a language is not regular. It is a sufficient condition for a regular language.

Theorem 1.15 (pumping lemma) For any regular language there exists a natural number (depending only on ), such that any word of with length at least may be written as such that

(1) ,

(2) ,

(3) for all .

Proof. If is a regular language, then there is such an DFA which accepts (by Theorems 1.12 and 1.10). Let be this DFA, so . Let be the number of its states, that is . Let and . Then, because the automaton accepts word , there are states and walk

Figure 1.17.  Sketch of DFA used in the proof of the pumping lemma.

Sketch of DFA used in the proof of the pumping lemma.


Because the number of states is and , by the pigeonhole principle states can not all be distinct (see Fig. 1.17), there are at least two of them which are equal.

Footnote. Pigeonhole principle: If we have to put more than objects into boxes, then at least one box will contain at least two objects.

Let , where and is the least such index. Then . Decompose word as:

.

This decomposition immediately yields to and . We will prove that for any .

Because , there exists an walk

and because of , this may be written also as

From this walk can be omitted or can be inserted many times. So, there are the following walks:

Therefore for all , and this proves the theorem.

Example 1.17 We use the pumping lemma to show that is not regular. Assume that is regular, and let be the corresponding natural number given by the pumping lemma. Because the length of the word is , this word can be written as in the lemma. We prove that this leads to a contradiction. Let be the decomposition as in the lemma. Then , so and can contain no other letters than , and because we must have , word contains at least one . Then for will contain a different number of 's and 's, therefore for any . This is a contradiction with the third assertion of the lemma, this is why that assumption that is regular, is false. Therefore .

Because the context-free grammar generates language , we have . From these two follow that .

Example 1.18 We show that is not regular. ( is the number of 0's in , while the number of 1's).

We proceed as in the previous example using here word , where is the natural number associated by lemma to language .

Example 1.19 We prove, using the pumping lemma, that is not a regular language. Let be, where here is also the natural number associated to by the pumping lemma. From we have that contains no other letters than , but it contains at least one. By lemma we have , that is not possible. Therefore is not regular.

Pumping lemma has several interesting consequences.

Corollary 1.16 Regular language is not empty if and only if there exists a word , , where is the natural number associated to by the pumping lemma.

Proof. The assertion in a direction is obvious: if there exists a word shorter than in , then . Conversely, let and let be the shortest word in . We show that . If , then we apply the pumping lemma, and give the decomposition , and . This is a contradiction, because and is the shortest word in . Therefore .

Corollary 1.17 There exists an algorithm that can decide if a regular language is or not empty.

Proof. Assume that , where is a DFA. By Consequence 1.16 and Theorem 1.15 language is not empty if and only if it contains a word shorter than , where is the number of states of automaton . By this it is enough to decide that there is a word shorter than which is accepted by automaton . Because the number of words shorter than is finite, the problem can be decided.

When we had given an algorithm for inaccessible states of a DFA, we remarked that the procedure can be used also to decide if the language accepted by that automaton is or not empty. Because finite automata accept regular languages, we can consider to have already two procedures to decide if a regular languages is or not empty. Moreover, we have a third procedure, if we take into account that the algorithm for finding productive states also can be used to decide on a regular language when it is empty.

Corollary 1.18 A regular language is infinite if and only if there exists a word such that , where is the natural number associated to language , given by the pumping lemma.

Proof. If is infinite, then it contains words longer than , and let be the shortest word longer than in . Because is regular we can use the pumping lemma, so , where , thus is also true. By the lemma . But because and the shortest word in longer than is , we get . From we get also .

Conversely, if there exists a word such that , then using the pumping lemma, we obtain that , and for any , therefore is infinite.

Now, the question is: how can we apply the pumping lemma for a finite regular language, since by pumping words we get an infinite number of words? The number of states of a DFA accepting language is greater than the length of the longest word in . So, in there is no word with length at least , when is the natural number associated to by the pumping lemma. Therefore, no word in can be decomposed in the form , where , , , and this is why we can not obtain an infinite number of words in .

1.2.7. 1.2.7 Regular expressions

In this subsection we introduce for any alphabet the notion of regular expressions over and the corresponding representing languages. A regular expression is a formula, and the corresponding language is a language over . For example, if , then , , are regular expressions over which represent respectively languages , , . The exact definition is the following.

Definition 1.19 Define recursively a regular expression over and the language it represent.

is a regular expression representing the empty language.

is a regular expression representing language .

If , then is a regular expression representing language .

If , are regular expressions representing languages and respectively, then , , are regular expressions representing languages , and respectively.

Regular expression over can be obtained only by using the above rules a finite number of times.

Some brackets can be omitted in the regular expressions if taking into account the priority of operations (iteration, product, union) the corresponding languages are not affected. For example instead of we can consider .

Two regular expressions are equivalent if they represent the same language, that is if , where and are the languages represented by regular expressions and respectively. Figure 1.18 shows some equivalent expressions.

Figure 1.18.  Properties of regular expressions.

Properties of regular expressions.


We show that to any finite language can be associated a regular expression which represent language . If , then . If , then , where for any expression is a regular expression representing language . This latter can be done by the following rule. If , then , else if , where depends on , then , where the brackets are omitted. We prove the theorem of Kleene which refers to the relationship between regular languages and regular expression.

Theorem 1.20 (Kleene's theorem) Language is regular if and only if there exists a regular expression over representing language .

Proof. First we prove that if is a regular expression, then language which represents is also regular. The proof will be done by induction on the construction of expression.

If , , , then , , respectively. Since is finite in all three cases, it is also regular.

If , then , where and are the languages which represent the regular expressions and respectively. By the induction hypothesis languages and are regular, so is also regular because regular languages are closed on union. Cases and can be proved by similar way.

Conversely, we prove that if is a regular language, then a regular expression can be associated to it, which represent exactly the language . If is regular, then there exists a DFA for which . Let the states of the automaton . Define languages for all and . is the set of words, for which automaton goes from state to state without using any state with index greater than . Using transition graph we can say: a word is in , if from state we arrive to state following the edges of the graph, and concatenating the corresponding labels on edges we get exactly that word, not using any state . Sets can be done also formally:

, if ,

,

for all .

We can prove by induction that sets can be described by regular expressions. Indeed, if , then for all and languages are finite, so they can be expressed by regular expressions representing exactly these languages. Moreover, if for all and language can be expressed by regular expression, then language can be expressed also by regular expression, which can be corresponding constructed from regular expressions representing languages , , and respectively, using the above formula for .

Finally, if is the set of final states of the DFA , then can be expressed by a regular expression obtained from expressions representing languages using operation .

Further on we give some procedures which associate DFA to regular expressions and conversely regular expression to DFA.

1.2.7.1.  Associating regular expressions to finite automata.

We present here three methods, each of which associate to a DFA the corresponding regular expression.

Method 1. Using the result of the theorem of Kleene, we will construct the sets , and write a regular expression which represents the language , where is the set of final states of the automaton.

Figure 1.19.  DFA from Example 1.20, to which regular expression is associated by Method 1.

DFA from Example 1.20, to which regular expression is associated by Method 1.

Figure 1.20.  DFA in Example 1.21 to which a regular expression is associated by Method 1. The computation are in Figure 1.21.

DFA in Example 1.21 to which a regular expression is associated by Method 1. The computation are in Figure 1.21.

Figure 1.21.  Determining a regular expression associated to DFA in Figure 1.20 using sets Determining a regular expression associated to DFA in Figure 1.20 using sets R_{ij}^{k} ..

Determining a regular expression associated to DFA in Figure 1.20 using sets R_{ij}^{k} .

Example 1.20 Consider the DFA in Fig. 1.19.

Then the regular expression corresponding to is .

Example 1.21 Find a regular expression associated to DFA in Fig. 1.20. The computations are in Figure 1.21. The regular expression corresponding to is .

Method 2. Now we generalize the notion of finite automaton, considering words instead of letters as labels of edges. In such an automaton each walk determine a regular expression, which determine a regular language. The regular language accepted by a generalized finite automaton is the union of regular languages determined by the productive walks. It is easy to see that the generalized finite automata accept regular languages.

The advantage of generalized finite automata is that the number of its edges can be diminuted by equivalent transformations, which do not change the accepted language, and leads to a graph with only one edge which label is exactly the accepted language. The possible equivalent transformations can be seen in Fig. 1.22. If some of the vertices 1, 2, 4, 5 on the figure coincide, in the result they are merged, and a loop will arrive.

First, the automaton is transformed by corresponding -moves to have only one initial and one final state. Then, applying the equivalent transformations until the graph will have only one edge, we will obtain as the label of this edge the regular expression associated to the automaton.

Figure 1.22.  Possible equivalent transformations for finding regular expression associated to an automaton.

Possible equivalent transformations for finding regular expression associated to an automaton.

Figure 1.23.  Transformation of the finite automaton in Fig. 1.19.

Transformation of the finite automaton in Fig. 1.19.

Example 1.22 In the case of Fig. 1.19 the result is obtained by steps illustrated in Fig. 1.23. This result is , which represents the same language as obtained by Method 1 (See example 1.20).

Example 1.23 In the case of Fig. 1.20 is not necessary to introduce new initial and final state. The steps of transformations can be seen in Fig. 1.23. The resulted regular expression can be written also as , which is the same as obtained by the previous method.

Figure 1.24.  Steps of Example 1.23.

Steps of Example 1.23.

Method 3. The third method for writing regular expressions associated to finite automata uses formal equations. A variable is associated to each state of the automaton (to different states different variables). Associate to each state an equation which left side contains , its right side contains sum of terms of form or , where is a variable associated to a state, and is its corresponding input symbol. If there is no incoming edge in the state corresponding to then the right side of the equation with left side contains , otherwise is the sum of all terms of the form for which there is a transition labelled with letter from state corresponding to to the state corresponding to . If the state corresponding to is also an initial and a final state, then on right side of the equation with the left side will be also a term equal to . For example in the case of Fig. 1.20 let these variable corresponding to the states . The corresponding equation are

.

If an equation is of the form , where are arbitrary words not containing , then it is easy to see by a simple substitution that is a solution of the equation.

Because these equations are linear, all of them can be written in the form or , where do not contain any variable. Substituting this in the other equations the number of remaining equations will be diminuted by one. In such a way the system of equation can be solved for each variable.

The solution will be given by variables corresponding to final states summing the corresponding regular expressions.

In our example from the first equation we get . From here , or , and solving this we get . Variable can be obtained immediately and we obtain .

Using this method in the case of Fig. 1.19, the following equations will be obtained

Therefore

.

Adding the two equations we will obtain

, from where (considering as and as ) we get the result

.

From here the value of after the substitution is

,

which is equivalent to the expression obtained using the other methods.

1.2.7.2.  Associating finite automata to regular expressions.

Associate to the regular expression a generalized finite automaton:

After this, use the transformations in Fig. 1.25 step by step, until an automaton with labels equal to letters from or will be obtained.

Example 1.24 Get started from regular expression . The steps of transformations are in Fig. 1.26(a)-(e). The last finite automaton (see Fig. 1.26(e)) can be done in a simpler form as can be seen in Fig. 1.26(f). After eliminating the -moves and transforming in a deterministic finite automaton the DFA in Fig. 1.27 will be obtained, which is equivalent to DFA in Fig. 1.19.

Figure 1.25.  Possible transformations to obtain finite automaton associated to a regular expression.

Possible transformations to obtain finite automaton associated to a regular expression.

Figure 1.26.  Associating finite automaton to regular expression Associating finite automaton to regular expression \varepsilon+(0+1)^{*}1 ..

Associating finite automaton to regular expression \varepsilon+(0+1)^{*}1 .

Figure 1.27.  Finite automaton associated to regular expression Finite automaton associated to regular expression \varepsilon+(0+1)^{*}1 ..

Finite automaton associated to regular expression \varepsilon+(0+1)^{*}1 .

Exercises

1.2-1 Give a DFA which accepts natural numbers divisible by 9.

1.2-2 Give a DFA which accepts the language containing all words formed by

a. an even number of 0's and an even number of 1's,

b. an even number of 0's and an odd number of 1's,

c. an odd number of 0's and an even number of 1's,

d. an odd number of 0's and an odd number of 1's.

1.2-3 Give a DFA to accept respectively the following languages:

.

1.2-4 Give an NFA which accepts words containing at least two 0's and any number of 1's. Give an equivalent DFA.

1.2-5 Minimize the DFA's in Fig. 1.28.

Figure 1.28.  DFA's to minimize for Exercise 1.2-5

DFA's to minimize for Exercise 1.2-5


1.2-6 Show that the DFA in 1.29.(a) is a minimum state automaton.

Figure 1.29.  Finite automata for Exercises 1.2-6 and 1.2-7.

Finite automata for Exercises 1.2-6 and 1.2-7.


1.2-7 Transform NFA in Fig. 1.29.(b) in a DFA, and after this minimize it.

1.2-8 Define finite automaton which accepts all words of the form (), and finite automaton which accepts all words of the form (). Define the union automaton , and then eliminate the -moves.

1.2-9 Associate to DFA in Fig. 1.30 a regular expression.

Figure 1.30.  DFA for Exercise1.2-9.

DFA for Exercise1.2-9.


1.2-10 Associate to regular expression a DFA.

1.2-11 Prove, using the pumping lemma, that none of the following languages are regular:

.

1.2-12 Prove that if is a regular language, then is also regular.

1.2-13 Prove that if is a regular language, then the following languages are also regular.

.

1.2-14 Show that the following languages are all regular.

,

,

.