The Chomsky hierarchy was described first by Noam Chomsky in 1956. It classifies the generative grammars based on the forms of their production rules. The Chomsky hierarchy also classifies languages, based on the classes of generative grammars generating them. The formal definition is provided below.

**Definition 6.** (Chomsky hierarchy) *Let G *= (* N, T, S, P *)* be
a generative grammar.*

*Type 0 or unrestricted grammars. Each generative grammar is unrestricted.**Type 1 or context-sensitive grammars. G is called context-sensitive, if all of its production rules have a form**p*_{1}*Ap*_{2}→*p*_{1}*qp*_{2},*or**S*→ λ,*where p*_{1}*, p*_{2}∈*V*∈^{ *}, A*N and q*∈*V*^{ +}.*If S*→ λ ∈*P then S does not appear in the right hand side word of any other rule.**Type 2 or context-free grammars. The grammar G is called context-free, if all of its productions have a form**A*→*p*,*where A*∈*N and p*∈*V*^{ *}.*Type 3 or regular grammars. G is called regular, if all of its productions have a form**A*→*r*,*or**A*→*rB*,*where A,B*∈*N and r*∈*T*^{ *}.

**Example 11.** * Let the generative grammar G*_{0}* be*

G_{0}= ({S,X},{x,y},S,P)P= {S→SXSy,XS→y,X→SXS,S→yxx}.

*This grammar is unrestricted, because the second rule is not a context-sensitive rule.*

**Example 12.** * Let the generative grammar G*_{1}* be*

G_{1}= ({S,X},{x,y},S,P)P= {S→SXSy,XSy→XyxXy,S→yXy,X→y}.

*This grammar is context-sensitive, because each production rule has an appropriate form.*

**Example 13.** * Let the generative grammar G*_{2}* be*

G_{2}= ({S,X},{x,y},S,P)P= {S→SyS,S→XX,S→yxy,X→ySy,X→ λ }.

*This grammar is context-free, because the left hand side of each production rule is a nonterminal symbol.*

**Example 14.** * Let the generative grammar G be*

G= ({S,X},{x,y},S,P)P= {S→xyS,S→X,X→yxS,S→x,X→ λ }.

*This grammar is regular, because the left hand side of each production rule is a nonterminal, and the right hand side
contains at most one nonterminal symbol, in last position.*

**Exercise 5.** *What is the type of the following grammars?*

1. G= ({S,A},{0,1},S,P)P= {S→ 0101A,S→ λ,A→ 1S,A→ 000 }.2. G= ({S,A,B},{0,1},S,P)P= {S→ 0A01B,S→ λ, 0A01 → 01A101,A→ 0BB1,B→ 1A1B,B→ 0011,A→ 1 }.3. G= ({S,A,B},{0,1},S,P)P= {S→ 0ABS1,S→ 10, 0AB→ 01SAB, 1SA→ 111,A→ 0,B→ 1 }.4. G= ({S,A},{0,1},S,P)P= {S→ 00A,S→ λ,A→ λ,A→S1S}.

**Definition 7.** *The language L is regular, if there exists a regular grammar G
such that L = L *(*G*).

The same kind of definitions are given for the context-free, context-sensitive, and recursively enumerable languages:

**Definition 8.** *The language L is context-free, if there exists a context-free
grammar G such that L = L *(*G*).

**Definition 9.** *The language L is context-sensitive, if there exists a context-sensitive grammar G
such that L = L *(*G*).

**Definition 10.** *The language L is called recursively enumerable, if there exists an unrestricted grammar
G such that L = L *(*G*).

Although these are the most often described classes of the Chomsky hierarchy, there are also a number of subclasses which are worth investigating. For example, in chapter 3 we introduce the linear languages. A grammar is linear, if it is context-free, and the right hand side of its rules contain maximum one nonterminal symbol. The class of linear languages is a proper subset of the class of context-free languages, and it is a proper superset of the regular language class. Example 15. shows a typical linear language and a typical linear grammar.

**Example 15.** *Let L be the language of the palindromes over the alphabet T *= {*a,b*}.
*(Palindromes are words that read the same backward or forward.) Language L is generated by the following linear grammar.*

G= ({S},{a,b},S,P)P= {S→aSa,S→bSb,S→a,S→b,S→ λ }.

*Language L is linear, because it can be generated by the linear grammar G.*

**Example 16.** *In Example 9 we can see a context-free grammar,
which is not linear, because the production A → AA is not a linear rule. However, this context-free grammar generates a regular language,
because the same language can be generated by the following regular grammar.*

G= ({S,A},{0,1},S,P, andP= {S→ 1A,A→ 1A,A→ 0A,A→ λ }.

It is obvious that context-sensitive grammars are unrestricted as well, because each generative grammar is unrestricted. It is also obvious that regular grammars are context-free as well, because in regular grammars the left hand side of each rule is a nonterminal, which is the only condition to be satisfied for a grammar in order to be context-free.

Let us investigate the case of the context-free and context sensitive grammars.

**Example 17.** *Let the grammar G be*

G= ({S,A},{x,y},S,P)P= {S→AxA,A→SyS,A→ λ,S→ λ }.

*This grammar is context-free, because the left hand side of each rule contains one nonterminal. At the same time, this grammar
is not context-sensitive, because*

*the rule A*→ λ*is not allowed in context-sensitive grammars,**if S*→ λ ∈*P, then the rule A → SyS is not allowed.*

This example shows that there are context-free grammars which are not context-sensitive. Although this statement holds for grammars, we can show that in the case of languages the Chomsky hierarchy is a real hierarchy, because each context-free language is context-sensitive as well. To prove this statement, let us consider the following theorem.

**Theorem 1.** *For each context-free grammar G we can give context-free grammar G',
which is context-sensitive as well, such that L *(*G*) = *L *(*G'*).

**Proof.** *We give a constructive proof of this theorem. We are going to show the necessary steps
to receive an equivalent context-sensitive grammar G' for a context-free grammar G = *( *N, T, S, P *).

First, we have to collect each nonterminal symbol from which the empty word can be derived. To do this, let the set

*U*(1) be*U*(1) = {*A*∣*A*∈*N, A*→ λ ∈*P*}. Then let*U*(*i*) =*U*(*i*-1) ∪ {*A*∣*A*→*B*_{1}*B*_{2}...*B*∈_{n}*P*,*B*_{1},*B*_{2},...,*B*∈_{n}*U*(*i*-1)}. Finally, we have an integer i such that*U*(*i*) =*U*(*i*-1) =*U*which contains all of the nonterminal symbols from which the empty word can be derived.Second, we are going to create a new set of rules. The right hand side of these rules should not contain the empty word. Let

*P'*= (*P*∪*P*_{1})\{*A*→ λ∣*A*∈*N*}. The set*P*_{1}contains production rules as well.*B*→*p*∈*P*_{1}if*B*→*q*∈*P*and we get*p*from*q*by removing some letters contained in set*U*.Finally, if

*S*∉*U*, then*G'*= (*N, T, S, P'*). If set*U*contains the start symbol, then the empty word can be derived from*S*, and λ∈*L*(*G*). In this case, we have to add a new start symbol to the set of nonterminals, and we also have to add two new productions to the set*P*_{1}, and*G'*= (*N*∪{*S'*},*T, S', P'*∪ {*S'*→ λ,*S'*→*S*}), so*G'*generates the empty word, and it is context sensitive.

QED.

**Example 18.** *Let the context-free grammar G be*

G= ({S,A,B},{x,y},S,P)P= {S→ASxB,S→AA,A→ λ,B→SyA}.

*Now, we create a context-sensitive generative grammar G', such that
L *(*G'*) = *L* (*G*).

1. U(1) = {A},U(2) = {A,S} =U.2. P'= {S→ASxB, S→SxB, S→AxB, S→xB,S→AA, S→A,B→SyA, B→yA, B→Sy, B→y}.3. G'= ({S,A,B,S'},{x,y},S',P'∪{S'→ λ,S'→S}).

**Exercise 6.** *Create a context-sensitive generative grammar G', such
that L *(*G'*) = *L *(*G*)*!*

G =({S,A,B,C},{a,b},S,P)P= {S→aAbB,S→aCCb,C→ λ,A→C,B→ACC,A→aSb}.

**Exercise 7.** *Create a context-sensitive generative grammar G',
such that L *(*G'*) = *L *(*G*)*!*

G= ({S,X,Y},{0,1},S,P)P= {S→ λ,S→XXY,X→Y0Y1,Y→ 1XS1,X→S00S}.

In the following chapters we are going to investigate the language classes of the Chomsky hierarchy. We will show algorithms to decide if a word is in a language generated by a generative grammar. We will learn methods which help to specify the position of a language in the Chomsky hierarchy, and we are going to investigate the closure properties of the different language classes.

We will introduce numerous kinds of automata. Some of them accept languages, and we can use them as an alternative language definition tool, however, some of them calculate output words from the input word, and we can see them as programmable computational tools. First of all, in the next chapter, we are going to deal with the most simple class of the Chomsky hierarchy, the class of regular languages.