Tartalom
Summary of the chapter: In this chapter, we deal with a family of languages generated by context-sensitive grammars of the Chomsky hierarchy, i.e., the context-sensitive languages. We are going to prove that monotone grammars generate the same language class. Normal forms of these grammars, such as Kuroda and Révész normal forms are provided. An example for a non-context-free, context-sensitive language is also given. The language class accepted by linear bounded automata coincides with the class of context-sensitive languages. This class is closed under the regular operations (union, concatenation, Kleene-star) and under the set theoretic operations complement and intersection. The word problem is solvable for these languages but no efficient algorithm is known for the general case.
For better understanding, we start this section by recalling the definition of context-sensitive grammars and languages.
Definition 26 (Context-sensitive grammars). A grammar G = (N, T, S, P) is context-sensitive if each of its productions has one of the following forms: pAq → puq, where A ∈ N, p, q ∈ (N ∪ T)*, u ∈ (N ∪ T)+; S → λ, and if S →λ ∈ P, then S does not occur in the right hand side of any rule in P. The languages that can be generated by context-sensitive grammars are the context-sensitive languages.
Example 53. Animation 9. shows an example for a context-sensitive grammar with a sample derivation.
We present yet another definition:
Definition 27. (Monotone grammars). A grammar G = (N, T, S, P) is monotone (or length-non-decreasing) if for each of its rules p → q (p ∈ (N ∪ T)* N (N∪T)*, q ∈ (N ∪ T)+)∣p∣ ≤ ∣q∣, but the possible rule S →λ. If S → λ is contained in P, then S does not occur in the right hand side of any rules of the grammar.
It is an interesting property of monotone grammars that the terminals can be rewritten: this definition is so general that it allows to this (if there is a nonterminal close to that terminal), for example, by rule aaaaB → cccCCcc (with a, c ∈ T; B, C ∈ N).
According to the definitions, it is obvious that every context-sensitive grammar is monotone. The opposite will also be proven in this section, but first we are going to investigate a few normal forms.
In this subsection, two normal forms are presented.
Definition 28. (Kuroda normal form). A monotone grammar G = (N, T, S, P) is in Kuroda normal form, if it is monotone, and each of its rules is in one of the following forms:
AB → CD, A → BC, A → B, A → a, S → λ
(A,B,C,D ∈ N, a ∈ T).
Since grammars in Kuroda normal form are monotone, in case S → λ is in the set of productions, the start symbol S cannot be in any right hand side of a rule. Kuroda normal form is a normal form, therefore we have the following theorem:
Theorem 25. There is an equivalent grammar in Kuroda normal form for every monotone grammar.
Proof. Let a monotone grammar G = (N, T, S, P) be given. The proof is constructive: we present an algorithmic way to obtain the grammar in Kuroda normal form that is equivalent to G. Since G is monotone, the generated language L(G) contains the empty word λ, if and only if there is a production S → λ in G. We need to deal only with the rules of the form p → q with ∣p∣ ≤ ∣q∣.
As a first step of our proof (algorithm), we obtain a grammar G'' that generates the same language as G, moreover, it has rules containing terminals only in rules of the form A → a. So for each terminal a let us introduce a new nonterminal Xa (Xa ∉ N), and replace each occurrence of all terminals in every rule by their new nonterminals (for example, a is replaced by Xa in every rule, both left and right hand side); and add the rules of the form Xa → a to the set of productions for each terminal:
N'' = N ∪{Xa∣a ∈ T},
G'' = (N'', T, S,
{p' → q'∣p → q ∈ P, and p' and q' are obtained from p and q, respectively, by replacing the occurrences of each terminal to the appropriate new nonterminal} ∪ {Xa → a∣a ∈ T}).
Observe that in G'' only nonterminals are rewritten. It can be seen that G' generates the same language as G, and the terminals can be derived only in the last steps of the derivations.
Now, if a rule of the monotone grammar G'' is not allowed to be in a grammar in Kuroda normal form, then this rule must have longer right hand side than it is allowed in Kuroda normal form (i.e., 2). Let us substitute each of these rules by a sequence of appropriate rules.
Let a rule A1... Am → B1... Bn in P. Based on the definition of monotone grammars it is clear that m ≤ n. Then
if n ≤ 2, then the rule is allowed in Kuroda normal form, and we leave it as is.
if m = 1 and n < 2, we can do the same replacement as we have done at the Chomsky normal form for context-free grammars (see the proof of Theorem 16.):
A1 → B1... Bn
is replaced by the set of rules
A1 → B1X1, X1 → B2X2, ... Xn-2 → Bn-1Bn,
where X1,...,Xn-2 are new nonterminals, were not in the grammar so far.
if m ≥ 2, n > 2, then
A1... Am → B1... Bn
is replaced by the set of rules
A1A2 → B1X1, X1A3 → B2X2, ... Xm-2Am → Bm-1Xm-1,
Xm-1 → BmXm, ... , Xn-2 → Bn-1Bn,
where X1,...,Xn-2 are new nonterminals, not used in the grammar before. See also Figure 5.1.
5.1. ábra - In derivations the rules with long right hand side are replaced by chains of shorter rules.
![]() |
The resulting grammar generates the same language as G, and it is in Kuroda normal form.
QED.
Example 54. Let
G = ({S, A, B, C}, {0,1,2}, S,
{S → ABAB00, ABA → A111A, A111 → B221, B → 2, B → CC, BB → CBA, C → S, C → 021 }).
Give a Kuroda normal form grammar that is equivalent to G.
Solution:
In the first step, by introducing the nonterminals D0, D1, D2 (using them instead of the terminals) we obtain G'' as follows:
G'' = ({S, A, B, C , D0, D1, D2}, {0,1,2}, S,
{S → ABABD0D0, ABA → AD1D1D1A, AD1D1D1 → BD2D2D1, B → D2, B → CC, BB → CBA, C → S, C → D0D2D1, D0 → 0, D1 → 1, D2 → 2 }).
Now, we need to replace the rules which have too many letters (having right hand side longer than 2):
the original rule | replaced by the set of rules | |
S → ABABD0D0: | S → AX1, X1 → BX2, X2 → AX3, X3 → BX4, X4 → D0D0, | |
ABA → AD1D1D1A: | AB → AX5, X5A → D1X6, X6 → D1X7, X7 → D1A, | |
AD1D1D1 → BD2D2D1: | AD1 → BX8, X8D1 → D2X9, X9D1 → D2D1, | |
BB → CBA: | BB → CX10, X10 → BA, | |
C → D0D2D1: | C → D0X11, X11 → D2D1. |
All the other rules are kept, thus we have the solution
G' = ({S, A, B, C, D0, D1, D2, X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11}, {0,1,2}, S,
{S → AX1, X1 → BX2, X2 → AX3, X3 → BX4, X4 → D0D0,
AB → AX5, X5A → D1X6, X6 → D1X7, X7 → D1A, AD1 → BX8,
X8D1 → D2X9, X9D1 → D2D1, B → D2, B → CC, BB → CX10, X10 → BA,
C → S, C → D0X11, X11 → D2D1, D0 → 0, D1 → 1, D2 → 2}).
The next observation was proven by György Révész, so this normal form is caled Révész normal form. Every rule AB → CD of a Kuroda normal form grammar can be replaced by a chain of rules
AB → AX, AX → YX, YX → YD, YD → CD,
where X and Y are newly introduced nonterminals that are used only in these rules in the new grammar.
Definition 29. (Révész normal form). A grammar G = (N, T, S, P) is in Révész normal form, if each of its rules is in one of the following forms:
AB → AC, AB → CB, A → BC, A → B, A → a, S → λ
(A,B,C ∈ N, a ∈ T and S does not occur in the right hand side of any rule if S → λ ∈ P).
By using Révész's observation the following result is obtained:
Theorem 26. There is an equivalent grammar in Révész normal form for every monotone grammar.
Example 55. Let
G = ({S, A, B}, {a,b,c}, S,
{S → BaB, BaB → BABa, A → BbB, A → c, B → BABB, B → AbbA, B → aB, B → c, S → λ }).
Give a Révész normal form grammar that is equivalent to G.
Solution:
First, we obtain grammar G' that is in Kuroda normal form and generates the same language as G. Thus,
G'' = ({S, A, B, Ca, Cb, Cc}, {a,b,c}, S,
{S → BCaB, BCaB → BABCa, A → BCbB, A → Cc, B → BABB, B → ACbCbA, B → CaB, B → Cc, S → λ }), and
G' = ({S, A, B, Ca, Cb, Cc, X1, X2, X3, X4, X5, X6, X7, X8}, {a,b,c}, S,
{S → BX1, X1 → CaB, BCa → BX2, X2B → AX3, X3 → BCa, A → BX4, X4 → CbB, A → Cc, B → BX5, X5 → AX6, X6 → BB, B → AX7, X7 → CbX8, X8 → CbA, B → CaB, B → Cc, S → λ }).
Further, we need to replace the following rule by a chain of rules:
the original rule | replaced by the set of rules | |
X2B → AX3 | X2B → X2Y1, X2Y1 → Y2Y1, Y2Y1 → Y2X3, Y2X3 → AX3. |
Thus, the Révész normal form grammar that is equivalent to G is
G''' = ({S, A, B, Ca, Cb, Cc, X1, X2, X3, X4, X5, X6, X7, X8, Y1, X2}, {a,b,c}, S,
{S → BX1, X1 → CaB, BCa → BX2,
X2B → X2Y1, X2Y1 → Y2Y1, Y2Y1 → Y2X3, Y2X3 → AX3
X3 → BCa, A → BX4, X4 → CbB, A → Cc,
B → BX5, X5 → AX6, X6 → BB,
B → AX7, X7 → CbX8, X8 → CbA, B → CaB, B → Cc, S → λ}).
One may observe that grammars in Révész normal form satisfy the conditions of the definition of context-sensitive grammars, and thus one can construct an equivalent context-sensitive grammar for any monotone grammar, i.e., the following theorem is proven.
Theorem 27. The class of languages generated by monotone grammars coincides with the class of context-sensitive languages.
By the previous theorem, we may use any monotone grammar to generate a context-sensitive language.
As we have shown by the Empty-word lemma (Theorem 1.) every context-free language can be generated by context-sensitive grammars. Now, we are going to give an example that proves that the class of context-free languages is strictly included in the class of context-sensitive languages.
G = ({S, A, B, C}, {a,b,c}, S,
{S → λ, S → abc, S → aABC, A → aABC, A → aBC, CB → BC, aB → ab, bB → bb, bC → bc, cC → cc }).
Then λ and abc can be derived directly from S. Then every other (finished) derivation in this grammar applies S → aABC, and then n times the rule A → aABC (n ∈ ℕ, n ≥ 0) and finally the rule A → aBC. In this way the sentential form starts with n + 2 a's and it contains n + 2 B's and C's. Then, every B must be positioned before the C's in a terminating derivation. Hence the generated language is {ajbjcj∣j ∈ ℕ}. See Animation 10. for a terminal derivation in this grammar.
Remember that in Example 41. we have shown that this language is not context-free, but as it can be seen from Example 56., it is context-sensitive.
We finish this section with some exercises.
Exercise 70. Give a monotone grammar that generates the language
{anbmcndm∣n, m ∈ ℕ}.
Exercise 71. Let
G = ({S, A, B}, {0,1}, S,
{S → SAS, SA → B0B0S, S → 1, A → S0S, B0B0 → 0S0S }).
Give a Kuroda normal form grammar that is equivalent to G.
Exercise 72. Let
G = ({S, A, B, C}, {d,e}, S,
{S → λ, S → BeBe, C → BeBe, BeBe → dAdA, eB → dede, Bd → CAC, A → ede, B → dd }).
Give a Révész normal form grammar that generates the same language as G.
Exercise 73. Let
G = ({S, A, B, C, D}, {a,b,c}, S,
{S → λ, S → AaBb, Aa → ccBbBb, Bb → CACA, bB → DaDa, DaD → CAC, bBbBb → ABCaD, A → a, B → bb, C → D, C → ccc }).
Give a Révész normal form grammar that generates the same language as G.
Exercise 74. Let
G = ({S, A, B, C}, {a,b,c,d}, S,
{S → BBC, S → SAB, A → cdC, cdA → CBbb, Bb → aa, bbA → dbd, A → a, A → d, B → b, C → SdS, C → cd }).
Give a grammar in Révész normal form that generates the same language as G.