In this section, we prove that the class of context-sensitive languages is closed under union, concatenation and Kleene-star. It is also closed under the other set theoretical operations.
Theorem 28. The class of context-sensitive languages is closed under the regular operations.
Proof. The proof is constructive. Let L1 and L2 be two context-sensitive languages. Let the monotone grammars G1 = (N1, T, S1, P1) and G2 = (N2, T, S2, P2) be in Kuroda normal form and generate the languages L1 and L2, respectively, such that N1 ∩ N2 = ∅ (this can be done by renaming nonterminals of a grammar without affecting the generated language).
First, we show the closure under union.
If λ ∉ L1 ∪ L2, then let
G = (N1 ∪ N2 ∪ {S}, T, S, P1 ∪ P2 ∪ {S → S1, S → S2}),
where S ∉ N1 ∪ N2, a new symbol. It can be seen that G generates the language L1 ∪ L2.
If λ ∈ L1 ∪ L2 (i.e., S1 → λ ∈ P1 and/or S2 → λ ∈ P2), then let
G = (N1 ∪ N2 ∪ {S}, T, S,
P1 ∪ P2 ∪ {S → S1, S → S2, S → λ} \ {S1 → λ, S2 → λ}),
where S ∉ N1 ∪ N2. In this way, G generates the language L1 ∪ L2.
The closure under concatenation is proven for the following four cases:
If λ ∉ L1 and λ ∉ L2, then let
G = (N1 ∪ N2 ∪ {S}, T, S, P1 ∪ P2 ∪ {S → S1S2}),
where S ∉ N1 ∪ N2 a new symbol.
If λ ∈ L1 and λ ∉ L2, then let
G = (N1 ∪ N2 ∪ {S}, T, S, P1 ∪ P2 ∪ {S → S1S2, S → S2} \ {S1 → λ}),
where S is a new symbol.
If λ ∉ L1 and λ ∈ L2, then let
G = (N1 ∪ N2 ∪ {S}, T, S, P1 ∪ P2 ∪ {S → S1S2, S → S1} \ {S2 → λ}),
where S ∉ N1 ∪ N2 a new symbol.
If λ∈ L1 and λ ∈ L2, then let
G = (N1 ∪ N2 ∪ {S}, T, S,
P1 ∪ P2 ∪ {S → S1S2, S → S1, S → S2, S → λ} \ {S1 → λ, S2 → λ}),
where S ∉ N1 ∪ N2.
It can be easily seen that G generates the language L1L2.
Finally, let us consider the closure under Kleene-star. Let now G1 = (N1, T, S1, P1) and G2 = (N2, T, S2, P2) be in Kuroda normal form and generate the languages L (both G1 and G2) such that N1 ∩ N2 =∅.
Let
G = (N1 ∪ N2 ∪ {S, S'},
P1 ∪ P2 ∪ {S → λ, S → S1, S → S1S2, S → S1S2S', S' → S1, S' → S1S2, S' → S1S2S'} \ {S1 → λ, S2 → λ}),
where S, S' ∉ N1 ∪ N2, they are new symbols. Then G generates the language L*.
QED.
The closure of the class of context-sensitive languages under complementation was a famous problem and was open for more than 20 years. In the 1980's, Immerman, Szelepcsényi solved this problem independently. We present this result without proof.
Theorem 29. The class of context-sensitive languages is closed under complementation.
Theorem 30. The class of context-sensitive languages is closed under intersection.
Proof. The proof uses the fact that this class is closed both under union and complementation. Let us consider the context-sensitive languages L1 and L2. Then, the complement of each of them is context-sensitive according to the theorem of Immerman and Szelepcsényi. Their union is also context-sensitive, as we have proven constructively. The complement of this language is also context-sensitive. However, this language is the same as L1 ∩ L2 by the De Morgan law.
QED.
Exercise 75. Give a monotone grammar that generates the language of marked-copy:
{wcw∣w ∈ {a,b}*}
over the alphabet {a,b,c}. (Hint: use context-sensitive rules to allow some nonterminals to terminate, i.e., to change them to terminals only at the correct place.)
Exercise 76. Give context-sensitive grammars that generate the union and the concatenations of the languages generated by grammars G1 and G2, where
G1 = ({S1, A1, B1, C1}, {a,b,c}, S1,
{S1 → λ, S1 → C1aA1, S1 → B1, aA1 → B1B1, B1 → B1b, B1bb → cba, C1 → cA1 }) and
G2 = ({S2, A2, B2, C2}, {a,b,c}, S2,
{S2 → cccS2aA2a, S2 → bA2, ccS2 → C2bA2, bA2 → A2b, A2 → cB2C2a, A2 → aa, cB2 → bB2, bB2 → baccabB2, C2 → C2c, C2c → A2S2, C2cc → cccc }).
Exercise 77. Let
G1 = ({S, A, B}, {0,1}, S,
{S → λ, S → AB, A → 0AB, 0A→ 1A, AB → 11 }) and
G2 = ({S, B, C, D}, {0,1}, S,
{S → λ, S → BC, S → D, B → BC, B → 1, 1C → 1D0, D → DD, D → 11, 1D → C0, BDC → 00D11 }).
Give context-sensitive grammars that generate
L(G1) ∪ L(G2),
L(G1) L(G2),
L(G2) L(G1),
(L(G1))* and
(L(G2))*.
The word problem of context-sensitive grammars can be solved. Since this language class is accepted by linear bounded automata, it can be solved in linear space in a nondeterministic manner. It is known that polynomial space is sufficient with a deterministic algorithm: the required space is c∣w∣2, where c is a constant and ∣w∣ is the length of the word. It is an open problem if linear space (i.e., c ∣w∣ with a constant c) is sufficient or not. Regarding time complexity, there is no deterministic or nondeterministic algorithm is known that can solve the word problem in polynomial time (for arbitrary context-sensitive grammar/language).
Now, we are going to present a naive solution to the word problem (which is very inefficient), but at the same time it shows that the problem is solvable. So let G = (N, T, S, P) and w be given. If the input word is the empty word (w = λ), then it is in the language, if and only if S → λ ∈ P. If ∣w∣ > 0, then we may consider only the rules u → v of P with the property ∣u∣ ≤ ∣v∣. Then, let us use a bread-first search algorithm. Let the initial node of the graph be labeled by S. Consider the productions as possible operators on the sentential forms. Then, the search-graph can be obtained by applying every applicable rule for every node. This graph is usually infinite, but we need to obtain only a finite portion of it. (Every label must appear at most once in the graph.) Since each of the rules has the monotone property we may cut those branches of the search-space that contain a longer sentential form than w (the solution cannot be in the continuation of such a branch). When we have obtained all the portions of the search-graph representing sentential forms not longer than w, then we can check whether w is a node of the graph, or not. If so, then it can be derived, it is in the generated language, else it is not.
Exercise 78. Check whether the words abc and ccc are in the language generated by the grammar
G = ({S, A, B, C, D, E}, {a,b,c}, S,
{S → ab, S → BS, S → ABS, A → a, A → BCE, AB → BA, B → b, BC → bc, CE → abc, D → b, ED → aE, E → DD, bA → SS }).