3. fejezet - Linear Languages

Summary of the chapter: In this chapter, we deal with a family of languages between the regular and context-free languages of the Chomsky hierarchy, i.e., the linear languages. We give an example for a non-regular linear language. A normal form for linear grammars is proven. The class of one-turn pushdown automata recognizes exactly the class of linear languages. This class is closed under union, but it is not closed under concatenation, Kleene-star, complement and intersection.

3.1. Linear Grammars

First let us recall the definition of linear grammars.

Definition 19. (Linear grammars). A grammar G = (N, T, S, P) is linear if each of its productions has one of the following forms: Au, AuBv, where A,BN, u,vT*. The languages that can be generated by linear grammars are the linear languages.

This class of languages are between the (classes of) type 3 and type 2 languages of the Chomsky hierarchy, thus we may call them type 2.5 languages. The linear grammars inherit the property of the regular grammars that there is at most one nonterminal in any sentential form. However, this nonterminal is not restricted to be at the end of the sentential form (as it was in the regular case), it can be in an arbitrary place.

Now, we give an example, where the nonterminal is in the middle of the sentential forms in every derivation.

Example 36. Let

G = ({S}, {a,b}, S, {SaSb, S →λ}).

Then, every (finished) derivation in this grammar has the following form: applying the first rule n times (n ∈ ℕ, n ≥ 0) and then applying the second rule. Hence, the generated language is {anbnn ∈ ℕ}. See also Animation 8. for an example derivation in this grammar.

Animation 8.

Remember that in Example 29. we have shown that this language is not regular, and thus, with Example 36 we have just proven that the class of linear languages strictly includes the class of regular languages.

Theorem 11. Every linear language can be generated by a grammar having productions in the following forms, only:

AaB, ABa, Aa, S→λ.

(We call a grammar with this property a linear grammar in normal form.)

Proof. The proof is constructive. Given G = (N, T, S, P) a linear grammar, we show how one can construct a grammar G' that is in normal form and equivalent to G.

If there is a (are) rule(s) of the form A → λ (AS), then first the Empty-word lemma (Theorem 1.) is applied and grammar G'' = (N'', T, S', P'') is obtained that may only contain λ in the right hand side of the rule S' → λ. Then in G'' either S = S', N'' = N or S'N and in this latter case N'' = N ∪ {S'}.

Now as an intermediate step, we replace each of the rules AuBv (where u ≠ λ, v ≠ λ) with rules AuX, XBv, where X is a newly introduced nonterminal, i.e., XN''. After the substitution of each rule of this form grammar G''' = (N''', T, S', P''') is obtained and it is equivalent to the original grammar G. (See the left side of Figure 3.1., for an example.)

Now let us eliminate the rules having more than one terminal in their right hand side (i.e., they have long right hand side). Actually, rules of the form Aa1... ak for k > 1, where aiT, i ∈ {1,...,k} and Aa1... akB for k > 1, where aiT, i ∈ {1,...,k}, BN''' can be substituted in the same manner as we have eliminated them in regular grammars (see the proof of Theorem 2 for details). We present a similar technique for the rules of the form ABa1... ak for k > 1, where aiT, i ∈ {1,...,k}, BN''', since rules of this type were not present in regular grammars. Every rule of the above form is substituted by a chain of shorter rules introducing new nonterminals into the grammar: let the new nonterminals X1,..., Xk-1 be introduced and put to the set N''', and instead of the actual rule the next set of rules is added to P''':

AX1ak, X1X2ak-1, ..., Xk-2Xk-1a2, Xk-1Ba1.

(See the right hand side of Figure 3.1, for an example.)

Now a grammar G'''' = (N'''', T, S' ,P'''') is obtained and the set of productions P'''' can contain only rules of the following forms

Aa, AaB, ABa, AB, S' → λ

(A, BN'''', aT). Now, as a final step of our (algorithm) proof we need to exclude the chain rules (rules of the form AB). This step can be done in a similar way as we showed in the proof of Theorem 2: first the set U(A) is determined for each nonterminal A, and then the grammar G' = (N'''', T, S', P') is obtained having P' = {Ar∣∃ BN'''' such that BrP'''', rN'''' and BU(A)}. This grammar is in a normal form and it generates the same language as G, so the (construction) proof is finished.

QED.

Example 37. Let

G = ({S, A, B},{1,3,7}, S,

```{  S → 11S37,
S → 7A,
S → B313,
A → 333B7777,
A → S, B → λ, B → 731
}). ```

Give a linear grammar in a normal form that is equivalent to G.

Solution:

Since there is a rule B → λ in the grammar, we start by applying the Empty-word lemma. Then set U = {B} and it is the set of nonterminals from which the empty word λ can be derived. Consequently, we obtain the grammar

G'' = ({S, A, B}, {1,3,7}, S,

```{  S → 11S37,
S → 7A,
S → B313, S → 313,
A → 333B7777, A → 3337777,
A → S,
B → 731
}).```

We have the rules S → 11S37 and A → 333B7777 having terminals on both sides of the nonterminal in the right hand side, thus, the intermediate step results in the grammar

G''' = ({S, A, B, X1, X2}, {1,3,7}, S,

```{  S → 11X1, X1 → S37,
S → 7A,
S → B313,
S → 313,
A → 333X2, X2 → B7777,
A → 3337777,
A → S,
B → 731
}).```

Now let us replace the rules with more than one terminals on their right hand side by chains of rules having exactly one terminal on their right hand sides:

G'''' = ({S, A, B, X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X18, X19, X20, X21}, {1,3,7}, S,

```{  S → 1X3, X3 → 1X1,
X1 → X47, X4 → S3,
S → 7A,
S → X53, X5 → X61, X6 → B3,
S → 3X7, X7 → 1X8, X8 → 3,
A → 3X9, X9 → 3X10, X10 → 3X2,
X2 → X117, X11 → X127, X12 → X137, X13 → B7,
A → 3X14, X14 → 3X15, X15 → 3X16, X16 → 7X17, X17 → 7X18, X18 → 7X19, X19 → 7,
A → S,
B → 7X20, X20 → 3X21, X21 → 1
}).```

However, grammar G'''' contains the chain rule A → S, and thus

U(S) = {S}, U(A) = {A,S},

for the other nonterminals the trivial sets are obtained since they do not appear in any chain rules: U(B) = {B} and U(Xi) = {Xi} (for 1 ≤ i ≤ 21). Thus, the result is:

G' = ({S, A, B, X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X18, X19, X20, X21}, {1,3,7}, S,

```{  S → 1X3, A → 1X3,
X3 → 1X1,
X1 → X47,
X4 → S3,
S → 7A, A → 7A,
S → X53, A → X53,
X5 → X61,
X6 → B3,
S → 3X7, A → 3X7,
X7 → 1X8,
X8 → 3,
A → 3X9,
X9 → 3X10,
X10 → 3X2,
X2 → X117,
X11 → X127,
X12 → X137,
X13 → B7,
A → 3X14,
X14 → 3X15,
X15 → 3X16,
X16 → 7X17,
X17 → 7X18,
X18 → 7X19,
X19 → 7,
B → 7X20,
X20 → 3X21,
X21 → 1
}).```

It is linear, in normal form and equivalent to G.

As special cases of linear grammars the right-linear grammars (i.e., our regular grammars) and also the so-called left-linear grammars are defined.

Definition 20. (Left-linear grammars). A grammar G = (N, T, S, P) is left-linear if each of its productions has one of the following forms: Au, ABu, where A,BN, uT*.

We state the following interesting result about the languages that can be generated by left-linear grammars, without proof.

Theorem 12. The language class that can be generated by left-linear grammars is exactly the class of regular languages.

We note here that even though regular languages can be generated by using left linear or right linear rules, using both in the same grammar leads to a different language class.

Finally, in this section, we provide a few exercises.

Exercise 49. Give a linear grammar that generates the language

{0m 1n 2nn, m ∈ ℕ}.

Exercise 50. Give a linear grammar that generates the language

{a3n b2nn ∈ ℕ}.

Exercise 51. Give a linear grammar in normal form that generates the language

{ucvu, v ∈ {a,b}* such that the number of a's are the same in u and v}.

Exercise 52. Let

G = ({S, A, B, C}, {a, b, c}, S,

```{  S → aaSbc,
S → B,
S → cccC,
A → λ,
A → aaa,
A → aCabc,
B → A,
B → Bbbb,
C → cAb,
C → c
}).```

Give a linear grammar in normal form that is equivalent to G.