4.3. Pumping Lemma for Context-free Languages

Although it is easy to find the exact position of a grammar in the Chomsky hierarchy, it is sometimes much more challenging to find the position of a language in the Chomsky hierarchy. The Bar-Hillel lemma is the first pumping - also called iteration - lemma, which gives properties shared by all context-free languages. Thus, if a language does not satisfy the conditions of the lemma, it is not context-free. This lemma - and its variations - can be used to show that a language is not context-free. On the other hand, languages satisfying the conditions may be not context-free.

Theorem 17. (Bar-Hillel lemma) For each context-free language L there exists an integer n ≥ 1 such that each string pL, ∣p∣ ≥ n can be written in a form p = uvwxy, wherevwx∣ ≤ n, ∣vx∣ ≥ 1 and uviwxiyL holds for each integer i ≥ 0.

Proof. Let L1 = L \ {λ}. It is ovious that if language L1 satisfies the above conditions then language L = L1 ∪ {λ} also holds the above conditions, so it is enough to prove the lemma for λ-free context-free languages.

Theorem 16 shows that each λ-free context-free language can be generated by a Chomsky normal form grammar. Further on let us assume that grammar G generating L1 is in Chomsky normal form.

Let us mark the number of nonterminals in grammar G by k, and let n = 2k+1. Let p be a word generated by grammar G, and let ∣p∣ ≥ n. In this case, the height of the derivation tree of p is more than k+2, where the last step is a nonterminal to terminal derivation. Let us investigate the last k+2 height part of the longest path of the derivation tree. There must be a nonterminal A which appears twice, because the number of the nonterminals in G is less than the number of the nonterminals in this part. So there must be terminal words v, x such that A* vAx. Here ∣vx∣ ≥1 because A has two different occurrences in the path, and the length of the generated word is increased by one in each derivation step, except for the derivation steps when we change a nonterminal to a terminal symbol. Also, there is a terminal word w, which can be derived from the last appearance of A on the path, so A* w also holds. Moreover, there are terminal words u, y such that S* uAy. Based on these facts it is easy to show that

S* uAy* uwy
S* uAy* uvAxy* uvwxy
S* uAy* uvAxy* uvvAxxy* uvvwxxy

This proves that uviwxiyL holds for each integer i ≥ 0.

Finally, ∣vwx∣ ≤ n, because the word vwx was derived with a derivation subtree of height at most k+2, - where the last step was a nonterminal to terminal derivation, - so the length of the word vwx is maximum 2k which is less than n.

QED.

Example 41. The following classical example shows an application of the Bar-Hillel lemma. We are going to prove that language L = {ajbjcjj ≥ 0} is not context-free. In order to do this suppose to the contrary that language L is context-free. Let j ≥ (n / 3), then, by the Bar-Hillel lemma, ajbjcj can be written in a form uviwxiy such thatvx∣ ≥ 1 and uviwxiyL holds for each integer i ≥ 0. First, neither of v nor x should contain two or more different letters, because repeating them would change the form of the words in L. So v is a unary word (some power of a letter, e.g. aa...a) and x is a unary word as well. In this case, when we increase the integer i, we change the number of one or two different letters, but we cannot change the number of each letter, which is a contradiction.

Example 42. Let us consider the language L ={ajbkcjdkj, k ≥ 0}. Suppose to the contrary that language L is context-free. Then, the word anbncndnL can be written in a form uvwxy such that uviwxiyL for each i ≥ 0. Suppose that the word v contains the letter a. In this case, the word x cannot contain the letter c, becausevwx∣ ≤ n. Also, if the word v contains the letter b, then the word x cannot contain the letter d. In this case, we cannot increase the number of the letters a and c at the same time, and also we cannot increase the number of the letters b and d together, which means that this language does not satisfy the conditions of the Bar-Hillel lemma, consequently it is not context-free.

Example 43. In this example we consider the language L = {wcww ∈ {a,b}*}, and we use the Bar-Hillel lemma to prove that this language is not context-free. Suppose to the contrary that the language L is context-free, in this case the word anbncanbn can be written in a form uvwxy such that uviwxiyL for each i ≥0. The only possible solution is that the word v contains letters before the letter c and the word x contains letters after the letter c, because in other cases the number of the letters before and after c will not be the same. Now,vwx∣≤ n, so the word v can contain only letter b and the word x can contain only letter a, which is a contradiction.

Example 44. In this example we show that the language L = {app pime} is not context-free. Suppose to the contrary that the language L is context-free. Let pn, so ap can be written in a form uvwxy such thatvx∣ ≥ 1 and uviwxiyL holds for each integer i ≥ 0. Now, let q = ∣vx∣, r = ∣uwy∣, so ar+i · q∈ L holds for each integer i ≥ 0. This means that r+i · q is a prime for each integer i ≥ 0. Here r ≠ 1, because 1+i · q = 1 for i = 0, and 1 is not a prime number. Let i=r, then r+r · q should be a prime, but r+r · q = r · (1+q) is not a prime, so we have a contradiction.