A generative grammar is said to be λ-free grammar if none of its production rules contains the empty word λ on the right hand side.
We have to note that each λ ∉ *L*
context-free language can be generated by some λ-free context-free grammar.

**Definition 21.** *The grammar G* = (*N, T, S, P*) *is in Chomsky normal form,
if all of its production rules has the form:*

*A*→*BC or**A*→*a*,

*where A, B, C* ∈ *N and a* ∈ *T*.

This normal form was introduced by Noam Chomsky for λ-free context-free languages. Using the Chomsky normal form instead of the universal context-free grammar makes it more simple to store the grammar in the memory of the computer, to calculate using the grammar and to prove theorems about context-free languages. First, we have to prove that each λ-free context-free language can be generated by a Chomsky normal form grammar.

**Theorem 16.** *For each λ-free context-free grammar G* = (*N, T, S, P*)
*one can give Chomsky normal form
grammar G'* = (*N', T, S, P'*) *such that L*(*G*) = *L*(*G'*).

**Proof.** We are going to give a constructive proof of this theorem. We are going to show the necessary steps
to construct a Chomsky normal form grammar *G'* = (*N', T, S, P'*) which is equivalent to the original λ-free
context-free grammar *G* = (*N, T, S, P*). It can easily be seen that we get equivalent grammars after each step.

First of all we create the grammar

*G*_{1}= (*N*_{1},*T*,*S*,*P*_{1}) such that all of its production rules are of the form:*A*→*p*, or*A*→*a*,

where

*A*∈*N, a*∈*T*and*p*∈*N*^{+}. In this step we eliminate each terminal symbol from the production rules whose right-hand side contains more than one letter. To do this we introduce new nonterminal symbols for each such terminal symbol. Let the set of new nonterminals be*N*= {_{new}*X*∣*a*∈*T, A*→*paq*∈*P*, ∣*pq*∣≠ 0} and let*N*_{1}=*N*∪*N*. Then let the set_{new}*P*_{1}be the union of 3 different sets:{

*A*→*p*∣*A*→*p*∈*P*,*p*∈ {*T*∪*N*^{+}}} ⊂*P*_{1}(we keep the rules for which the right hand side contains just one terminal, or contains only nonterminal symbols),{

*X*→*a*∣*X*∈*N*} ⊂_{new}*P*_{1}(we add new rules, the right hand side contains the terminal symbol, and the left hand side contains the nonterminal introduced for it),{

*A*→*p*_{0}*X*_{1}*p*_{1}*X*_{2}...*X*∣_{n}p_{n}*A*→*p*_{0}*a*_{1}*p*_{1}*a*_{2}...*a*∈_{n}p_{n}*P, a*_{1},*a*_{2},...,*a*∈_{n}*T*,*X*_{1},*X*_{2},...,*X*∈_{n}*N*,_{new}*p*_{0},*p*_{1},...,*p*∈_{n}*N*^{*}, ∣*p*_{0}*a*_{1}*p*_{1}*a*_{2}...*a*∣≥ 2} ∈_{n}p_{n}*P*_{1}(here we change each appearance of the terminals to the nonterminal introduced for it in each rule, with right hand side of at least two letters).

Now we have the sets

*N*_{1}and*P*_{1}and the grammar*G*_{1}= (*N*_{1},*T,S,P*_{1}) satisfies the above conditions. It can be easily shown that*L*(*G*_{1}) =*L*(*G*).The next step is to eliminate the long rules. We create the grammar

*G*_{2}= (*N*_{2},*T, S, P*_{2}) such that all of its production rules are of the form:*A*→*B*, or*A*→*BC*, or*A*→*a*,

where

*A, B, C*∈*N*and*a*∈*T*. To reach our goal, we have to replace the long rules in*P*_{1}with short ones in*P*_{2}. For each rule*A*→*B*_{1}*B*_{2}...*B*∈_{k}*P*_{1},*k*≥ 3 we introduce new nonterminals*Z*_{1},*Z*_{2},...,*Z*_{k}_{-2}. The set*N*_{2}contains these new nonterminals and the nonterminals contained by the set*N*_{1}. In the set*P*_{2}we keep those productions rules from the set*P*_{1}whose right hand side contains at most two letters and instead of each*A*→*B*_{1}*B*_{2}...*B*∈_{k}*P*_{1},*k*≥ 3 rule we introduce the rules*A*→*B*_{1}*Z*_{1},*Z*_{1}→*B*_{2}*Z*_{2}, ⋮*Z*_{k}_{-3}→*B*_{k}_{-2}*Z*_{k}_{-2},*Z*_{k}_{-2}→*B*_{k}_{-1}*B*._{k}The grammar

*G*_{2}= (*N*_{2},*T, S, P*_{2}) has no long rules and*L*(*G*_{2}) =*L*(*G*_{1}).The third step is to eliminate the rules of the form

*A*→*B*, where*A, B*∈*N.*First, for each nonterminal letter

*A*let us collect all nonterminal letters*B*_{1},*B*_{2}, ...,*B*such that_{k}*A*can be derived from*B*, 1 ≤_{i}*i*≤*k*. Let*U*(*A*) = {*B*_{1},*B*_{2}, ...,*B*} for each nonterminal_{k}*A*. The following formulas make this pocedure simple:*U*_{1}(*A*) = {*A*}*U*_{i}_{+1}(*A*) =*U*(_{i}*A*) ∪ {*B*∣*B*→*C*∈*P*_{2},*C*∈*U*(_{i}*A*)}if

*U*(_{k}*A*) =*U*_{k}_{+1}(*A*) then*U*(*A*) =*U*(_{k}*A*)

When we have set

*U*for each nonterminal letter, we can define set*P'*with the following formula:*P'*= {*B*→*p*∣*A*→*p*∈*P*_{2},*B*∈*U*(*A*),*p*∉*N*_{2}}. Then*N'*=*N*_{2},*G'*= (*N', T, S, P'*) and*L*(*G'*) =*L*(*G*_{2}) =*L*(*G*_{1}) =*L*(*G*).

QED.

**Example 40.** *In this example we have a λ-free context-free grammar G, and we are going to create the
grammar G' which is in Chomsky normal form and generates the same language as G.*

G= ({S, A, B}, {a,b,c},S, P)P= {S→ABaba,A→c,A→B,A→AS,B→AbA,B→S}

*The terminals a and b appear in a rule which has more than 1 letter on the right hand side, so we have to add two new nonterminals X*_{a}and X_{b}to the set of nonterminals: N_{1}= {*S, A, B, X*}._{a}, X_{b}*Now we add two new rules (X*_{a}→ a and X_{b}→ b) to the set of production rules and replace the terminal symbol a by X_{a}and b by X_{b}in the rules which have more than one letter on the right hand side. Now we have*G*_{1}= ({*S, A, B, X*}, {_{a}, X_{b}*a,b,c*},*S, P*_{1}),*P*_{1}= {*A*→*c*,*A*→*B*,*A*→*AS*,*B*→*S*,*X*→_{a}*a*,*X*→_{b}*b*,*S*→*ABX*,_{a}X_{b}X_{a}*B*→*AX*}._{b}A*In the set P*_{1}there are two long rules, S → ABX_{a}X_{b}X_{a}and B → AX_{b}A. We add new nonterminals Z_{1}, Z_{2}, Z_{3}to the first rule and Z_{4}to the second one, and replace the rule S → ABX_{a}X_{b}X_{a}by rules*S*→*AZ*_{1},*Z*_{1}→*BZ*_{2},*Z*_{2}→*X*_{a}Z_{3},*Z*_{3}→*X*,_{b}X_{a}*and the rule B → AX*_{b}A by rules*B*→*AZ*_{4},*Z*_{4}→*X*._{b}A*Now we have**G*_{2}= ({*S, A, B, C, D, Z*_{1},*Z*_{2},*Z*_{3},*Z*_{4}}, {*a,b,c*},*S, P*_{2}),*P*_{2}= {*A*→*c*,*A*→*B*,*A*→*AS*,*B*→*S*,*X*→_{a}*a*,*X*→_{b}*b*,*S*→*AZ*_{1},*Z*_{1}→*BZ*_{2},*Z*→_{2}*X*_{a}Z_{3},*Z*_{3}→*X*,_{b}X_{a}*B*→*AZ*_{4},*Z*_{4}→*X*}._{b}A*U*(*B*) = {*B,A*}*and U*(*S*) = {*S,B,A*},*and finally we have:**G'*= ({*S, A, B, C, D, Z*_{1},*Z*_{2},*Z*_{3},*Z*_{4}}, {*a,b,c*},*S, P'*).*P'*= {*A*→*c*,*A*→*AS*,*X*→_{a}*a*,*X*→_{b}*b*,*S*→*AZ*_{1},_{B}→*AZ*_{1},*A*→*AZ*_{1},*Z*_{1}→*BZ*_{2},*Z*_{2}→*X*_{a}Z_{3},*Z*_{3}→*X*,_{b}X_{a}*B*→*AZ*_{4},*A*→*AZ*_{4},*Z**4*→*X*}_{b}A

**Exercise 55.** *In this exercise we have a λ-free context-free grammar G, and you have to create
a grammar G' which is in Chomsky normal form and generates the same language as G.*

G= ({S, A, B}, {x,y,z},S, P)P= {S→BB,A→S,A→xxzz,A→y,B→AxzxA,B→A}

**Exercise 56.** *Create a grammar G' which is in Chomsky normal form and generates the same language as G.*

G= ({S, A, B}, {x,y},S, P)P= {S→ABBAB,S→x,A→BB,A→S,A→B,B→ASA,B→y}

**Exercise 57.** Create a grammar G' which is in Chomsky normal form and generates the same language as G.

G= ({S, X, Z}, {x,y},S, P)P= {S→XZ,S→ZX,X→xy,X→S,Z→S,Z→yx,Z→X,Z→ZZ}

**Exercise 58.** *Create a grammar G' which is in Chomsky normal form and generates the same language as G.*

G= ({S}, {a,+,*,(,)},S, P)P= {S→S+S,S→S*S,S→ (S),S→a}

**Exercise 59.** *Create a grammar G' which is in Chomsky normal form and generates the same language as G.*

G= ({S, A, B}, {x,y},S, P)P= {S→AxxB,S→A,S→B,B→A,A→y,A→SB}