4.2. Chomsky Normal Form

A generative grammar is said to be λ-free grammar if none of its production rules contains the empty word λ on the right hand side. We have to note that each λ ∉ L context-free language can be generated by some λ-free context-free grammar.

Definition 21. The grammar G = (N, T, S, P) is in Chomsky normal form, if all of its production rules has the form:

  1. ABC or

  2. Aa,

where A, B, CN and aT.

This normal form was introduced by Noam Chomsky for λ-free context-free languages. Using the Chomsky normal form instead of the universal context-free grammar makes it more simple to store the grammar in the memory of the computer, to calculate using the grammar and to prove theorems about context-free languages. First, we have to prove that each λ-free context-free language can be generated by a Chomsky normal form grammar.

Theorem 16. For each λ-free context-free grammar G = (N, T, S, P) one can give Chomsky normal form grammar G' = (N', T, S, P') such that L(G) = L(G').

Proof. We are going to give a constructive proof of this theorem. We are going to show the necessary steps to construct a Chomsky normal form grammar G' = (N', T, S, P') which is equivalent to the original λ-free context-free grammar G = (N, T, S, P). It can easily be seen that we get equivalent grammars after each step.

  1. First of all we create the grammar G1 = (N1, T, S, P1) such that all of its production rules are of the form:

    1. Ap, or

    2. Aa,

    where AN, aT and pN+. In this step we eliminate each terminal symbol from the production rules whose right-hand side contains more than one letter. To do this we introduce new nonterminal symbols for each such terminal symbol. Let the set of new nonterminals be Nnew = {XaT, ApaqP, ∣pq∣≠ 0} and let N1 = NNnew. Then let the set P1 be the union of 3 different sets:

    1. {ApApP, p ∈ {TN+}} ⊂ P1 (we keep the rules for which the right hand side contains just one terminal, or contains only nonterminal symbols),

    2. {XaXNnew} ⊂ P1 (we add new rules, the right hand side contains the terminal symbol, and the left hand side contains the nonterminal introduced for it),

    3. {Ap0X1p1X2... XnpnAp0a1p1a2... anpnP, a1, a2,...,anT, X1, X2,...,XnNnew, p0, p1,...,pnN*, ∣p0a1p1a2... anpn∣≥ 2} ∈ P1 (here we change each appearance of the terminals to the nonterminal introduced for it in each rule, with right hand side of at least two letters).

    Now we have the sets N1 and P1 and the grammar G1 = (N1, T,S,P1) satisfies the above conditions. It can be easily shown that L(G1) = L(G).

  2. The next step is to eliminate the long rules. We create the grammar G2 = (N2, T, S, P2) such that all of its production rules are of the form:

    1. AB, or

    2. ABC, or

    3. Aa,

    where A, B, CN and aT. To reach our goal, we have to replace the long rules in P1 with short ones in P2. For each rule AB1B2...BkP1, k ≥ 3 we introduce new nonterminals Z1, Z2,...,Zk-2. The set N2 contains these new nonterminals and the nonterminals contained by the set N1. In the set P2 we keep those productions rules from the set P1 whose right hand side contains at most two letters and instead of each AB1B2...BkP1, k ≥ 3 rule we introduce the rules

    AB1Z1,
    Z1B2Z2,
    ⋮
    Zk-3Bk-2Zk-2,
    Zk-2Bk-1Bk.
    

    The grammar G2 = (N2, T, S, P2) has no long rules and L(G2) = L(G1).

  3. The third step is to eliminate the rules of the form AB, where A, BN.

    First, for each nonterminal letter A let us collect all nonterminal letters B1, B2, ..., Bk such that A can be derived from Bi, 1 ≤ ik. Let U(A) = {B1, B2, ..., Bk} for each nonterminal A. The following formulas make this pocedure simple:

    1. U1(A) = {A}

    2. Ui+1(A) = Ui(A) ∪ {BBCP2, CUi(A)}

    3. if Uk(A) = Uk+1(A) then U(A) = Uk(A)

    When we have set U for each nonterminal letter, we can define set P' with the following formula: P' = {BpApP2, BU(A), pN2}. Then N' = N2, G' = (N', T, S, P') and L(G') = L(G2) = L(G1) = L(G).

QED.

Example 40. In this example we have a λ-free context-free grammar G, and we are going to create the grammar G' which is in Chomsky normal form and generates the same language as G.

G = ({S, A, B}, {a,b,c},S, P)
P = {
  SABaba,
  Ac,
  AB,
  AAS,
  BAbA,
  BS
}
  1. The terminals a and b appear in a rule which has more than 1 letter on the right hand side, so we have to add two new nonterminals Xa and Xb to the set of nonterminals: N1 = {S, A, B, Xa, Xb}. Now we add two new rules (Xa → a and Xb → b) to the set of production rules and replace the terminal symbol a by Xa and b by Xb in the rules which have more than one letter on the right hand side. Now we have

    G1 = ({S, A, B, Xa, Xb}, {a,b,c}, S, P1),
    P1 = {
    Ac,
    AB,
    AAS,
    BS,
    Xaa,
    Xbb,
    SABXaXbXa,
    BAXbA
    }.
  2. In the set P1 there are two long rules, S → ABXaXbXa and B → AXbA. We add new nonterminals Z1, Z2, Z3 to the first rule and Z4 to the second one, and replace the rule S → ABXaXbXa by rules

    SAZ1,
    Z1BZ2,
    Z2XaZ3,
    Z3XbXa,
    and the rule  B → AXbA by rules
    BAZ4,
    Z4XbA.
    Now we have
    G2 = ({S, A, B, C, D, Z1, Z2, Z3, Z4}, {a,b,c}, S, P2),
    P2 = {
    Ac,
    AB,
    AAS,
    BS,
    Xaa,
    Xbb,
    SAZ1,
    Z1BZ2,
    Z2XaZ3,
    Z3XbXa,
    BAZ4,
    Z4XbA
    }.
  3. U(B) = {B,A} and U(S) = {S,B,A}, and finally we have:

    G' = ({S, A, B, C, D, Z1, Z2, Z3, Z4}, {a,b,c}, S, P').
    P' = {
    Ac,
    AAS,
    Xaa,
    Xbb,
    SAZ1, BAZ1, AAZ1,
    Z1BZ2,
    Z2XaZ3,
    Z3XbXa,
    BAZ4, AAZ4,
    Z4XbA
    }

Exercise 55. In this exercise we have a λ-free context-free grammar G, and you have to create a grammar G' which is in Chomsky normal form and generates the same language as G.

G = ({S, A, B}, {x,y,z}, S, P)
P = {
   SBB,
   AS,
   Axxzz,
   Ay,
   BAxzxA,
   BA
}

Exercise 56. Create a grammar G' which is in Chomsky normal form and generates the same language as G.

G = ({S, A, B}, {x,y}, S, P)
P = {
   SABBAB,
   Sx,
   ABB,
   AS,
   AB,
   BASA,
   By
}

Exercise 57. Create a grammar G' which is in Chomsky normal form and generates the same language as G.

G = ({S, X, Z}, {x,y}, S, P)
P = {
   SXZ,
   SZX,
   Xxy,
   XS,
   ZS,
   Zyx,
   ZX,
   ZZZ
}

Exercise 58. Create a grammar G' which is in Chomsky normal form and generates the same language as G.

G = ({S}, {a,+,*,(,)}, S, P)
P = {
   SS+S,
   SS*S,
   S → (S),
   Sa
}

Exercise 59. Create a grammar G' which is in Chomsky normal form and generates the same language as G.

G = ({S, A, B}, {x,y}, S, P)
P = {
   SAxxB,
   SA,
   SB,
   BA,
   Ay,
   ASB
}