Chapter 18. Relational Database Design

The relational datamodel was introduced by Codd in 1970. It is the most widely used datamodel—extended with the possibilities of the World Wide Web—, because of its simplicity and flexibility. The main idea of the relational model is that data is organised in relational tables, where rows correspond to individual records and columns to attributes. A relational schema consists of one or more relations and their attribute sets. In the present chapter only schemata consisting of one relation are considered for the sake of simplicity. In contrast to the mathematical concept of relations, in the relational schema the order of the attributes is not important, always sets of attributes are considered instead of lists. Every attribute has an associated domain that is a set of elementary values that the attribute can take values from. As an example, consider the following schema.

 Employee(Name,Mother's name,Social Security Number,Post,Salary) 

The domain of attributes Name and Mother's name is the set of finite character strings (more precisely its subset containing all possible names). The domain of Social Security Number is the set of integers satisfying certain formal and parity check requirements. The attribute Post can take values from the set {Director,Section chief,System integrator,Programmer,Receptionist,Janitor,Handyman}. An instance of a schema is a relation if its columns correspond to the attributes of and its rows contain values from the domains of attributes at the attributes' positions. A typical row of a relation of the Employee schema could be

 (John Brown,Camille Parker,184-83-2010,Programmer,$172,000) 

There can be dependencies between different data of a relation. For example, in an instance of the Employee schema the value of Social Security Number determines all other values of a row. Similarly, the pair (Name,Mother's name) is a unique identifier. Naturally, it may occur that some set of attributes do not determine all attributes of a record uniquely, just some of its subsets.

A relational schema has several integrity constraints attached. The most important kind of these is functional dependency. Let and be two sets of attributes. functionally depends on , in notation, means that whenever two records are identical in the attributes belonging to , then they must agree in the attribute belonging to , as well. Throughout this chapter the attribute set is denoted by for the sake of convenience.

Example 18.1 Functional dependencies Consider the schema

 R(Professor,Subject,Room,Student,Grade,Time). 

The meaning of an individual record is that a given student got a given grade of a given subject that was taught by a given professor at a given time slot. The following functional dependencies are satisfied.

In Example 18.1 the attribute set StT uniquely determines the values of all other attributes, furthermore it is minimal such set with respect to containment. This kind attribute sets are called keys. If all attributes are functionally dependent on a set of attributes , then is called a superkey. It is clear that every superkey contains a key and that any set of attributes containing a superkey is also a superkey.

18.1. 18.1 Functional dependencies

Some functional dependencies valid for a given relational schema are known already in the design phase, others are consequences of these. The StT P dependency is implied by the StT Su and Su P dependencies in Example 18.1. Indeed, if two records agree on attributes St and T, then they must have the same value in attribute Su. Agreeing in Su and Su P implies that the two records agree in P, as well, thus StT P holds.

Definition 18.1 Let be a relational schema, be a set of functional dependencies over . The functional dependency is logically implied by , in notation , if each instance of that satisfies all dependencies of also satisfies . The closure of a set of functional dependencies is the set given by

18.1.1. 18.1.1 Armstrong-axioms

In order to determine keys, or to understand logical implication between functional dependencies, it is necessary to know the closure of a set of functional dependencies, or for a given dependency the question whether it belongs to must be decidable. For this, inference rules are needed that tell that from a set of functional dependencies what others follow. The Armstrong-axioms form a system of sound and complete inference rules. A system of rules is sound if only valid functional dependencies can be derived using it. It is complete, if every dependency that is logically implied by the set is derivable from using the inference rules.

Armstrong-axioms

  • (A1) Reflexivity implies .

  • (A2) Augmentation If , then for arbitrary , holds.

  • (A3) Transitivity If and hold, then holds, as well.

Example 18.2 Derivation by the Armstrong-axioms. Let and , then is a key:

  1. is given.

  2. 1. is augmented by (A2) with .

  3. is given.

  4. 3. is augmented by (A2) with .

  5. transitivity (A3) is applied to 2. and 4.

Thus it is shown that is superkey. That it is really a key, follows from algorithm Closure( ) .

There are other valid inference rules besides (A1)–(A3). The next lemma lists some, the proof is left to the Reader (Exercise 18.1-5).

Lemma 18.2

  1. Union rule .

  2. Pseudo transitivity .

  3. Decomposition If holds and , then holds, as well.

The soundness of system (A1)–(A3) can be proven by easy induction on the length of the derivation. The completeness will follow from the proof of correctness of algorithm Closure( ) by the following lemma. Let denote the closure of the set of attributes with respect to the family of functional dependencies , that is .

Lemma 18.3 The functional dependency follows from the family of functional dependencies by the Armstrong-axioms iff .

Proof. Let where 's are attributes, and assume that . follows by the Armstrong-axioms for all by the definition of . Applying the union rule of Lemma 18.2 follows. On the other hand, assume that can be derived by the Armstrong-axioms. By the decomposition rule of Lemma 18.2 follows by (A1)–(A3) for all . Thus, .

18.1.2. 18.1.2 Closures

Calculation of closures is important in testing equivalence or logical implication between systems of functional dependencies. The first idea could be that for a given family of functional dependencies in order to decide whether , it is enough to calculate and check whether holds. However, the size of could be exponential in the size of input. Consider the family of functional dependencies given by

consists of all functional dependencies of the form , where , thus . Nevertheless, the closure of an attribute set with respect to can be determined in linear time of the total length of functional dependencies in . The following is an algorithm that calculates the closure of an attribute set with respect to . The input consists of the schema , that is a finite set of attributes, a set of functional dependencies defined over , and an attribute set .

Closure( )

  1     2     3        
                      Functional dependencies not used yet.   4  
                        REPEAT
                        5       6    
                        FOR
                      all  in    7       
                        DO
                      
                     
                        IF
                      
                        8          
                        THEN
                      
                        9               10      11  
                        UNTIL
                      
                      
                  

It is easy to see that the attributes that are put into any of the 's by Closure( ) really belong to . The harder part of the correctness proof of this algorithm is to show that each attribute belonging to will be put into some of the 's.

Theorem 18.4 Closure( ) correctly calculates .

Proof. First we prove by induction that if an attribute is put into an during Closure( ) , then really belongs to .

Base case: . I this case and by reflexivity (A1) .

Induction step: Let and assume that . is put into , because there is a functional dependency in , where and . By induction, holds, which implies using Lemma 18.3 that holds, as well. By transitivity (A3) and implies . By reflexivity (A1) and , holds. Applying transitivity again, is obtained, that is .

On the other hand, we show that if , then is contained in the result of Closure( ) . Suppose in contrary that , but , where is the result of Closure( ) . By the stop condition in line 9 this means . An instance of the schema is constructed that satisfies every functional dependency of , but does not hold in if . Let be the following two-rowed relation:

Let us suppose that the above violates a functional dependency of , that is , but is not a subset of . However, in this case Closure( ) could not have stopped yet, since .

implies using Lemma 18.3 that follows from by the Armstrong-axioms. (A1)–(A3) is a sound system of inference rules, hence in every instance that satisfies , must hold. However, the only way this could happen in instance is if .

Let us observe that the relation instance given in the proof above provides the completeness proof for the Armstrong-axioms, as well. Indeed, the closure calculated by Closure( ) is the set of those attributes for which follows from by the Armstrong-axioms. Meanwhile, for every other attribute , there exist two rows of that agree on , but differ in , that is does not hold.

The running tome of Closure( ) is , where is the length of he input. Indeed, in the rEPEAT uNTIL loop of lines 4–11 every not yet used dependency is checked, and the body of the loop is executed at most times, since it is started again only if , that is a new attribute is added to the closure of . However, the running time can be reduced to linear with appropriate bookkeeping.

  1. For every yet unused dependency of it is kept track of how many attributes of are not yet included in the closure ().

  2. For every attribute those yet unused dependencies are kept in a doubly linked list whose left side contains .

  3. Those not yet used dependencies are kept in a linked list , whose left side 's every attribute is contained in the closure already, that is for which .

It is assumed that the family of functional dependencies is given as a set of attribute pairs , representing . The Linear-Closure( ) algorithm is a modification of Closure( ) using the above bookkeeping, whose running time is linear. is the schema, is the given family of functional dependencies, and we are to determine the closure of attribute set .

Algorithm Linear-Closure( ) consists of two parts. In the initialisation phase (lines 1–13) the lists are initialised. The loops of lines 2–5 and 6–8, respectively, take time. The loop in lines 9–11 means steps. If the length of the input is denoted by , then this is steps altogether.

During the execution of lines 14–23, every functional dependency is examined at most once, when it is taken off from list . Thus, lines 15–16 and 23 take at most steps. The running time of the loops in line 17–22 can be estimated by observing that the sum is decreased by one in each execution, hence it takes steps, where is the value obtained in the initialisation phase. However, , thus lines 14–23 also take time in total.

Linear-Closure( )

  1        Initialisation phase.   2  
                        FOR
                      all    3    
                        DO
                      
                     
                        FOR
                      all    4       
                        DO
                      add  to list    5       6  
                        FOR
                      all    7    
                        DO
                      
                     
                        FOR
                      all  of list    8       
                        DO
                      
                        9  
                        FOR
                      all   10    
                        DO
                      
                     
                        IF
                      
                       11       
                        THEN
                      add  to list   12    13        End of initialisation phase.  14  
                        WHILE
                      
                      is nonempty  15    
                        DO
                      
                       16       delete  from list   17       
                        FOR
                      all   18          
                        DO
                      
                     
                        FOR
                      all  of list   19             
                        DO
                      
                       20                
                        IF
                      
                       21                   
                        THEN
                      add  to list   22                      delete  from list   23      24  
                        RETURN
                      
                     
                  

18.1.3. 18.1.3 Minimal cover

Algorithm Linear-Closure( ) can be used to test equivalence of systems of dependencies. Let and be two families of functional dependencies. and are said to be equivalent, if exactly the same functional dependencies follow from both, that is . It is clear that it is enough to check for all functional dependencies in whether it belongs to , and vice versa, for all in , whether it is in . Indeed, if some of these is not satisfied, say is not in , then surely . On the other hand, if all are in , then a proof of a functional dependency from can be obtained from dependencies in in such a way that to the derivation of the dependencies of from , the derivation of from is concatenated. In order to decide that a dependency from is in , it is enough to construct the closure of attribute set with respect to using Linear-Closure( ) , then check whether holds. The following special functional dependency system equivalent with is useful.

Definition 18.5 The system of functional dependencies is a minimal cover of the family of functional dependencies iff is equivalent with , and

  1. functional dependencies of are in the form , where is an attribute and ,

  2. no functional dependency can be dropped from , i.e., ,

  3. the left sides of dependencies in are minimal, that is , .

Every set of functional dependencies have a minimal cover, namely algorithm Minimal-Cover( ) constructs one.

Minimal-Cover( )

  1     2  
                        FOR
                      all    3    
                        DO
                      
                     
                        FOR
                      all    4       
                        DO
                      
                        5        Each right hand side consists of a single attribute.   6  
                        FOR
                      all    7    
                        DO
                      
                     
                        WHILE
                      there exists    8          9        Each left hand side is minimal.  10  
                        FOR
                      all   11    
                        DO
                      
                     
                        IF
                      
                       12       
                        THEN
                      
                       13        No redundant dependency exists. 

After executing the loop of lines 2–4, the right hand side of each dependency in consists of a single attribute. The equality follows from the union rule of Lemma 18.2 and the reflexivity axiom. Lines 6–8 minimise the left hand sides. In line 11 it is checked whether a given functional dependency of can be removed without changing the closure. is the closure of attribute set with respect to the family of functional dependencies .

Claim 18.6 Minimal-Cover( ) calculates a minimal cover of .

Proof. It is enough to show that during execution of the loop in lines 10–12, no functional dependency is generated whose left hand side could be decreased. Indeed, if a dependency would exist, such that for some held, then would also hold, where is the set of dependencies considered when is checked in lines 6–8. , which implies (see Exercise 18.1-1). Thus, should have been decreased already during execution of the loop in lines 6–8.

18.1.4. 18.1.4 Keys

In database design it is important to identify those attribute sets that uniquely determine the data in individual records.

Definition 18.7 Let be a relational schema. The set of attributes is called a superkey, if . A superkey is called a key, if it is minimal with respect to containment, that is no proper subset is key.

The question is how the keys can be determined from ? What makes this problem hard is that the number of keys could be super exponential function of the size of . In particular, Yu and Johnson constructed such relational schema, where , but the number of keys is . Békéssy and Demetrovics gave a beautiful and simple proof of the fact that starting from functional dependencies, at most key can be obtained. (This was independently proved by Osborne and Tompa.)

The proof of Békéssy and Demetrovics is based on the operation they introduced, which is defined for functional dependencies.

Definition 18.8 Let and be two functional dependencies. The binary operation is defined by

Some properties of operation is listed, the proof is left to the Reader (Exercise 18.1-3). Operation is associative, furthermore it is idempotent in the sense that if and for some , then .

Claim 18.9 (Békéssy and Demetrovics) Let be a relational schema and let be a listing of the functional dependencies. If is a key, then , where is an ordered subset of the index set , and is a trivial dependency in the form .

Proposition 18.9 bounds in some sense the possible sets of attributes in the search for keys. The next proposition gives lower and upper bounds for the keys.

Claim 18.10 Let be a relational schema and let . Let us assume without loss of generality that . Let and . If is a key in the schema , then

The proof is not too hard, it is left as an exercise for the Reader (Exercise 18.1-4). The algorithm List-keys( ) that lists the keys of the schema is based on the bounds of Proposition 18.10. The running time can be bounded by , but one cannot expect any better, since to list the output needs that much time in worst case.

List-Keys( )

  1        Let  and  be as defined in Proposition 18.10   2  
                        IF
                      
                        3    
                        THEN
                      
                     
                        RETURN
                      
                        4        
                      is the only key.   5  
                        IF
                      
                        6    
                        THEN
                      
                     
                        RETURN
                      
                        7        
                      is the only key.   8     9  
                        FOR
                      all permutations  of the attributes of   10    
                        DO
                         11       
                        FOR
                      
                      
                     
                        TO
                      
                        12          
                        DO
                      
                       13             
                        IF
                      
                       14                
                        THEN
                      
                       15         16  
                        RETURN
                      
                      
                  

Exercises

18.1-1 Let be a relational schema and let and be families of functional dependencies over . Show that

a. .

b. .

c. If , then .

Formulate and prove similar properties of the closure – with respect to – of an attribute set .

18.1-2 Derive the functional dependency from the set of dependencies using Armstrong-axioms (A1)–(A3).

18.1-3 Show that operation is associative, furthermore if for functional dependencies we have and for some , then .

18.1-4 Prove Proposition 18.10.

18.1-5 Prove the union, pseudo transitivity and decomposition rules of Lemma 18.2.