18.2. 18.2 Decomposition of relational schemata

A decomposition of a relational schema is a collection of subsets of such that

The 's need not be disjoint, in fact in most application they must not be. One important motivation of decompositions is to avoid anomalies.

Example 18.3 Anomalies Consider the following schema

 SUPPLIER-INFO(SNAME,ADDRESS,ITEM,PRICE) 

This schema encompasses the following problems:

  1. Redundancy. The address of a supplier is recorded with every item it supplies.

  2. Possible inconsistency (update anomaly). As a consequence of redundancy, the address of a supplier might be updated in some records and might not be in some others, hence the supplier would not have a unique address, even though it is expected to have.

  3. Insertion anomaly. The address of a supplier cannot be recorded if it does not supply anything at the moment. One could try to use NULL values in attributes ITEM and PRICE, but would it be remembered that it must be deleted, when a supplied item is entered for that supplier? More serious problem that SNAME and ITEM together form a key of the schema, and the NULL values could make it impossible to search by an index based on that key.

  4. Deletion anomaly. This is the opposite of the above. If all items supplied by a supplier are deleted, then as a side effect the address of the supplier is also lost.

All problems mentioned above are eliminated if schema SUPPLIER-INFO is replaced by two sub-schemata:

 SUPPLIER(SNAME,ADDRESS), 

 SUPPLIES(SNAME,ITEM,PRICE). 

In this case each suppliers address is recorded only once, and it is not necessary that the supplier supplies a item in order its address to be recorded. For the sake of convenience the attributes are denoted by single characters (SNAME), (ADDRESS), (ITEM), (PRICE).

Question is that is it correct to replace the schema by and ? Let be and instance of schema . It is natural to require that if and is used, then the relations belonging to them are obtained projecting to and , respectively, that is and . and contains the same information as , if can be reconstructed using only and . The calculation of from and can bone by the natural join operator.

Definition 18.11 The natural join of relations of schemata () is the relation belonging to the schema , which consists of all rows that for all there exists a row of relation such that . In notation .

Example 18.4 Let , , and . The natural join of and belongs to the schema , and it is the relation .

If is the natural join of and , that is , then és by Lemma 18.12. If , then the original relation could not be reconstructed knowing only and .

18.2.1. 18.2.1 Lossless join

Let be a decomposition of schema , furthermore let be a family of functional dependencies over . The decomposition is said to have lossless join property (with respect to ), if every instance of that satisfies also satisfies

That is, relation is the natural join of its projections to attribute sets , . For a decomposition , let denote the the mapping which assigns to relation the relation . Thus, the lossless join property with respect to a family of functional dependencies means that for all instances that satisfy .

Lemma 18.12 Let be a decomposition of schema , and let be an arbitrary instance of . Furthermore, let . Then

  1. .

  2. If , then .

  3. .

The proof is left to the Reader (Exercise 18.2-7).

18.2.2. 18.2.2 Checking the lossless join property

It is relatively not hard to check that a decomposition of schema has the lossless join property. The essence of algorithm Join-Test( ) is the following.

A array is constructed, whose column corresponds to attribute , while row corresponds to schema . if , otherwise .

The following step is repeated until there is no more possible change in the array. Consider a functional dependency from . If a pair of rows and agree in all attributes of , then their values in attributes of are made equal. More precisely, if one of the values in an attribute of is , then the other one is set to , as well, otherwise it is arbitrary which of the two values is set to be equal to the other one. If a symbol is changed, then each of its occurrences in that column must be changed accordingly. If at the end of this process there is an all row in , then the decomposition has the lossless join property, otherwise, it is lossy.

Join-Test( )

  1        Initialisation phase.   2  
                        FOR
                      
                      
                     
                        TO
                      
                        3    
                        DO
                      
                     
                        FOR
                      
                      
                     
                        TO
                      
                        4       
                        DO
                      
                     
                        IF
                      
                        5          
                        THEN
                      
                        6          
                        ELSE
                      
                        7        End of initialisation phase.   8     9  
                        REPEAT
                       10      11    
                        FOR
                      all   12       
                        DO
                      
                     
                        FOR
                      
                      
                     
                        TO
                      
                       13          
                        DO
                      
                     
                        FOR
                      
                      
                     
                        TO
                      
                       14             
                        DO
                      
                     
                        IF
                      for all  in  
                       15                
                        THEN
                      
                     
                        Equate(
                        
                        )
                       16  
                        UNTIL
                      
                       17  
                        IF
                      there exist an all 0 row in   18    
                        THEN
                      
                     
                        RETURN
                      “Lossless join”  19    
                        ELSE
                      
                     
                        RETURN
                      “Lossy join” 

Procedure Equate( ) makes the appropriate symbols equal.

Equate( )

  1  
                        FOR
                      
                        2    
                        DO
                      
                     
                        IF
                      
                        3       
                        THEN
                        4          
                        FOR
                      
                     
                     
                        TO
                      
                        5             
                        DO
                      
                     
                        IF
                      
                        6                
                        THEN
                      
                        7       
                        ELSE
                        8          
                        FOR
                      
                      
                     
                        TO
                      
                        9             
                        DO
                      
                     
                        IF
                      
                       10                
                        THEN
                      
                      
                  

Example 18.5 Checking lossless join property Let , , , , , , furthermore let the functional dependencies be . The initial array is shown on Figure 18.1(a). Using values 1,2,5 in column can be equated to 1. Then applying value 3 of column again can be changed to 1. The result is shown on Figure 18.1(b). Now can be used to change values 2,3,5 of column to 0. Then applying (the only nonzero) value 1 of column can be set to 0. Finally, makes it possible to change values 3 and 4 in column to be changed to 0. The final result is shown on Figure 18.1(c). The third row consists of only zeroes, thus the decomposition has the lossless join property.

Figure 18.1.  Application of Join-test( Application of Join-test( R,F,\rho ) . ) .

Application of Join-test( R,F,\rho ) .


It is clear that the running time of algorithm Join-test( ) is polynomial in the length of the input. The important thing is that it uses only the schema, not the instance belonging to the schema. Since the size of an instance is larger than the size of the schema by many orders of magnitude, the running time of an algorithm using the schema only is negligible with respect to the time required by an algorithm processing the data stored.

Theorem 18.13 Procedure Join-test( ) correctly determines whether a given decomposition has the lossless join property.

Proof. Let us assume first that the resulting array contains no all zero row. itself can be considered as a relational instance over the schema . This relation satisfies all functional dependencies from , because the algorithm finished since there was no more change in the table during checking the functional dependencies. It is true for the starting table that its projections to every 's contain an all zero row, and this property does not change during the running of the algorithm, since a 0 is never changed to another symbol. It follows, that the natural join contains the all zero row, that is . Thus the decomposition is lossy. The proof of the other direction is only sketched.

Logic, domain calculus is used. The necessary definitions can be found in the books of Abiteboul, Hull and Vianu, or Ullman, respectively. Imagine that variable is written in place of zeroes, and is written in place of 's in column , and Join-test( ) is run in this setting. The resulting table contains row , which corresponds to the all zero row. Every table can be viewed as a shorthand notation for the following domain calculus expression

where is the th row of . If is the starting table, then formula (18.1) defines exactly. As a justification note that for a relation , contains the row iff contains for all a row whose th coordinate is if is an attribute of , and arbitrary values represented by variables in the other attributes.

Consider an arbitrary relation belonging to schema that satisfies the dependencies of . The modifications (equating symbols) of the table done by Join-test( ) do not change the set of rows obtained from by (18.1), if the modifications are done in the formula, as well. Intuitively it can be seen from the fact that only such symbols are equated in (18.1), that can only take equal values in a relation satisfying functional dependencies of . The exact proof is omitted, since it is quiet tedious.

Since in the result table of Join-test( ) the all 's row occurs, the domain calculus formula that belongs to this table is of the following form:

It is obvious that if (18.2) is applied to relation belonging to schema , then the result will be a subset of . However, if satisfies the dependencies of , then (18.2) calculates . According to Lemma 18.12, holds, thus if satisfies , then (18.2) gives back exactly, so , that is the decomposition has the lossless join property.

Procedure Join-test( ) can be used independently of the number of parts occurring in the decomposition. The price of this generality is paid in the running time requirement. However, if is to be decomposed only into two parts, then Closure( ) or Linear-Closure( ) can be used to obtain the same result faster, according to the next theorem.

Theorem 18.14 Let be a decomposition of , furthermore let be a set of functional dependencies. Decomposition has the lossless join property with respect to iff

These dependencies need not be in , it is enough if they are in .

Proof. The starting table in procedure Join-test( ) is the following:

It is not hard to see using induction on the number of steps done by Join-test( ) that if the algorithm changes both values of the column of an attribute to 0, then . This is obviously true at the start. If at some time values of column must be equated, then by lines 11–14 of the algorithm, there exists , such that the two rows of the table agree on , and . By the induction assumption holds. Applying Armstrong-axioms (transitivity and reflexivity), follows.

On the other hand, let us assume that , that is . Then this functional dependency can be derived from using Armstrong-axioms. By induction on the length of this derivation it can be seen that procedure Join-test( ) will equate the two values of column , that is set them to 0. Thus, the row of will be all 0 iff , similarly, the row of will be all 0 iff .

18.2.3. 18.2.3 Dependency preserving decompositions

The lossless join property is important so that a relation can be recovered from its projections. In practice, usually not the relation belonging to the underlying schema is stored, but relations for an appropriate decomposition , in order to avoid anomalies. The functional dependencies of schema are integrity constraints of the database, relation is consistent if it satisfies all prescribed functional dependencies. When during the life time of the database updates are executed, that is rows are inserted into or deleted from the projection relations, then it may happen that the natural join of the new projections does not satisfy the functional dependencies of . It would be too costly to join the projected relations – and then project them again – after each update to check the integrity constraints. However, the projection of the family of functional dependencies to an attribute set can be defined: consists of those functional dependencies , where . After an update, if relation is changed, then it is relatively easy to check whether still holds. Thus, it would be desired if family would be logical implication of the families of functional dependencies . Let .

Definition 18.15 The decomposition is said to be dependency preserving. If

Note that , hence always holds. Consider the following example.

Example 18.6 Let be the underlying schema, furthermore let be the functional dependencies. Let the decomposition be . This has the lossless join property by Theorem 18.14. consists of besides the trivial dependencies. Let and . Two rows are inserted into each of the projections belonging to schemata and , respectively, so that functional dependencies of the projections are satisfied:

In this case and satisfy the dependencies prescribed for them separately, however in the dependency does not hold.

It is true as well, that none of the decompositions of this schema preserves the dependency . Indeed, this is the only dependency that contains on the right hand side, thus if it is to be preserved, then there has to be a subschema that contains , but then the decomposition would not be proper. This will be considered again when decomposition into normal forms is treated.

Note that it may happen that decomposition preserves functional dependencies, but does not have the lossless join property. Indeed, let , , and let the decomposition be .

Theoretically it is very simple to check whether a decomposition is dependency preserving. Just needs to be calculated, then projections need to be taken, finally one should check whether the union of the projections is equivalent with . The main problem with this approach is that even calculating may need exponential time.

Nevertheless, the problem can be solved without explicitly determining . Let . will not be calculated, only its equivalence with will be checked. For this end, it needs to be decidable for all functional dependencies that if is taken with respect to , whether it contains . The trick is that is determined without full knowledge of by repeatedly taking the effect to the closure of the projections of onto the individual 's. That is, the concept of -operation on an attribute set is introduced, where is another set of attributes: is replaced by , where the closure is taken with respect to . Thus, the closure of the part of that lies in is taken with respect to , then from the resulting attributes those are added to , which also belong to .

It is clear that the running time of algorithm Preserve( ) is polynomial in the length of the input. More precisely, the outermost fOR loop is executed at most once for each dependency in (it may happen that it turns out earlier that some dependency is not preserved). The body of the rEPEAT uNTIL loop in lines 3–7. requires linear number of steps, it is executed at most times. Thus, the body of the fOR loop needs quadratic time, so the total running time can be bounded by the cube of the input length.

Preserve( )

  1  
                        FOR
                      all    2    
                        DO
                      
                        3       
                        REPEAT
                        4             5          
                        FOR
                      
                      
                     
                        TO
                      
                        6             
                        DO
                      
                        7       
                        UNTIL
                      
                        8    
                        IF
                      
                        9       
                        THEN
                      
                     
                        RETURN
                      “Not dependency preserving”  10  
                        RETURN
                      “Dependency preserving” 

Example 18.7 Consider the schema , let the decomposition be , and dependencies be . That is, by the visible cycle of the dependencies, every attribute determines all others. Since and do not occur together in the decomposition one might think that the dependency is not preserved, however this intuition is wrong. The reason is that during the projection to , not only the dependency is obtained, but , as well, since not , but is projected. Similarly, and are obtained, as well, but is a logical implication of these by the transitivity of the Armstrong axioms. Thus it is expected that Preserve( ) claims that is preserved.

Start from the attribute set . There are three possible operations, the -operation, the -operation and the -operation. The first two obviously does not add anything to , since , that is the closure of the empty set should be taken, which is empty (in the present example). However, using the -operation:

In the next round using the -operation the actual is changed to , finally applying the -operation on this, is obtained. This cannot change, so procedure Preserve( ) stops. Thus, with respect to the family of functional dependencies

holds, that is . It can be checked similarly that the other dependencies of are in (as a fact in ).

Theorem 18.16 The procedure Preserve( ) determines correctly whether the decomposition is dependency preserving.

Proof. It is enough to check for a single functional dependency whether whether the procedure decides correctly if it is in . When an attribute is added to in lines 3–7, then Functional dependencies from are used, thus by the soundness of the Armstrong-axioms if Preserve( ) claims that , then it is indeed so.

On the other hand, if , then Linear-closure( ) (run by as input) adds the attributes of one-by-one to . In every step when an attribute is added, some functional dependency of is used. This dependency is in one of 's, since is the union of these. An easy induction on the number of functional dependencies used in procedure Linear-closure( ) shows that sooner or later becomes a subset of , then applying the -operation all attributes of are added to .

18.2.4. 18.2.4 Normal forms

The goal of transforming (decomposing) relational schemata into normal forms is to avoid the anomalies described in the previous section. Normal forms of many different strengths were introduced in the course of evolution of database theory, here only the Boyce–Codd normal formát (BCNF) and the third, furthermore fourth normal form (3NF and 4NF) are treated in detail, since these are the most important ones from practical point of view.

18.2.4.1.  Boyce-Codd normal form

Definition 18.17 Let be relational schema, be a family of functional dependencies over . is said to be in Boyce-Codd normal form if and implies that is a superkey.

The most important property of BCNF is that it eliminates redundancy. This is based on the following theorem whose proof is left to the Reader as an exercise (Exercise 18.2-8).

Theorem 18.18 Schema is in BCNF iff for arbitrary attribute and key there exists no , for which ; ; and .

In other words, Theorem 18.18 states that “BCNF There is no transitive dependence on keys”. Let us assume that a given schema is not in BCNF, for example and hold, but does not, then the same value could occur besides many different values, but at each occasion the same value would be stored with it, which is redundant. Formulating somewhat differently, the meaning of BCNF is that (only) using functional dependencies an attribute value in a row cannot be predicted from other attribute values. Indeed, assume that there exists a schema , in which the value of an attribute can be determined using a functional dependency by comparison of two rows. That is, there exists two rows that agree on an attribute set , differ on the set and the value of the remaining (unique) attribute can be determined in one of the rows from the value taken in the other row.

If the value ? can be determined by a functional dependency, then this value can only be , the dependency is , where is an appropriate subset of . However, cannot be a superkey, since the two rows are distinct, thus is not in BCNF.

18.2.4.2.  3NF

Although BCNF helps eliminating anomalies, it is not true that every schema can be decomposed into subschemata in BCNF so that the decomposition is dependency preserving. As it was shown in Example 18.6, no proper decomposition of schema preserves the dependency. At the same time, the schema is clearly not in BCNF, because of the dependency.

Since dependency preserving is important because of consistency checking of a database, it is practical to introduce a normal form that every schema has dependency preserving decomposition into that form, and it allows minimum possible redundancy. An attribute is called prime attribute, if it occurs in a key.

Definition 18.19 The schema is in third normal form, if whenever , then either is a superkey, or is a prime attribute.

The schema of Example 18.3 with the dependencies and is not in 3NF, since is the only key and so is not a prime attribute. Thus, functional dependency violates the 3NF property.

3NF is clearly weaker condition than BCNF, since “or is a prime attribute” occurs in the definition. The schema in Example 18.6 is trivially in 3NF, because every attribute is prime, but it was already shown that it is not in BCNF.

18.2.4.3.  Testing normal forms

Theoretically every functional dependency in should be checked whether it violates the conditions of BCNF or 3NF, and it is known that can be exponentially large in the size of . Nevertheless, it can be shown that if the functional dependencies in are of the form that the right hand side is a single attribute always, then it is enough to check violation of BCNF, or 3NF respectively, for dependencies of . Indeed, let be a dependency that violates the appropriate condition, that is is not a superkey and in case of 3NF, is not prime. . In the step when Closure( ) puts into (line 8) it uses a functional dependency from that and . This dependency is non-trivial and is (still) not prime. Furthermore, if were a superkey, than by , would also be a superkey. Thus, the functional dependency from violates the condition of the normal form. The functional dependencies easily can be checked in polynomial time, since it is enough to calculate the closure of the left hand side of each dependency. This finishes checking for BCNF, because if the closure of each left hand side is , then the schema is in BCNF, otherwise a dependency is found that violates the condition. In order to test 3NF it may be necessary to decide about an attribute whether it is prime or not. However this problem is NP-complete, see Problem 18-4.

18.2.4.4.  Lossless join decomposition into BCNF

Let be a relational schema (where is the set of functional dependencies). The schema is to be decomposed into union of subschemata , such that the decomposition has the lossless join property, furthermore each endowed with the set of functional dependencies is in BCNF. The basic idea of the decomposition is simple:

  • If is in BCNF, then ready.

  • If not, it is decomposed into two proper parts , whose join is lossless.

  • Repeat the above for and .

In order to see that this works one has to show two things:

  • If is not in BCNF, then it has a lossless join decomposition into smaller parts.

  • If a part of a lossless join decomposition is further decomposed, then the new decomposition has the lossless join property, as well.

Lemma 18.20 Let be a relational schema (where is the set of functional dependencies), be a lossless join decomposition of . Furthermore, let be a lossless join decomposition of with respect to . Then is a lossless join decomposition of .

The proof of Lemma 18.20 is based on the associativity of natural join. The details are left to the Reader (Exercise 18.2-9).

This can be applied for a simple, but unfortunately exponential time algorithm that decomposes a schema into subschemata of BCNF property. The projections in lines 4–5 of Naive-BCNF may be of exponential size in the length of the input. In order to decompose schema , the procedure must be called with parameters . Procedure Naive-BCNF is recursive, is the actual schema with set of functional dependencies . It is assumed that the dependencies in are of the form , where is a single attribute.

Naive-BCNF( )

  1  
                           WHILE
                         there exists , that violates BCNF   2    
                           DO
                         
                           3          4          5          6       
                           RETURN
                         
                        
                           (Naive-BCNF(
                           
                           ), Naive-BCNF}(
                           
                           ))
                           7  
                           RETURN
                         
                         
                     

However, if the algorithm is allowed overdoing things, that is to decompose a schema even if it is already in BCNF, then there is no need for projecting the dependencies. The procedure is based on the following two lemmae.

Lemma 18.21

  1. A schema of only two attributes is in BCNF.

  2. If is not in BCNF, then there exists two attributes and in , such that holds.

Proof. If the schema consists of two attributes, , then there are at most two possible non-trivial dependencies, and . It is clear, that if some of them holds, then the left hand side of the dependency is a key, so the dependency does not violate the BCNF property. However, if none of the two holds, then BCNF is trivially satisfied.

On the other hand, let us assume that the dependency violates the BCNF property. Then there must exists an attribute , since otherwise would be a superkey. For this , holds.

Let us note, that the converse of the second statement of Lemma 18.21 is not true. It may happen that a schema is in BCNF, but there are still two attributes that satisfy . Indeed, let , . This schema is obviously in BCNF, nevertheless .

The main contribution of Lemma 18.21 is that the projections of functional dependencies need not be calculated in order to check whether a schema obtained during the procedure is in BCNF. It is enough to calculate for pairs of attributes, which can be done by Linear-closure( ) in linear time, so the whole checking is polynomial (cubic) time. However, this requires a way of calculating without actually projecting down the dependencies. The next lemma is useful for this task.

Lemma 18.22 Let and let be the set of functional dependencies of scheme . Then

The proof is left for the Reader (Exercise 18.2-10). The method of lossless join BCNF decomposition is as follows. Schema is decomposed into two subschemata. One is that is in BCNF, satisfying . The other subschema is , hence by Theorem 18.14 the decomposition has the lossless join property. This is applied recursively to , until such a schema is obtained that satisfies property 2 of Lemma 18.21. The lossless join property of this recursively generated decomposition is guaranteed by Lemma 18.20.

Polynomial-BCNF( )

  1     2        
                         is the schema that is not known to be in BCNF during the procedure.   3     4  
                           WHILE
                         there exist  in , such that  
                        
                           AND
                         
                           5    
                           DO
                         Let  and  be such a pair   6          7          8       
                           WHILE
                         there exist  in , such that    9          
                           DO
                         
                          10               11         12         13    14  
                           RETURN
                         
                         
                     

The running time of Polynomial-BCNF( ) is polynomial, in fact it can be bounded by , as follows. During each execution of the loop in lines 4–12 the size of is decreased by at least one, so the loop body is executed at most times. is calculated in line 4 for at most pairs that can be done in linear time using Linear-closure that results in steps for each execution of the loop body. In lines 8–10 the size of is decreased in each iteration, so during each execution of lines 3–12, they give at most iteration. The condition of the command wHILE of line 8 is checked for pairs of attributes, each checking is done in linear time. The running time of the algorithm is dominated by the time required by lines 8–10 that take steps altogether.

18.2.4.5.  Dependency preserving decomposition into 3NF

We have seen already that its is not always possible to decompose a schema into subschemata in BCNF so that the decomposition is dependency preserving. Nevertheless, if only 3NF is required then a decomposition can be given using Minimal-Cover( ) . Let be a relational schema and be the set of functional dependencies. Using Minimal-Cover( ) a minimal cover of is constructed. Let .

Theorem 18.23 The decomposition is dependency preserving decomposition of into subschemata in 3NF.

Proof. Since and the functional dependency is in , the decomposition preserves every dependency of . Let us suppose indirectly, that the schema is not in 3NF, that is there exists a dependency that violates the conditions of 3NF. This means that the dependency is non-trivial and is not a superkey in and is not a prime attribute of . There are two cases possible. If , then using that is not a superkey follows. In this case the functional dependency contradicts to that was a member of minimal cover, since its left hand side could be decreased. In the case when , holds. is not prime in , thus is not a key, only a superkey. However, then would contain a key such that . Furthermore, would hold, as well, that contradicts to the minimality of since the left hand side of could be decreased.

If the decomposition needs to have the lossless join property besides being dependency preserving, then given in Theorem 18.23 is to be extended by a key of . Although it was seen before that it is not possible to list all keys in polynomial time, one can be obtained in a simple greedy way, the details are left to the Reader (Exercise 18.2-11).

Theorem 18.24 Let be a relational schema, and let be a minimal cover of . Furthermore, let be a key in . Then the decomposition is a lossless join and dependency preserving decomposition of into subschemata in 3NF.

Proof. It was shown during the proof of Theorem 18.23 that the subschemata are in 3NF for . There cannot be a non-trivial dependency in the subschema , because if it were, then would not be a key, only a superkey.

The lossless join property of is shown by the use of Join-test( ) procedure. Note that it is enough to consider the minimal cover of . More precisely, we show that the row corresponding to in the table will be all 0 after running Join-test( ) . Let be the order of the attributes of as Closure( ) inserts them into . Since is a key, every attribute of is taken during Closure( ) . It will be shown by induction on that the element in row of and column of is 0 after running Join-test( ) .

The base case of is obvious. Let us suppose that the statement is true for and consider when and why is inserted into . In lines 6–8 of Closure( ) such a functional dependency is used where . Then , for some . The rows corresponding to and agree in columns of (all 0 by the induction hypothesis), thus the entries in column of are equated by Join-test( ) . This value is 0 in the row corresponding to , thus it becomes 0 in the row of , as well.

It is interesting to note that although an arbitrary schema can be decomposed into subschemata in 3NF in polynomial time, nevertheless it is NP-complete to decide whether a given schema is in 3NF, see Problem 18-4. However, the BCNF property can be decided in polynomial time. This difference is caused by that in order to decide 3NF property one needs to decide about an attribute whether it is prime. This latter problem requires the listing of all keys of a schema.

18.2.5. 18.2.5 Multivalued dependencies

Example 18.8 Besides functional dependencies, some other dependencies hold in Example 18.1, as well. There can be several lectures of a subject in different times and rooms. Part of an instance of the schema could be the following.

A set of values of Time and Room attributes, respectively, belong to each given value of Subject, and all other attribute values are repeated with these. Sets of attributes SR and StG are independent, that is their values occur in each combination.

The set of attributes is said to be multivalued dependent on set of attributes , in notation , if for every value on , there exists a set of values on that is not dependent in any way on the values taken in . The precise definition is as follows.

Definition 18.25 The relational schema satisfies the multivalued dependency , if for every relation of schema and arbitrary tuples of that satisfy , there exists tuples such that

holds.

Footnote. It would be enough to require the existence of , since the existence of would follow. However, the symmetry of multivalued dependency is more apparent in this way.

In Example 18.8 S TR holds.

Remark 18.26 Functional dependency is equality generating dependency, that is from the equality of two objects it deduces the equality of other other two objects. On the other hand, multivalued dependency is tuple generating dependency, that is the existence of two rows that agree somewhere implies the existence of some other rows.

There exists a sound and complete axiomatisation of multivalued dependencies similar to the Armstrong-axioms of functional dependencies. Logical implication and inference can be defined analogously. The multivalued dependency is logically implied by the set of multivalued dependencies, in notation , if every relation that satisfies all dependencies of also satisfies .

Note, that implies . The rows and of Definition 18.25 can be chosen as and , respectively. Thus, functional dependencies and multivalued dependencies admit a common axiomatisation. Besides Armstrong-axioms (A1)–(A3), five other are needed. Let be a relational schema.

  • (A4) Complementation: .

  • (A5) Extension: If holds, and , then .

  • (A6) Transitivity: .

  • (A7) .

  • (A8) If holds, , furthermore for some disjoint from holds, then is true, as well.

Beeri, Fagin and Howard proved that (A1)–(A8) is sound and complete system of axioms for functional and Multivalued dependencies together. Proof of soundness is left for the Reader (Exercise 18.2-12), the proof of the completeness exceeds the level of this book. The rules of Lemma 18.2 are valid in exactly the same way as when only functional dependencies were considered. Some further rules are listed in the next Proposition.

Claim 18.27 The followings are true for multivalued dependencies.

  1. Union rule: .

  2. Pseudotransitivity: .

  3. Mixed pseudotransitivity: .

  4. Decomposition rule for multivalued dependencies: if and holds, then , and holds, as well.

The proof of Proposition 18.27 is left for the Reader (Exercise 18.2-13).

18.2.5.1.  Dependency basis

Important difference between functional dependencies and multivalued dependencies is that immediately implies for all in , however is deduced by the decomposition rule for multivalued dependencies from only if there exists a set of attributes such that and , or . Nevertheless, the following theorem is true.

Theorem 18.28 Let be a relational schema, be a set of attributes. Then there exists a partition of the set of attributes such that for the multivalued dependency holds if and only if is the union of some 's.

Proof. We start from the one-element partition . This will be refined successively, while the property that holds for all in the actual decomposition, is kept. If and is not a union of some of the 's, then replace every such that neither nor is empty by and . According to the decomposition rule of Proposition 18.27, both and hold. Since is finite, the refinement process terminates after a finite number of steps, that is for all such that holds, is the union of some blocks of the partition. In order to complete the proof one needs to observe only that by the union rule of Proposition 18.27, the union of some blocks of the partition depends on in multivalued way.

Definition 18.29 The partition constructed in Theorem 18.28 from a set of functional and multivalued dependencies is called the dependency basis of (with respect to ).

Example 18.9 Consider the familiar schema

 R(Professor,Subject,Room,Student,Grade,Time) 

of Examples 18.1 and 18.8. Su RT was shown in Example 18.8. By the complementation rule Su PStG follows. Su P is also known. This implies by axiom (A7) that Su P. By the decomposition rule Su Stg follows. It is easy to see that no other one-element attribute set is determined by Su via multivalued dependency. Thus, the dependency basis of Su is the partition {P,RT,StG}.

We would like to compute the set of logical consequences of a given set of functional and multivalued dependencies. One possibility is to apply axioms (A1)–(A8) to extend the set of dependencies repeatedly, until no more extension is possible. However, this could be an exponential time process in the size of . One cannot expect any better, since it was shown before that even can be exponentially larger than . Nevertheless, in many applications it is not needed to compute the whole set , one only needs to decide whether a given functional dependency or multivalued dependency belongs to or not. In order to decide about a multivalued dependency , it is enough to compute the dependency basis of , then to check whether can be written as a union of some blocks of the partition. The following is true.

Theorem 18.30 (Beeri) In order to compute the dependency basis of a set of attributes with respect to a set of dependencies , it is enough to consider the following set of multivalued dependencies:

  1. All multivalued dependencies of and

  2. for every in the set of multivalued dependencies , where , and the 's are single attributes.

The only thing left is to decide about functional dependencies based on the dependency basis. Closure( ) works correctly only if multivalued dependencies are not considered. The next theorem helps in this case.

Theorem 18.31 (Beeri) Let us assume that and the dependency basis of with respect to the set of multivalued dependencies obtained in Theorem 18.30 is known. holds if and only if

  1. forms a single element block in the partition of the dependency basis, and

  2. There exists a set of attributes that does not contain , is an element of the originally given set of dependencies , furthermore .

Based on the observations above, the following polynomial time algorithm can be given to compute the dependency basis of a set of attributes .

Dependency-Basis( )

  1        
                         The collection of sets in the dependency basis is .   2  
                           REPEAT
                           3    
                           FOR
                         all    4       
                           DO
                         
                        
                           IF
                         there exists  such that    5          
                           THEN
                         
                           6  
                           UNTIL
                         
                         does not change   7  
                           RETURN
                         
                        
                     

It is immediate that if changes in lines 3–5. of Dependency-basis( ) , then some block of the partition is cut by the algorithm. This implies that the running time is a polynomial function of the sizes of and . In particular, by careful implementation one can make this polynomial to , see Problem 18-5.

18.2.5.2.  Fourth normal form 4NF

The Boyce-Codd normal form can be generalised to the case where multivalued dependencies are also considered besides functional dependencies, and one needs to get rid of the redundancy caused by them.

Definition 18.32 Let be a relational schema, be a set of functional and multivalued dependencies over . is in fourth normal form (4NF), if for arbitrary multivalued dependency for which and , holds that is superkey in .

Observe that 4NF BCNF. Indeed, if violated the BCNF condition, then , furthermore could not contain all attributes of , because that would imply that is a superkey. However, implies by (A8), which in turn would violate the 4NF condition.

Schema together with set of functional and multivalued dependencies can be decomposed into , where each is in 4NF and the decomposition has the lossless join property. The method follows the same idea as the decomposition into BCNF subschemata. If schema is not in 4NF, then there exists a multivalued dependency in the projection of onto that violates the 4NF condition. That is, is not a superkey in , neither is empty, nor is a subset of , furthermore the union of and is not . It can be assumed without loss of generality that and are disjoint, since is implied by using (A1), (A7) and the decomposition rule. In this case can be replaced by subschemata and , each having a smaller number of attributes than itself, thus the process terminates in finite time.Two things has to be dealt with in order to see that the process above is correct.

  • Decomposition has the lossless join property.

  • How can the projected dependency set be computed?

  • The first problem is answered by the following theorem.

Theorem 18.33 The decomposition of schema has the lossless join property with respect to a set of functional and multivalued dependencies iff

Proof. The decomposition of schema has the lossless join property iff for any relation over the schema that satisfies all dependencies from holds that if and are two tuples of , then there exists a tuple satisfying and , then it is contained in . More precisely, is the natural join of the projections of on and of on , respectively, which exist iff . Thus the fact that is always contained in is equivalent with that .

To compute the projection of the dependency set one can use the following theorem of Aho, Beeri and Ullman. is the set of multivalued dependencies that are logical implications of and use attributes of only.

Theorem 18.34 (Aho, Beeri és Ullman) consists of the following dependencies:

  • For all , if , then .

  • For all , if , then .

Other dependencies cannot be derived from the fact that holds in .

Unfortunately this theorem does not help in computing the projected dependencies in polynomial time, since even computing could take exponential time. Thus, the algorithm of 4NF decomposition is not polynomial either, because the 4NF condition must be checked with respect to the projected dependencies in the subschemata. This is in deep contrast with the case of BCNF decomposition. The reason is, that to check BCNF condition one does not need to compute the projected dependencies, only closures of attribute sets need to be considered according to Lemma 18.21.

Exercises

18.2-1 Are the following inference rules sound?

a. If and , then .

b. If and , then .

c. If and , then .

18.2-2 Prove Theorem 18.30, that is show the following. Let be a set of functional and multivalued dependencies, and let . Then

a. , and

b. .

Hint. Use induction on the inference rules to prove b.

18.2-3 Consider the database of an investment firm, whose attributes are as follows: (stockbroker), (office of stockbroker), (investor), (stock), (amount of stocks of the investor), (dividend of the stock). The following functional dependencies are valid: , , , .

a. Determine a key of schema .

b. How many keys are in schema ?

c. Give a lossless join decomposition of into subschemata in BCNF.

d. Give a dependency preserving and lossless join decomposition of into subschemata in 3NF.

18.2-4 The schema of Exercise 18.2-3 is decomposed into subschemata , , and . Does this decomposition have the lossless join property?

18.2-5 Assume that schema of Exercise 18.2-3 is represented by , , and subschemata. Give a minimal cover of the projections of dependencies given in Exercise 18.2-3. Exhibit a minimal cover for the union of the sets of projected dependencies. Is this decomposition dependency preserving?

18.2-6 Let the functional dependency of Exercise 18.2-3 be replaced by the multivalued dependency . That is , represents the stock's dividend “history”.

a. Compute the dependency basis of .

b. Compute the dependency basis of .

c. Give a decomposition of into subschemata in 4NF.

18.2-7 Consider the decomposition of schema . Let , furthermore . Prove:

a. .

b. If , then .

c. .

18.2-8 Prove that schema is in BCNF iff for arbitrary and key , it holds that there exists no , for which ; ; and .

18.2-9 Prove Lemma 18.20.

18.2-10 Let us assume that and the set of functional dependencies of schema is . Prove that .

18.2-11 Give a running time algorithm to find a key of the relational schema .

Hint. Use that is superkey and each superkey contains a key. Try to drop attributes from one-by-one and check whether the remaining set is still a key.

18.2-12 Prove that axioms (A1)–(A8) are sound for functional and multivalued dependencies.

18.2-13 Derive the four inference rules of Proposition 18.27 from axioms (A1)–(A8).