18.3. 18.3 Generalised dependencies

Two such dependencies will be discussed in this section that are generalizations of the previous ones, however cannot be axiomatised with axioms similar to (A1)–(A8).

18.3.1. 18.3.1 Join dependencies

Theorem 18.33 states that multivalued dependency is equivalent with that some decomposition the schema into two parts has the lossless join property. Its generalisation is the join dependency.

Definition 18.35 Let be a relational schema and let . The relation belonging to is said to satisfy the join dependency

if

In this setting satisfies multivalued dependency iff it satisfies the join dependency . The join dependency expresses that the decomposition has the lossless join property. One can define the fifth normal form, 5NF.

Definition 18.36 The relational schema is in fifth normal form, if it is in 4NF and has no non-trivial join dependency.

The fifth normal form has theoretical significance primarily. The schemata used in practice usually have primary keys. Using that the schema could be decomposed into subschemata of two attributes each, where one of the attributes is a superkey in every subschema.

Example 18.10 Consider the database of clients of a bank (Client-number,Name,Address,accountBalance). Here C is unique identifier, thus the schema could be decomposed into (CN,CA,CB), which has the lossless join property. However, it is not worth doing so, since no storage place can be saved, furthermore no anomalies are avoided with it.

There exists an axiomatisation of a dependency system if there is a finite set of inference rules that is sound and complete, i.e. logical implication coincides with being derivable by using the inference rules. For example, the Armstrong-axioms give an axiomatisation of functional dependencies, while the set of rules (A1)–(A8) is the same for functional and multivalued dependencies considered together. Unfortunately, the following negative result is true.

Theorem 18.37 The family of join dependencies has no finite axiomatisation.

In contrary to the above, Abiteboul, Hull and Vianu show in their book that the logical implication problem can be decided by an algorithm for the family of functional and join dependencies taken together. The complexity of the problem is as follows.

Theorem 18.38

  • It is NP-complete to decide whether a given join dependency is implied by another given join dependency and a functional dependency.

  • It is NP-hard to decide whether a given join dependency is implied by given set of multivalued dependencies.

18.3.2. 18.3.2 Branching dependencies

A generalisation of functional dependencies is the family of branching dependencies. Let us assume that and there exists no rows in relation over schema , such that they contain at most distinct values in columns of , but all values are pairwise distinct in some column of . Then is said to be -dependent on , in notation . In particular, holds if and only if functional dependency holds.

Example 18.11 Consider the database of the trips of an international transport truck.

  • One trip: four distinct countries.

  • One country has at most five neighbours.

  • There are 30 countries to be considered.

Let be the attributes of the countries reached in a trip. In this case does not hold, however another dependency is valid:

The storage space requirement of the database can be significantly reduced using these dependencies. The range of each element of the original table consists of 30 values, names of countries or some codes of them (5 bits each, at least). Let us store a little table ( bits) that contains a numbering of the neighbours of each country, which assigns to them the numbers 0,1,2,3,4 in some order. Now we can replace attribute by these numbers (), because the value of gives the starting country and the value of determines the second country with the help of the little table. The same holds for the attribute , but we can decrease the number of possible values even further, if we give a table of numbering the possible third countries for each pair. In this case, the attribute can take only 4 different values. The same holds for , too. That is, while each element of the original table could be encoded by 5 bits, now for the cost of two little auxiliary tables we could decrease the length of the elements in the second column to 3 bits, and that of the elements in the third and fourth columns to 2 bits.

The -closure of an attribute set can be defined:

In particular, . In case of branching dependencies even such basic questions are hard as whether there exists an Armstrong-relation for a given family of dependencies.

Definition 18.39 Let be a relational schema, be a set of dependencies of some dependency family defined on . A relation over schema is Armstrong-relation for , if the set of dependencies from that satisfies is exactly , that is .

Armstrong proved that for an arbitrary set of functional dependencies there exists Armstrong-relation for . The proof is based on the three properties of closures of attributes sets with respect to , listed in Exercise 18.1-1. For branching dependencies only the first two holds in general.

Lemma 18.40 Let , furthermore let be a relational schema. For one has

  1. and

  2. .

There exists such mapping and natural numbers that there exists no Armstrong-relation for in the family if -dependencies.Grant Minker investigated numerical dependencies that are similar to branching dependencies. For attribute sets the dependency holds in a relation over schema if for every tuple value taken on the set of attributes , there exists at most distinct tuple values taken on . This condition is stronger than that of , since the latter only requires that in each column of there are at most values, independently of each other. That allows different projections. Numerical dependencies were axiomatised in some special cases, based on that Katona showed that branching dependencies have no finite axiomatisation. It is still an open problem whether logical implication is algorithmically decidable amongst branching dependencies.

Exercises

18.3-1 Prove Theorem 18.38.

18.3-2 Prove Lemma 18.40.

18.3-3 Prove that if , then holds besides the two properties of Lemma 18.40.

18.3-4 A mapping is called a closure, if it satisfies the two properties of Lemma 18.40 and and the third one of Exercise 18.3-3. Prove that if is a closure, and is the family of dependencies defined by , then there exists an Armstrong-relation for in the family of -dependencies (functional dependencies) and in the family of -dependencies, respectively.

18.3-5 Let be the closure defined by

Prove that there exists no Armstrong-relation for in the family of -dependencies, if .

  PROBLEMS  

18-1 External attributes

Maier calls attribute an external attribute in the functional dependency with respect to the family of dependencies over schema , if the following two conditions hold:

  1. , or

  2. .

Design an running time algorithm, whose input is schema and output is a set of dependencies equivalent with that has no external attributes.

18-2 The order of the elimination steps in the construction of minimal cover is important

In the procedure Minimal-cover( ) the set of functional dependencies was changed in two ways: either by dropping redundant dependencies, or by dropping redundant attributes from the left hand sides of the dependencies. If the latter method is used first, until there is no more attribute that can be dropped from some left hand side, then the first method, this way a minimal cover is obtained really, according to Proposition 18.6. Prove that if the first method applied first and then the second, until there is no more possible applications, respectively, then the obtained set of dependencies is not necessarily a minimal cover of .

18-3 BCNF subschema

Prove that the following problem is coNP-complete: Given a relational schema with set of functional dependencies , furthermore , decide whether is in BCNF.

18-4 3NF is hard to recognise

Let be a relational schema, where is the system of functional dependencies.

The size key problem is the following: given a natural number , determine whether there exists a key of size at most .

The prime attribute problem is the following: for a given , determine whether it is a prime attribute.

  • a. Prove that the size key problem is NP-complete. Hint. Reduce the vertex cover problem to the prime attribute problem.

  • b. Prove that the prime attribute problem is NP-complete by reducing the size key problem to it.

  • c. Prove that determining whether the relational schema is in 3NF is NP-complete. Hint. Reduce the prime attribute problem to it.

18-5 Running time of Dependency-basis

Give an implementation of procedure Dependency-basis , whose running time is .

  CHAPTER NOTES  

The relational data model was introduced by Codd [ 66 ] in 1970. Functional dependencies were treated in his paper of 1972 [ 70 ], their axiomatisation was completed by Armstrong [ 20 ]. The logical implication problem for functional dependencies were investigated by Beeri and Bernstein [ 32 ], furthermore Maier [ 232 ]. Maier also treats the possible definitions of minimal covers, their connections and the complexity of their computations in that paper. Maier, Mendelzon and Sagiv found method to decide logical implications among general dependencies [ 233 ]. Beeri, Fagin and Howard proved that axiom system (A1)–(A8) is sound and complete for functional and multivalued dependencies taken together [ 34 ]. Yu and Johnson [ 353 ] constructed such relational schema, where and the number of keys is . Békéssy and Demetrovics [ 42 ] gave a simple and beautiful proof for the statement, that from functional dependencies at most keys can be obtained, thus Yu and Johnson's construction is extremal.

Armstrong-relations were introduced and studied by Fagin [ 100 ], [ 101 ], furthermore by Beeri, Fagin, Dowd and Statman [ 33 ].

Multivalued dependencies were independently discovered by Zaniolo [ 356 ], Fagin [ 102 ] and Delobel [ 83 ].

The necessity of normal forms was recognised by Codd while studying update anomalies [ 68 ], [ 67 ]. The Boyce–Codd normal form was introduced in [ 69 ]. The definition of the third normal form used in this chapter was given by Zaniolo [ 357 ]. Complexity of decomposition into subschemata in certain normal forms was studied by Lucchesi and Osborne [ 225 ], Beeri and Bernstein [ 32 ], furthermore Tsou and Fischer [ 328 ].

Theorems 18.30 and 18.31 are results of Beeri [ 31 ]. Theorem 18.34 is from a paper of Aho, Beeri és Ullman [ 6 ].

Theorems 18.37 and 18.38 are from the book of Abiteboul, Hull and Vianu [ 3 ], the non-existence of finite axiomatisation of join dependencies is Petrov's result [ 271 ].

Branching dependencies were introduced by Demetrovics, Katona and Sali, they studied existence of Armstrong-relations and the size of minimal Armstrong-relations [ 84 ], [ 85 ], [ 86 ], [ 292 ]. Katona showed that there exists no finite axiomatisation of branching dependencies in (ICDT'92 Berlin, invited talk) but never published.

Possibilities of axiomatisation of numerical dependencies were investigated by Grant and Minker [ 142 ], [ 143 ].

Good introduction of the concepts of this chapter can be found in the books of Abiteboul, Hull and Vianu [ 3 ], Ullman [ 331 ] furthermore Thalheim [ 321 ], respectively.