The relational datamodel was introduced by Codd in 1970. It is the most widely used datamodel—extended with the possibilities of the World Wide Web—, because of its simplicity and flexibility. The main idea of the relational model is that data is organised in relational tables, where rows correspond to individual **records** and columns to **attributes**. A **relational schema** consists of one or more relations and their attribute sets. In the present chapter only schemata consisting of one relation are considered for the sake of simplicity. In contrast to the mathematical concept of relations, in the relational schema the order of the attributes is not important, always **sets** of attributes are considered instead of **lists**. Every attribute has an associated **domain** that is a set of elementary values that the attribute can take values from. As an example, consider the following schema.

Employee(Name,Mother's name,Social Security Number,Post,Salary) |

The domain of attributes **Name** and **Mother's name** is the set of finite character strings (more precisely its subset containing all possible names). The domain of **Social Security Number** is the set of integers satisfying certain formal and parity check requirements. The attribute **Post** can take values from the set {Director,Section chief,System integrator,Programmer,Receptionist,Janitor,Handyman}. An **instance** of a schema is a relation if its columns correspond to the attributes of and its rows contain values from the domains of attributes at the attributes' positions. A typical row of a relation of the Employee schema could be

(John Brown,Camille Parker,184-83-2010,Programmer,$172,000) |

There can be dependencies between different data of a relation. For example, in an instance of the Employee schema the value of Social Security Number determines all other values of a row. Similarly, the pair (Name,Mother's name) is a unique identifier. Naturally, it may occur that some set of attributes do not determine all attributes of a record uniquely, just some of its subsets.

A relational schema has several **integrity constraints** attached. The most important kind of these is **functional dependency**. Let and be two sets of attributes.
**functionally depends** on , in notation, means that whenever two records are identical in the attributes belonging to , then they must agree in the attribute belonging to , as well. Throughout this chapter the attribute set is denoted by for the sake of convenience.

**Example 18.1 **
*Functional dependencies* Consider the schema

R(Professor,Subject,Room,Student,Grade,Time). |

The meaning of an individual record is that a given student got a given grade of a given subject that was taught by a given professor at a given time slot. The following functional dependencies are satisfied.

Su P: One subject is taught by one professor.

PT R: A professor teaches in one room at a time.

StT R: A student attends a lecture in one room at a time.

StT Su: A student attends a lecture of one subject at a time.

SuSt G: A student receives a unique final grade of a subject.

In Example 18.1 the attribute set **StT** uniquely determines the values of all other attributes, furthermore it is minimal such set with respect to containment. This kind attribute sets are called **keys**. If all attributes are functionally dependent on a set of attributes , then is called a **superkey**. It is clear that every superkey contains a key and that any set of attributes containing a superkey is also a superkey.

Some functional dependencies valid for a given relational schema are known already in the design phase, others are consequences of these. The **StT**
**P** dependency is implied by the **StT**
**Su** and **Su**
**P** dependencies in Example 18.1. Indeed, if two records agree on attributes **St** and **T**, then they must have the same value in attribute **Su**. Agreeing in **Su** and **Su**
**P** implies that the two records agree in **P**, as well, thus **StT**
**P** holds.

**Definition 18.1 **
*Let be a relational schema, be a set of functional dependencies over . The functional dependency is logically implied by , in notation , if each instance of that satisfies all dependencies of also satisfies . The closure of a set of functional dependencies is the set given by*

In order to determine keys, or to understand logical implication between functional dependencies, it is necessary to know the closure of a set of functional dependencies, or for a given dependency the question whether it belongs to must be decidable. For this, **inference rules** are needed that tell that from a set of functional dependencies what others follow. The **Armstrong-axioms** form a system of **sound** and **complete** inference rules. A system of rules is sound if only valid functional dependencies can be derived using it. It is complete, if every dependency that is logically implied by the set is derivable from using the inference rules.

*
Armstrong-axioms
*

(A1)

**Reflexivity**implies .(A2)

**Augmentation**If , then for arbitrary , holds.(A3)

**Transitivity**If and hold, then holds, as well.

**Example 18.2 **
*Derivation by the Armstrong-axioms.* Let and , then is a key:

is given.

1. is augmented by (A2) with .

is given.

3. is augmented by (A2) with .

transitivity (A3) is applied to 2. and 4.

Thus it is shown that is superkey. That it is really a key, follows from algorithm *
Closure(
)
*.

There are other valid inference rules besides (A1)–(A3). The next lemma lists some, the proof is left to the Reader (Exercise 18.1-5).

**Union rule**.**Pseudo transitivity**.**Decomposition***If holds and , then holds, as well.*

The soundness of system (A1)–(A3) can be proven by easy induction on the length of the derivation. The completeness will follow from the proof of correctness of algorithm *
Closure(
)
* by the following lemma. Let denote the

**Lemma 18.3 **
*The functional dependency follows from the family of functional dependencies by the Armstrong-axioms iff .*

**Proof. **Let where 's are attributes, and assume that . follows by the Armstrong-axioms for all by the definition of . Applying the union rule of Lemma 18.2
follows. On the other hand, assume that can be derived by the Armstrong-axioms. By the decomposition rule of Lemma 18.2
follows by (A1)–(A3) for all . Thus, .

Calculation of closures is important in testing equivalence or logical implication between systems of functional dependencies. The first idea could be that for a given family of functional dependencies in order to decide whether , it is enough to calculate and check whether holds. However, the size of could be exponential in the size of input. Consider the family of functional dependencies given by

consists of all functional dependencies of the form , where , thus . Nevertheless, the closure of an attribute set with respect to can be determined in linear time of the total length of functional dependencies in . The following is an algorithm that calculates the closure of an attribute set with respect to . The input consists of the schema , that is a finite set of attributes, a set of functional dependencies defined over , and an attribute set .

*
Closure(
)
*

1 2 3 Functional dependencies not used yet. 45 6`REPEAT`

all in 7`FOR`

`DO`

8`IF`

9 10 11`THEN`

`UNTIL`

It is easy to see that the attributes that are put into any of the 's by *
Closure(
)
* really belong to . The harder part of the correctness proof of this algorithm is to show that each attribute belonging to will be put into some of the 's.

**Theorem 18.4 **
*
*

`Closure(`

`)`

correctly calculates .
**Proof. **First we prove by induction that if an attribute is put into an during *
Closure(
)
*, then really belongs to .

*Base case:*
. I this case and by reflexivity (A1) .

*Induction step:* Let and assume that . is put into , because there is a functional dependency in , where and . By induction, holds, which implies using Lemma 18.3 that holds, as well. By transitivity (A3) and implies . By reflexivity (A1) and , holds. Applying transitivity again, is obtained, that is .

On the other hand, we show that if , then is contained in the result of *
Closure(
)
*. Suppose in contrary that , but , where is the result of

`Closure(`

`)`

Let us suppose that the above violates a functional dependency of , that is , but is not a subset of . However, in this case *
Closure(
)
* could not have stopped yet, since .

implies using Lemma 18.3 that follows from by the Armstrong-axioms. (A1)–(A3) is a sound system of inference rules, hence in every instance that satisfies , must hold. However, the only way this could happen in instance is if .

Let us observe that the relation instance given in the proof above provides the completeness proof for the Armstrong-axioms, as well. Indeed, the closure calculated by *
Closure(
)
* is the set of those attributes for which follows from by the Armstrong-axioms. Meanwhile, for every other attribute , there exist two rows of that agree on , but differ in , that is

The running tome of *
Closure(
)
* is , where is the length of he input. Indeed, in the

`rEPEAT`

`uNTIL`

For every yet unused dependency of it is kept track of how many attributes of are not yet included in the closure ().

For every attribute those yet unused dependencies are kept in a doubly linked list whose left side contains .

Those not yet used dependencies are kept in a linked list , whose left side 's every attribute is contained in the closure already, that is for which .

It is assumed that the family of functional dependencies is given as a set of attribute pairs , representing . The *
Linear-Closure(
)
* algorithm is a modification of

`Closure(`

`)`

Algorithm *
Linear-Closure(
)
* consists of two parts. In the initialisation phase (lines 1–13) the lists are initialised. The loops of lines 2–5 and 6–8, respectively, take time. The loop in lines 9–11 means steps. If the length of the input is denoted by , then this is steps altogether.

During the execution of lines 14–23, every functional dependency is examined at most once, when it is taken off from list . Thus, lines 15–16 and 23 take at most steps. The running time of the loops in line 17–22 can be estimated by observing that the sum is decreased by one in each execution, hence it takes steps, where is the value obtained in the initialisation phase. However, , thus lines 14–23 also take time in total.

*
Linear-Closure(
)
*

1 Initialisation phase. 2all 3`FOR`

`DO`

all 4`FOR`

add to list 5 6`DO`

all 7`FOR`

`DO`

all of list 8`FOR`

9`DO`

all 10`FOR`

`DO`

11`IF`

add to list 12 13 End of initialisation phase. 14`THEN`

is nonempty 15`WHILE`

16 delete from list 17`DO`

all 18`FOR`

`DO`

all of list 19`FOR`

20`DO`

21`IF`

add to list 22 delete from list 23 24`THEN`

`RETURN`

Algorithm *
Linear-Closure(
)
* can be used to test equivalence of systems of dependencies. Let and be two families of functional dependencies. and are said to be

`Linear-Closure(`

`)`

**Definition 18.5 **
*The system of functional dependencies is a minimal cover of the family of functional dependencies iff is equivalent with , and*

*functional dependencies of are in the form , where is an attribute and ,**no functional dependency can be dropped from , i.e., ,**the left sides of dependencies in are minimal, that is , .*

Every set of functional dependencies have a minimal cover, namely algorithm *
Minimal-Cover(
)
* constructs one.

*
Minimal-Cover(
)
*

1 2all 3`FOR`

`DO`

all 4`FOR`

5 Each right hand side consists of a single attribute. 6`DO`

all 7`FOR`

`DO`

there exists 8 9 Each left hand side is minimal. 10`WHILE`

all 11`FOR`

`DO`

12`IF`

13 No redundant dependency exists.`THEN`

After executing the loop of lines 2–4, the right hand side of each dependency in consists of a single attribute. The equality follows from the union rule of Lemma 18.2 and the reflexivity axiom. Lines 6–8 minimise the left hand sides. In line 11 it is checked whether a given functional dependency of can be removed without changing the closure. is the closure of attribute set with respect to the family of functional dependencies .

**Claim 18.6 **
*
*

`Minimal-Cover(`

`)`

calculates a minimal cover of .
**Proof. **It is enough to show that during execution of the loop in lines 10–12, no functional dependency is generated whose left hand side could be decreased. Indeed, if a dependency would exist, such that for some
held, then would also hold, where is the set of dependencies considered when is checked in lines 6–8. , which implies (see Exercise 18.1-1). Thus, should have been decreased already during execution of the loop in lines 6–8.

In database design it is important to identify those attribute sets that uniquely determine the data in individual records.

**Definition 18.7 **
*Let be a relational schema. The set of attributes is called a superkey, if . A superkey is called a key, if it is minimal with respect to containment, that is no proper subset is key.*

The question is how the keys can be determined from ? What makes this problem hard is that the number of keys could be super exponential function of the size of . In particular, Yu and Johnson constructed such relational schema, where , but the number of keys is . Békéssy and Demetrovics gave a beautiful and simple proof of the fact that starting from functional dependencies, at most key can be obtained. (This was independently proved by Osborne and Tompa.)

The proof of Békéssy and Demetrovics is based on the operation they introduced, which is defined for functional dependencies.

**Definition 18.8 **
*Let and be two functional dependencies. The binary operation is defined by*

Some properties of operation is listed, the proof is left to the Reader (Exercise 18.1-3). Operation is associative, furthermore it is idempotent in the sense that if and for some , then .

**Claim 18.9 **
**(Békéssy and Demetrovics)**
*Let be a relational schema and let be a listing of the functional dependencies. If is a key, then , where is an ordered subset of the index set , and is a trivial dependency in the form .*

Proposition 18.9 bounds in some sense the possible sets of attributes in the search for keys. The next proposition gives lower and upper bounds for the keys.

**Claim 18.10 **
*Let be a relational schema and let . Let us assume without loss of generality that . Let and . If is a key in the schema , then*

The proof is not too hard, it is left as an exercise for the Reader (Exercise 18.1-4). The algorithm *
List-keys(
)
* that lists the keys of the schema is based on the bounds of Proposition 18.10. The running time can be bounded by , but one cannot expect any better, since to list the output needs that much time in worst case.

*
List-Keys(
)
*

1 Let and be as defined in Proposition 18.10 23`IF`

`THEN`

4 is the only key. 5`RETURN`

6`IF`

`THEN`

7 is the only key. 8 9`RETURN`

all permutations of the attributes of 10`FOR`

11`DO`

`FOR`

12`TO`

13`DO`

14`IF`

15 16`THEN`

`RETURN`

**Exercises**

18.1-1 Let be a relational schema and let and be families of functional dependencies over . Show that

*a.*
.

*b.*
.

*c.* If , then .

Formulate and prove similar properties of the closure – with respect to – of an attribute set .

18.1-2 Derive the functional dependency from the set of dependencies using Armstrong-axioms (A1)–(A3).

18.1-3 Show that operation is associative, furthermore if for functional dependencies we have and for some , then .

18.1-4 Prove Proposition 18.10.

18.1-5 Prove the union, pseudo transitivity and decomposition rules of Lemma 18.2.