Lossless Joins |
A set of relations satisfies the lossless join property if the instances can
be joined without creating invalid data (i.e. new rows). The term lossless join
may be somewhat confusing. A join that is not lossless will contain extra,
invalid rows. A join that is lossless will not contain extra, invalid rows. Thus
the term gainless join might be more appropriate.
To give an example of incorrect information created by an invalid join let us
take the following data structure:
R(student, course, instructor, hour, room, grade)
Assuming that only one section of a class is offered during a semester we can
define the following functional dependencies:
- (HOUR, ROOM)
COURSE
- (COURSE, STUDENT)
GRADE
- (INSTRUCTOR, HOUR)
ROOM
- (COURSE)
INSTRUCTOR
- (HOUR, STUDENT)
ROOM
Take the following sample data:
STUDENT |
COURSE |
INSTRUCTOR |
HOUR |
ROOM |
GRADE |
Smith |
Math 1 |
Jenkins |
8:00 |
100 |
A |
Jones |
English |
Goldman |
8:00 |
200 |
B |
Brown |
English |
Goldman |
8:00 |
200 |
C |
Green |
Algebra |
Jenkins |
9:00 |
400 |
A |
The following four relations, each in 4th normal form, can be generated from
the given and implied dependencies:
R1(STUDENT, HOUR, COURSE)
R2(STUDENT, COURSE, GRADE)
R3(COURSE, INSTRUCTOR)
R4(INSTRUCTOR, HOUR, ROOM)
Note that the dependencies (HOUR, ROOM)
COURSE and (HOUR, STUDENT)
ROOM are not explicitly represented in the preceding decomposition. The goal is
to develop relations in 4th normal form that can be joined to answer any ad hoc
inquiries correctly. This goal can be achieved without representing every
functional dependency as a relation. Furthermore, several sets of relations may
satisfy the goal.
|