Fifth normal form

From The Right Wiki
Jump to navigationJump to search

Fifth normal form (5NF), also known as projection–join normal form (PJ/NF), is a level of database normalization designed to remove redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships. A table is said to be in the 5NF if and only if every non-trivial join dependency in that table is implied by the candidate keys. It is the final normal form as far as removing redundancy is concerned. A 6NF also exists, but its purpose is not to remove redundancy and it is therefore only adopted by a few data warehouses, where it can be useful to make tables irreducible. A join dependency *{A, B, … Z} on R is implied by the candidate key(s) of R if and only if each of A, B, …, Z is a superkey for R.[1] The fifth normal form was first described by Ronald Fagin in his 1979 conference paper Normal forms and relational database operators.[2]

Example

Consider the following example:

Traveling-salesman product availability by brand
Traveling salesman Brand Product type
Jack Schneider Acme Vacuum cleaner
Jack Schneider Acme Breadbox
Mary Jones Robusto Pruning shears
Mary Jones Robusto Vacuum cleaner
Mary Jones Robusto Breadbox
Mary Jones Robusto Umbrella stand
Louis Ferguson Robusto Vacuum cleaner
Louis Ferguson Robusto Telescope
Louis Ferguson Acme Vacuum cleaner
Louis Ferguson Acme Lava lamp
Louis Ferguson Nimbus Tie rack

The table's predicate is: products of the type designated by product type, made by the brand designated by brand, are available from the traveling salesman designated by traveling salesman. The primary key is the composite of all three columns. Also note that the table is in 4NF, since there are no multivalued dependencies (2-part join dependencies) in the table: no column (which by itself is not a candidate key or a superkey) is a determinant for the other two columns. In the absence of any rules restricting the valid possible combinations of traveling salesman, brand, and product type, the three-attribute table above is necessary in order to model the situation correctly. Suppose, however, that the following rule applies: A traveling salesman has certain brands and certain product types in their repertoire. If brand B1 and brand B2 are in their repertoire, and product type P is in their repertoire, then (assuming brand B1 and brand B2 both make product type P), the traveling salesman must offer products of product type P those made by brand B1 and those made by brand B2. In that case, it is possible to split the table into three:

Product types by traveling salesman
Traveling salesman Product type
Jack Schneider Vacuum cleaner
Jack Schneider Breadbox
Mary Jones Pruning shears
Mary Jones Vacuum cleaner
Mary Jones Breadbox
Mary Jones Umbrella stand
Louis Ferguson Telescope
Louis Ferguson Vacuum cleaner
Louis Ferguson Lava lamp
Louis Ferguson Tie rack
Brands by traveling salesman
Traveling salesman Brand
Jack Schneider Acme
Mary Jones Robusto
Louis Ferguson Robusto
Louis Ferguson Acme
Louis Ferguson Nimbus
Product types by brand
Brand Product type
Acme Vacuum cleaner
Acme Breadbox
Acme Lava lamp
Robusto Pruning shears
Robusto Vacuum cleaner
Robusto Breadbox
Robusto Umbrella stand
Robusto Telescope
Nimbus Tie rack

In this case, it's impossible for Louis Ferguson to refuse to offer vacuum cleaners made by Acme (assuming Acme makes vacuum cleaners) if he sells anything else made by Acme (lava lamp) and he also sells vacuum cleaners made by any other brand (Robusto). Note how this setup helps to remove redundancy. Suppose that Jack Schneider starts selling Robusto's products breadboxes and vacuum cleaners. In the previous setup we would have to add two new entries one for each product type (<Jack Schneider, Robusto, breadboxes>, <Jack Schneider, Robusto, vacuum cleaners>). With the new setup we need to add only a single entry (<Jack Schneider, Robusto>) in "brands by traveling salesman".

Usage

Only in rare situations does a 4NF table not conform to 5NF; for instance, when the decomposed tables are cyclic. These are situations in which a complex real-world constraint governing the valid combinations of attribute values in the 4NF table is not implicit in the structure of that table. If such a table is not normalized to 5NF, the burden of maintaining the logical consistency of the data within the table must be carried partly by the application responsible for insertions, deletions, and updates to it; and there is a heightened risk that the data within the table will become inconsistent. In contrast, the 5NF design excludes the possibility of such inconsistencies. A table T is in fifth normal form (5NF) or projection-join normal form (PJ/NF) if it cannot have a lossless decomposition into any number of smaller tables. The case where all the smaller tables after the decomposition have the same candidate key as the table T is excluded.

See also

References

  1. Analysis of normal forms for anchor-tables
  2. S. Krishna (1991). Introduction to Data Base and Knowledge Base Systems. World Scientific. ISBN 9810206208. The fifth normal form was introduced by Fagin

Further reading

de:Normalisierung (Datenbank)#Fünfte Normalform (5NF)