Functional Dependencies
Notation
In Database theory: [math]\{A,B\}\to \{C,D\}[/math] is abbreviated to: [math]AB\to CD[/math]
Simple hierarchical Example
9 single entries for 2022 sorted by social media followers
SPARQL query
result as of 2022-06-13
A | B | C | D | E | F | G | H | I | J |
---|---|---|---|---|---|---|---|---|---|
single | language | collectionId | collection | collectionType | performerId | performer | followerCount | youtubeVideoId | publicationYear |
Let Somebody Go | English | http://www.wikidata.org/entity/Q107597380 | Music of the Spheres | album | http://www.wikidata.org/entity/Q83287 | Selena Gomez | 65629044 | EptPhiK_q0E | 2022 |
As It Was | English | http://www.wikidata.org/entity/Q111343757 | Harry's House | album | http://www.wikidata.org/entity/Q3626966 | Harry Styles | 36156370 | H5v3kku4y6Q | 2022 |
Let Somebody Go | English | http://www.wikidata.org/entity/Q107597380 | Music of the Spheres | album | http://www.wikidata.org/entity/Q45188 | Coldplay | 23541189 | EptPhiK_q0E | 2022 |
Bam Bam | English | http://www.wikidata.org/entity/Q56071495 | Camila Cabello singles discography | singles discography | http://www.wikidata.org/entity/Q47447 | Ed Sheeran | 17661915 | -8VfKZCOo_I | 2022 |
Bam Bam | English | http://www.wikidata.org/entity/Q111083420 | Familia | album | http://www.wikidata.org/entity/Q47447 | Ed Sheeran | 17661915 | -8VfKZCOo_I | 2022 |
Sacrifice | English | http://www.wikidata.org/entity/Q110400486 | Dawn FM | album | http://www.wikidata.org/entity/Q2121062 | The Weeknd | 15549868 | VafTMsrnSTU | 2022 |
Out of Time | English | http://www.wikidata.org/entity/Q110400486 | Dawn FM | album | http://www.wikidata.org/entity/Q2121062 | The Weeknd | 15549868 | 2fDzCWNS3ig | 2022 |
Bam Bam | English | http://www.wikidata.org/entity/Q56071495 | Camila Cabello singles discography | singles discography | http://www.wikidata.org/entity/Q18810940 | Camila Cabello | 12717885 | -8VfKZCOo_I | 2022 |
Bam Bam | English | http://www.wikidata.org/entity/Q111083420 | Familia | album | http://www.wikidata.org/entity/Q18810940 | Camila Cabello | 12717885 | -8VfKZCOo_I | 2022 |
Functional Dependencies
The functional dependencies in this example are derived from the structure of the SPARQL query
- A≡single
- B≡language
- C≡collectionId
- D≡collection
- E≡collectionType
- F≡performerId
- G≡performer
- H≡followerCount
- I≡youtubeVideoId
- J≡publicationYear
- [math]A\to BCFIJ[/math]
- [math]C\to DE[/math]
- [math]F\to GH[/math]
[math]R={{\{A,B,C,D,E,F,G,H,I,J\}},{A\to BCFIJ,C\to DE,F\to GH}}[/math]
functional dependencies diagram
The following diagram shows the functional dependencies of this example. The nodes represent the attributes and the edges the dependencies.
Attribute Closures
The attribute closure is the set of all attributes that are "reachable/determinable" from a given attribute node. In this example the results are quite obvious since the functional dependencies form a tree and the reachable nodes are all nodes of the subtree of a given attribute.
attribute | closure |
---|---|
A | ABCDEFGHIJ |
B | B |
C | CDE |
D | D |
E | E |
F | FGH |
G | G |
H | H |
I | I |
J | J |
Keys for the example
There is only a single candidate key: A≡single
It is the only attribute that has the full set of attributes as it's closure. A≡single is also a super key and primary key
With a slightly modified query we'll see that wikidata uses a surrogate key as the primary key for all items. Querying Wikidata's graph data more often than not leads to results that need to be normalized to be useful in the context of relational databases.
Extending the above query with a column for the wikidata Id of the single shows how the surrogate key is related to the single entries.
9 single entries for 2022 sorted by social media followers with wikidata Id link
query
result
Normal forms
1st Normal Form - 1NF
A relation is in first normal form if and only if no attribute domain has relations as elements Wikipedia Article on First normal form
- attribute domain means the type of attribute (think of attribute a cell of a table in spreadsheet and the domain as the type you selected for the cells of a column)
- the type not having releations is sometime also describe as "atomic" meaning having a base type such a string, float, integer and not tuples
- an example for a cell having a relations a elements would be having LAT and LON of a geocoordinate in the same cell
- if you have lists of values or even full tables in a cell than the relation is surely not in 1st normal Form 1NF
- if a single cell in a column violates 1NF the whole columns type is violating 1NF which makes the whole table violate 1NF
- in modern systems where you want to be lenient with user input its often hard to enforce 1NF - database designers might try to "cutting corners" around this by treating the type
as string event if it contains relations data such as a full list of comma separated elements, a json or xml encoded or even uuencoded binary string.
The example above is in first normal form the types of the columns are:
2nd Normal Form
3rd Normal Form
Boyce Codd Normal Form
Synthesis
Links
Wikipedia
The following graph shows some relevant wikipedia articles. If the nodes are not clickable in your browser you might want to open the svg image in a separate tab to get a clickable version of this graph.
- https://en.wikipedia.org/wiki/Category:Database_theory
- https://en.wikipedia.org/wiki/Functional_dependency
- https://en.wikipedia.org/wiki/Database_normalization
- https://en.wikipedia.org/wiki/First_normal_form
- https://en.wikipedia.org/wiki/Second_normal_form
- https://en.wikipedia.org/wiki/Third_normal_form
- https://en.wikipedia.org/wiki/Boyce%E2%80%93Codd_normal_form
- https://en.wikipedia.org/wiki/Lossless_join_decomposition
- https://en.wikipedia.org/wiki/Candidate_key
- https://en.wikipedia.org/wiki/Surrogate_key
- https://en.wikipedia.org/wiki/Superkey