Databases and Data Mining

Course offered in the second semester of the M1.

This course gives a general introduction of data management through relational databases and data mining from algorithms to theoretical aspects.

Topics covered:

Databases:

  • Relational model, relational calculus, SQL
  • Relational algebra, equivalence between relational calculus and algebra, indexation and optimisation
  • Functional dependencies, axiomatisation, Armstrong’s relations
  • Inclusion dependencies, data exchange, chase, query rewriting
  • Openness to current issues

Data Mining:

  • Introduction to data mining (itemset/rule mining, enumeration algorithms)
  • constraint-based pattern mining: study of constraint properties and pruning
  • Assessment of results (statistical significance, relevancy w.r.t. background knowledge, user prior)
  • Advanced pattern mining: formal concept analysis, sequences, graphs, dynamics graphs, attributed graphs
  • Clustering: general aims, distances, similarities, algorithms (kmeans, EM, density-based clustering, spectral clustering, hierarchical clustering), graph clustering
  • Anomaly detection
  • Current issues and challenges

Teaching methods: Lectures and Lab sessions (TP).
Form(s) of Assessment: written exam (50%) and project(s) (50%)

References:

  • M. Levene, G. Loizou. A Guided Tour of Relational Databases and Beyond. Springer, 1999.
  • I S. Abiteboul, R. Hull et V. Vianu. Fondements des bases de données. Addison-Wesley, 1995.
  • I R. Ramakrishnam et J. Gehrke. Database Management Systems. 2003 (third edition).

    Material available on pages.cs.wisc.edu/~dbbook/)

  • On line course: https://cs.stanford.edu/people/widom/DB-mooc.html
  • Charu C. Aggarwal, Data Mining, Springer, 2015.
  • Mohammed J. Zaki, Wagner Meira, Jr. Data Mining and Analysis Fundamental Concepts and Algorithms. Cambridge University Press, 2014.

Expected prior knowledge:

Link:

https://perso.liris.cnrs.fr/ecoquery/dokuwiki/doku.php?id=enseignement:dbdm:start