Data Mining:
Statistical Modeling and Learning from Data
Monday General Concepts | Tuesday Linear Models | Wednesday Unsupervised ML Non-linear Models | Thursday SVM and VC theory | Friday Evaluation | |
9:30-10:30 | Theory | Theory | Theory | Theory | Individual Evaluation |
10:30-10:45 | Break | ||||
10:45-11:45 | Practice | Practice | Practice | Practice | Individual Evaluation |
11:45-13:30 | Lunch Break | ||||
13:30-14:30 | Theory | Theory | Theory | Theory | Group Project |
14:30-15:30 | Theory | Practice | Theory | Theory | |
15:45-16:45 | Practice | Practice | Practice | Group Project Presentation |
- Yaser S. Abu-Mostafa, Malik Magdon-Ismail, Hsuan-Tien Lin, "Learning from Data", AMLBook 2012
- David J. Hand, Heikki Mannila, Padhraic Smyth, "Principles of Data Mining", MIT Press 2011
One part of the final evaluation will be made through a group project with oral presentation of the results. The project will involve the submission of a solution to a Kaggle class competition. If you are not familiar to how Kaggle works, we strongly recommend you to try and make a submission to one of the competitions.
General concepts of machine learning (learning problem, approximation-generalization, learning curve…)
Linear models ( linear regression, logistic regression, Lasso)
Non-linear models (SVM, naive Bayes, decision tree, neural networks)
Unsupervised ML (SVD, NMF, k-means, text analysis)
The course aims to provide basic skills for analysis and statistical modeling of data, with special attention to machine learning both supervised and unsupervised. An important objective of the course is the operational knowledge of the techniques and algorithms treated, and for this aim the lectures will focus on both theoretical and practical aspects of machine learning, and for the practical part it is required to have a good knowledge of programming, preferentially in Python language. The expected outcomes include (1) understanding the theoretical foundations of machine learning and (2) ability to use some Python libraries for machine learning in the context of simple applications.
Topics will include:
Overview of the theoretical aspects of machine learning will be followed by the application of algorithms in real problems such as: image classification, text mining, spam detection… The exercises will be implemented with the help of an interactive Python environment, with the use of standard tools for data analysis and visualization, such as the Scientific Python stack, ScikitLearn, Pandas and NLTK.