Data Mining Algorithms

A tantárgy neve magyarul / Name of the subject in Hungarian: Adatbányászati algoritmusok

Last updated: 2017. január 27.

Budapest University of Technology and Economics
Faculty of Electrical Engineering and Informatics
Course ID Semester Assessment Credit Tantárgyfélév
VISZD308   4/0/0/v 5  
3. Course coordinator and department Dr. Katona Gyula,
Web page of the course http://www.cs.bme.hu/adatalg
4. Instructors
Bálint Daróczy HAS Institute for Computer Science and Control
Dr Gyula Katona Department of Computer Science and Information Theory
5. Required knowledge

- linear algebra

- basic programming techniques in any programming language 

7. Objectives, learning outcomes and obtained knowledge
- Introduction and important assets of data mining and data science
- Practical application of data science through important tools
8. Synopsis

- Linear and polynomial, one and multidimensional regression and optimization: gradient descent and least squares

- Supervised learning (classification): nearest neighbour methods, decision trees, logistic regression, non-linear classification, neural networks, support vector networks, timeseries classification and dynamic time warping

- Advanced classification methods: semi-supervised learning, multi-class classification, multi-task learning, ensemble methods: bagging, boosting, stacking, ensemble of classifiers by Dietterich

- Evaluation of classifiers: cross-validation, bias-variance trade-off

- Clustering: k-means (k-medoid, FurthestFirst), hierarchical clustering, Kleinberg's impossibility theorem, internal and external evaluation, convergence speed

- Principal component analysis, low-rank approximation, collaborative filtering and applications (recommender systems, drug-target prediction)

- Density estimation and anomaly detection

- Frequent itemset mining

- Biomedical data processing (next-generation sequencing, gene expression, biomedical timeseries) and mining

- Additional applications and problems: preprocessing, scaling, overfitting, hyperparameter optimization, imbalanced classification

- Tools: Octave/Matlab, Python, R, Hadoop

9. Method of instruction 2x2 hour lectures/week
10. Assessment

- during the semester: 5 homeworks

- final: oral exam 

12. Consultations In office hours or by appointment.
13. References, textbooks and resources

Pang-Ning Tan, Michael Steinbach, Vipin Kumar:

Introduction to Data Mining 

http://www-users.cs.umn.edu/~kumar/dmbook/index.php

 

Bodon Ferenc, Buza Krisztián: Adatbányászat, elektronikus jegyzet

http://www.cs.bme.hu/~buza/pdfs/adatbanyaszat-cover.pdf

14. Required learning hours and assignment
Kontakt óra56
Félévközi készülés órákra14
Felkészülés zárthelyire25
Házi feladat elkészítése25
Kijelölt írásos tananyag elsajátítása0
Vizsgafelkészülés30
Összesen150
15. Syllabus prepared by

Dr. Buza Krisztián tudományos munkatárs, MTA-TTK

 Bálint Daróczy  HAS Institute for Computer Science and Control