BME VIK - Nagyméretű adathalmazok kezelése

A tantárgy neve magyarul / Name of the subject in Hungarian: Nagyméretű adathalmazok kezelése

3. Course coordinator and department Dr. Katona Gyula,

Web page of the course cs.bme.hu/nagyadat

4. Instructors

Dr. Gyula Katona	associate professor	Department of Computer Science and Information Theory
Bálint Daróczy	PhD	MTA SZTAKI

5. Required knowledge Database theory, graph theory, basic algorithmic techniques

6. Pre-requisites

Kötelező:

NEM ( TárgyEredmény( "BMEVISZM144" , "jegy" , _ ) >= 2
VAGY
TárgyEredmény("BMEVISZM144", "FELVETEL", AktualisFelev()) > 0)

A fenti forma a Neptun sajátja, ezen technikai okokból nem változtattunk.

A kötelező előtanulmányi rend az adott szak honlapján és képzési programjában található.

7. Objectives, learning outcomes and obtained knowledge Overview of special theoretical and practical problems arising in the course aims for large data sets. Students are given an insight into the topic of modern trends, data mining, relational databases, large graphs, data streams theoretical and practical questions.

8. Synopsis

1. Machine learning basic tasks first, discriminating and generative models, attribute types,

2. Nearest neighbor search: normalization, distance.

3. Decision trees: wood building models (C4.5, regression trees), the purity levels, cuts,

4. Early- and post-pruning, management of continuous variables.

5. Naive Bayes: Managing continuous variables, m-Estimate.

6. Perceptron: activation function, stochastic gradient.

7. Clustering: mid-point (k-Means, bisecting k-Means)

8. Density-based methods (DBSC, OPTICS), hierarchical clustering (linkage).

9. Recommendation Systems: collaborative filtering (matrix factorization, nearest neighbor methods), content-based recommendation.

10 Searching: index building, ranking (TF-IDF, BM25, PageRank)

11. Support vector machines (SVM): maximal margin, kernel functions

12. Principal Component Analysis (PCA)

13 Artificial Neural Networks (ANN): Unsupervised (Restricted Boltzmann Machines)

14 Artificial Neural Networks (ANN): Supervised (Multilayer Percetpron) case.

9. Method of instruction Lectures and computer aided practice problems.

10. Assessment

Signature:

2 midterms, both must be at >=40%, optional homework, extra points added to midterm results

Final:

The grade is based on the midterm results, can be improved at oral exam.

13. References, textbooks and resources Tan-Steinbach-Kumar: Introduction to Data Mining, Pearson Educacion; 2nd Revised edition edition (2013)

14. Required learning hours and assignment

In class	42
Preparation for classes	28
Preparation for midterms	20
Homework	15
Reading assignment
Preparation for final	15
Total	120

Comments

Dr. Gyula Katona	associate professor	Department of Computer Science and Information Theory
Bálint Daróczy	PhD	MTA SZTAKI