Very Large Databases

A tantárgy neve magyarul / Name of the subject in Hungarian: Nagyméretű adathalmazok kezelése

Last updated: 2015. december 2.

Budapest University of Technology and Economics
Faculty of Electrical Engineering and Informatics

Engineering Information Technology, MsC

Theory of Computation, minor specialization

Course ID Semester Assessment Credit Tantárgyfélév
VISZMA01 2 2/1/0/v 4  
3. Course coordinator and department Dr. Katona Gyula,
Web page of the course cs.bme.hu/nagyadat
4. Instructors

Dr. Gyula Katona

associate professor

Department of Computer Science and Information Theory

Bálint Daróczy 

PhD

MTA SZTAKI

5. Required knowledge Database theory, graph theory, basic algorithmic techniques
6. Pre-requisites
Kötelező:
NEM ( TárgyEredmény( "BMEVISZM144" , "jegy" , _ ) >= 2
VAGY
TárgyEredmény("BMEVISZM144", "FELVETEL", AktualisFelev()) > 0)

A fenti forma a Neptun sajátja, ezen technikai okokból nem változtattunk.

A kötelező előtanulmányi rend az adott szak honlapján és képzési programjában található.

7. Objectives, learning outcomes and obtained knowledge Overview of special theoretical and practical problems arising in the course aims for large data sets. Students are given an insight into the topic of modern trends, data mining, relational databases, large graphs, data streams theoretical and practical questions.
8. Synopsis
1. Machine learning basic tasks first, discriminating and generative models, attribute types,

2. Nearest neighbor search: normalization, distance.

3. Decision trees: wood building models (C4.5, regression trees), the purity levels, cuts,

4. Early- and post-pruning, management of continuous variables.

5. Naive Bayes: Managing continuous variables, m-Estimate.

6. Perceptron: activation function, stochastic gradient.

7. Clustering: mid-point (k-Means, bisecting k-Means)

8. Density-based methods (DBSC, OPTICS), hierarchical clustering (linkage).

9. Recommendation Systems: collaborative filtering (matrix factorization, nearest neighbor methods), content-based recommendation.

10 Searching: index building, ranking (TF-IDF, BM25, PageRank)

11. Support vector machines (SVM): maximal margin, kernel functions

12. Principal Component Analysis (PCA)

13 Artificial Neural Networks (ANN): Unsupervised (Restricted Boltzmann Machines)

14 Artificial Neural Networks (ANN): Supervised (Multilayer Percetpron) case.
9. Method of instruction Lectures and computer aided practice problems.
10. Assessment

Signature:

2 midterms, both must be at >=40%, optional homework, extra points added to midterm results

 

Final:

The grade is based on the midterm results, can be improved at oral exam.

13. References, textbooks and resources Tan-Steinbach-Kumar: Introduction to Data Mining, Pearson Educacion; 2nd Revised edition edition (2013)
14. Required learning hours and assignment
In class42
Preparation for classes28
Preparation for midterms20
Homework15
Reading assignment
Preparation for final15
Total120
Comments

Dr. Gyula Katona

associate professor

Department of Computer Science and Information Theory

Bálint Daróczy 

PhD

MTA SZTAKI