Data Science - Part 2

A tantárgy neve magyarul / Name of the subject in Hungarian: Adatbányászat - 2

Last updated: 2010. november 10.

Budapest University of Technology and Economics
Faculty of Electrical Engineering and Informatics

Mérnök informatikus szak

BSc képzés

Course ID Semester Assessment Credit Tantárgyfélév
VISZA084   0/0/2/f 2  
3. Course coordinator and department Dr. Katona Gyula,
Web page of the course www.cs.bme.hu/....
4. Instructors
Name

 

Position

 

Department

 

András A. BENCZÚR

 

Lecturer

 

Department of Computer Science and Information Theory

 

András LUKÁCS

 

Lecturer

 

Department of Computer Science and Information Theory

 

Gyula Y. KATONA

 

Assoc. professor

 

Department of Computer Science and Information Theory

 

Gábor WIENER

 

Assoc. professor

 

Department of Computer Science and Information Theory

 

5. Required knowledge

The course requires basic knowledge in data mining. (See also the course Data Mining: Models and Algorithms) Background in probability theory and linear algebra is important. Knowledge in combinatorics and algorithms is an advantage.

May be studied in the same semester as „Data mining - Part 1”

7. Objectives, learning outcomes and obtained knowledge

The aim of the course is to discuss advanced techniques of data mining with useful knowledge of related disciplines supporting real-world, especially bioinformatics data mining projects. By the end of the course, students will be able to analyze biological (genomic, microarray, pathway, protein, chemical) data sets using complex data mining methods.

8. Synopsis

1.  Advanced classification methods: Bagging, boosting, AdaBoost.
2.  Random forest. Implementation of models by WEKA.
3.  Support Vector Machine. Kernel methods, graph kernels. Protein function prediction.
4.  Similarity measures, fingerprint based similarity search. Sketches.
5.  Dimensionality reduction by spectral methods, singular value decomposition, low-rank                             approximation.
6.  Spectral clustering, bi-clustering for microarrays.
7.  Mixture models. Maximum likelihood estimators, EM-algorithm.
8.  Gauss Mixture Models. Midterm test.
9.  Search engines, web information retrieval, PageRank and beyond.
10. Rank learning for Protein Structure Prediction.
11. Text mining, natural language processing. Building databases and networks from PubMed and BioMed Central.
12. Graph mining algorithms. Frequent subgraph mining in microarray-based co-expression networks.
13. Semi-supervised classification of network data, graph stacking in biological networks.
14. Feature selection methods for unbalanced data sets. Final test.

 

9. Method of instruction

Handouts, PowerPoint presentations, relevant research papers, web page, course mailing list and Wiki. Weekly regular office hour for consultations.

10. Assessment

Case study: A practical problem. Choosing the model, solution method. Implementation of the algorithm.        

Grading principles:                    

Model:                               40% 

Solution method:                40%

Implementation:                  20%

 

12. Consultations You can reach the instructor at the following e-mail address for consultation:

 

András A. Benczúr : benczur@ilab.sztaki.hu

 

13. References, textbooks and resources

 

14. Required learning hours and assignment
Number of contact hours28
Preparation to the classes12
Preparation to the tests
Homework20
Assigned reading
Preparation to the exam
Total60
15. Syllabus prepared by
Name

 

Position

 

Department

 

András A. Benczúr

 

Lecturer

 

Department of Computer Science and Information Theory

 

Comments

May be studied in the same semester as „Data mining - Part 1”