BME VIK - Média- és szövegbányászat

vissza a tantárgylistához nyomtatható verzió

Media and Text Mining

A tantárgy neve magyarul / Name of the subject in Hungarian: Média- és szövegbányászat

Last updated: 2017. június 21.

Budapest University of Technology and Economics
Faculty of Electrical Engineering and Informatics

Business Information Systems
Specialization Analytical Business Intelligence
MSc program

Course ID	Semester	Assessment	Credit	Tantárgyfélév
VITMM275		3/0/1/v	5

3. Course coordinator and department Dr. Szűcs Gábor,

4. Instructors

Name:	Position:	Department:
Gábor Szűcs PhD	associate professor	TMIT

5. Required knowledge

Calculus, Algebra, Probability theory

6. Pre-requisites

Ajánlott:

Theres is no obligatory pre-required subject.

But we recommend Data Mining Techniques (BMEVISZM185) before this subject.

7. Objectives, learning outcomes and obtained knowledge

The course is concerned with introducing the students to the identification, assessment and analysis of the intelligent information search systems and multimedia retrieval systems. It also focuses on content handling techniques, where contents may either be text or media, or both.

8. Synopsis

Problems in Analytical Business Intelligence by multinational companies. Metadata systems and standards: DC, RDF, MPEG-7.
Typical task types in Media and Text Mining. Search, classification, clustering, forecasting and their combinations.
Methods for media and text analysis, search techniques, indexing, ranking procedures. Bag of words model.
Searching on he Web, Web Mining. PageRank, webgraph methods, HITS, Boole-search, weighting schemes (tf-idf, etc.), cosines distance.
Dimension reduction methods, feature extraction and feature selection techniques, chi-square, eigenvalue based methods, independent component analysis (ICA).
Classification of pictures, videos. Discretization. Types and methods of media classification. Support vector machine for media classification.
Text analysis. Stemming algorithms, Porter stemmer, Lovins stemmer. Language detection, language dependency. Shallow and deep parsing. POS tagging. Syntax tree parsers, dependency graph parser. Stanford tools.
Text classification. Types and methods of text classification. Gini index. C4.5, C5.0, Random Forest. Automatic text processing at enterprises.
Text and media clustering. Various distance measure. Agglomerative and divisive clustering. Hierarchical clustering (bottom-up and top-down), k-means clustering, density-based clustering.
Relation extraction from text. Co-occurrence, pattern-matching and supervised learning methods. Convolution kernels with SVM in relation extraction. Gathering business news, information extraction from the news.
Hierarchical taxonomy systems, Catalogue search, thesaurus. Folksonomy, methods for multiusers. Concept mining. Annotation. Sentiment analysis.
Context-Based Image Retrieval. Line detection, skeletonization. Image and time series in multimedia.
Media-indexing. Probability models in video and audio searches. Applications of Hidden Markov Models.
Developing media retrieval and search systems in enterprises. Marketing applications, online media applications.

Laboratory:

The tasks should be solved by data mining and text mining softwares (e.g. SAS software modules)
Searching techniques in a predefined corpus.
Media classification exercises.
Picture clustering exercises.
Text analysis.
Text categorization.
Context-Based Image Retrieval with a large set of pictures.
Lift diagram analysis

9. Method of instruction

lecture and laboratory

10. Assessment

a. In the class period there is an in-class test (ZH).

b. In the examination period: a homework should be written and this work should be defended at the examination (oral). Another part of the examination is written.

c. Condition for the signature is the pass mark of ZH test (40% above). There is a possibility to rewrite the in-class test (ZH). In the rectification period (repeat period) there is another (final) possibility to rewrite the in-class test (ZH).

d. Another condition for the signature is at least 5 attendances the laboratory exercises.

11. Recaps

There is one possibility to repeat the test in the teaching period and there is a final one in the official recap period. There is no possibility to make up for the missed laboratory exercises. Condition for the signature is the pass of one of the tests and at least 5 successful laboratory exercises.

12. Consultations

Consultation with the lecturers of the subject is possible at pre-arranged time.

13. References, textbooks and resources

Blanken, de Vries, Blok, Fres (eds): Multimedia Retrieval. Springer, 2007.
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze: Introduction to Information Retrieval. Cambridge University Press, 2008
Ronen Feldman, James Sanger: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data, Cambridge University Press, 2007

14. Required learning hours and assignment

Lessons	56
Preparation for lessons (10 for lectures and 8 for laboratories)	18
Preparation for test	20
Home work	16
Learning of prepared matters	0
Preparation for exam	40
Total	150

15. Syllabus prepared by

Name: Position: Department:
Zsolt T. Kardkovács PhD assistant professor TMIT
Gábor Szűcs PhD associate professor TMIT
Domonkos Tikk PhD senior research fellow TMIT

Budapest University of Technology and Economics, Faculty of Electrical Engineering and Informatics

Média- és szövegbányászat