Impresszum | Copyright © 2025 Budapesti Műszaki és Gazdaságtudományi EgyetemBME
Thomas Hain előadása
2025. október 15.
Szeretettel várjuk Prof. Thomas Hain előadására / Your are welcome at the talk of Prof Thomas Hain
2025.10.15. 15.00-17.00 BME I. épület I.B.023 / BME I Building I.B.023 1117. Budapest, Magyar tudósok krt. 2.
Prof. Thomas Hain Bio
Professor Thomas Hain received a Dip.-Ing in electrical and communication engineering from the University of Technology, Vienna in 1993 and a Ph.D. in information engineering from Cambridge University in 2002.
From 1994 to 1997 he worked at Philips Speech Processing in Vienna, as a Senior Technologist before working as a Research Associate alongside PH.D. study. In 2001 he became a Lecturer at Cambridge University, moving to The University of Sheffield as a member of the Speech and Hearing Research Group in 2004. After a series of intermediary promotions he was appointed as Full Professor in 2013. In 2016 he became the Head of the Speech and Hearing Research Group, a member of the Machine Learning Research Group, and from 2017 Visiting Professor at the Nagoya Institute of Technology. Shortly after in 2018 he formed Voicebase Centre for Speech and Language Technology which he leads. Since 2019 he is also the Director of the UKRI Centre for Doctoral Training in Speech and Language Technologies and Their Applications. Prof Hain’s research has received funding from many sources nationally and internationally with a total volume of more than £15M. He has also co-founded a number of companies working in speech processing and machine learning. He is the author of more than 300 publications on machine learning and speech recognition topics. His current research interests include natural language processing, speech, audio and multimedia technology, machine learning and complex system optimisation and design.
Professor Hain was the Associate Editor of ACM Transactions on Speech and Language Processing and Area Chair for Interspeech in 2019. He served on the IEEE Speech and Language Technical Committee from 2012-2014 and 2016-2019, and on the ISCA Technical Committee 2018-2022. He is currently a member of the editorial board of Computer Speech and Language. In 2021 he was awarded Fellow of the International Speech Association.
Lecture abstracts Selecting Data for Semi-Supervised ASR
Training of ASR models has long followed the path of multi-style training, more data is better data. Labelled data is still hard to come by, hence semi-supervised training is often used. The amounts of unlabelled data however are vast - and the question of data selection may be important again. In this talk we briefly review standard strategies for semi-supervised training and data selection. We then move on to present recent work on data selection using new methods for word error rate estimation and present results on ASR training.
Self-supervised Models for Robust Speech Content Representations
Self-supervised models have revolutionised language and speech processing. The fact that unlabelled data can be used to inform and bootstrap models for a vast range of tasks. However, even though most models are trained on large amounts of data domain generalisation can be poor. Model training is very typically very costly and fine-tuning a task may not lead to good results because of domain mismatches. In this presentation some properties of SSL derived methods were explored, leading to novel ways to tune models to a domain and content-oriented task types. Instead of using model specific loss functions generic alignment loss allows for fast tuning with much lower computational cost.

