In Plzeň computers are taught to understand human speech. In this process they use CESNET services

We recently celebrated 30 years of the CESNET network. Reliable fast connectivity is essential for professional workplaces in an academic environment. But they also need tools for collaboration and communication, data storage, easy identification and access, and computationally intensive computing to study and research. CESNET can offer them all this on its network.

One of the active users of these services are scientists and students of the Faculty of Applied Sciences of the University of West Bohemia in Plzeň. In the Artificial Intelligence Division of the Department of Cybernetics, they are engaged in research on machine learning methods for intelligent decision making, classification and prediction. They focus on human-computer voice communication in natural language, computer vision, as well as applications of the research methods in other fields.

“The Department of Cybernetics is very wide. Within the Artificial Intelligence department we work on speech technologies, we have a team that focuses on computer vision, and we also work on general artificial intelligence and machine learning. At the department we have colleagues who do robotics and control theory, but also experts in biocybernetics,” says Jan Švec, a scientist in the field of artificial intelligence.

Without MetaCenter it would not be possible
For computing neural networks, training speech models and many other tasks, the department needs a lot of computing power. Suitable conditions are provided by the MetaCentre environment operated by CESNET. “We provide interconnected computing and data resources for solving very demanding tasks that typically exceed the capabilities of stand-alone academic departments,” says Jan Hoidekr from CESNET’s Distributed Computing Department.

The Department of Cybernetics in Plzeň has been using MetaCenter services since 2007, when they purchased their first cluster and integrated it into the common infrastructure. They praise the cooperation. “The integration of computing and data resources into one cluster gives us great flexibility. For our purposes we only need MetaCenter’s resources for part of the year, so it would not be worth buying and running everything ourselves. We see this as a significant advantage of sharing these capacities,” says Jan Švec.

Optimisation across the CESNET network is one of the main purposes of MetaCenter. Another great advantage is a unified working environment with easy user management and secure access to services. Interested parties sign in with one user account. “We can add new users and students very easily, manage them and create shared project spaces for collaboration,” adds Jan Švec.
According to Jan Hoidekr, the collaboration with the Department of Cybernetics has been an excellent experience, very useful for both sides: “We are confident that such partnerships will find application across the academic community, regardless of the specific discipline.”

Subtitles for Czech Television
The Department of Cybernetics uses the MetaCentre for tasks in the field of speech technology, speech recognition, speech synthesis or processing and understanding of communication content. These technologies have been used, for example, in the pilot research phase of the Czech Television live subtitling project or in the Holocaust Audiovisual Archive project. The development of applications for both projects was preceded by extensive research and calculations.

First, a huge amount of source data had to be processed using machine learning methods to help the models learn to understand human speech. Jan Lehečka, a researcher, describes how such learning takes place: “We train the model using audio recordings that it listens to and processes. It is able to absorb a huge amount of information in a very short time. In the next phase, we teach the model how to solve the problem. We show it exactly what we want it to do, which it easily learns from a small amount of data as it already has a deep knowledge of the language acquired in the first phase.”

These are computationally demanding tasks with large amounts of data. Training is therefore only possible thanks to the supercomputers that are part of the MetaCenter. Just to give you an idea: one dataset is tens of terabytes. “Our particular model was trained for about two weeks on four NVIDIA A100 graphics cards,” adds Jan Lehečka.

After the model is trained, the application is prepared for final users. In the case of live subtitling, the findings from the pilot research phase of the project have been successfully applied to the commercial sphere and SpeechTech, s.r.o. is currently developing and operating an automatic subtitling service for Czech Television programmes on its own infrastructure.

The system recognises spoken speech and transcribes it into text in real time. The goal is accurate transcription, but also precise timing so that the text appears on the screen when it is spoken. The problem arises in programmes where there is background noise or where multiple people are talking at the same time. These shows are then difficult to automatically transcribe correctly. Typically, these are talk shows and sports broadcasts. This is where the so-called shadow speakers come in, who coax the audio track so that it is intelligible to the machine, which then transcribes it without any problems.

Oral history projects
Speech recognition and indexing technology was also used in a major oral history project. It is an audiovisual archive of the Holocaust that captures survivor testimonies recorded by the USC Shoah Foundation. “The main task was speech recognition and transcription in a specific situation and environment. People here often speak in affect, with emotion, English is usually not their first language, they have an accent… So from our point of view it is a challenging task,” explains Aleš Pražák, a speech recognition specialist.

Accuracy in transcribing proper names and places is important, as these can then be used to search the archive. If the names were misspelled, the output from the archive would be distorted. Valuable testimonies stored in a comprehensive and searchable archive are used by researchers and students to develop specific research topics. Through the testimonies of individual people, they piece together stories and link information.

Following this project, researchers from the Department of Cybernetics were approached by the Institute for the Study of Totalitarian Regimes (ISTR), which has its own oral archive. For each interview the Institute has sets of scanned documents that complement the recorded statements: investigation files, travel documents, protocols…

“We have therefore linked speech processing and image processing. We have transcribed the scans into text, which enables searching and processing of information,” says Jan Švec. Cooperation with colleagues from the ÚSTR goes beyond the oral history archives. Last year an archive of scanned NKVD/KGB documents obtained from Ukrainian archives held on Czechoslovak citizens was completed.

Germans envy us
At the Faculty of Applied Sciences, they are very satisfied with CESNET services. “I am impressed with the way the CESNET e-infrastructure works. Such solutions are not really common in Europe,” says Jan Švec, who continues “For example, I asked colleagues in Germany how they compute on clusters. The answer was that they had a cluster somewhere and if they wanted to use it, they had to go to their professor, he would open an account for them and then they could compute, but only at a certain time. They are surprised when we tell them that we have a national infrastructure and how it works. We’ll tell the students: Fill in the application form here. We add them to the system and in an hour they can store data and compute on powerful servers, just use all the infrastructure. Moreover, everything is shared across academia. If we have a problem, we contact user support and get it resolved right away.”

In addition to the CESNET e-infrastructure, the Faculty is also involved in other research infrastructures, such as LINDAT/CLARIAH-CZ, a network for researchers in linguistics and related fields. Here, research data is deposited in a repository in full compliance with the principles of open science promoted by the EOSC project. “Once we have a dataset worth publishing, we publish it. We also add detailed metadata. We already have this path relatively trodden and we know how to deal with it. Our experience will also be shared within the EOSC-CZ project working groups in which we are involved,” adds Jan Švec.

How to understand DNA
The Faculty of Applied Sciences at the University of West Bohemia cannot imagine how they could work without the CESNET infrastructure and its services. But nothing like that is imminent. They can then think of ways to use the extra capacity for new tasks.

“Artificial intelligence is already crossing over into other disciplines at the faculty. One of them is biocybernetics. Even this sector cannot do without large computing power and adequate storage space,” points out Jan Hoidekr from CESNET.

Lukáš Kuhajda, a research assistant at the Faculty of Applied Sciences of the University of West Bohemia, is his successor: “We use the resources available in the CESNET infrastructure mainly to train neural networks, specifically in the Regulation in DNA project. We can immediately validate the model outputs in the lab.”

This confirms that the faculty can use the technologies being developed not only for text understanding, but even for understanding DNA sequences or protein sequences. And this is just the beginning…

Last change: 11.8.2023