Powered by OpenAIRE graph

Quorate Technology Limited

Quorate Technology Limited

3 Projects, page 1 of 1
  • Funder: UK Research and Innovation Project Code: EP/S001271/1
    Funder Contribution: 517,456 GBP

    Neural machine translation (NMT) has recently made major advances in translation quality and this technology has been rapidly adopted by industry leaders, such as Google and Amazon, and international organisations, such as the UN and the EU. However, high performing neural models require many millions of human translated sentences for training. For many real-world applications, there is not enough data to build useful MT systems. In this project I plan to stretch the resources and capabilities that we have, in order to develop robust MT technologies which are capable of being deployed for low-resource language pairs and for highly specialised low-resource domains. I will investigate making translation significantly more robust by using the intuition that translated (or parallel) corpora contain enormous redundancies, and are an inefficient way to learn to translate. Inspired by human learning, we will study Bayesian models which build up meaning compositionally and are able to learn to learn, thus creating models which only need a few training examples. We will also develop machine learning techniques, such as transfer learning and data augmentation, to extract knowledge from monolingual and parallel resources from other languages and domains. This proposal combines fundamental research in rapid deep learning with lower-risk data-driven machine learning research in order to deliver useful products to our industry partners. My team will provide translations for language pairs which were not previously well served by automatic machine translation. This will allow our partners, BBC World Service and BBC Monitoring, to cover under-resourced languages. Building on an existing scalable platform, created within the EU project called Scalable Understanding of Multilingual MediA (SUMMA), we can already deploy multilingual capabilities in the newsroom. The innovation fellowship will contribute to the commercialisation and sustainability of SUMMA translation components, but crucially it will allow us to cover a wider range of topical and strategic languages. Access to a high-quality translation platform for low-resource languages will help the BBC deliver impartial reporting across the world. Collaboration with our industry partner Quorate, will demonstrate the commercial potential of our research in the highly specialised domain of financial trading. In the long term, this project will have a wider impact on British industry by breaking down language barriers affecting international trade, and by significantly improving the quality and resilience of transformative AI language technologies.

    more_vert
  • Funder: UK Research and Innovation Project Code: EP/L016427/1
    Funder Contribution: 4,746,530 GBP

    Overview: We propose a Centre for Doctoral Training in Data Science. Data science is an emerging discipline that combines machine learning, databases, and other research areas in order to generate new knowledge from complex data. Interest in data science is exploding in industry and the public sector, both in the UK and internationally. Students from the Centre will be well prepared to work on tough problems involving large-scale unstructured and semistructured data, which are increasingly arising across a wide variety of application areas. Skills need: There is a significant industrial need for students who are well trained in data science. Skilled data scientists are in high demand. A report by McKinsey Global Institute cites a shortage of up to 190,000 qualified data scientists in the US; the situation in the UK is likely to be similar. A 2012 report in the Harvard Business Review concludes: "Indeed the shortage of data scientists is becoming a serious constraint in some sectors." A report on the Nature web site cited an astonishing 15,000% increase in job postings for data scientists in a single year, from 2011 to 2012. Many of our industrial partners (see letters of support) have expressed a pressing need to hire in data science. Training approach: We will train students using a rigorous and innovative four-year programme that is designed not only to train students in performing cutting-edge research but also to foster interdisciplinary interactions between students and to build students' practical expertise by interacting with a wide consortium of partners. The first year of the programme combines taught coursework and a sequence of small research projects. Taught coursework will include courses in machine learning, databases, and other research areas. Years 2-4 of the programme will consist primarily of an intensive PhD-level research project. The programme will provide students with breadth throughout the interdisciplinary scope of data science, depth in a specialist area, training in leadership and communication skills, and appreciation for practical issues in applied data science. All students will receive individual supervision from at least two members of Centre staff. The training programme will be especially characterized by opportunities for combining theory and practice, and for student-led and peer-to-peer learning.

    more_vert
  • Funder: UK Research and Innovation Project Code: EP/R012067/1
    Funder Contribution: 734,106 GBP

    Speech recognition has made major advances in the past few years. Error rates have been reduced by more than half on standard large-scale tasks such as Switchboard (conversational telephone speech), MGB (multi-genre broadcast recordings), and AMI (multiparty meetings). These research advances have quickly translated into commercial products and services: speech-based applications and assistants such as such as Apple's Siri, Amazon's Alexa, and Google voice search have become part of daily life for many people. Underpinning the improved accuracy of these systems are advances in acoustic modelling, with deep learning having had an outstanding influence on the field. However, speech recognition is still very fragile: it has been successfully deployed in specific acoustic conditions and task domains - for instance, voice search on a smart phone - and degrades severely when the conditions change. This is because speech recognition is highly vulnerable to additive noise caused by multiple acoustic sources, and to reverberation. In both cases, acoustic conditions which have essentially no effect on the accuracy of human speech recognition can have a catastrophic impact on the accuracy of a state-of-the-art automatic system. A reason for such brittleness is the lack of a strong model for acoustic robustness. Robustness is usually addressed through multi-condition training, in which the training set comprises speech examples across the many required acoustic conditions, often constructed by mixing speech with noise at different signal-to-noise ratios. For a limited set of acoustic conditions these techniques can work well, but they are inefficient and do not offer a model of multiple acoustic sources, nor do they factorise the causes of variability. For instance, the best reported speech recognition results for transcription of the AMI corpus test set using single distant microphone recordings is about 38% word error rate (for non-overlapped speech), compared to about 5% error rate for human listeners. In the past few years there have been several approaches that have tried to address these problems: explicitly learning to separate multiple sources; factorised acoustic models using auxiliary features; and learned spectral masks for multi-channel beam-forming. SpeechWave will pursue an alternative approach to robust speech recognition: The development of acoustic models which learn directly from the speech waveform. The motivation to operate directly in the waveform domain arises from the insight that redundancy in speech signals is highly likely to be a key factor in the robustness of human speech recognition. Current approaches to speech recognition separate non-adaptive signal processing components from the adaptive acoustic model, and in so doing lose the redundancy - and, typically, information such as the phase - present in the speech waveform. Waveform models are particularly exciting as they combine the previously distinct signal processing and acoustic modelling components. In SpeechWave, we shall explore novel waveform-based convolutional and recurrent networks which combine speech enhancement and recognition in a factorised way, and approaches based on kernel methods and on recent research advances in sparse signal processing and speech perception. Our research will be evaluated on standard large-scale speech corpora. In addition we shall participate in, and organise, international challenges to assess the performance of speech recognition technologies. We shall also validate our technologies in practice, in the context of the speech recognition challenges faced by our project partners BBC, Emotech, Quorate, and SRI.

    more_vert

Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.

Content report
No reports available
Funder report
No option selected
arrow_drop_down

Do you wish to download a CSV file? Note that this process may take a while.

There was an error in csv downloading. Please try again later.