Summary
Automatic Speech Recognition (ASR) is considered to represent the most natural man-machine interface across the spectrum of technological space. Current commercial ASR systems rely on a ‘rich’ representation of an acoustic signal for words and their variants, resulting in major challenges in the deployment of ASR systems in areas where it could have substantial social impact. Our central goal is to translate research results from the ERC funded project MORPHON into a novel ASR system to remove such barriers. We have previously demonstrated that the use of a universal set of phonological features delivers an isolated word recognition system (FlexSR) with enhanced phoneme recognition accuracy. It is more robust under conditions of non-standard speech, dialect variation and can be easily adapted to new languages. These aspects are problematic for current ASR systems which rely on the probabilistic sequencing of whole words in their language model (LM) based on large written text corpora for training. Obtaining sufficient training data for a new LM is prohibitively expensive. Instead, MorSR will incorporate linguistic information about word-structure to reject improbable words. This reduces the search space and increases the probability of identifying correct words. A major outcome will be an innovative LM based on linguistic principles. Unlike existing approaches, it is based on speech data to capture crucial regularities that are lost in text corpora. Combined with FlexSR's key strengths in identifying subtle phonological contrasts, MorSR will not only enable improved predictions of word sequences in running speech, but also dramatically reduce the requirement for training data when adapting the system to a new language. MorSR's strengths include: (a) prediction of fine-grained possibilities of word sequences based on grammatical principles; (b) requiring considerably less training data; (c) easily adaptable to new languages; and (d) will be fast, secure and accurate.
Unfold all
/
Fold all
More information & hyperlinks
Web resources: | https://cordis.europa.eu/project/id/838058 |
Start date: | 01-03-2019 |
End date: | 31-08-2020 |
Total budget - Public funding: | 149 919,00 Euro - 149 919,00 Euro |
Cordis data
Original description
Automatic Speech Recognition (ASR) is considered to represent the most natural man-machine interface across the spectrum of technological space. Current commercial ASR systems rely on a ‘rich’ representation of an acoustic signal for words and their variants, resulting in major challenges in the deployment of ASR systems in areas where it could have substantial social impact. Our central goal is to translate research results from the ERC funded project MORPHON into a novel ASR system to remove such barriers. We have previously demonstrated that the use of a universal set of phonological features delivers an isolated word recognition system (FlexSR) with enhanced phoneme recognition accuracy. It is more robust under conditions of non-standard speech, dialect variation and can be easily adapted to new languages. These aspects are problematic for current ASR systems which rely on the probabilistic sequencing of whole words in their language model (LM) based on large written text corpora for training. Obtaining sufficient training data for a new LM is prohibitively expensive. Instead, MorSR will incorporate linguistic information about word-structure to reject improbable words. This reduces the search space and increases the probability of identifying correct words. A major outcome will be an innovative LM based on linguistic principles. Unlike existing approaches, it is based on speech data to capture crucial regularities that are lost in text corpora. Combined with FlexSR's key strengths in identifying subtle phonological contrasts, MorSR will not only enable improved predictions of word sequences in running speech, but also dramatically reduce the requirement for training data when adapting the system to a new language. MorSR's strengths include: (a) prediction of fine-grained possibilities of word sequences based on grammatical principles; (b) requiring considerably less training data; (c) easily adaptable to new languages; and (d) will be fast, secure and accurate.Status
CLOSEDCall topic
ERC-2018-PoCUpdate Date
27-04-2024
Images
No images available.
Geographical location(s)