Login

jiya · 08-16-2017, 08:56 PM

(01-21-2010, 07:14 PM)justlikeheaven Wrote: A speech-to-speech
translation system called IBM MASTOR that can translate spontaneous free-form
speech in real-time on both laptop and hand-held PDAs is described here.

1. INTRODUCTION
Automatic speech-to-speech (S2S) translation breaks down communication barriers between people speaking different languages. An efficient and robust S2S translation system poses a lot of challenges. dialects of Arabic. The aim of this project is to provide language support to military, medical and humanitarian personnel
during operations in foreign territories, by providing a two-way real-time speech-to-speech translation system designed for specific tasks such as medical triage and force protection.casual or colloquial speaking style of speaking is a major challenge.

Translation for under-studied languages is also a difficulty due Lack of
appropriate amount of speech data , adverse environments, lack of training data and linguistic resources for under-studied languages, and the Lack of linguistic
knowledge realization in spelling standards, transcriptions, lexicons and dictionaries, or annotated corpora.And these complicated algorithms
and programs must be made to fit into small devices for mobile users.

SYSTEM OVERVIEW
The general framework of our MASTOR system
has components of ASR, MT and TTS. Acoustic models for English and Mandarin baseline are developed for large-vocabulary continuous speech and trained on over 200 hours of speech collected from about 2000 speakers for each language. grapheme,
phonetic, and context-sensitive grapheme are the approaches used for pronunciation and acoustic modeling.

AUTOMATIC SPEECH RECOGNITION
Acoustic Models:
Here , the pronunciation dictionary greatly influence the ASR performance. a major challenge when changing the language is the creation of the appropriate pronunciation dictionary. One approach to overcome the absence of short vowels is to use
grapheme based acoustic models.a full phonetic approach which uses short vowels, and the second uses context-sensitive graphemes for the letter "A" were used for overcoming the fact that same grapheme may lead to different phonetic sounds depending on its context. Using phoneme based pronunciations would require vowelization of every word.section Due to the difficulties in
accurately vowelizing dialectal words, this approach of ASR hasn't made much progress.
Speech recognition for both the laptop and hand-held systems is
based on the IBM ViaVoice engine. This highly efficient and robust framework uses rank based acoustic scores which are derived from tree-clustered context dependent Gaussian models.n-gram LM probabilities and acoustic scores are used to implement a stack based search algorithm to yield the most probable word sequence given the input speech.

Language Modeling
Language modeling (LM) of the probability of various word sequences
is crucial for high-performance ASR . The approach used here is to build statistical tri-gram LMs fall into three categories: 1) obtaining additional training material automatically; 2) interpolating domain-specific LMs with other LMs; 3) improving distribution estimation robustness and accuracy with limited in-domain resources. the English language model has two components that are linearly interpolated. The first component built using in-domain data and the second one acting as a background model.

SPEECH TRANSLATION
NLU/NLG-based Speech Translation:

statistical translation method based on natural language understanding (NLU) and natural language generation(NLG) has been used. Statistical machine translation methods translate a sentence W in the source language into a sentence A in the target language by using a statistical model that estimates the probability.

full report download:

rashmi · 08-16-2017, 08:56 PM

A speech-to-speech
translation system called IBM MASTOR that can translate spontaneous free-form
speech in real-time on both laptop and hand-held PDAs is described here.

1. INTRODUCTION
Automatic speech-to-speech (S2S) translation breaks down communication barriers between people speaking different languages. An efficient and robust S2S translation system poses a lot of challenges. dialects of Arabic. The aim of this project is to provide language support to military, medical and humanitarian personnel
during operations in foreign territories, by providing a two-way real-time speech-to-speech translation system designed for specific tasks such as medical triage and force protection.casual or colloquial speaking style of speaking is a major challenge.

Translation for under-studied languages is also a difficulty due Lack of
appropriate amount of speech data , adverse environments, lack of training data and linguistic resources for under-studied languages, and the Lack of linguistic
knowledge realization in spelling standards, transcriptions, lexicons and dictionaries, or annotated corpora.And these complicated algorithms
and programs must be made to fit into small devices for mobile users.

SYSTEM OVERVIEW
The general framework of our MASTOR system
has components of ASR, MT and TTS. Acoustic models for English and Mandarin baseline are developed for large-vocabulary continuous speech and trained on over 200 hours of speech collected from about 2000 speakers for each language. grapheme,
phonetic, and context-sensitive grapheme are the approaches used for pronunciation and acoustic modeling.

AUTOMATIC SPEECH RECOGNITION
Acoustic Models:
Here , the pronunciation dictionary greatly influence the ASR performance. a major challenge when changing the language is the creation of the appropriate pronunciation dictionary. One approach to overcome the absence of short vowels is to use
grapheme based acoustic models.a full phonetic approach which uses short vowels, and the second uses context-sensitive graphemes for the letter "A" were used for overcoming the fact that same grapheme may lead to different phonetic sounds depending on its context. Using phoneme based pronunciations would require vowelization of every word.section Due to the difficulties in
accurately vowelizing dialectal words, this approach of ASR hasn't made much progress.
Speech recognition for both the laptop and hand-held systems is
based on the IBM ViaVoice engine. This highly efficient and robust framework uses rank based acoustic scores which are derived from tree-clustered context dependent Gaussian models.n-gram LM probabilities and acoustic scores are used to implement a stack based search algorithm to yield the most probable word sequence given the input speech.

Language Modeling
Language modeling (LM) of the probability of various word sequences
is crucial for high-performance ASR . The approach used here is to build statistical tri-gram LMs fall into three categories: 1) obtaining additional training material automatically; 2) interpolating domain-specific LMs with other LMs; 3) improving distribution estimation robustness and accuracy with limited in-domain resources. the English language model has two components that are linearly interpolated. The first component built using in-domain data and the second one acting as a background model.

SPEECH TRANSLATION
NLU/NLG-based Speech Translation:

statistical translation method based on natural language understanding (NLU) and natural language generation(NLG) has been used. Statistical machine translation methods translate a sentence W in the source language into a sentence A in the target language by using a statistical model that estimates the probability.

full report download:
[attachment=1306]