语音识别外文翻译外文文献英文文献

资源描述

《语音识别外文翻译外文文献英文文献》由会员分享，可在线阅读，更多相关《语音识别外文翻译外文文献英文文献（14页珍藏版）》请在金锄头文库上搜索。

1、Speech RecognitionVictor Zue, Ron Cole, & Wayne WardMIT Laboratory for Computer Science, Cambridge, Massachusetts, USA Oregon Graduate Institute of Science & Technology, Portland, Oregon, USACarnegie Mellon University, Pittsburgh, Pennsylvania, USA1 Defining the Problem Speech recognition is the pro

2、cess of converting an acoustic signal, captured by a microphone or a telephone, to a set of words. The recognized words can be the final results, as for applications such as commands & control, data entry, and document preparation. They can also serve as the input to further linguistic processing in

3、 order to achieve speech understanding, a subject covered in section. Speech recognition systems can be characterized by many parameters, some of the more important of which are shown in Figure. An isolated-word speech recognition system requires that the speaker pause briefly between words, whereas

4、 a continuous speech recognition system does not. Spontaneous, or extemporaneously generated, speech contains disfluencies, and is much more difficult to recognize than speech read from script. Some systems require speaker enrollment-a user must provide samples of his or her speech before using them

5、, whereas other systems are said to be speaker-independent, in that no enrollment is necessary. Some of the other parameters depend on the specific task. Recognition is generally more difficult when vocabularies are large or have many similar-sounding words. When speech is produced in a sequence of

6、words, language models or artificial grammars are used to restrict the combination of words. The simplest language model can be specified as a finite-state network, where the permissible words following each word are given explicitly. More general language models approximating natural language are s

7、pecified in terms of a context-sensitive grammar. One popular measure of the difficulty of the task, combining the vocabulary size and the 1 language model, is perplexity, loosely defined as the geometric mean of the number of words that can follow a word after the language model has been applied (s

8、ee section for a discussion of language modeling in general and perplexity in particular). Finally, there are some external parameters that can affect speech recognition system performance, including the characteristics of the environmental noise and the type and the placement of the microphone.Spee

9、ch recognition is a difficult problem, largely because of the many sources of variability associated with the signal. First, the acoustic realizations of phonemes, the smallest sound units of which words are composed, are highly dependent on the context in which they appear. These phonetic variabili

10、ties are exemplified by the acoustic differences of the phoneme，At word boundaries, contextual variations can be quite dramatic-making gas shortage sound like gash shortage in American English, and devo andare sound like devandare in Italian. Second, acoustic variabilities can result from changes in

11、 the environment as well as in the position and characteristics of the transducer. Third, within-speaker variabilities can result from changes in the speakers physical and emotional state, speaking rate, or voice quality. Finally, differences in sociolinguistic background, dialect, and vocal tract s

12、ize and shape can contribute to across-speaker variabilities. Figure shows the major components of a typical speech recognition system. The digitized speech signal is first transformed into a set of useful measurements or features at a fixed rate, 2 typically once every 10-20 msec (see sectionsand 1

13、1.3 for signal representation and digital signal processing, respectively). These measurements are then used to search for the most likely word candidate, making use of constraints imposed by the acoustic, lexical, and language models. Throughout this process, training data are used to determine the

14、 values of the model parameters. Speech recognition systems attempt to model the sources of variability described above in several ways. At the level of signal representation, researchers have developed representations that emphasize perceptually important speaker-independent features of the signal,

15、 and de-emphasize speaker-dependent characteristics. At the acoustic phonetic level, speaker variability is typically modeled using statistical techniques applied to large amounts of data. Speaker adaptation algorithms have also been developed that adapt speaker-independent acoustic models to those

16、of the current speaker during system use, (see section). Effects of linguistic context at the acoustic phonetic level are typically handled by training separate models for phonemes in different contexts; this is called context dependent acoustic modeling. Word level variability can be handled by allowing alternate pronunciations of words in representations known as pronunciation networks. Common alternate pronunciations of words, as well as effects of dialect

展开阅读全文

语音识别 外文翻译 外文文献 英文文献

语音识别外文翻译外文文献英文文献