基于tandem的声学模型区分性训练在语音评测系统中的研究

资源描述

《基于tandem的声学模型区分性训练在语音评测系统中的研究》由会员分享，可在线阅读，更多相关《基于tandem的声学模型区分性训练在语音评测系统中的研究（63页珍藏版）》请在金锄头文库上搜索。

1、中国科学技术大学硕士学位论文基于TANDEM的声学模型区分性训练在语音评测系统中的研究姓名：龚澍申请学位级别：硕士专业：信号与信息处理指导教师：刘庆峰 20100518 摘要摘摘要要近年来，以计算机辅助语言学习为代表的语音评测系统越来越多的运用在口语考试和语言教学活动之中，不仅提高了评分工作的公正性、高效性，保证了考试成绩的客观性，而且增强了教学反馈的及时性、准确性，激发了学生的学习兴趣。目前主流的语音评测系统采用的是基于 MFCC 特征的最大似然估计 MLE 建模方式。这套方法虽然成熟可靠，但也存在着诸如易受模型假设错误的影响、对模式的识别分类能力较差等缺点，

2、从而制约了系统评测性能的进一步提升。因此，本文考虑引入区分性训练技术和 TANDEM 特征，分别在声学模型训练准则和声学特征两个方面对原有系统进行改进。本文的结构如下：第一章概述性地介绍了语音评测技术的发展背景，较为详细地说明了语音评分系统和发音检错系统的基本原理和实现方式，重点阐述了语音评测的识别理论基础，包括声学特征、声学模型和语言模型等概念。第二章首先通过对贝叶斯决策理论的叙述指出了传统的最大似然估计 MLE 准则存在的不足，在此基础上引入了声学模型区分性训练的思想。再经过对各种区分性训练准则的目标函数和参数更新算法进行推导和比较，将它们统一地纳入到一套训练框架

3、体系之中。之后，文章又分析了语音评测系统的各种度量得分与不同区分性训练准则目标函数的对应关系，从而为区分性训练的建模方式在语音评测系统中的应用提供了理论基础。第三章首先分析了 HMM/GMM 框架和 HMM/ANN 框架各自的优缺点，之后提出了一种综合了两者优点的特征变换前端处理技术TANDEM 方法，并将其应用到普通话发音检错系统中。TANDEM 方法通过使用区分性训练的神经网络去估计音素级后验概率，经过一系列后续处理将原始 MFCC 特征转化为 TANDEM 特征，作为基于 HMM 统计模型的评测系统的输入，进而完成评分或检错的任务。实验结果证明，TANDEM 方法使系统

4、的检错性能有了较大的提升，结合 MLLR 等自适应方法的使用效果会更为明显。第四章首先分析了 TANDEM 特征和区分性训练技术相结合的可能性，之后介绍了英文评分系统的架构、评分特征和系统性能度量。最后搭建了 MFCC-MLE、TANDEM-MLE、MFCC-MPE、TANDEM-MPE 四个系统，分别用 Child 测试集和 Middle 测试集在不同配置的系统上进行测试，实验结果证明，基于 TANDEM 的声学模型区分性训练技术是一种有效的切实可行的提高目前英文发音评测系统性能的方法。第五章对全文进行总结，指出不足之处和改进方向。摘要关键词：关键词：语音评测系统语音检错

5、语音评分区分性训练最小音素错误 TANDEM 多层感知器 ABSTRACT ABSTRACT In recent years, the speech assessment and evaluation systems with the represent of computer assisted language learning system are more and more applied in the oral exams and language learning activities. These systems can not only help teachers give

6、 scores of oral tests much more objectively and efficiently but also give students pronunciation proficiency evaluation immediately and accurately. Now most of speech assessment and evaluation systems use maximum likelihood estimation for providing estimates for the parameters of models based on MFC

7、C Features. This popular statistical method has also some disadvantages. When there are confusable models or the training data is limited, it is unlikely to reach an optimization solution. To solve this problem, this thesis proposes discriminative training criterions and TANDEM feature which can imp

8、rove the performance of the current speech evaluation system. The whole thesis is organized as follows: Chapter 1 gives a brief summary on the development and background of speech evaluation, then, we explain the basic principle and system structure for speech scoring system and speech error detecti

9、on system respectively. Finally, we give introduction to some concept of speech recognition technology as the foundation of speech evaluation, such as acoustic features, acoustic model, language model and so on. Chapter 2 gives an overview on Bayesian decision theory firstly. To overcome the weaknes

10、s of MLE, we bring discriminative training methods for hidden Markov models into speech evaluation system. Four typical discriminative training criterions and some updating methods of acoustic model parameters are introduced, then, they are defined in a unified framework. Finally, we analyze the rel

11、ationship between the target of speech evaluation task and the objection function of each discriminative training criterion. This thesis proposes that the choice strategy of the discriminative function must be consistent with the measure of pronunciation evaluation. Chapter 3 compares HMM/ANN framew

12、ork with HMM/GMM framework at first. HMM/ANN has the advantages in discriminative training abilities over HMM/GMM. However, incremental enhancements such as speaker adaptation and ABSTRACT discriminative parameter estimation were not easily implemented in it. In this work, we apply the TANDEM approa

13、ch which combines neural-net discriminative feature processing with Gaussian-mixture distribution modeling to Mandarin speech error detection system. By training MLP network to estimate the probability distributions, then the error detection system based on HMM/GMM framework uses transformations of

14、these estimates as the input features. In this chapter, the experiment results show a large improvement in error-detecting performance, especially using maximum likelihood linear regression adaptation. Chapter 4 gives an analysis on chance for combining TANDEM feature with discriminative training me

15、thod, then, we introduce the system structure, scoring features and performance measurement for English speech scoring system. Finally, we design and build four systems, namely MFCC-MLE, TANDEM-MLE, MFCC-MPE and TANDEM-MPE. We test on them with Child data set and Middle data set. The experiment resu

16、lts show discriminative training based on TANDEM achieves the best evaluation performance which significantly outperforms MLE based on MFCC. Chapter 5 concludes the thesis. The possible improvements are also discussed here. Key Words: Speech Evaluation System, Speech Error Detection, Speech Scoring, Discriminative Training, Minimum Phone Error, TANDEM, Multi-Layer Perceptron. 插图索引插图索引插图索引图图 1.1 语音评分系统的结构图- 4 图图 1.2 发音检错系统的结构图-

展开阅读全文