基于特征补偿的自动语音识别的研究

资源描述

《基于特征补偿的自动语音识别的研究》由会员分享，可在线阅读，更多相关《基于特征补偿的自动语音识别的研究（58页珍藏版）》请在金锄头文库上搜索。

1、中国科学技术大学硕士学位论文基于特征补偿的自动语音识别的研究姓名：杨钊申请学位级别：硕士专业：信号与信息处理指导教师：刘庆峰;戴礼荣 20100501 摘要 I 摘摘要要本文主要研究的是自动语音识别中的前端噪声鲁棒性问题。众所周知，语音识别的根本目的就是使机器能够听懂人类的语言。在当前的实验室环境下，很多识别系统已经能够达到很好的性能。但在实际环境中，由于噪声的复杂多变和未知因素的干扰，系统性能往往会急剧下降以至于远远不能达到实用的目的。因此，噪声鲁棒性一直是语音识别研究中一个非常重要的方面。噪声鲁棒性的根源就在于训练环境和测试环境的失配。实际中这种失配是由语

2、音采集环境的影响（如加性噪声、信道畸变等）以及说话人自身的影响（如说话风格、口音等）引起的，当然，我们也可以将这种失配都看成是噪声的影响。为了使语音识别系统在不同噪声环境下仍能具有较好的性能，就需要采用各种方法来增强识别系统的鲁棒性。噪声鲁棒性的方法多种多样，但一般来说可分为前端方法和后端方法两大类。前端方法集中于对语音信号本身或者语音特征做处理，达到消除或尽可能抑制噪声影响的效果；后端方法主要集中于增强语音声学模型自身的宽容度和适应能力，使模型能够容忍一定程度的噪声，或者调整模型参数使之跟上噪声环境的变化。本文主要是对噪声鲁棒性的前端方法进行了一些研究，改善了一些已有的

3、方法，也提出了一些新的方法。首先，在本文第一章中，对语音识别技术的发展历程做了简单的概述，并重点介绍了一下基于统计建模框架下自动语音识别系统的几个重要组成部分。由于实际中噪声的多样化，使得噪声鲁棒性也出现了很多种方法，每种方法都有它的特点和适用范围。正是针对这种情况，论文在第二章中分别从鲁棒性特征的提取、语音增强、特征补偿/增强、模型补偿四个方面对噪声鲁棒性问题进行了比较全面的介绍和总结。在本文第三章中，首先介绍了基于显式模型的一阶矢量泰勒级数（VTS）离线特征补偿算法，但是离线算法在实用时并不完美，它最大的缺陷在于其巨大的运算量极大的降低了系统处理的效率。因此，在离线算法的

4、基础上我们提出了实用化的一阶 VTS 特征补偿算法，它在保证离线算法性能的同时，大大提升了算法处理的实时性。虽然实用化的一阶 VTS 特征补偿算法取得了不错的效果，但是它和离线算法一样，对噪声均采用的是单高斯建模，而在实际环境中噪声是复杂多样的，这种情况下单高斯可能不能很好的描述噪声参数的分布特性，从而使干净语音估计不准最终影响到识别性能，针对以上问题，在本文第四章中，提出了对噪声多高摘要 II 斯建模的一阶 VTS 特征补偿算法。实验结果表明，噪声多高斯建模方法还是能够在一定程度上提高系统识别的性能。关键词关键词：自动语音识别，噪声鲁棒性，矢量泰勒级数，特征补偿，实用

5、化，多高斯建模 Abstract III Abstract This thesis is focused on the research topic of noise-robust front-end of automatic speech recognition (ASR).As we all know, the ultimate purpose of speech recognition is to make the computer understand human spontaneous language. And now many mature systems have got f

6、airly high speech recognition rate in laboratory. However, the systems performance is too much worse to be used in real environment because of disturbance of various noises and unknown factors. Therefore, the noise robustness is a very important part of speech recognition research. The derivation of

7、 noise robustness can come down to the mismatch between training and testing environment. In our real world, this mismatch is caused by the influences of the speech collecting environment (additive noise, convolutional noise, etc.) and speaker (speaking style, accent, etc.), we can also regard this

8、mismatch as influences of noises. In order to make the speech recognition system maintain the good performance under these noise conditions, we must use various methods to enhance the robustness of system. The noise-robust methods are various and be roughly classified into two categories: front-end

9、methods and back-end ones. The front-end methods focus on mitigating the effect of noises by processing the speech signal or speech feature, while the back-end ones try to adjust models to meet the change of environments, which make models and real environments match. This thesis is primarily focuse

10、d on the research of front-end noise-robust methods, and then some existing algorithms are implemented, several new methods are proposed. Firstly, this thesis gives an overview and summary on the development history of ASR in chapter one, and highlight the several important components of ASR which i

11、s based on the statistical modeling. There are many kinds of noise-robust front-end methods because of the diversity of noises, and each has its character and in-point range. Therefore, general introductions and conclusions are made in chapter 2 from four aspects including robust feature extraction,

12、 speech enhancement, feature compensation/enhancement and model adaptation. In chapter 3, we firstly introduce the offline feature compensation based on first-order Vector Taylor Series (VTS) approximation using explicit model of environmental distortion. But the offline algorithm is not perfect in

13、practice. The Abstract IV biggest disadvantage of it is its huge computation which will reduce the system processing efficiency. Therefore, a practical first-order VTS approximation is proposed; it keeps the performance comparable to the offline condition, and also greatly increases the efficiency o

14、f the algorithm Although the practical first-order VTS algorithm has achieved good performance, but as is the offline algorithm, it assumes that for each sentence, the noise feature vector in cepstral domain follows one single Gaussian PDF (probability density function), this may be not a suitable d

15、escription of the noise distribution because of the diversity and complexity of noises, thus the clean speech is estimated inaccurate, ultimately affect the recognition performance. So a first-order VTS approximation which assumes the noise feature vector in cepstral domain follows multi-Gaussian PD

16、F is proposed in chapter 4.The results show that this method can improve the systems performance to some extent. Key words ： ASR, Noise Robustness, VTS, feature compensation, practical, multi-Gaussian modeling 中国科学技术大学学位论文相关声明本人声明所呈交的学位论文,是本人在导师指导下进行研究工作所取得的成果。除已特别加以标注和致谢的地方外，论文中不包含任何他人已经发表或撰写过的研究成果。与我一同工作的同志对本研究所做的贡献均已在论文中作了明确的说明。本人授权中国科学技术大学拥有学位论文的部分使用权，即：学校有权按有关规定向国家有关部门或机构送交论文的复印件和电子版，允许论文被查阅和借阅，可以将

展开阅读全文