基于多重分形的语音情感识别研究

资源描述

《基于多重分形的语音情感识别研究》由会员分享，可在线阅读，更多相关《基于多重分形的语音情感识别研究（50页珍藏版）》请在金锄头文库上搜索。

1、学校代号 10536 学号 0810803562 分类号 TP391.4 密级公开硕士学位论文基于多重分形的语音情感识别研究学位申请人姓名张密霞培养单位长沙理工大学导师姓名及职称叶吉祥教授学科专业计算机应用技术研究方向人工智能及其应用论文提交日期 2011 年 3 月学校代号：10536学号：0810803562密级：公开长沙理工大学硕士学位论文基于多重分形的语音情感识别研究学位申请人姓名张密霞导师姓名及职称叶吉祥教授培养单位长沙理工大学专业

2、名称计算机应用技术论文提交日期 2011 年 3 月论文答辩日期 2010 年 5 月答辩委员会主席车生兵教授 Speech Emotion Recognition based on MultifractalbyZhang Mi-xiaB.E.( Changsha University of Science & Technology) 2008A thesis submitted in partial satisfaction of theRequirements for the degree ofMaster of EngineeringinCo

3、mputer Application TechnologyinChangsha University of Science & TechnologySupervisorProfessor Ye Ji-xiangMarch, 2011长沙理工大学学位论文原创性声明本人郑重声明：所呈交的论文是本人在导师的指导下独立进行研究所取得的研究成果。除了文中特别加以标注引用的内容外，本论文不包含任何其他个人或集体已经发表或撰写的成果作品。对本文的研究做出重要贡献的个人和集体，均已在文中以明确方式标明。本人完全意识到本声明的法律后果由本人承担。作者签名：日期：年月日学位论文版权使用授权书本学位论文作者

4、完全了解学校有关保留、使用学位论文的规定，同意学校保留并向国家有关部门或机构送交论文的复印件和电子版，允许论文被查阅和借阅。本人授权长沙理工大学可以将本学位论文的全部或部分内容编入有关数据库进行检索，可以采用影印、缩印或扫描等复制手段保存和汇编本学位论文。本学位论文属于1、保密，在_年解密后适用本授权书。2、不保密。（请在以上相应方框内打“” ）作者签名：日期：年月日导师签名：日期：年月日I摘要随着科学技术的快速发展，新型的人机交互（Human Machine Interaction, HCI）技术逐渐成为当前计算机科学领域的研究热点。语音情感识别的研究对于增强计算机的人性

5、化和智能化，建立新型人机交互环境等具有重要的现实意义，并将产生很好的经济和社会效益。本文首先简述了课题的研究背景及文中的主要研究内容，回顾并分析了现阶段国内外语音情感识别中涉及的几个关键技术，包括情感的分类、情感语料库的建立、语音情感特征提取以及情感分类算法等。在此基础上，采用多重分形理论分析语音信号在高兴、生气、悲伤和平静 4 种不同情感状态下的混沌特性，进而提取了多重分形谱特征和广义 Hurst 指数作为新的情感特征参数参加语音情感识别。具体内容如下：（1）基于柏林实验室的德语语料库 EMO-DB，观察并分析了在高兴、生气、悲伤和平静四种情感状态下，语音信号的基频、能量振幅、过零率、共振峰

6、以及 Mel 倒谱系数 MFCC 等特征的变化规律。（2）提出了基于多重分形的语音情感特征参数的提取方法。介于传统情感语音特征缺乏对语音混沌特性的表征，采用多重分形理论通过分析不同语音情感状态下的多重分形特征，提取多重分形谱参数和广义 Hurst 指数作为新的语音情感特征参数。多重分形特征的引入，弥补了传统线性特征在刻画不同情感类型特征上的不足。（3）根据多重分形对将强度较高的情感（高兴和生气）与强度较低的情感（悲伤和平静）有良好区分度的特性，通过建立 SVM 二叉树的中间节点，实现对情感类别间的粗分类，保证了将容易混淆的情感类别划归为一组，以便深入分析不同情感状态之间的细微差别。进而采用贡献

7、最大的特征矢量对中间节点上的每组情感再进行分类，其贡献值的确定由经验而得。最后，实现了较为理想的基于经验性特征的 SVM 二叉树语音情感识别。关键词：语音情感识别；语音情感特征；多重分形；广义 Hurst 指数IIABSTRACTWith the rapid development of science and technology, the technology of new Human Machine Interaction (HMI) has become a very active study subject in the computer science field at pres

8、ent. The study of the speech emotion recognition has found important realistic value in such aspects as enhancing the intelligence and humanity of computer, developing new Human-Machine environment, and will produce good economic and social benefits.The thesis firstly introduces the study background

9、 of speech emotion recognition and the main research content, then call some key issues in the current studies of speech emotion recognition, including the overview of emotional corpus, the kinds of emotional stases, features extraction of speech emotion signals, emotional feature selection and clas

10、sification algorithms. After analyzing the methods currently used by others, we firstly take the Multi-fractal theory into the speech emotional recognition, by analyzing the Multiple fractal features on the four of speech emotional (happiness、anger、sadness and neutral), and proposed Multifractal Spe

11、ctrum parameters and Generalized Hurst Index as new emotional conventional parameters for speech emotion recognition. The contests are described as follows:(1) Based on the Berlin laboratory German corpus EMO-DB, We observe and analyze that speech emotions were well expressed for our analysis and ex

12、periments. Then through, we selected and defined the features(pith, resonance, energy, MFCC, etc)which are the most important in distinguishing emotions.(2) In order to overcome the inadequate of Emotional conventional linear argument at depicting different types of character sentiments，we take the

13、Multiple fractals theory into the speech emotional recognition，By analyzing the Multiple fractal features on the different speech emotional state, and proposed Multifractal Spectrum parameters and Generalized Hurst Index. it provides a new idea for speech emotion recognition by using non-linear para

14、meters.(3) A rough classification is taken according to the good discrimination between high intense emotion (happy and anger) and low intense emotion (sadness and neutral) of multifractal, to ensure emotions that are easily confused are grouped and to further detail the IIInuance among them. The ro

15、ugh classification creates binary intermediate nodes for SVM. Then the classification is taken on these intermediate nodes using the features of the greatest contribution, which is determinate by experience. At last, empirical characteristics based on SVM binary tree speech emotion recognition is realized ideally.Key words: Speech emotion recognition; Speech emotion feature; Multi-fractal；Generalized Hu

展开阅读全文