基于张量分解的音频信号分类研究论文设计

资源描述

《基于张量分解的音频信号分类研究论文设计》由会员分享，可在线阅读，更多相关《基于张量分解的音频信号分类研究论文设计（60页珍藏版）》请在金锄头文库上搜索。

1、摘要随着多媒体和互联网技术的不断进步与发展，音频信号作为多媒体信号的重要组成成分，对其进行信息处理与挖掘受到越来越多的研究者所青睐，而张量作为一种近年来被广泛使用的多线性分析工具，可以处理高维以及多模态的信号。如今网络上进行海量音频信息的查询时，那些混在其中的不良信息和危害性的信息就可以凭借音频分类技术完成审核，这样不仅可以降低人力成本，还能更加便捷的将有害的信息进行屏蔽。因此对于当今社会而言，音频分类技术的应用层面是非常广泛的。而随着人工智能的发展，研究人员对音频场景的分析展现出极大的兴趣，其中音频场景分类逐渐成为焦点。音频场景分类，是基于音频识别的一种具体的应用。给出音频场景，其中包含各

2、种各样的音频信号，传统的摄像头进行的视频监测，极易受到大雾、暴雨等天气状况以及视野盲区的影响，这些外因都不可避免的会使视频监控的效率降低。而利用音频分类时就能解决这些问题带来的弊端，音频分类只需要一个采集声音的设备和一个接受的设备，视频监控结合音频分类使得我们在今后的生产生活有一定的积极影响，对于单一的监控有很好的效果。本课题是将音频分类技术进行深入研讨，利用张量分析挖掘出音频信号的高维特征和内部结构信息来提高分类的准确率。本课题研究过程中利用梅尔频谱倒谱系数作为音频特征，在音频进行张量建模和Tucker分解之后生成特征，最后用支持向量机作为分类器完成声学场景的分类。在包含汽车喇叭、发动机空转

3、声、枪击声、儿童游戏声、狗叫、街头音乐等10种类型的共计8732个声音片段进行分类。最终获得分类的正确率为92.4%，单类场景分类的正确率都达到了90%以上，从而为音频场景的分类和音频件检测打下了良好的基础。关键词：音频分类；特征提取；张量分析；梅尔频谱倒谱系数；支持向量机AbstractWith the progress and development of multimedia and Internet technology, audio signal as an important component of multimedia signal, on the information pr

4、ocessing and mining researchers favored by more and more, and the tensor analysis in recent years, as a kind of widely used linear analysis, these factors can deal with high-dimensional and signals. Nowadays, in the search of massive audio information on the Internet, those bad information and harmf

5、ul information mixed in them can be automatically classified by audio classification. In this way, you can not only reduce the human cost, but also classify the harmful information more effectively and conveniently. Therefore, for todays society, the application of audio classification technology is

6、 very wide. With the development of artificial intelligence, researchers have shown great interest in the analysis of audio scenes, and the classification of audio scenes has gradually become the focus.Audio scene classification is a specific application based on audio recognition. Given the audio s

7、cene, which contains all kinds of audio signals, the video monitoring carried out by the traditional camera is highly susceptible to the influence of fog, heavy rain and other weather conditions as well as the blind area of the field of vision, which will inevitably reduce the efficiency of video mo

8、nitoring. However, the disadvantages brought by these problems can be solved by using audio classification. Audio classification only requires an equipment to collect sound and an equipment to receive it. Video monitoring combined with audio classification will have a positive impact on our producti

9、on and life in the future.This topic is to further explore the audio classification technology, using tensor analysis to mine the high-dimensional features and internal information of audio signals to improve the classification accuracy. This topic research in the process of using MEL frequency ceps

10、trum coefficient as audio features, after the audio tensor modelling and Tucker decomposition is extracted, and finally using support vector machine (SVM) as classifier in the car horn, adult, wind noise, childrens voice sound, dog, street music game six types of a total of 300 sound bite. The accur

11、acy of the final classification is 99.1%, and the accuracy of the classification of the six types of scenes is more than 90%, which lays a good foundation for the classification of audio scenes and the detection of audio events.KeyWords：Audio classification；Feature extraction；Tensor analysis；Mel fre

12、quency cepstrum coefficient；Support vector machineIII目录摘要IAbstractII1 绪论11.1 研究背景及意义11.2 国内外研究现状和进展分析21.2.1 研究现状21.2.2 应用进展31.3 论文的主要内容31.4 论文组织结构42 音频场景分类系统介绍52.1 音频系统结构简介52.2 音频分类相关内容52.2.1 应用场景及优势52.2.2 特征提取62.2.3 分类器的训练72.3 SVM模型72.3.1 SVM二分类问题72.3.2 SVM多分类问题102.4 本章小结123 音频信号特征提取133.1 音频特征133.

13、1.1 时域特征133.1.2 频域特征133.1.3 时频域特征133.1.4 倒谱特征143.2 常用的音频特征提取143.2.1 分类常用特征介绍143.2.2 MFCC特征提取的流程183.3 本章小结214 张量的相关原理224.1 张量的概念224.2 张量的相关运算224.2.1 张量的纤维与切片224.2.2 张量的矩阵化234.2.3 张量间的运算244.3 张量分解254.3.1 张量的CP分解264.3.2 Tucker分解274.4 本章小节295 实验及结果分析305.1 实验流程设计305.1.1 音频数据集305.1.2 实验环境及配置315.1.3 实验设计32

14、5.2 实验结果分析325.3 本章小结35结论36参考文献38在学研究成果43致谢441 绪论1.1 研究背景及意义随着互联网、人工智能等的相关技术的不断成熟与发展，大数据的脚步已经逐步渗透到我们生活的方方面面，大数据是集数据量大、数据内容丰富、类型多样于一体的数据集合。我国面向2030年的新一代人工智能发展规划指出: 经过60多年的演进，特别是在移动互联网、大数据、超级计算、传感网、脑科学等新理论新技术以及经济社会发展强烈需求的共同驱动下，人工智能加速发展，呈现出深度学习、跨界融合、人机协同、群智开放、自主操控等新特征1。2019年的第一个工作日，阿里巴巴达摩院就发布了“2019十大科

15、技趋势”，其认为人工智能（Artificial Intelligence，AI）仍然是科技界最热的方向，今年将开启人类与AI全面合作的新起点。十大科技趋势之中有遍及老百姓生产生活的，也有国家战略发展的，涵盖了智能城市、AI芯片、自动驾驶、语音AI在特定领域通过图灵测试等内容。从国家战略到科技热点都印证了AI技术已经成为人类社会未来的重要发展方向。随着海量数据、强大计算能力和先进模型的发展，人工智能的脚步已经从符号主义跨越到了连接主义，再到现在我们踏进的认知科学领域2。目前，AI不再局限于感知、认知、控制等功能，而是要像人一样对信息进行理解，其中场景理解（Scene Understanding）是作为AI领域最值得探讨的问题。人类对客观世界的各种感知是大脑与多个感官综合作用的结果，若将计算机视觉、听觉、触觉、嗅觉有机结合在一起，能够有效帮助和提高计算机对于复杂多变室内外场景的理解，这其中视觉场景和声学场景理解的应用更为广泛。视觉场景理解在计算机视觉3（Computer Vision，CV）理论基础上获得了不断地发展与完善，是其应用领域中不可或缺并具有挑战的技术4。但是由于视觉信息容易受视角、尺度、背景干扰和遮挡等因素的影响

展开阅读全文

基于张量分解的音频信号分类研究论文设计

最新文档