基于svm增量学习的p2p流媒体流量识别方法研究

资源描述

《基于svm增量学习的p2p流媒体流量识别方法研究》由会员分享，可在线阅读，更多相关《基于svm增量学习的p2p流媒体流量识别方法研究（81页珍藏版）》请在金锄头文库上搜索。

1、国防科学技术大学硕士学位论文基于SVM增量学习的P2P流媒体流量识别方法研究姓名：李进申请学位级别：硕士专业：控制科学与工程指导教师：王晖 2010-11 国防科学技术大学研究生院硕士学位论文第 i 页摘要随着 P2P 技术和多媒体信息处理技术的进步，P2P 流媒体应用在互联网上的迅速发展，同时它也给社会安全和舆论安全带来了一系列的隐患。由于 P2P 流媒体系统架设在互联网的开放环境中，只要掌握了 P2P 流媒体传输技术，境内外敌对势力可以利用它传播反动节目，境内外不法分子也可以利用它传播暴力色情内容，组织就在境外架设了一个 P2P 流媒体系统IPPOTV，并利用它

2、来传播反动言论，如果不及时控制必然会给社会造成恶劣影响。但是这种基于 P2P 技术的应用具有非中心化、自组织等特点，要对其进行有效的监管还存在困难。因此，实现对 P2P 流媒体流量准确快速的识别显得尤为重要。本课题组前期对 P2P 流媒体流量识别方法进行了研究，提出了基于包大小分布特征的方法，这种基于 SVM 的识别方法性能优异。然而实际网络中会不断出现新的 P2P 流媒体应用，而且已有的 P2P 流媒体应用也有可能改变协议，而采用 SVM 的方法在识别时是基于一个训练阶段产生的固有分类模型，该方法并不具备良好的增量学习的能力，所以说现有的基于 SVM 的方法离应用于实际的

3、网络环境还有差距。为了使识别方法能够在准确识别 P2P 流媒体流量的同时能够快速适应变化的网络环境，本文主要进行了以下工作：第一，为了使识别方法具有良好的扩展性，本文首先提出了基于包大小组合特征的识别方法，该方法解决了在识别过程可能出现的属性增量问题；第二，针对网络中不断出现新的 P2P 流媒体应用的情况，本文提出了适应于 P2P 流媒体流量识别的类增量学习算法 CIOOL，该算法充分利用了原有分类模型中的知识，实现了快速高效的知识累加，保证了识别系统对新应用的快速识别。为了进一步提高 CIOOL 算法的效率，本文提出了针对高维样本的精减算法 FSR，在保存样本空间信息的同时

4、大幅度地精减样本数量，使增量学习过程更加迅速; 第三，针对已有 P2P 流媒体应用变更协议的问题，本文提出了基于类更替的增量学习算法 CRIOOL。该算法能够避免对分类模型的重新训练，能够更加高效地实现分类模型中知识的更新；第四，在以上技术的支撑下，本文对具有增量学习能力的 P2P 流媒体流量识别系统进行了设计，并实现了原型系统 TV-EYE。使用原型系统基于校园网环境进行了在线实验，实验表明本文研究的技术是有效的， TV-EYE 已经具备实际监管能力。主题词：P2P 流媒体，流量识别，SVM，增量学习国防科学技术大学研究生院硕士学位论文第 ii 页 ABSTRACT

5、The P2P streaming media applications mushroom on the Internet along with development of the technologies of peer to peer and multimedia, at the same time it brings potential safety hazard in society and public opinion.Because the Internet is an open environment, if them master the technologies of P2

6、P streaming media, hostileforce within and beyond the borders can use it to spread reactionary program, pornography and violence. The group of FaLunGong has build a P2P streaming media application - IPPOTV, them make use of IPPOTV to declare reactionary opinion, if it cant be controlled in time, the

7、 influence to the society will be execrable. But the P2P streaming media applications has some characters like P2P applications, for example non-centralization, self-organization and so on, its hard to supervise them effectively. So its important to identify the traffic of P2P streaming media applic

8、ation quickly and accurately. Our group has researched on method to identify P2P streaming media traffic. We provide a new approch using packet size distribution character. The approach based on Support Vector Machines (SVM) has excellent performance. But new P2P streaming media applications will co

9、me forth ceaselessly in real network. And the existent P2P streaming media applications will change their protocol. The method carries out classification depending on a fixed model which produced by a train process, so the method has not the ability of incremental learning, that is to say, the motho

10、d based on SVM cant adapt the real network very well. In order to let the method suit the complicated circumstance of the network, this paper has research on some technologies as follows: Firstly, this paper provides a method based on packet size combination character. this method has solved the pro

11、blem of attribute incremental learning. Secondly, because new P2P streaming media applications will appear in the network constantly, this paper proposes a class incremental learning algorithm-CIOOL. CIOOL can make use of the knowledge of old model fully, and can accumulate knowledge rapidly. So thi

12、s method can identify new P2P streaming media application in a short time. In order to enhence the performance of CIOOL, this paper proposes a sample refine algorithm-FSR. It can reduce the number of samples remarkably, but it keeps the space character of the samples all the same. FSR improves the e

13、fficiency of CIOOL remarkbly. Thirdly, because the existent P2P streaming media applications will change protocol or become invalid, this will induce the SVM model produce invalid knowlege. In order to solve this problem, this paper proposes a class replace incremental learning algorithm-CRIOOL. Thi

14、s algorithm can update the knowledge of the model rapidly. 国防科学技术大学研究生院硕士学位论文第 iii 页 Fourthly, based on the technologies this paper researched, a prototyp identified system TV-EYE has been designed, and a sofeware also has been done. Experiment has been done using TV-EYE in a campus network. Result

15、s of experiment show that the technologies are effective and TV-EYE has the ability of supervising P2P streaming media. Key Words：P2P streaming media， Traffic Identification，Support Vector Machines（SVM），Incremental Learning 国防科学技术大学研究生院硕士学位论文第 IV 页表目录表 3. 1 五种 P2P 流媒体应用的主要包大小分布区间 17 表 3. 2 九个主要

16、包大小分布区间 18 表 3. 3 实验数据集 23 表 3. 4 不同核函数时的识别精度 24 表 3.5 包大小组合特征在不同 E 值下的识别结果 25 表 3.6 包个数分布特征在不同置零阈值下的识别结果 25 表 3.7 包大小组合特征在不同时间窗口下的识别结果 26 表 4.1 实验数据集 32 表 4.2 两种方法分别产生的 12 个分类模型中的支持向量个数 35 表 6. 1 五种已知应用的在线识别精度 64 国防科学技术大学研究生院硕士学位论文第 V 页图目录图 1.1 论文组织结构图. 5 图 2.1 最优超平面与 SVM 8 图 2.2 SVM 的特征映射 10 图 3.1 SopCast 四个不同频道的 A-PSD 16 图 3. 2 五种不同应用的 A-PSD 17 图 3. 3 五种不同应用的 C-PSD 18 图 3. 4 SVM 训练过程 . 21 图 3. 5 SVM 识别过程 . 23 图 3.6 不同拒绝阈值 T 对识别精度的影响. 27 图 4.1 重新训练和 CIOOL 在时间消耗上随类别增加的变化情况. 33 图 4

展开阅读全文