一类支持向量机集成

资源描述

《一类支持向量机集成》由会员分享，可在线阅读，更多相关《一类支持向量机集成（42页珍藏版）》请在金锄头文库上搜索。

1、密级：学校代码：10075 分类号：学号： 20091331 工学硕士学位论文一类支持向量机集成学位申请人：陈雪芳指导教师：邢红杰副教授王熙照教授学位类别：工学硕士学科专业：计算机应用技术授予单位：河北大学答辩日期：二一二年五月 Classified Index: CODE: 10075 U.D.C: NO: 20091331 A Dissertation for the Degree of M. Engineering Ensemble of One-Class Support Vector Machines Candidate : Chen Xuef

2、ang Supervisor : Associate Prof. Xing Hongjie Prof. Wang Xizhao Academic Degree Applied : Master of Engineering Specialty : Computer Applied Technology University : Hebei University Date of Oral Examination : May, 2012 摘要 I 摘要异常检测在机器学习和模式识别领域可以看作是单类分类问题，单类分类器仅需使用正常数据进行训练，但是可以将待测样本分类为正常数据或异常数据。迄

3、今为止，出现了很多单类分类器，其中最为常用的是一类支持向量机和支持向量数据描述。在为单类分类器选取参数时，一般是使用交叉验证的方法，如果参数选择的不合理，所得的单类分类器就不能对正常数据的分布很好地加以拟合，构造出的分类边界就不够紧致。为了提高单个单类分类器的性能，可以将多个单类分类器按照一定的规则加以集成，使最终得到的分类器更好地对正常数据的分布进行拟合，从而产生更加紧致的分类边界。 AdaBoost 方法是一种常用的分类器集成方法，而一类支持向量机是强分类器，用作 AdaBoost 集成方法的基分类器，集成效果不显著。故此，我们将 AdaBoost 方法加以改进，使其适用于

4、一类支持向量机。此外，还提出了一种支持向量数据描述选择性集成的方法。该方法首先利用相交相关熵和自相关熵分别代替负相关学习中的训练误差和负相关项，建立相应的权重优化模型。然后通过半二次优化技术获得最优的权重向量。在保证分类准确率的前提下，剔除不起作用的基分类器，从而达到提高集成性能的目的。实验结果表明第一种方法（基于改进 AdaBoost 的一类支持向量机集成）能够提高单个一类支持向量机的分类性能。第二种方法（支持向量数据描述选择性集成）能够有效地减少集成中基分类器的个数，并能使分类准确率不低于甚至高于使用所有基分类器的集成方法。关键词一类支持向量机支持向量数据描述 Ad

5、aBoost 选择性集成 Abstract II Abstract Novelty detection is considered as one-class classification in the fields of machine learning and pattern recognition. The novelty detection models are trained only with the normal data but used for classifying a test sample as the normal data or the novel data. So

6、far, there have been a lot of one-class classifiers, among which one-class support vector machine and support vector data description are mostly used. The cross-validation method is usually used for choosing parameters for these one-class classifiers. If the selected parameters are not appropriate,

7、the obtained one-class classifier cannot effectively model the distribution of the given normal data, which results the incompact classification boundary. In order to enhance the performance of single one-class classifiers, several one-class classifiers can be integrated with certain rules to make t

8、he obtained classifier efficiently model the distribution of the normal data and produce the compact classification boundary. AdaBoost is a common method for combining classifiers. One-class support vector machine is a strong classifier, which makes the result of ensemble not good as the base classi

9、fiers of AdaBoost. Hence, the AdaBoost is modified in the dissertation to make it fit for combining one-class support vector machine. Moreover, the selective ensemble of support vector data descriptions is proposed. This method first uses cross correntropy and auto-correntropy to respectively replac

10、e the mean square error and variance, and establishes the corresponding model of weighted optimization problem. Then the optimal weight vector is obtained through the half-quadratic optimization technique. On the premise of ensuring the accuracy rate, the proposed method can get rid of the unnecessa

11、ry base classifiers and improve the performance. Experimental results in the dissertation demonstrate that the first method (the modified AdaBoost based one-class support vector machine ensemble) can improve the classification performance of a single one-class support vector machine, while the secon

12、d method (the selective ensemble of support vector data descriptions) can effectively reduce the number of base classifiers and ensure the accuracy is not less than or even higher than the ensemble of all the base classifiers. Keywords one class support vector machine support vector data description

13、 AdaBoost selective ensemble 目录 III 目录第 1 章绪论1 1.1 研究背景和意义1 1.2 国内外研究现状1 1.3 主要研究内容与论文组织结构3 1.3.1 主要研究内容3 1.3.2 论文组织结构3 1.4 本章小结4 第 2 章基于支持向量的两类分类器和单类分类器5 2.1 支持向量机5 2.1.1 线性可分支持向量机5 2.1.2 线性不可分支持向量机6 2.1.3 非线性可分支持向量机7 2.2 一类支持向量机9 2.3 支持向量数据描述10 2.4 本章小结10 第 3 章基于改进 AdaBoost 的一类支持向量机集成 11 3.1

14、 AdaBoost 集成方法的发展. 11 3.1.1 Boosting 集成方法 11 3.1.2 AdaBoost 集成方法.12 3.2 改进的 AdaBoost 集成方法13 3.3 实验验证14 3.3.1 人工数据集15 3.3.2 UCI 数据集17 3.4 本章小结20 第 4 章支持向量数据描述选择性集成22 4.1 负相关学习22 目录 IV 4.2 基于相交相关熵和自相关熵的 NCL.23 4.3 实验验证25 4.3.1 人工数据集25 4.3.2 UCI 数据集26 4.4 本章小结28 第 5 章总结与展望29 5.1 全文总结29 5.2 工作展望29 参考

15、文献31 致谢33 攻读学位期间取得的科研成果34 第 1 章绪论 1 第 1 章绪论 1.1 研究背景和意义与两类分类问题不同，单类分类问题1在训练过程中只需要正常数据（参与训练），在测试阶段则能够对训练过程中不曾出现的新数据成功地加以辨识。整个过程不需事先知道样本的类标，这大大降低了对实验数据的要求，节约了用于预先得知样本类别的成本，在现实生活中具有重要意义。目前，单类分类器已经成功了运用到了实际生活中，如：网络安全防御体系中的入侵检测、安全审计系统和金融领域中的用户行为异常检测、控制领域中的机器故障诊断等。迄今为止，出现了很多单类分类器，如高斯混合模型、隐马尔

16、可夫模型、parzen 窗密度估计器、基于 KNN 的方法、一类支持向量机（OCSVM）2和支持向量数据描述 (SVDD)3等，其中一类支持向量机和支持向量数据描述最为常用。然而，一类支持向量机和支持向量数据描述一个突出的问题是参数选择问题，虽然使用交叉验证方法为这些模型选取参数能使模型比较稳定，但是一旦参数选择的不合理，所得的单类分类器就不能对正常数据的分布很好地加以拟合，构造出的分类边界就不够紧致。为了提高单类分类器的性能，可以将多个单类分类器按照一定的规则加以集成，使最终得到的分类器更好地对正常数据的分布进行拟合，从而产生更加紧致的分类边界。在处理单类分类问题时，对多个单类分类器加以集成能明显改善单类分类器的性能 4。国内外对两类（或多类）分类器集成问题的研究已日趋成熟，而对多个单类分类器加以集成是模式识别和机器学习领域一个新的研究方向，

展开阅读全文

一类支持向量机集成

最新文档