基于信息熵的粗糙集约简与支持向量机相结合的分类问题研究

资源描述

《基于信息熵的粗糙集约简与支持向量机相结合的分类问题研究》由会员分享，可在线阅读，更多相关《基于信息熵的粗糙集约简与支持向量机相结合的分类问题研究（68页珍藏版）》请在金锄头文库上搜索。

1、分类_TP391_密级_硕士学位论文基于信息熵的粗糙集约简与支持向量机相结合的分类问题研究导师姓名职称: 任小康教授专业名称：研究方向：计算机应用技术多媒体信息处理论文答辩日期： 2010.6 学位授予日期：2010.6答辩委员会主席：评阅人：二一年六月硕士学位论文M.D. Thesis基于信息熵的粗糙集约简与支持向量机相结合的分类问题研究Research on Classification Problems Based on the InformationEntropy of the Rough Setss Reduction and Support Vector Machine孙正兴Sun

2、 Zhengxing独创性声明I摘要随着科技的快速发展，各行各业涌现出大量的数据信息。如何从这些数据中快速获取有用的知识、提取有效的分类方法是目前机器学习面临的主要问题。粗糙集理论方法是一种能有效地分析和处理不精确、不一致、不完整等各种不确定信息的数据分析工具，已经成功应用于机器学习、模式识别、决策支持、知识发现、故障诊断等领域。知识约简作为粗糙集应用的关键技术，是该理论的核心问题之一，能够对数据进行快速有效的处理。Vapnik等人提出的支持向量机是一种新的机器学习理论，集成了最大间隔超平面、Mercer核、凸二次规划、稀疏解和松弛变量等多项技术，其主要借助于最优化方法来解决机器学习问题。由于

3、该理论具有全局最优、结构简单、推广能力强等优点，近些年得到了广泛地研究并被应用于数据分类、模式识别等领域。由于大多数的多类分类问题最终都可以转化为两类分类问题，因此支持向量机的原始问题也只涉及两类分类问题。通过对以上两种理论的学习研究，结合它们的优点，设计了一种基于信息熵的粗糙集约简与支持向量机相结合分类算法，并将该算法应用到本文的分类系统中，实验结果显示该算法在分类的准确率和速率方面都有所提高。本文主要工作如下：1.通过对粗糙集约简方法的研究，选择基于信息熵的启发式约简方法作为本文所提算法属性约简的工具。2.在粗糙集区分矩阵的基础上，提出一种新的粗糙集对象约简算法。3.设计了一种基于信息熵的

4、粗糙集约简与支持向量机相结合的分类算法。该算法采用粗糙集属性约简理论对支持向量机数据特征进行选择，能有效降低输入特征向量的维数；而且，根据本文提出的对象约简方法能够有效移除冗余信息，修正不一致的信息，从而使分类面的分类性能得到优化。4.通过对 UCI数据库中数据集进行的研究和实验，验证了文中设计算法的有效性。关键词：粗糙集；支持向量机；约简；区分矩阵；分类IIAbstractWith the development of science and technology, the information from all thefields is increasing rapidly. It ha

5、s been to a big problem that how to acquire the usefulknowledge and the effective classification methods from these data. The rough setmethod, which is an excellent data analysis tool to process the uncertain informationsuch as imprecise, inconsistent, incomplete and so on. It has been successfully

6、appliedto machine learning and pattern recognition, decision support and knowledgediscovery, fault diagnosis and etc. Knowledge reduction of rough set as the keytechnology which can process data rapidly and effectively.Support vector machine is a new machine learning technique developed fromthe midd

7、le of 1990s by Vapnik. Its characterized by the use of a maximal marginhyperplane, the theory of kernels and the absence of local minima, convexoptimization the sparseness of the solution, Mercers theorem and the capacity controlobtained by acting on the margin. And its a new tool for machine learni

8、ng by usingoptimization method. Because support vector machines has not only simpler structure,but also better performance, especially its better generalization ability. In recent years,it has been widely applied to classification and pattern recognition, etc. Since manymulticlass problems can be di

9、vided into binary classification problems, the originalproblem of support vector machine is designed to slove the binary classification.Through research the theory above and use their advantages, we designed amethod which is base on the information entropy of the rough setss reduction andsupport vec

10、tor machine. Since the algorithm is applied to the classification system,theresult demonstrates that the accuracy and speed of classification are improved.Themain work is given as follows:1. An overview on a variety of algorithms and techniques for the reduction methodof rough set, and select the in

11、formation entropy as the method of reductionalgorithm.2. Introduce a new reduction method which is base on the discernibility matrix.3. The paper designs a new kind of classification algorithm which is base on roughset theory and support vector machine method. We preprocess the SVM inputIIItraining

12、data by applying feature selection which is conducted by attributereduction,so the dimension of the input vector is decrease. And using thereduction algorithm of this paper to remove redundant data and fixinconsistent information ,so the quality of hyperplane is optimized.4. After some brief research and experimentation were made on some of UCIdata sets, and the results demonstrate the effectiveness of this algorithm.Key words : Rough set; SVM; Reduction; Discernibility matrix; ClassificationIV摘要Abstract目录1 绪论 . 11.1 研究背景及意义. 11.2 粗糙集知识约简的研究现状. 21.3 支持向量机研究的现状.

展开阅读全文