支持向量机SVM智能科学课件

资源描述

《支持向量机SVM智能科学课件》由会员分享，可在线阅读，更多相关《支持向量机SVM智能科学课件（81页珍藏版）》请在金锄头文库上搜索。

1、2020/9/16,Chap8 SVM Zhongzhi Shi,1,知识发现（数据挖掘) 第三章,史忠植中国科学院计算技术研究所,支持向量机 Support Vector machine,2020/9/16,Chap8 SL Zhongzhi Shi,2,内容提要,统计学习方法概述统计学习问题学习过程的泛化能力支持向量机 SVM寻优算法应用,2020/9/16,Chap8 SL Zhongzhi Shi,3,统计学习方法概述,统计方法是从事物的外在数量上的表现去推断该事物可能的规律性。科学规律性的东西一般总是隐藏得比较深，最初总是从其数量表现上通过统计分析看出一些线索，然后提出一定

2、的假说或学说，作进一步深入的理论研究。当理论研究提出一定的结论时，往往还需要在实践中加以验证。就是说，观测一些自然现象或专门安排的实验所得资料，是否与理论相符、在多大的程度上相符、偏离可能是朝哪个方向等等问题，都需要用统计分析的方法处理。,2020/9/16,Chap8 SL Zhongzhi Shi,4,统计学习方法概述,近百年来，统计学得到极大的发展。我们可用下面的框架粗略地刻划统计学发展的过程： 1900-1920 数据描述 1920-1940 统计模型的曙光 1940-1960 数理统计时代随机模型假设的挑战松弛结构模型假设 1990-1999 建模复杂的数据结构,2020/9/

3、16,Chap8 SL Zhongzhi Shi,5,统计学习方法概述,从1960年至1980年间，统计学领域出现了一场革命，要从观测数据对依赖关系进行估计，只要知道未知依赖关系所属的函数集的某些一般的性质就足够了。引导这一革命的是60年代的四项发现： Tikhonov, Ivanov 和 Philips 发现的关于解决不适定问题的正则化原则； Parzen, Rosenblatt 和Chentsov 发现的非参数统计学； Vapnik 和Chervonenkis 发现的在泛函数空间的大数定律，以及它与学习过程的关系； Kolmogorov, Solomonoff 和Chaitin 发现的算法

4、复杂性及其与归纳推理的关系。这四项发现也成为人们对学习过程研究的重要基础。,2020/9/16,Chap8 SVM Zhongzhi Shi,6,统计学习方法概述,统计学习方法：传统方法: 统计学在解决机器学习问题中起着基础性的作用。传统的统计学所研究的主要是渐近理论，即当样本趋向于无穷多时的统计性质。统计方法主要考虑测试预想的假设和数据模型拟合。它依赖于显式的基本概率模型。模糊集粗糙集支持向量机,2020/9/16,Chap8 SVM Zhongzhi Shi,7,统计学习方法概述,统计方法处理过程可以分为三个阶段：（1）搜集数据：采样、实验设计（2）分析数据：建模、知识发现、

5、可视化（3）进行推理：预测、分类常见的统计方法有: 回归分析（多元回归、自回归等）判别分析（贝叶斯判别、费歇尔判别、非参数判别等）聚类分析（系统聚类、动态聚类等）探索性分析（主元分析法、相关分析法等）等。,2020/9/16,Chap8 SVM Zhongzhi Shi,8,支持向量机,SVM是一种基于统计学习理论的机器学习方法，它是由Boser,Guyon, Vapnik在COLT-92上首次提出，从此迅速发展起来 Vapnik V N. 1995. The Nature of Statistical Learning Theory. Springer-Verlag, New Yo

6、rk Vapnik V N. 1998. Statistical Learning Theory. Wiley-Interscience Publication, John Wiley) to use.,underfitting,overfitting,good fit,Problem of generalization: a small emprical risk Remp does not imply small true expected risk R.,2020/9/16,Chap8 SVM Zhongzhi Shi,17,学习理论的四个部分,1. 学习过程的一致性理论 What ar

7、e (necessary and sufficient) conditions for consistency (convergence of Remp to R) of a learning process based on the ERM Principle? 2.学习过程收敛速度的非渐近理论 How fast is the rate of convergence of a learning process? 3. 控制学习过程的泛化能力理论 How can one control the rate of convergence (the generalization ability) o

8、f a learning process? 4. 构造学习算法的理论 How can one construct algorithms that can control the generalization ability?,2020/9/16,Chap8 SVM Zhongzhi Shi,18,结构风险最小化归纳原则 (SRM),ERM is intended for relatively large samples (large l/h) Large l/h induces a small which decreases the the upper bound on risk Small

9、samples? Small empirical risk doesnt guarantee anything!we need to minimise both terms of the RHS of the risk bounds The empirical risk of the chosen An expression depending on the VC dimension of ,2020/9/16,Chap8 SVM Zhongzhi Shi,19,结构风险最小化归纳原则 (SRM),The Structural Risk Minimisation (SRM) Principle

10、 Let S = Q(z,),. An admissible structure S1S2SnS: For each k, the VC dimension hk of Sk is finite and h1h2hnhS Every Sk is either is non-negative bounded, or satisfies for some (p,k),2020/9/16,Chap8 SVM Zhongzhi Shi,20,The SRM Principle continued For given z1,zl and an admissible structure S1S2Sn S,

11、 SRM chooses function Q(z,lk) minimising Remp in Sk for which the guaranteed risk (risk upper-bound) is minimal Thus manages the unavoidable trade-off of quality of approximation vs. complexity of approximation,结构风险最小化归纳原则 (SRM),2020/9/16,Chap8 SVM Zhongzhi Shi,21,Sn,S*,经验风险Empirical risk,置信范围 Confi

12、dence interval,风险界限Bound on the risk,h1,h*,hn,h,S1,S*,Sn,结构风险最小化归纳原则 (SRM),2020/9/16,Chap8 SVM Zhongzhi Shi,22,支持向量机 SVM,SVMs are learning systems that use a hyperplane of linear functions in a high dimensional feature space Kernel function trained with a learning algorithm from optimization theory

13、Lagrange Implements a learning bias derived from statistical learning theory Generalisation SVM is a classifier derived from statistical learning theory by Vapnik and Chervonenkis,2020/9/16,Chap8 SVM Zhongzhi Shi,23,线性分类器,a,yest,2020/9/16,Chap8 SVM Zhongzhi Shi,24,线性分类器,f,x,a,yest,denotes +1 denotes

14、 -1,f(x,w,b) = sign(w. x - b),How would you classify this data?,2020/9/16,Chap8 SVM Zhongzhi Shi,25,线性分类器,f,x,a,yest,denotes +1 denotes -1,f(x,w,b) = sign(w. x - b),How would you classify this data?,Copyright 2001, 2003, Andrew W. Moore,2020/9/16,Chap8 SVM Zhongzhi Shi,26,线性分类器,f,x,a,yest,denotes +1

15、 denotes -1,f(x,w,b) = sign(w. x - b),How would you classify this data?,Copyright 2001, 2003, Andrew W. Moore,2020/9/16,Chap8 SVM Zhongzhi Shi,27,线性分类器,f,x,a,yest,denotes +1 denotes -1,f(x,w,b) = sign(w. x - b),How would you classify this data?,Copyright 2001, 2003, Andrew W. Moore,2020/9/16,Chap8 SVM Zhongzhi Shi,28,最大间隔,f,x,a,yest,denotes +1 denotes -1,f(x,w,b) = sign(w. x - b),The maximum margin linear classifier is the linear classifier with the maximum margin. This is the simplest kind of SVM (Called an LSVM),Linear SVM,Copyright 2001, 2003, Andrew W.

展开阅读全文