机器学习算法介绍及相关参考文献

资源描述

《机器学习算法介绍及相关参考文献》由会员分享，可在线阅读，更多相关《机器学习算法介绍及相关参考文献（7页珍藏版）》请在金锄头文库上搜索。

1、Linear Discriminant AnalysisLinear Discriminant Analysis (LDA) is a linear inherently multi-class classification method. It was originally introduced by Fisher for two classes 9, but was later extended for multiple classes by Rao26. In particular, LDA computes a classification function g (x) = W Tx

2、, where W is selected as_| WtS W |the linear projection that maximizes the Fisher-criterion W = argmax, whereoptW | WTS W |WS and S are the within-class and the between class scatter matrices (see, e.g., 7). The WBcorresponding optimal solution for this optimization problem is given by the solution

3、of the generalized eigenproblem S=九 S w or directly by computing the eigenvectors for S J . Since the rank of S t S is bounded by the rank of S there are c 1 non-zero eigenvalues resulting in a w bb(c 1 )-dimensional subspace L = WtX e r(c_1)xn , which preserves the most discriminant information. Fo

4、r classification of a new sample x 丘诫m the class label e e 1,，c is assigned according to the result of a nearest neighbor classification. For that purpose, the Euclidean distances d of the projected sample g(x) and the class centers v Wt卩in the LDA space are compared:iie argmind(g(x),v ) .i1i cLoog

5、et al. 19 showed that for more than two classes maximizing the Fisher criterion in Eq. (7) provides only a suboptimal solution! In particular, optimizing the Fisher criterion provides an optimal solution with respect to the Bayes error for two classes, but this can not be generalized for multiple cl

6、asses. Nevertheless LDA can be applied for many practical multi-class problems. This was also confirmed by theoretical considerations by Martinez and Zhu 20. However, they showed that increasing the number of classes decreases the separability.9 R. A. Fisher. The use of multiple measurements in taxo

7、nomic problems. Annals of Eugenics, 7:179188, 1936. 26 C. R. Rao. The utilization of multiple measurements in problems of biological classification. Journal of the Royal Statistical Society Series B, 10(2):159203, 1948.7 R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification.John Wiley & So

8、ns, 2000.19 M. Loog, R. P. W. Duin, and R. Haeb-Umbach. Multiclass linear dimension reduction by weighted pairwise fisher criteria. IEEE Trans. PAMI, 23(7):762 766, 2001.20 A. M. Martinez and M. Zhu. Where are linear feature extraction methods applicable? IEEE Trans. PAMI, 27(12):1934 -1944, 2005.Mu

9、lti Kernel LearningRecently, Multiple Kernel Learning (MKL) 25, 16, 29 has become a quite popular method to combine data from multiple information sources. The main idea is to create a weighted linear combination of the kernels obtained from each information source. Moreover, in Rakotomamonjy etal.

10、25 it was shown that by using multiple kernels instead of one a more effective decision function can be obtained. In particular, the kernel K(x,X) can be considered a convex combination of Mbasis kernels K (x, x) : K(x, xr)=瓦 d K (x, xr) , where d 0 are the weights of the kernels j j j jj=1K and Y M

11、 d = 1. Thus, the decision function g (x) of an SVM with multiple kernels can be jj =1 jrepresented as g (x) = YNa y K (x , x) 一 bi i ii=1=YN a yYMd K (x , x) - b,i ij j ii=1j =1where x are the training samples and y G 一1,+1 are the corresponding class labels. Hence, iiwhen training an MKL model the

12、 goal is to learn both, the coefficients a and the weights d , in imparallel.25 A. Rakotomamonjy, F. R. Bach, S. Canu, and Y. Grandvalet. SimpleMKL. J. of Machine Learning Research, 9:24912521, 2008.16 G. R. G. Lanckriet, T. d. Bie, N. Cristianini, M. I. Jordan, and W. S. Noble. A statistical framew

13、ork for genomic data fusion. Bioinformatics, 20(16):26262635, 2004.29 S. Sonnenburg, B. S. Bernhard, P. Bennett, and E. ParradoHernandez. Large scale multiple kernel learning. J. of Machine Learning Research, 7, 2006.AdaBoost LearningAdaBoost 6 is a popular machine learning method combining properti

14、es of an efficient classifier and feature selection. The discrete version of AdaBoost defines a strong binary classifier Ht=1a h ( z)ttH(z)=sgn(Yah(z) H(z)=ttt=1using a weighted combination of T weak learners htwith weights at. At each new round t, AdaBoost selects a new hypothesis h that best class

15、ifies training samples with high classification error in the previous rounds. Each weak learner碣=11社如沁| -I Qiheiwisemay explore any feature f of the data 乙 In the context of visual object recognition it is attractive to define f in terms of local image properties over image regions r and then use Ad

16、aBoost for selecting features maximizing the classification performance. This idea was first explored by Viola and Jones Adaboost算法是1995年由Yoav Freund和Robert E. Schapire提出的，是目前最具实用价值的机器学习方法之一，它是一种迭代的学习方法，可以将一组弱学习算法提升为一个强学习算法。Adaboost算法本身是通过改变数据分布来实现的，它根据每次训练集中每个样本的分类是否正确，以及上次的总体分类准确率，来确定每个样本的权值，将每次训练得到的弱分类器最后融合起来，作为最后的决策分

展开阅读全文