机器学习十大算法：EM

资源描述

《机器学习十大算法：EM》由会员分享，可在线阅读，更多相关《机器学习十大算法：EM（23页珍藏版）》请在金锄头文库上搜索。

1、Chapter 5 EM Geoffrey J McLachlan and Shu Kay Ng Contents 5 1Introduction 93 5 2Algorithm Description 95 5 3Software Implementation 96 5 4Illustrative Examples 97 5 4 1Example 5 1 Multivariate Normal Mixtures 97 5 4 2Example 5 2 Mixtures of Factor Analyzers 100 5 5Advanced Topics 103 5 6Exercises 10

2、5 References 113 AbstractThe expectation maximization EM algorithm is a broadly applicable approach to the iterative computation of maximum likelihood ML estimates useful in a variety of incomplete data problems In particular the EM algorithm simplifi es considerably the problem of fi tting fi nite

3、mixture models by ML where mixture models are used to model heterogeneity in cluster analysis and pattern recognition contexts The EM algorithm has a number of appealing properties including its numerical stability simplicity of implementation and reliable global convergence There are also extension

4、s of the EM algorithm to tackle complex problems in various data mining applications It is however highly desirable if its simplicity and stability can be preserved 5 1Introduction The expectation maximization EM algorithm has been of considerable interest in recentyearsinthedevelopmentofalgorithmsi

5、nvariousapplicationareassuchasdata mining machine learning and pattern recognition 20 27 28 The seminal paper of Dempster et al 8 on the EM algorithm greatly stimulated interest in the use of fi nite mixture distributions to model heterogeneous data This is because the fi tting of 93 2009 by Taylor

6、g i 1 ifi yj i j 1 n 5 1 where the component density fi yj i is specifi ed up to a vector iof unknown parameters i 1 g The vector of all the unknown parameters is given by 1 g 1 T1 Tg T where the superscript T denotes vector transpose The parameter vector can be estimated by ML The objective is to m

7、aximize the likelihood L or equivalently the log likelihood log L as a function of over the parameter space That is the ML estimate of is given by an appropriate root of the log likelihood equation log L 0 5 2 where log L n j 1 log f yj is the log likelihood function for formed under the assumption

8、of independent data y1 yn The aim of ML estimation 13 is to determine an estimate for each n so that it defi nes a sequence of roots of Equation 5 2 that is consistent and asymptotically effi cient Such a sequence is known to exist under suitable regularity conditions 7 Withprobabilitytendingtoone t

9、heserootscorrespondtolocalmaxima intheinterioroftheparameterspace Forestimationmodelsingeneral thelikelihood usually has a global maximum in the interior of the parameter space Then typically a sequenceofrootsofEquation 5 2 withthedesiredasymptoticpropertiesisprovided by taking for each n to be the

10、root that globally maximizes L in this case is the MLE 18 We shall henceforth refer to as the MLE even in situations where it may not globally maximize the likelihood Indeed in the example on mixture models to be presented in Section 5 4 1 the likelihood is unbounded However there may still exist un

11、der the usual regularity conditions a sequence of roots of Equation 5 2 with the properties of consistency effi ciency and asymptotic normality 16 2009 by Taylor k E k log Lc y where E k denotes expectation using the parameter vector k The M step up dates the estimate of by that value k 1 of that ma

12、ximizes the Q function Q k with respect to over the parameter space 18 The E and M steps are alternated repeatedly until the changes in the log likelihood values are less than some specifi ed threshold As mentioned in Section 5 1 the EM algorithm is numerically stable with each EM iteration increasi

13、ng the likelihood value as L k 1 L k ItcanbeshownthatboththeE andM stepswillhaveparticularlysimpleformswhen the complete data probability density function is from an exponential family 18 Often in practice the solution to the M step exists in closed form In those instances where it does not it may n

14、ot be feasible to attempt to fi nd the value of that globally maximizes the function Q k For such situations a generalized EM GEM algorithm 8 may be adopted for which the M step requires k 1 to be chosen such that k 1 increases the Q function Q k over its value at k That is Q k 1 k Q k k holds see 1

15、8 Some of the drawbacks of the EM algorithm are a it does not automatically pro duce an estimate of the covariance matrix of the parameter estimates This disadvan tage however can easily be removed by using appropriate methodology associated with the EM algorithm 18 b it is sometimes very slow to co

16、nverge and c in some problems the E or M steps may be analytically intractable We shall briefl y address the last two issues in Section 5 5 2009 by Taylor see below Starting values for EM algorithm With applications where the log likelihood equation has multiple roots corresponding to local maxima the EM algorithm should be applied from a wide choice of starting values in any search for all localmaxima Inthecontextoffi nitemixturemodels aninitialparametervalue canbeobtainedusingthek meanscluster

展开阅读全文