机器学习课件教学作者2 cs229 notes9

资源描述

《机器学习课件教学作者2 cs229 notes9》由会员分享，可在线阅读，更多相关《机器学习课件教学作者2 cs229 notes9（9页珍藏版）》请在金锄头文库上搜索。

1、CS229 Lecture notes Andrew Ng Part X Factor analysis When we have data x i Rnthat comes from a mixture of several Gaussians the EM algorithm can be applied to fi t a mixture model In this setting we usually imagine problems were the we have suffi cient data to be able to discern the multiple Gaussia

2、n structure in the data For instance this would be the case if our training set size m was signifi cantly larger than the dimension n of the data Now consider a setting in which n m In such a problem it might be diffi cult to model the data even with a single Gaussian much less a mixture of Gaussian

3、 Specifi cally since the m data points span only a low dimensional subspace of Rn if we model the data as Gaussian and estimate the mean and covariance using the usual maximum likelihood estimators 1 m m X i 1 x i 1 m m X i 1 x i x i T we would fi nd that the matrix is singular This means that 1does

4、 not exist and 1 1 2 1 0 But both of these terms are needed in computing the usual density of a multivariate Gaussian distribution Another way of stating this diffi culty is that maximum likelihood estimates of the parameters result in a Gaussian that places all of its probability in the affi ne spa

5、ce spanned by the data 1and this corresponds to a singular covariance matrix 1This is the set of points x satisfying x Pm i 1 ix i for some i s so that Pm i 1 1 1 1 2 More generally unless m exceeds n by some reasonable amount the max imum likelihood estimates of the mean and covariance may be quite

6、 poor Nonetheless we would still like to be able to fi t a reasonable Gaussian model to the data and perhaps capture some interesting covariance structure in the data How can we do this In the next section we begin by reviewing two possible restrictions on ones that allow us to fi t with small amoun

7、ts of data but neither of which will give a satisfactory solution to our problem We next discuss some properties of Gaussians that will be needed later specifi cally how to fi nd marginal and conditonal distributions of Gaussians Finally we present the factor analysis model and EM for it 1Restrictio

8、ns of If we do not have suffi cient data to fi t a full covariance matrix we may place some restrictions on the space of matrices that we will consider For instance we may choose to fi t a covariance matrix that is diagonal In this setting the reader may easily verify that the maximum likelihood est

9、imate of the covariance matrix is given by the diagonal matrix satisfying jj 1 m m X i 1 x i j j 2 Thus jjis just the empirical estimate of the variance of the j th coordinate of the data Recall that the contours of a Gaussian density are ellipses A diagonal corresponds to a Gaussian where the major

10、 axes of these ellipses are axis aligned Sometimes we may place a further restriction on the covariance matrix that not only must it be diagonal but its diagonal entries must all be equal In this setting we have 2I where 2is the parameter under our control The maximum likelihood estimate of 2can be

11、found to be 2 1 mn n X j 1 m X i 1 x i j j 2 This model corresponds to using Gaussians whose densities have contours that are circles in 2 dimesions or spheres hyperspheres in higher dimen sions 3 If we were fi tting a full unconstrained covariance matrix to data it was necessary that m n 1 in order

12、 for the maximum likelihood estimate of not to be singular Under either of the two restrictions above we may obtain non singular when m 2 However restricting to be diagonal also means modeling the diff erent coordinates xi xjof the data as being uncorrelated and independent Often it would be nice to

13、 be able to capture some interesting correlation structure in the data If we were to use either of the restrictions on described above we would therefore fail to do so In this set of notes we will describe the factor analysis model which uses more parameters than the diagonal and captures some corre

14、lations in the data but also without having to fi t a full covariance matrix 2Marginals and conditionals of Gaussians Before describing factor analysis we digress to talk about how to fi nd condi tional and marginal distributions of random variables with a joint multivari ate Gaussian distribution S

15、uppose we have a vector valued random variable x x1 x2 where x1 Rr x2 Rs and x Rr s Suppose x N where 1 2 11 12 21 22 Here 1 Rr 2 Rs 11 Rr r 12 Rr s and so on Note that since covariance matrices are symmetric 12 T 21 Under our assumptions x1and x2are jointly multivariate Gaussian What is the margina

16、l distribution of x1 It is not hard to see that E x1 1 and that Cov x1 E x1 1 x1 1 11 To see that the latter is true note that by defi nition of the joint covariance of x1and x2 we have 4 that Cov x 11 12 21 22 E x x T E x1 1 x2 2 x1 1 x2 2 T E x1 1 x1 1 T x1 1 x2 2 T x2 2 x1 1 T x2 2 x2 2 T Matching the upper left subblocks in the matrices in the second and the last lines above gives the result Since marginal distributions of Gaussians are themselves Gaussian we therefore have that the marginal

展开阅读全文