统计学习[the elements of statistical learning]第六章习题

上传人:wt****50 文档编号:44607079 上传时间:2018-06-14 格式:PDF 页数:3 大小:149.66KB
返回 下载 相关 举报
统计学习[the elements of statistical learning]第六章习题_第1页
第1页 / 共3页
统计学习[the elements of statistical learning]第六章习题_第2页
第2页 / 共3页
统计学习[the elements of statistical learning]第六章习题_第3页
第3页 / 共3页
亲,该文档总共3页,全部预览完了,如果喜欢就下载吧!
资源描述

《统计学习[the elements of statistical learning]第六章习题》由会员分享,可在线阅读,更多相关《统计学习[the elements of statistical learning]第六章习题(3页珍藏版)》请在金锄头文库上搜索。

1、The Element of Statistical Learning Chapter 6oxstarSJTUJanuary 6, 2011Ex. 6.2 Show thatPNi=1(xi x0)li(x0) = 0 for local linear regression. Define bj(x0) =PNi=1(xix0)jli(x0).Show that b0(x0) = 1 for local polynomial regression of any degree (including local constants). Show that bj(x0) = 0 for all j

2、1,2,.,k for local polynomial regression of degree k. What are the implications of this on the bias?Proof By the definition of vector-valued function, b(x)T= (1,x) and B = 1,x, so we haveb(x0)T= b(x0)T(BTW(x0)B)1BTW(x0)B(1,x0) = b(x0)T(BTW(x0)B)1BTW(x0)1,x0 ?1 x0=b(x0)T(BTW(x0)B)1BTW(x0)1 =PNi=1li(x0

3、) b(x0)T(BTW(x0)B)1BTW(x0)x0=PNi=1li(x0)xi(1)NXi=1(xi x0)li(x0) =NXi=1li(x0)xi x0NXi=1li(x0)= x0 x0 1 = 0?From (1), we haveb0(x0) =NXi=1(xi x0)0li(x0) =NXi=1li(x0) = 1?When j 1,2,.,k, we have vector-valued function b(x)T= (1,x,x2,.,xk) and B = 1,x,x2,.,xk.From (1), we similarly havexj0=NXi=1li(x0)xj

4、i(2)Expanding (xjx0)jwithout combing of similar terms, each term can be written as (1)bxaixb0, where a+b = j. Obviously, the number of positive terms equals with the number of negative terms, i.e.P2jn=1(1)bn= 0. So each term of bj(x0) can be written asNXi=1(1)bxaixb 0li(x0) = (1)bxb 0NXi=1li(x0)xai=

5、 (1)bxb0xa 0= (1)bxj 0/ (2)bj(x0) =NXi=1(xi x0)jli(x0)=NXi=1?2jXn=1(1)bnxj0? li(x0) = 0?1Hence we have the biasEf(x0) f(x0) =NXi=1li(x0)f(xi) f(x0)=? f(x0)NXi=1li(x0) f(x0)? + f0(x0)NXi=1(xi x0)li(x0)+ k2f00(x0)NXi=1(xi x0)2li(x0) + . + kj+1f(j+1)(x0)NXi=1(xi x0)j+1li(x0) + .= kj+1f(j+1)(x0)NXi=1(xi

6、 x0)j+1li(x0) + .where knis the coefficients of series expansion terms. Now we see that the bias depends only on (j +1)th-degree and higher order terms in the expan- sion of f.?Ex. 6.3 Show that |l(x)| (Section 6.1.2) increases with the degree of the local polynomial.Proof Preliminary: B is a N 2 re

7、gression matrix, so B is not invertible while BBTis invert- ible.Fromf(xj) = b(xj)T(BTW(xj)B)1BTW(xj)y=NXi=1li(xj)yi= l(xj)Tywe havel(xj)T= b(xj)T(BTW(xj)B)1BTW(xj)l(xj) = W(xj)B(BTW(xj)B)1b(xj)|l(xj)|2= l(xj)Tl(xj) = b(xj)T(BTW(xj)B)1BTW(xj)W(xj)B(BTW(xj)B)1b(xj)= b(xj)T(BTW(xj)B)1BTW(xj)BBT(BBT)1(

8、BBT)1BBTW(xj)B(BTW(xj)B)1b(xj)= b(xj)TBT(BBT)1(BBT)1Bb(xj)|l(x)|2=d+1Xj=1|l(xj)|2=d+1Xj=1b(xj)TBT(BBT)1(BBT)1Bb(xj)= trace(BBT(BBT)1(BBT)1BBT)/ Prof. Zhang has proved it= trace(Id+1) = d + 1Hence |l(x)| =d + 1 which increases with the degree of the local polynomial.Ex.6.4 Suppose that the p predicto

9、rs X arise from sampling relatively smooth analog curves at p uniformly spaced abscissa values. Denote by Cov(X|Y ) = the conditional covariance matrix of the predictors, and assume this does not change much with Y . Discuss the nature of Mahalanobis choice A = 1for the metric in (6.14). How does th

10、is compare with A = I? How might you construct a kernel A that (a) downweights high-frequency components in the distance metric; (b) ignores them completely?2Answer D =p(x x 0)T1(x x0) is called the Mahalanobis distance of the point x to x0. It takes the correlations of the data set into account. If

11、 the predictors are highly correlated, Maha- lanobis distance is much accurate than Euclidian distance. If A = I, then d =p(x x 0)T(x x0) equals to Euclidian distance of the point x to x0. Prior to smoothing, we should standardize each predictor, for examplex0i=xi E(xi)pVar(x i)When comparing it wit

12、h 1and using the standard predictors, we haveCov(x0i,x0j) = E(x0i E(x0i)(x0j E(x0j) = E(x0ix0j)= E? (xi E(xi)pVar(x i)(xj E(xj)pVar(x j)?=E(xi E(xi)(xj E(xj)pVar(x i)pVar(x j)=Cov(xi,xj)pVar(x i)pVar(x j)= (xi,xj)Then the covariance matrix will change to its standardized version (correlation matrix)

13、. Hence A = I means i 6= j,(xi,xj) = 0, i.e., all dimensions of x are not correlated. (a) If we want to construct a kernel A that downweights high-frequency components (xis) in the distance metric, we can just decrease Cov(xi,xj) or (xi,xj) in order to suppress their influence. (b) If we want to construct a kernel A that ignores them completely, we can just set Cov(xi,xj) or (xi,xj) as 0.3

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 生活休闲 > 社会民生

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号