斯坦福大学机器学习所有问题及答案合集

上传人:小** 文档编号:93029768 上传时间:2019-07-15 格式:PDF 页数:97 大小:1.90MB
返回 下载 相关 举报
斯坦福大学机器学习所有问题及答案合集_第1页
第1页 / 共97页
斯坦福大学机器学习所有问题及答案合集_第2页
第2页 / 共97页
斯坦福大学机器学习所有问题及答案合集_第3页
第3页 / 共97页
斯坦福大学机器学习所有问题及答案合集_第4页
第4页 / 共97页
斯坦福大学机器学习所有问题及答案合集_第5页
第5页 / 共97页
点击查看更多>>
资源描述

《斯坦福大学机器学习所有问题及答案合集》由会员分享,可在线阅读,更多相关《斯坦福大学机器学习所有问题及答案合集(97页珍藏版)》请在金锄头文库上搜索。

1、 CS 229 机器学习 (问题及答案) 斯坦福大学 目录 (1) 作业1(Supervised Learning) 1 (2) 作业1解答(Supervised Learning) 5 (3) 作业2(Kernels, SVMs, and Theory) 15 (4) 作业2解答(Kernels, SVMs, and Theory) 19 (5) 作业3(Learning Theory and Unsupervised Learning) 27 (6) 作业3解答(Learning Theory and Unsupervised Learning) 31 (7) 作业4(Unsupervis

2、ed Learning and Reinforcement Learning) 39 (8) 作业4解答(Unsupervised Learning and Reinforcement Learning) 44 (9) Problem Set #1: Supervised Learning 56 (10) Problem Set #1 Answer 62 (11) Problem Set #2: Problem Set #2: Naive Bayes, SVMs, and Theory 78 (12) Problem Set #2 Answer 85 CS229 Problem Set #11

3、 CS 229, Public Course Problem Set #1:Supervised Learning 1. Newtons method for computing least squares In this problem, we will prove that if we use Newtons method solve the least squares optimization problem, then we only need one iteration to converge to . (a) Find the Hessian of the cost functio

4、n J() = 1 2 Pm i=1( Tx(i) y(i)2. (b) Show that the fi rst iteration of Newtons method gives us = (XTX)1XT y, the solution to our least squares problem. 2. Locally-weighted logistic regression In this problem you will implement a locally-weighted version of logistic regression, where we weight diff e

5、rent training examples diff erently according to the query point. The locally- weighted logistic regression problem is to maximize () = 2 T + m X i=1 w(i) h y(i)logh(x(i) + (1 y(i)log(1 h(x(i) i . The 2 T here is what is known as a regularization parameter, which will be discussed in a future lectur

6、e, but which we include here because it is needed for Newtons method to perform well on this task. For the entirety of this problem you can use the value = 0.0001. Using this defi nition, the gradient of () is given by () = XTz where z Rm is defi ned by zi= w(i)(y(i) h(x(i) and the Hessian is given

7、by H = XTDX I where D Rmmis a diagonal matrix with Dii= w(i)h(x(i)(1 h(x(i) For the sake of this problem you can just use the above formulas, but you should try to derive these results for yourself as well. Given a query point x, we choose compute the weights w(i)= exp ? |x x (i)|2 22 ? . Much like

8、the locally weighted linear regression that was discussed in class, this weighting scheme gives more when the “nearby” points when predicting the class of a new example. 1 CS229 Problem Set #12 (a) Implement the Newton-Raphson algorithm for optimizing () for a new query point x, and use this to pred

9、ict the class of x. The q2/ directory contains data and code for this problem. You should implement the y = lwlr(X train, y train, x, tau) function in the lwlr.m fi le. This func- tion takes as input the training set (the X train and y train matrices, in the form described in the class notes), a new

10、 query point x and the weight bandwitdh tau. Given this input the function should 1) compute weights w(i)for each training exam- ple, using the formula above, 2) maximize () using Newtons method, and fi nally 3) output y = 1h(x) 0.5 as the prediction. We provide two additional functions that might h

11、elp. The X train, y train = load data; function will load the matrices from fi les in the data/ folder. The func- tion plot lwlr(X train, y train, tau, resolution) will plot the resulting clas- sifi er (assuming you have properly implemented lwlr.m). This function evaluates the locally weighted logi

12、stic regression classifi er over a large grid of points and plots the resulting prediction as blue (predicting y = 0) or red (predicting y = 1). Depending on how fast your lwlr function is, creating the plot might take some time, so we recommend debugging your code with resolution = 50; and later in

13、crease it to at least 200 to get a better idea of the decision boundary. (b) Evaluate the system with a variety of diff erent bandwidth parameters . In particular, try = 0.01,0.050.1,0.51.0,5.0. How does the classifi cation boundary change when varying this parameter? Can you predict what the decisi

14、on boundary of ordinary (unweighted) logistic regression would look like? 3. Multivariate least squares So far in class, we have only considered cases where our target variable y is a scalar value. Suppose that instead of trying to predict a single output, we have a training set with multiple output

15、s for each example: (x(i),y(i), i = 1,.,m, x(i) Rn, y(i) Rp. Thus for each training example, y(i)is vector-valued, with p entries. We wish to use a linear model to predict the outputs, as in least squares, by specifying the parameter matrix in y = Tx, where Rnp. (a) The cost function for this case is J() = 1 2 m X i=1 p X j=1 ? (Tx(i)j y(i) j ?2 . Write J() in matrix-vector notation (i.e., without using any summations). Hint: Start with the m n design matrix X = (x(1)T (x(2)T . . . (x(m)T 2 CS229 Pr

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 商业/管理/HR > 管理学资料

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号