中科院机器学习题库-new.doc

资源描述

《中科院机器学习题库-new.doc》由会员分享，可在线阅读，更多相关《中科院机器学习题库-new.doc（44页珍藏版）》请在金锄头文库上搜索。

1、机器学习题库一、极大似然1、 ML estimation of exponential model (10)A Gaussian distribution is often used to model data on the real line, but is sometimes inappropriate when the data are often close to zero but constrained to be nonnegative. In such cases one can fit an exponential distribution, whose probabilit

2、y density function is given byGiven N observations xi drawn from such a distribution:(a) Write down the likelihood as a function of the scale parameter b.(b) Write down the derivative of the log likelihood.(c) Give a simple expression for the ML estimate for b.2、换成Poisson分布：3、二、贝叶斯假设在考试的多项选择中，考生知道正

3、确答案的概率为p，猜测答案的概率为1-p，并且假设考生知道正确答案答对题的概率为1，猜中正确答案的概率为，其中m为多选项的数目。那么已知考生答对题目，求他知道正确答案的概率。1、 Conjugate priorsThe readings for this week include discussion of conjugate priors. Given a likelihood for a class models with parameters , a conjugate prior is a distribution with hyperparameters , such that th

4、e posterior distribution与先验的分布族相同(a) Suppose that the likelihood is given by the exponential distribution with rate parameter :Show that the gamma distribution _is a conjugate prior for the exponential. Derive the parameter update given observations and the prediction distribution .(b) Show that the

5、 beta distribution is a conjugate prior for the geometric distributionwhich describes the number of time a coin is tossed until the first heads appears, when the probability of heads on each toss is . Derive the parameter update rule and prediction distribution.(c) Suppose is a conjugate prior for t

6、he likelihood ; show that the mixture prioris also conjugate for the same likelihood, assuming the mixture weights wm sum to 1. (d) Repeat part (c) for the case where the prior is a single distribution and the likelihood is a mixture, and the prior is conjugate for each mixture component of the like

7、lihood.some priors can be conjugate for several different likelihoods; for example, the beta is conjugate for the Bernoulliand the geometric distributions and the gamma is conjugate for the exponential and for the gamma with fixed (e) (Extra credit, 20) Explore the case where the likelihood is a mix

8、ture with fixed components and unknown weights; i.e., the weights are the parameters to be learned.三、判断题（1）给定n个数据点，如果其中一半用于训练，另一半用于测试，则训练误差和测试误差之间的差别会随着n的增加而减小。（2）极大似然估计是无偏估计且在所有的无偏估计中方差最小，所以极大似然估计的风险最小。（）回归函数A和B，如果A比B更简单，则A几乎一定会比B在测试集上表现更好。（）全局线性回归需要利用全部样本点来预测新输入的对应输出值，而局部线性回归只需利用查询点附近的样本来预测输出值。所以全

9、局线性回归比局部线性回归计算代价更高。（）Boosting和Bagging都是组合多个分类器投票的方法，二者都是根据单个分类器的正确率决定其权重。() In the boosting iterations, the training error of each new decision stump and the training error of the combined classifier vary roughly in concert （F）While the training error of the combined classifier typically decreases a

10、s a function of boosting iterations, the error of the individual decision stumps typically increases since the example weights become concentrated at the most difficult examples.() One advantage of Boosting is that it does not overfit. （F）() Support vector machines are resistant to outliers, i.e., v

11、ery noisy examples drawn from a different distribution. （）（9）在回归分析中，最佳子集选择可以做特征选择，当特征数目较多时计算量大；岭回归和Lasso模型计算量小，且Lasso也可以实现特征选择。（10）当训练数据较少时更容易发生过拟合。（11）梯度下降有时会陷于局部极小值，但EM算法不会。（12）在核回归中，最影响回归的过拟合性和欠拟合之间平衡的参数为核函数的宽度。(13) In the AdaBoost algorithm, the weights on all the misclassified points will go up

12、 by the same multiplicative factor. （T）(14) True/False: In a least-squares linear regression problem, adding an L2 regularization penalty cannot decrease the L2 error of the solution w on the training data. （F）(15) True/False: In a least-squares linear regression problem, adding an L2 regularization

13、 penalty always decreases the expected L2 error of the solution w on unseen test data （F）.(16)除了EM算法，梯度下降也可求混合高斯模型的参数。 (T)(20) Any decision boundary that we get from a generative model with class-conditional Gaussian distributions could in principle be reproduced with an SVM and a polynomial kernel.

14、True! In fact, since class-conditional Gaussians always yield quadratic decision boundaries, they can be reproduced with an SVM with kernel of degree less than or equal to two.(21) AdaBoost will eventually reach zero training error, regardless of the type of weak classifier it uses, provided enough

15、weak classifiers have been combined.False! If the data is not separable by a linear combination of the weak classifiers, AdaBoost cant achieve zero training error.(22) The L2 penalty in a ridge regression is equivalent to a Laplace prior on the weights. （F）(23) The log-likelihood of the data will al

16、ways increase through successive iterations of the expectation maximation algorithm. (F)(24) In training a logistic regression model by maximizing the likelihood of the labels given the inputs we have multiple locally optimal solutions. (F)一、回归1、考虑回归一个正则化回归问题。在下图中给出了惩罚函数为二次正则函数，当正则化参数C取不同值时，在训练集和测试集上的log似然（mea

展开阅读全文