陈天奇论文演PPT课件

上传人:优*** 文档编号:149528407 上传时间:2020-10-27 格式:PPT 页数:41 大小:1.25MB
返回 下载 相关 举报
陈天奇论文演PPT课件_第1页
第1页 / 共41页
陈天奇论文演PPT课件_第2页
第2页 / 共41页
陈天奇论文演PPT课件_第3页
第3页 / 共41页
陈天奇论文演PPT课件_第4页
第4页 / 共41页
陈天奇论文演PPT课件_第5页
第5页 / 共41页
点击查看更多>>
资源描述

《陈天奇论文演PPT课件》由会员分享,可在线阅读,更多相关《陈天奇论文演PPT课件(41页珍藏版)》请在金锄头文库上搜索。

1、.,1,Introduction to Boosted Trees,Tianqi Chen Oct. 22 2014,.,2,Outline,Review of key concepts of supervised learning Regression Tree and Ensemble (What are we Learning) Gradient Boosting (How do we Learn) Summary,.,3,Elements in Supervised Learning,Notations:i-th training example Model: how to make

2、predictiongiven,(include linear/logistic regression) can have different interpretations,Linear model: The prediction score depending on the task Linear regression: Logistic regression:,is the predicted score is predicted the probability,of the instance being positive Others for example in rankingcan

3、 be the rank score Parameters: the things we need to learn from data Linear model:,.,4,Elements continued: Objective Function,Objective function that is everywhere,Loss on training data: Square loss: Logistic loss: Regularization: how complicated the model is? L2 norm: L1 norm (lasso):,Training Loss

4、 measures how well model fit on training data,Regularization, measures complexity of model,.,5,Putting known knowledge into context,Ridge regression: Linear model, square loss, L2 regularization Lasso: Linear model, square loss, L1 regularization Logistic regression: Linear model, logistic loss, L2

5、regularization The conceptual separation between model, parameter, objective also gives you engineering benefits. Think of how you can implement SGD for both ridge regression and logistic regression,.,6,Objective and Bias Variance Trade-off,Why do we want to contain two component in the objective? O

6、ptimizing training loss encourages predictive models Fitting well in training data at least get you close to training data which is hopefully close to the underlying distribution Optimizing regularization encourages simple models Simpler models tends to have smaller variance in future predictions, m

7、aking prediction stable,Training Loss measures how well model fit on training data,Regularization, measures complexity of model,.,7,Outline,Review of key concepts of supervised learning Regression Tree and Ensemble (What are we Learning) Gradient Boosting (How do we Learn) Summary,.,8,Regression Tre

8、e (CART),regression tree (also known as classification and regression tree): Decision rules same as in decision tree Contains one score in each leaf value,Input: age, gender, occupation, ,age 15,is male?,+2,-1,+0.1,Y,N,Y,N,Does the person like computer games,prediction score in each leaf,.,9,Regress

9、ion Tree Ensemble,age 15,is male?,+2,-1,+0.1,Y,N,Y,N,Y,N,+0.9,-0.9,tree1,tree2 Use Computer Daily,f() = 2 + 0.9= 2.9f()= -1 - 0.9= -1.9 Prediction of is sum of scores predicted by each of the tree,.,10,Tree Ensemble methods,Very widely used, look for GBM, random forest Almost half of data mining com

10、petition are won by using some variants of tree ensemble methods Invariant to scaling of inputs, so you do not need to do careful features normalization. Learn higher order interaction between features. Can be scalable, and are used in Industry,.,11,Put into context: Model and Parameters,Model: assu

11、ming we have K trees,Think: regression tree is a function that maps the attributes to the score Parameters Including structure of each tree, and the score in the leaf Or simply use function as parameters,Instead learning weights in, we are learning functions(trees),Space of functions containing all

12、Regression trees,.,12,Learning a tree on single variable,How can we learn functions? Define objective (loss, regularization), and optimize it! Example: Consider regression tree on single input t (time) I want to predict whether I like romantic music at time t,t 2011/03/01,Y,N,t 2010/03/20 Y,0.2,Equi

13、valently,The model is regression tree that splits on time,N 1.2,1.0,Piecewise step function over time,.,13,Learning a step function,Things we need to learn,Objective for single variable regression tree(step functions) Training Loss: How will the function fit on the points? Regularization: How do we

14、define complexity of the function? Number of splitting points, l2 norm of the height in each segment?,Splitting Positions The Height in each segment,.,14,Learning step function (visually),.,15,Coming back: Objective for Tree Ensemble,Model: assuming we have K trees Objective,Possible ways to define?

15、 Number of nodes in the tree, depth L2 norm of the leaf weights detailed later,Training loss,Complexity of the Trees,.,16,Objective vs Heuristic,When you talk about (decision) trees, it is usually heuristics Split by information gain Prune the tree Maximum depth Smooth the leaf values Most heuristic

16、s maps well to objectives, taking the formal (objective) view let us know what we are learning Information gain - training loss Pruning - regularization defined by #nodes Max depth - constraint on the function space Smoothing leaf values - L2 regularization on leaf weights,.,17,Regression Tree is not just for regression!,Regression tree ensemble defines how you make the prediction scor

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 高等教育 > 专业基础教材

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号