数据挖掘导论英文chap4_basic_classification讲述

上传人:最**** 文档编号:117174684 上传时间:2019-11-18 格式:PPT 页数:101 大小:1.74MB
返回 下载 相关 举报
数据挖掘导论英文chap4_basic_classification讲述_第1页
第1页 / 共101页
数据挖掘导论英文chap4_basic_classification讲述_第2页
第2页 / 共101页
数据挖掘导论英文chap4_basic_classification讲述_第3页
第3页 / 共101页
数据挖掘导论英文chap4_basic_classification讲述_第4页
第4页 / 共101页
数据挖掘导论英文chap4_basic_classification讲述_第5页
第5页 / 共101页
点击查看更多>>
资源描述

《数据挖掘导论英文chap4_basic_classification讲述》由会员分享,可在线阅读,更多相关《数据挖掘导论英文chap4_basic_classification讲述(101页珍藏版)》请在金锄头文库上搜索。

1、Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2 Classification:

2、Definition lGiven a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. lFind a model for class attribute as a function of the values of other attributes. lGoal: previously unseen records should be assigned a class as accurately as poss

3、ible. A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3 Illustrating Classification Ta

4、sk Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4 Examples of Classification Task lPredicting tumor cells as benign or malignant lClassifying credit card transactions as legitimate or fraudulent lClassifying secondary structures of protein as alpha-helix, beta-sheet, or random coil lCa

5、tegorizing news stories as finance, weather, entertainment, sports, etc Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5 Classification Techniques lDecision Tree based Methods lRule-based Methods lMemory based reasoning lNeural Networks lNave Bayes and Bayesian Belief Networks lSupport V

6、ector Machines Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6 Example of a Decision Tree categorical categorical continuous class Refund MarSt TaxInc YESNO NO NO YesNo Married Single, Divorced 80K Splitting Attributes Training Data Model: Decision Tree Tan,Steinbach, Kumar Introduction

7、 to Data Mining 4/18/2004 7 Another Example of Decision Tree categorical categorical continuous class MarSt Refund TaxInc YESNO NO NO Yes No Married Single, Divorced 80K There could be more than one tree that fits the same data! Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8 Decision T

8、ree Classification Task Decision Tree Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9 Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO YesNo Married Single, Divorced 80K Test Data Start from the root of tree. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 10 Apply Mo

9、del to Test Data Refund MarSt TaxInc YESNO NO NO YesNo Married Single, Divorced 80K Test Data Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 11 Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO YesNo Married Single, Divorced 80K Test Data Tan,Steinbach, Kumar Introduction to Data

10、Mining 4/18/2004 12 Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO YesNo Married Single, Divorced 80K Test Data Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 13 Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO YesNo Married Single, Divorced 80K Test Data Tan,Steinbach,

11、 Kumar Introduction to Data Mining 4/18/2004 14 Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO YesNo Married Single, Divorced 80K Test Data Assign Cheat to “No” Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 15 Decision Tree Classification Task Decision Tree Tan,Steinbach, Kuma

12、r Introduction to Data Mining 4/18/2004 16 Decision Tree Induction lMany Algorithms: Hunts Algorithm (one of the earliest) CART ID3, C4.5 SLIQ,SPRINT Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 17 General Structure of Hunts Algorithm lLet Dt be the set of training records that reach a

13、 node t lGeneral Procedure: If Dt contains records that belong the same class yt, then t is a leaf node labeled as yt If Dt is an empty set, then t is a leaf node labeled by the default class, yd If Dt contains records that belong to more than one class, use an attribute test to split the data into

14、smaller subsets. Recursively apply the procedure to each subset. Dt ? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 18 Hunts Algorithm Dont Cheat Refund Dont Cheat Dont Cheat YesNo Refund Dont Cheat YesNo Marital Status Dont Cheat Cheat Single, Divorced Married Taxable Income Dont Cheat

15、 = 80K Refund Dont Cheat YesNo Marital Status Dont Cheat Cheat Single, Divorced Married Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 19 Tree Induction lGreedy strategy. Split the records based on an attribute test that optimizes certain criterion. lIssues Determine how to split the records uHow to specify the attribute test condition? uHow to determine the best split? Determine when to stop splitting Tan,Steinbach, Kumar

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 高等教育 > 大学课件

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号