ppt title

上传人:jiups****uk12 文档编号:45255670 上传时间:2018-06-15 格式:PPT 页数:48 大小:2.87MB
返回 下载 相关 举报
ppt title_第1页
第1页 / 共48页
ppt title_第2页
第2页 / 共48页
ppt title_第3页
第3页 / 共48页
ppt title_第4页
第4页 / 共48页
ppt title_第5页
第5页 / 共48页
点击查看更多>>
资源描述

《ppt title》由会员分享,可在线阅读,更多相关《ppt title(48页珍藏版)》请在金锄头文库上搜索。

1、使用SQL Server 2000 进行高级数据挖掘ZhaoHui Tang 唐朝晖 Program Manager SQL Server Analysis Services Microsoft 公司纲要lMicrosoft的数据挖掘算法lOLE DB for DM 数据挖掘查询l数据挖掘案例研究:点击流分析用户细分站点相关性分析有针对性的广告横幅lMicrosoft数据挖掘算法的性能l问与答SQL Server 2000的数据挖 掘算法决策树l面向分类、预测任务 的流行技术变动分析信贷风险分析l易于理解可以通过从节点到叶 子的任何一条途径形 成规则l迅速构建l基于叶子节点状态的 预测l变化形

2、式:C4.5、C5 、CART、Chaid进入大学: 55% 是 45% 否所有学生进入大学: 79% 是 21% 否IQ=高进入大学: 35% 是 65% 否IQ 低进入大学: 94% 是 6% 否家庭收入 = 高进入大学: 69% 是 31% 否家庭收入 = 低决策树的工作方式 IQ父母的鼓励家庭收入性别高中低有无高低男女进入大学是300500200700300400600500500否100100090040016004001600110090001002003004005006007008009001000IQ=HighIQ=MediumIQ=Low020040060080010001

3、200140016001800PI=HighPI=FALSE020040060080010001200140016001800PE=TRUEPE=FALSE020040060080010001200MaleFemale是是否否分割递归进入大学 33% 是 67% 否所有学生进入大学 63% 是 37% 否父母的鼓励 = 有进入大学 16% 是 84% 否父母的鼓励 = 无IQ父母的鼓励家庭收入性别高中低有无高低男女进入大学是2004001007000300400400250否502501004000100300250150Microsoft 决策树l概率分类树l分割方法:贝叶斯( Bayesi

4、an )分值和信 息熵( Entropy)l前向修剪l树的外形:二元树和N元树l可伸缩框架聚集算法(EM )l在用户细分、邮件列表、用户档案等方面使 用流行方法l算法分配一组初始点为每个点分配一个初始聚集使用概率为每个聚集分配数据点根据权重计算 设定一个新的中央点循环直到收敛EM 图示XXXMicrosoft的聚集算法(可伸缩 EM )数据填充缓冲区构建/更新 模型压缩数据 充分状态确定数据 是否被压缩停止?最终模型OLE DB for Data MiningOLE DB for DMl数据挖掘的行业标准l基于现有技术SQLOLE DBlDM常见概念的定义主体(Case)、嵌套主体(Neste

5、d Case)挖掘模型(Mining Model)模型创建(Model Creation)模型训练(Model Training)预测(Prediction) l基于API的语言用户表用户 ID职业收入性别风险1工程师85男否2工人40男是3医生90女否4教师50女否5工人45男否DM 查询语言Create Mining ModelCreate Mining Model CreditRiskCreditRisk( (CustomerID CustomerID long key,long key,Gender text discrete,Gender text discrete,Income l

6、ong continuous,Income long continuous,Profession text discrete,Profession text discrete,RiskRisktext discrete predict)text discrete predict)UsingUsing Microsoft_Decision_Trees Microsoft_Decision_TreesInsert intoInsert into CreditRisk CreditRisk ( (CustomerIdCustomerId, Gender, Income, , Gender, Inco

7、me, Profession, Risk)Profession, Risk)Select Select CustomerIDCustomerID, Gender, Income, , Gender, Income, Profession,RiskProfession,RiskFrom CustomersFrom CustomersSelectSelect NewCustomersNewCustomers. .CustomerIDCustomerID, , CreditRiskCreditRisk.Risk, .Risk, PredictProbabilityPredictProbability

8、( (CreditRiskCreditRisk) )FromFrom CreditRiskCreditRisk Prediction JoinPrediction Join NewCustomersNewCustomersOnOn CreditRiskCreditRisk.Gender=.Gender=NewCustomerNewCustomer.Gender .Gender And And CreditRiskCreditRisk.Income=.Income=NewCustomerNewCustomer.Income.IncomeAndAndCreditRiskCreditRisk.Pro

9、fession=.Profession=NewCustomerNewCustomer.Profession.ProfessionSchema Rowsets(架构行集 ) l提供元数据信息的表列数据lOLE DB for DM 中的Schema Rowsets列 表Mining_ServicesMining_Service_ParametersMining_ModelsMining_ColumnsMining_Model_ContentsModel_Content_PMML挖掘模型满足架构行集架构行集和瘦客户机浏览器案例研究:点击流分析架构 Customer CustomerGuidDayTi

10、meOnLineNightTimeOnLineBrowserTypeEmailTimeChatTimeGeoLocationWebClick CustomerGuidURLCategoryTimeDurationReferPageWeb用户细分Web Web 访问者细分访问者细分根据用户表对用户进行细分Create Mining ModelCreate Mining Model CustomerClusteringCustomerClustering( (CustomerID CustomerID text key,text key,DayTimeOnline DayTimeOnline lo

11、ng continuouslong continuousNightTimeOnline NightTimeOnline long continuous,long continuous,BrowserType BrowserType text discrete,text discrete,ChatTime ChatTime long continuous,long continuous,EmailTimeEmailTimelong continuous,long continuous,GeoLocationGeoLocationtext discretetext discrete) )Using

12、Using Microsoft_Clustering Microsoft_Clustering根据Customer和WebClick进 行细分Create Mining ModelCreate Mining Model CustomerClusteringCustomerClustering( (CustomerID CustomerID text key,text key,DayTimeOnline DayTimeOnline long continuous,long continuous,NightTimeOnline NightTimeOnline long continuous,lon

13、g continuous,BrowserType BrowserType text discrete,text discrete,ChatTime ChatTime long continuous,long continuous,EmailTimeEmailTimelong continuous,long continuous,GeoLocationGeoLocationtext discretetext discreteWebClickWebClicktable (table (UrlCategoryUrlCategory text key ) text key ) )UsingUsing

14、Microsoft_Clustering Microsoft_ClusteringMSFTies上的用户细分Web站点相关性分析使用 Microsoft决策树进行关 联分析InsuranceNo InsuranceLoanNo LoanBusinessLoanNo LoanStockNo StockInsuranceBusinessNo BusinessShoppingNo ShoppingStockStockInsuranceNo InsuranceLoanNo Stock使用 Microsoft决策树进行关 联分析InsuranceNo InsuranceLoanNo LoanBusine

15、ssLoanNo LoanStockNo StockInsuranceBusinessNo BusinessShoppingNo ShoppingStockStockInsuranceNo InsuranceLoanNo Stock站点相关性站点相关性站点相关性 Create Mining ModelCreate Mining Model SiteAffiliationSiteAffiliation( (CustomerID CustomerID text key,text key,WebClick WebClick table predict (table predict (UrlCateg

16、oryUrlCategory text key ) text key ) )UsingUsing Microsoft_Decision_Trees Microsoft_Decision_TreesInsert intoInsert into SiteAffiliationSiteAffiliation ( (CustomerIDCustomerID, ,WebClickWebClick (skip, (skip, UrlCategoryUrlCategory) ) OpenRowsetOpenRowset( (MSDataShapeMSDataShape, data , data provider=SQLOLEDB;Server=provider

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 行业资料 > 其它行业文档

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号