大数据数据挖掘培训讲义3:概念属性和实例

上传人:我*** 文档编号:134894305 上传时间:2020-06-09 格式:PPT 页数:36 大小:237.50KB
返回 下载 相关 举报
大数据数据挖掘培训讲义3:概念属性和实例_第1页
第1页 / 共36页
大数据数据挖掘培训讲义3:概念属性和实例_第2页
第2页 / 共36页
大数据数据挖掘培训讲义3:概念属性和实例_第3页
第3页 / 共36页
大数据数据挖掘培训讲义3:概念属性和实例_第4页
第4页 / 共36页
大数据数据挖掘培训讲义3:概念属性和实例_第5页
第5页 / 共36页
点击查看更多>>
资源描述

《大数据数据挖掘培训讲义3:概念属性和实例》由会员分享,可在线阅读,更多相关《大数据数据挖掘培训讲义3:概念属性和实例(36页珍藏版)》请在金锄头文库上搜索。

1、Input Concepts Attributes Instances 2 ModuleOutline TerminologyWhat saconcept Classification association clustering numericpredictionWhat sinanexample Relations flatfiles recursionWhat sinanattribute Nominal ordinal interval ratioPreparingtheinputARFF attributes missingvalues gettingtoknowdata witte

2、n eibe 3 Terminology Componentsoftheinput Concepts kindsofthingsthatcanbelearnedAim intelligibleandoperationalconceptdescriptionInstances theindividual independentexamplesofaconceptNote morecomplicatedformsofinputarepossibleAttributes measuringaspectsofaninstanceWewillfocusonnominalandnumericones wi

3、tten eibe 4 What saconcept DataMiningTasks Stylesoflearning Classificationlearning predictingadiscreteclassAssociationlearning detectingassociationsbetweenfeaturesClustering groupingsimilarinstancesintoclustersNumericprediction predictinganumericquantityConcept thingtobelearnedConceptdescription out

4、putoflearningscheme witten eibe 5 Classificationlearning Exampleproblems attritionprediction usingDNAdatafordiagnosis weatherdatatopredictplay notplayClassificationlearningissupervisedSchemeisbeingprovidedwithactualoutcomeOutcomeiscalledtheclassoftheexampleSuccesscanbemeasuredonfreshdataforwhichclas

5、slabelsareknown testdata Inpracticesuccessisoftenmeasuredsubjectively 6 Associationlearning Examples supermarketbasketanalysis whatitemsareboughttogether e g milk cereal chips salsa Canbeappliedifnoclassisspecifiedandanykindofstructureisconsidered interesting Differencewithclassificationlearning Can

6、predictanyattribute svalue notjusttheclass andmorethanoneattribute svalueatatimeHence farmoreassociationrulesthanclassificationrulesThus constraintsarenecessaryMinimumcoverageandminimumaccuracy 7 Clustering Examples customergroupingFindinggroupsofitemsthataresimilarClusteringisunsupervisedTheclassof

7、anexampleisnotknownSuccessoftenmeasuredsubjectively witten eibe 8 Numericprediction Classificationlearning but class isnumericLearningissupervisedSchemeisbeingprovidedwithtargetvalueMeasuresuccessontestdata witten eibe 9 What sinanexample Instance specifictypeofexampleThingtobeclassified associated

8、orclusteredIndividual independentexampleoftargetconceptCharacterizedbyapredeterminedsetofattributesInputtolearningscheme setofinstances datasetRepresentedasasinglerelation flatfileRatherrestrictedformofinputNorelationshipsbetweenobjectsMostcommonforminpracticaldatamining witten eibe 10 Afamilytree P

9、eterM PeggyF StevenM GrahamM PamF GraceF RayM IanM PippaF BrianM AnnaF NikkiF witten eibe 11 Familytreerepresentedasatable witten eibe 12 The sister of relation Closed worldassumption witten eibe 13 Afullrepresentationinonetable witten eibe 14 Generatingaflatfile Processofflatteningafileiscalled den

10、ormalization SeveralrelationsarejoinedtogethertomakeonePossiblewithanyfinitesetoffiniterelationsProblematic relationshipswithoutpre specifiednumberofobjectsExample conceptofnuclear familyDenormalizationmayproducespuriousregularitiesthatreflectstructureofdatabaseExample supplier predicts supplieraddr

11、ess witten eibe 15 The ancestor of relation witten eibe 16 Recursion Appropriatetechniquesareknownas inductivelogicprogramming e g Quinlan sFOIL Problems a noiseand b computationalcomplexity Infiniterelationsrequirerecursion witten eibe 17 Multi instanceproblems Eachexampleconsistsofseveralinstances

12、E g predictingdrugactivityExamplesaremoleculesthatareactive notactiveInstancesareconfirmationsofamoleculeMoleculeactive examplepositive catleastoneofitsconfirmations instances isactive positive Moleculenotactive examplenegative callofitsconfirmations instances arenotactive negative Problem identifyi

13、ngthe truly positiveinstances witten eibe 18 What sinanattribute Eachinstanceisdescribedbyafixedpredefinedsetoffeatures its attributes But numberofattributesmayvaryinpracticePossiblesolution irrelevantvalue flagRelatedproblem existenceofanattributemaydependofvalueofanotheronePossibleattributetypes l

14、evelsofmeasurement Nominal ordinal intervalandratio witten eibe 19 Nominalquantities ValuesaredistinctsymbolsValuesthemselvesserveonlyaslabelsornamesNominalcomesfromtheLatinwordfornameExample attribute outlook fromweatherdataValues sunny overcast and rainy Norelationisimpliedamongnominalvalues noord

15、eringordistancemeasure Onlyequalitytestscanbeperformed witten eibe 20 Ordinalquantities ImposeorderonvaluesBut nodistancebetweenvaluesdefinedExample attribute temperature inweatherdataValues hot mild cool Note additionandsubtractiondon tmakesenseExamplerule temperature hotcplay yesDistinctionbetween

16、nominalandordinalnotalwaysclear e g attribute outlook witten eibe 21 Intervalquantities Numeric IntervalquantitiesarenotonlyorderedbutmeasuredinfixedandequalunitsExample1 attribute temperature expressedindegreesFahrenheitExample2 attribute year DifferenceoftwovaluesmakessenseSumorproductdoesn tmakesenseZeropointisnotdefined witten eibe 22 Ratioquantities RatioquantitiesareonesforwhichthemeasurementschemedefinesazeropointExample attribute distance DistancebetweenanobjectanditselfiszeroRatioquanti

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 办公文档 > PPT模板库 > PPT素材/模板

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号