数据采集和营销工具英文版

上传人:汽*** 文档编号:571436219 上传时间:2024-08-10 格式:PPT 页数:54 大小:619KB
返回 下载 相关 举报
数据采集和营销工具英文版_第1页
第1页 / 共54页
数据采集和营销工具英文版_第2页
第2页 / 共54页
数据采集和营销工具英文版_第3页
第3页 / 共54页
数据采集和营销工具英文版_第4页
第4页 / 共54页
数据采集和营销工具英文版_第5页
第5页 / 共54页
点击查看更多>>
资源描述

《数据采集和营销工具英文版》由会员分享,可在线阅读,更多相关《数据采集和营销工具英文版(54页珍藏版)》请在金锄头文库上搜索。

1、Knowledge discovery & data mining Tools, methods, and experiencesFosca Giannotti and Dino PedreschiPisa KDD LabCNUCE-CNR & Univ. PisaA tutorial EDBT2000Contributors and acknowledgementszThe people Pisa KDD Lab: Francesco BONCHI, Giuseppe MANCO, Mirco NANNI, Chiara RENSO, Salvatore RUGGIERI, Franco T

2、URINI and many studentszThe many KDD tutorialists and teachers which made their slides available on the web (all of them listed in bibliography) ;-)zIn particular:yJiawei HAN, Simon Fraser University, whose forthcoming book Data mining: concepts and techniques has influenced the whole tutorialyRajee

3、v RASTOGI and Kyuseok SHIM, Lucent Bell LabsyDaniel A. KEIM, University of HalleyDaniel Silver, CogNova Technologies zThe EDBT2000 board who accepted our tutorial proposalTutorial goalszIntroduce you to major aspects of the Knowledge Discovery Process, and theory and applications of Data Mining tech

4、nologyzProvide a systematization to the many many concepts around this area, according the following linesythe processythe methods applied to paradigmatic casesythe support environmentythe research challengeszImportant issues that will be not covered in this tutorial:ymethods: time series, exception

5、 detection, neural netsysystems: parallel implementationsTutorial Outline1.Introduction and basic concepts1.Motivations, applications, the KDD process, the techniques 2.Deeper into DM technology1.Decision Trees and Fraud Detection 2.Association Rules and Market Basket Analysis3.Clustering and Custom

6、er Segmentation3.Trends in technology1.Knowledge Discovery Support Environment2.Tools, Languages and Systems4.Research challengesIntroduction - module outlinezMotivationszApplication AreaszKDD Decisional ContextzKDD ProcesszArchitecture of a KDD systemzThe KDD steps in shortEvolution of Database Tec

7、hnology:from data management to data analysisz1960s:yData collection, database creation, IMS and network DBMS.z1970s: yRelational data model, relational DBMS implementation.z1980s: yRDBMS, advanced data models (extended-relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scienti

8、fic, engineering, etc.).z1990s: yData mining and data warehousing, multimedia databases, and Web technology.Motivations “Necessity is the Mother of Invention”zData explosion problem: yAutomated data collection tools, mature database technology and internet lead to tremendous amounts of data stored i

9、n databases, data warehouses and other information repositories. zWe are drowning in information, but starving for knowledge! (John Naisbett)zData warehousing and data mining :yOn-line analytical processingyExtraction of interesting knowledge (rules, regularities, patterns, constraints) from data in

10、 large databases.zAlso referred to as: Data dredging, Data harvesting, Data archeologyzA multidisciplinary field:yDatabase yStatisticsyArtificial intelligencexMachine learning, Expert systems and Knowledge AcquisitionyVisualization methodsA rapidly emerging fieldA rapidly emerging fieldMotivations f

11、or DM zAbundance of business and industry datazCompetitive focus - Knowledge ManagementzInexpensive, powerful computing engineszStrong theoretical/mathematical foundations ymachine learning & logicystatisticsydatabase management systemsWhat is DM useful for?MarketingDatabaseMarketingDataWarehousingK

12、DD &Data Mining Increase knowledge to base decision upon.E.g., impact on marketingThe Value Chain DataData Customer data Store data Demographical Data Geographical data InformationInformation X lives in Z S is Y years old X and S moved W has money in Z KnowledgeKnowledge A quantity Y of product A is

13、 used in region Z Customers of class Y use x% of C during period D DecisionDecision Promote product A in region Z. Mail ads to families of profile P Cross-sell service B to clients CApplication Areas and OpportunitieszMarketing: segmentation, customer targeting, .zFinance: investment support, portfo

14、lio managementzBanking & Insurance: credit and policy approvalzSecurity: fraud detectionzScience and medicine: hypothesis discovery, prediction, classification, diagnosis zManufacturing: process modeling, quality control,resource allocationzEngineering: simulation and analysis, pattern recognition,

15、signal processingzInternet: smart search engines, web marketing Classes of applicationszMarket analysisxtarget marketing, customer relation management, market basket analysis, cross selling, market segmentation.zRisk analysisxForecasting, customer retention, improved underwriting, quality control, c

16、ompetitive analysis.zFraud detectionzText (news group, email, documents) and Web analysis.Market AnalysiszWhere are the data sources for analysis?yCredit card transactions, loyalty cards, discount coupons, customer complaint calls, plus (public) lifestyle studies.zTarget marketingyFind clusters of “

17、model” customers who share the same characteristics: interest, income level, spending habits, etc.zDetermine customer purchasing patterns over timeyConversion of single to a joint bank account: marriage, etc.zCross-market analysisyAssociations/co-relations between product salesyPrediction based on t

18、he association information.14zCustomer profilingydata mining can tell you what types of customers buy what products (clustering or classification).zIdentifying customer requirementsyidentifying the best products for different customersyuse prediction to find what factors will attract new customerszP

19、rovides summary informationyvarious multidimensional summary reports;ystatistical summary information (data central tendency and variation)Market Analysis and ManagementMarket Analysis (2)15Risk AnalysiszFinance planning and asset evaluation: ycash flow analysis and predictionycontingent claim analy

20、sis to evaluate assets ycross-sectional and time series analysis (financial-ratio, trend analysis, etc.)zResource planning:ysummarize and compare the resources and spendingzCompetition:ymonitor competitors and market directions (CI: competitive intelligence).ygroup customers into classes and class-b

21、ased pricing proceduresyset pricing strategy in a highly competitive market16Fraud DetectionzApplications:ywidely used in health care, retail, credit card services, telecommunications (phone card fraud), etc.zApproach:yuse historical data to build models of fraudulent behavior and use data mining to

22、 help identify similar instances.zExamples:yauto insurance: detect a group of people who stage accidents to collect on insuranceymoney laundering: detect suspicious money transactions (US Treasurys Financial Crimes Enforcement Network) ymedical insurance: detect professional patients and ring of doc

23、tors and ring of references17zMore examples:yDetecting inappropriate medical treatment: xAustralian Health Insurance Commission identifies that in many cases blanket screening tests were requested (save Australian $1m/yr).yDetecting telephone fraud: xTelephone call model: destination of the call, du

24、ration, time of day or week. Analyze patterns that deviate from an expected norm.xBritish Telecom identified discrete groups of callers with frequent intra-group calls, especially mobile phones, and broke a multimillion dollar fraud. yRetail: Analysts estimate that 38% of retail shrink is due to dis

25、honest employees.Fraud Detection (2)18zSportsyIBM Advanced Scout analyzed NBA game statistics (shots blocked, assists, and fouls) to gain competitive advantage for New York Knicks and Miami Heat.zAstronomyyJPL and the Palomar Observatory discovered 22 quasars with the help of data miningzInternet We

26、b Surf-AidyIBM Surf-Aid applies data mining algorithms to Web access logs for market-related pages to discover customer preference and behavior pages, analyzing effectiveness of Web marketing, improving Web site organization, etc.yWatch for the PRIVACY pitfall!Other applications19zThe selection and

27、processing of data for:ythe identification of novel, accurate, and useful patterns, and ythe modeling of real-world phenomena.zData mining is a major component of the KDD process - automated discovery of patterns and the development of predictive and explanatory models.What is KDD? A process!Selecti

28、on and PreprocessingData MiningInterpretation and EvaluationData ConsolidationKnowledgep(x)=0.02WarehouseData SourcesPatterns & ModelsPrepared Data ConsolidatedDataThe KDD processThe KDD ProcessCore Problems & Approaches zProblems:yidentification of relevant datayrepresentation of dataysearch for va

29、lid pattern or modelzApproaches:ytop-down deduction by expertyinteractive visualization of data/modelsy* bottom-up induction from data *DataMiningOLAPzLearning the application domain:yrelevant prior knowledge and goals of applicationzData consolidation: Creating a target data setzSelection and Prepr

30、ocessing yData cleaning : (may take 60% of effort!)yData reduction and projection:xfind useful features, dimensionality/variable reduction, invariant representation.zChoosing functions of data mining ysummarization, classification, regression, association, clustering.zChoosing the mining algorithm(s

31、)zData mining: search for patterns of interestzInterpretation and evaluation: analysis of results.yvisualization, transformation, removing redundant patterns, zUse of discovered knowledgeThe steps of the KDD process23IdentifyProblem or OpportunityMeasure effectof ActionAct onKnowledgeKnowledgeResult

32、sStrategyProblemThe virtuous cycleApplications, operations, techniquesRoles in the KDD processIncreasing potentialto supportbusiness decisionsEnd UserBusiness Analyst DataAnalystDBA MakingDecisionsData PresentationVisualization TechniquesData MiningInformation DiscoveryData ExplorationOLAP, MDAStati

33、stical Analysis, Querying and ReportingData Warehouses / Data MartsData SourcesPaper, Files, Information Providers, Database Systems, OLTPData mining and business intelligenceGraphical User InterfaceDataConsolidationSelectionandPreprocessingDataMiningInterpretationand EvaluationWarehouseKnowledgeDat

34、a SourcesArchitecture of a KDD systemA business intelligence environmentSelection and PreprocessingData MiningInterpretation and EvaluationData ConsolidationKnowledgep(x)=0.02WarehouseData SourcesPatterns & ModelsPrepared Data ConsolidatedDataThe KDD processGarbage in Garbage out zThe quality of res

35、ults relates directly to quality of the dataz50%-70% of KDD process effort is spent on data consolidation and preparationzMajor justification for a corporate data warehouseData consolidation and preparationFrom data sources to consolidated data repositoryRDBMSLegacy DBMSFlat FilesDataConsolidationan

36、d CleansingWarehouseObject/Relation DBMS Object/Relation DBMS Multidimensional DBMS Multidimensional DBMS Deductive Database Deductive Database Flat files Flat files ExternalData consolidationzDetermine preliminary list of attributes zConsolidate data into working databaseyInternal and External sour

37、ceszEliminate or estimate missing valueszRemove outliers (obvious exceptions)zDetermine prior probabilities of categories and deal with volume biasData consolidationSelection and PreprocessingData Mining Interpretation and EvaluationData ConsolidationKnowledgep(x)=0.02WarehouseThe KDD processzGenera

38、te a set of examplesychoose sampling methodyconsider sample complexityydeal with volume bias issueszReduce attribute dimensionalityyremove redundant and/or correlating attributesycombine attributes (sum, multiply, difference)zReduce attribute value rangesygroup symbolic discrete valuesyquantize cont

39、inuous numeric valueszTransform datayde-correlate and normalize values ymap time-series data to static representationzOLAP and visualization tools play key roleData selection and preprocessingSelection and PreprocessingData Mining Interpretation and EvaluationData ConsolidationKnowledgep(x)=0.02Ware

40、houseThe KDD processData mining tasks and methods zAutomated Exploration/Discoveryye.g. discovering new market segmentsyclustering analysiszPrediction/Classificationye.g. forecasting gross sales given current factorsyregression, neural networks, genetic algorithms, decision treeszExplanation/Descrip

41、tionye.g. characterizing customers by demographics and purchase historyydecision trees, association rulesx1x2f(x)xif age 35 and income $35k then .zClustering: partitioning a set of data into a set of classes, called clusters, whose members share some interesting common properties.zDistance-based num

42、erical clusteringymetric grouping of examples (K-NN)ygraphical visualization can be usedzBayesian clusteringysearch for the number of classes which result in best fit of a probability distribution to the data yAutoClass (NASA) one of best examplesAutomated exploration and discoveryzLearning a predic

43、tive modelzClassification of a new case/sample zMany methods:yArtificial neural networksyInductive decision tree and rule systemsyGenetic algorithmsyNearest neighbor clustering algorithmsyStatistical (parametric, and non-parametric)Prediction and classificationzThe objective of learning is to achiev

44、e good generalization to new unseen cases.zGeneralization can be defined as a mathematical interpolation or regression over a set of training pointszModels can be validated with a previously unseen test set or using cross-validation methodsf(x)xGeneralization and regressionClassification and predict

45、ionyClassify data based on the values of a target attribute, e.g., classify countries based on climate, or classify cars based on gas mileage.yUse obtained model to predict some unknown or missing attribute values based on other information.Objective: Develop a general model or hypothesis from speci

46、fic exampleszFunction approximation (curve fitting)zClassification (concept learning, pattern recognition)x1x2ABf(x)xSummarizing: inductive modeling = learningzLearn a generalized hypothesis (model) from selected datazDescription/Interpretation of model provides new knowledge zMethods:yInductive dec

47、ision tree and rule systemsyAssociation rule systemsyLink Analysisy Explanation and descriptionzGenerate a model of normal activityzDeviation from model causes alertzMethods:yArtificial neural networksyInductive decision tree and rule systemsyStatistical methodsyVisualization toolsException/deviatio

48、n detectionOutlier and exception data analysiszTime-series analysis (trend and deviation): yTrend and deviation analysis: regression, sequential pattern, similar sequences, trend and deviation, e.g., stock analysis.ySimilarity-based pattern-directed analysisyFull vs. partial periodicity analysiszOth

49、er pattern-directed or statistical analysisSelection and PreprocessingData Mining Interpretation and EvaluationData Consolidationand WarehousingKnowledgep(x)=0.02WarehouseThe KDD processzA data mining system/query may generate thousands of patterns, not all of them are interesting.zInterestingness m

50、easures:yeasily understood by humansyvalid on new or test data with some degree of certainty.ypotentially usefulynovel, or validates some hypothesis that a user seeks to confirm zObjective vs. subjective interestingness measuresyObjective: based on statistics and structures of patterns, e.g., suppor

51、t, confidence, etc.ySubjective: based on users beliefs in the data, e.g., unexpectedness, novelty, etc.Are all the discovered pattern interesting?47zFind all the interesting patterns: Completeness.yCan a data mining system find all the interesting patterns?zSearch for only interesting patterns: Opti

52、mization.yCan a data mining system find only the interesting patterns?yApproachesxFirst generate all the patterns and then filter out the uninteresting ones.xGenerate only the interesting patterns - mining query optimization.Completeness vs. optimization48EvaluationzStatistical validation and signif

53、icance testingzQualitative review by experts in the fieldzPilot surveys to evaluate model accuracyInterpretationzInductive tree and rule models can be read directlyzClustering results can be graphed and tabledzCode can be automatically generated by some systems (IDTs, Regression models)Interpretatio

54、n and evaluationzVisualization tools can be very helpfulysensitivity analysis (I/O relationship)yhistograms of value distributionytime-series plots and animationyrequires training and practiceResponseVelocityTempInterpretation and evaluationz1989 IJCAI Workshop on KDDyKnowledge Discovery in Database

55、s (G. Piatetsky-Shapiro and W. Frawley, eds., 1991)z1991-1994 Workshops on KDDyAdvances in Knowledge Discovery and Data Mining (U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., 1996)z1995-1998 AAAI Int. Conf. on KDD and DM (KDD95-98)yJournal of Data Mining and Knowledge Discovery

56、(1997)z1998 ACM SIGKDD z1999 SIGKDD99 Conf.Important dates of data mining51References - generalzP. Adriaans and D. Zantinge. Data Mining. Addison-Wesley: Harlow, England, 1996.zM. S. Chen, J. Han, and P. S. Yu. Data mining: An overview from a database perspective. IEEE Trans. Knowledge and Data Engi

57、neering, 8:866-883, 1996.zU. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, 1996.zJ. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000. To appear.zT. Imielinski and H. Mannila. A database per

58、spective on knowledge discovery. Communications of ACM, 39:58-64, 1996.zG. Piatetsky-Shapiro, U. Fayyad, and P. Smith. From data mining to knowledge discovery: An overview. In U.M. Fayyad, et al. (eds.), Advances in Knowledge Discovery and Data Mining, 1-35. AAAI/MIT Press, 1996.zG. Piatetsky-Shapir

59、o and W. J. Frawley. Knowledge Discovery in Databases. AAAI/MIT Press, 1991.zMichael Berry & Gordon Linoff. Data Mining Techniques for Marketing, Sales and Customer Support. John Wiley & Sons, 1997. zSholom M. Weiss and Nitin Indurkhya. Predictive Data Mining: A Practical Guide. Morgan Kaufmann, 199

60、7.zW.H. Inmon, J.D. Welch, Katherine L. Glassey. Managing the data warehouse. Wiley, 1997. zT. Mitchell. Machine Learning. McGraw-Hill, 1997.Main Web resourceszKDD Newsletter and comprehensive websitez zACM SIGKDD zJournal of Data Mining and Knowledge DiscoveryTutorial OutlinezIntroduction and basic

61、 conceptsxMotivations, applications, the KDD process, the techniques zDeeper into DM technologyyDecision Trees and Fraud Detection yAssociation Rules and Market Basket AnalysisyClustering and Customer SegmentationzTrends in technologyxKnowledge Discovery Support EnvironmentxTools, Languages and SystemszResearch challenges

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 办公文档 > 工作计划

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号