数据挖掘课件_第一课

上传人:简****9 文档编号:116130734 上传时间:2019-11-16 格式:PPT 页数:51 大小:248KB
返回 下载 相关 举报
数据挖掘课件_第一课_第1页
第1页 / 共51页
数据挖掘课件_第一课_第2页
第2页 / 共51页
数据挖掘课件_第一课_第3页
第3页 / 共51页
数据挖掘课件_第一课_第4页
第4页 / 共51页
数据挖掘课件_第一课_第5页
第5页 / 共51页
点击查看更多>>
资源描述

《数据挖掘课件_第一课》由会员分享,可在线阅读,更多相关《数据挖掘课件_第一课(51页珍藏版)》请在金锄头文库上搜索。

1、数据仓库与数据挖掘 信息学院 李翠平 Course Outline nIntroduction nFrequent Patterns nClassification nCluster Analysis nOurlier Detection nData Warehouse and OLAP Technology for Data Mining nAdvanced topic in data mining nStream data mining nTime-series and sequential pattern mining nGraph and structured pattern mini

2、ng nSpatiotemporal and multimedia data mining nMulti-relational and cross-database data mining nSocial network analysis nText and Web mining nOther interesting data mining topics nData Mining Applications & Examples (分组报告) Date2Data Mining: Concepts and Techniques 课程要求、成绩评估、参考书 课程要求: n按时上课和完成作业,积极参与

3、课堂讨论, 评估成绩: n平时(50):考勤+课堂报告 n期末(50 ) : 参考书: nJiawei Han: Data Mining: Concept and Techniques( 数据库视角看数据挖掘) nDavid J. Hand等,Principles of Data Mining(统计视 角看数据挖掘) n王珊,李翠平等,数据仓库与数据分析原理 Date3Data Mining: Concepts and Techniques Introduction nMotivation: Why data mining? nWhat is data mining? nData Mining

4、: On what kind of data? nData mining functionality nAre all the patterns interesting? nData Mining Framework nIntegration of Data Mining and Data Warehousing nMajor data mining conference Date4Data Mining: Concepts and Techniques Necessity Is the Mother of Invention nData explosion problem nAutomate

5、d data collection tools and mature database technology lead to tremendous amounts of data accumulated and/or to be analyzed in databases, data warehouses, and other information repositories nWe are drowning in data, but starving for knowledge! nSolution: Data warehousing and data mining nData wareho

6、using and on-line analytical processing nMining interesting knowledge (rules, regularities, patterns, constraints) from data in large databases Date5Data Mining: Concepts and Techniques Evolution of Database Technology n1960s: nData collection, database creation, IMS and network DBMS n1970s: nRelati

7、onal data model, relational DBMS implementation n1980s: nRDBMS, advanced data models (extended-relational, OO, deductive, etc.) nApplication-oriented DBMS (spatial, scientific, engineering, etc.) n1990s: nData mining, data warehousing, multimedia databases, and Web databases n2000s nStream data mana

8、gement and mining nData mining and its applications nWeb technology (XML, data integration) and global information systems Date6Data Mining: Concepts and Techniques What Is Data Mining? nData mining (knowledge discovery from data) nExtraction of interesting (non-trivial, implicit, previously unknown

9、 and potentially useful) patterns or knowledge from huge amount of data nData mining: a misnomer? nAlternative names nKnowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc. nWat

10、ch out: Is everything “data mining”? n(Deductive) query processing. nExpert systems or small ML/statistical programs Date7Data Mining: Concepts and Techniques 数据挖掘与KDD n也可以把数据挖掘作为KDD的一个步骤。 nKDD 是一个以知识使用者为中心,人机交互的探索过程, 包括了在指定的数据库中用数据挖掘算法提取模型,以及 围绕数据挖掘所进行的预处理和结果表达等一系列的步骤 。 n尽管数据挖掘是整个过程的中心,但它通常只占KDD 过程

11、 15%25%的工作量。 数据源 数据 集成 数据 预处理 数据 挖掘 评估 表示 模式知识 图7.1 将数据挖掘看作KDD的一个步骤 目标数据 洁净数据 Date8Data Mining: Concepts and Techniques Data Mining: On What Kinds of Data? nRelational database nData warehouse nTransactional database nAdvanced database and advanced applications nObject-relational databases nTemporal

12、 databases and time-series databases nSpatial databases and spatiotemporal databases nText databases and multimedia database nHeterogeneous databases and legacy databases nData streams nThe World-Wide Web Date9Data Mining: Concepts and Techniques 数据挖掘的特点(1) n第一,数据挖掘的数据源必须是真实的。 n数据挖掘所处理的数据通常是已经存在的真实数

13、据(如超 市业务数据),而不是为了进行数据分析而专门收集的数 据。因此,数据收集本身不属于数据挖掘所关注的焦点, 这是数据挖掘区别于大多数统计任务的特征之一。 Date10Data Mining: Concepts and Techniques 数据挖掘的特点(2) n第二,数据挖掘所处理的数据必须是海量的。 n如果数据集很小的话,采用单纯的统计分析方法就可以了 。但是,当数据集很大时,会面临许多新的问题,诸如, 数据的有效存储、快速访问、合理表示等。 Date11Data Mining: Concepts and Techniques 数据挖掘的特点(3) n第三,查询一般是决策制定者(用户

14、)提出的随 机查询。 n查询要求灵活,往往不能形成精确的查询要求,要靠数据 挖掘技术来寻找可能的查询结果。 Date12Data Mining: Concepts and Techniques 数据挖掘的特点(4) n第四,挖掘出来的知识一般是不能预知的,数据 挖掘发现的是潜在的、新颖的知识。 n这些知识在特定环境下是可以接受、可以理解、可以运用 的,但不是放之四海皆准的。 Date13Data Mining: Concepts and Techniques Data Mining Functionalities nConcept description: Characterization a

15、nd discrimination nGeneralize, summarize, and contrast data characteristics, e.g., dry vs. wet regions nAssociation (correlation and causality) nDiaper Beer 0.5%, 75% (Correlation or causality?) nClassification and Prediction nConstruct models (functions) that describe and distinguish classes or con

16、cepts for future prediction nE.g., classify countries based on climate, or classify cars based on gas mileage nPresentation: decision-tree, classification rule, neural network nPredict some unknown or missing numerical values Date14Data Mining: Concepts and Techniques Data Mining Functionalities (2) nCluster analysis nClass label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patterns nMaximizing intra-class similarity & minimizing interclass

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 商业/管理/HR > 管理学资料

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号