《数据挖掘理论与技术01》由会员分享,可在线阅读,更多相关《数据挖掘理论与技术01(53页珍藏版)》请在金锄头文库上搜索。
1、2018年9月14日星期五,Data Mining: Concepts and Techniques,1,知识挖掘理论和技术 Knowledge Mining Theory and Technology Chapter 1 ,魏玮 计算机科学与软件学院 河北工业大学 ,2018年9月14日星期五,Data Mining: Concepts and Techniques,2,教材:数据挖掘 概念与技术 Data Mining: Concepts and Techniques,Jiawei Han Department of Computer Science University of Illin
2、ois at Urbana-Champaign www.cs.uiuc.edu/hanj 2006 Jiawei Han and Micheline Kamber, All rights reserved 范明 孟小峰 译 机械工业出版社,2008,2018年9月14日星期五,Data Mining: Concepts and Techniques,3,Data Mining: Concepts and Techniques,2nd ed. to be published in Jan. 2006 Seven chapters will be covered in Fall semester
3、E-preprints will be used in this class,2018年9月14日星期五,Data Mining: Concepts and Techniques,4,教学参考书,朱明编著.数据挖掘.中国科学技术大学出版社,2002 (加)Jiawei Han,Micheline Kamber. DATA MINING:Concepts and Techniques(英文版,第二版).机械工业出版社,2006.4 (美)Mehmed Kantardzic著.数据挖掘概念、模型、方法和算法.闪四清、陈茵、程雁等译,清华大学出版社,2003 (美)Richard J. Roiger
4、,Michael W. Geatz. 数据挖掘教程.翁敬农译,清华大学出版社,2003,2018年9月14日星期五,Data Mining: Concepts and Techniques,5,Disciplinary Coverage,Introduction Data Preprocessing Data Warehouse and OLAP Technology: An Introduction Advanced Data Cube Technology and Data Generalization Mining Frequent Patterns, Association and C
5、orrelations Classification and Prediction Cluster Analysis,2018年9月14日星期五,Data Mining: Concepts and Techniques,6,Chapter 1. Introduction,Motivation: Why data mining? What is data mining? Data Mining: On what kind of data? Data mining functionality Are all the patterns interesting? Classification of d
6、ata mining systems Data Mining Task Primitives Integration of data mining system with a DB and DW System Major issues in data mining,2018年9月14日星期五,Data Mining: Concepts and Techniques,7,1.1 Motivation: Why data mining? Necessity Is the Mother of Invention,Data explosion problem Automated data collec
7、tion tools, widely used database systems, computerized society, and the Internet lead to tremendous amounts of data accumulated and/or to be analyzed in databases, data warehouses, WWW, and other information repositories We are drowning in data, but starving for knowledge! Solution: Data warehousing
8、 and data mining Data warehousing and on-line analytical processing Mining interesting knowledge (rules, regularities, patterns, constraints) from data in large databases,2018年9月14日星期五,Data Mining: Concepts and Techniques,8,Fig 1.1 The evolution of database system technology,2018年9月14日星期五,Data Minin
9、g: Concepts and Techniques,9,Evolution of Database Technology,1960s: Data collection, database creation, IMS and network DBMS 1970s: Relational data model, relational DBMS implementation 1980s: RDBMS, advanced data models (extended-relational, OO, deductive, etc.) Application-oriented DBMS (spatial,
10、 scientific, engineering, etc.) 1990s: Data mining, data warehousing, multimedia databases, and Web databases 2000s Stream data management and mining Data mining and its applications Web technology (XML, data integration) and global information systems,2018年9月14日星期五,Data Mining: Concepts and Techniq
11、ues,10,Fig 1.2 We are data rich, but information poor,2018年9月14日星期五,Data Mining: Concepts and Techniques,11,1.2 What Is Data Mining?,Data mining (knowledge discovery from data) Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from hug
12、e amount of data Data mining: a misnomer? Alternative names Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.,2018年9月14日星期五,Data Mining: Concepts and Techniques,12,Why Data
13、 Mining?Potential Applications,Data analysis and decision support Market analysis and management Target marketing, customer relationship management (CRM), market basket analysis, cross selling, market segmentation Risk analysis and management Forecasting, customer retention, improved underwriting, q
14、uality control, competitive analysis Fraud detection and detection of unusual patterns (outliers) Other Applications Text mining (news group, email, documents) and Web mining Stream data mining Bioinformatics and bio-data analysis,2018年9月14日星期五,Data Mining: Concepts and Techniques,13,Example 1: Mark
15、et Analysis and Management,Where does the data come from?Credit card transactions, loyalty cards, discount coupons, customer complaint calls, plus (public) lifestyle studies Target marketing Find clusters of “model” customers who share the same characteristics: interest, income level, spending habit
16、s, etc., Determine customer purchasing patterns over time Cross-market analysisFind associations/co-relations between product sales, & predict based on such association Customer profilingWhat types of customers buy what products (clustering or classification) Customer requirement analysis Identify the best products for different customers Predict what factors will attract new customers Provision of summary information Multidimensional summary reports Statistical summary information (data central tendency and variation),