数据库中关联规则及效用模式挖掘算法的研究

资源描述

《数据库中关联规则及效用模式挖掘算法的研究》由会员分享，可在线阅读，更多相关《数据库中关联规则及效用模式挖掘算法的研究（52页珍藏版）》请在金锄头文库上搜索。

1、湘潭大学硕士学位论文数据库中关联规则及效用模式挖掘算法的研究姓名：熊学栋申请学位级别：硕士专业：计算机应用技术指导教师：肖建华 20070501 I 摘要近年来随着数字化在各机关企业中越来越普及，数据库在各个企业中的角色也就越来越重要。数据库所累积大量的数据中往往隐藏了许多有用的重要信息，如何能够有效率且正确地发掘出这些信息就变成为一个重要的课题，因此数据挖掘技术随即应运而生。目前数据挖掘中应用最广的技术就是关联规则的挖掘，许多的相关技术及研究已经被提出。关联规则挖掘模型以平等的方式对待每个项目（item），只考虑项目是否在事务记录中出现。但是在实际的情况中，项

2、目之间的是有明显区别的，我们可以将这种区别定量化，其中一种方法就是以效用来衡量项目之间的区别。本文在研究提出关联规则新算法的同时，对另一类问题，效用模式的挖掘也作了细致的研究。效用模式挖掘是一个全新的挖掘技术分支.效用模式发现问题是和关联规则，序列分析较为相似的一类问题，它们有共同的数据背景-从购物篮数据延伸开来的客户记录数据。和另外两者的挖掘类似，效用挖掘也是从这些数据中寻找潜在有用的，非平凡的支持决策的新知识。只是更加侧重满足最小效用值，可以看成是一种带有约束的项集挖掘。本文延续了对关联规则的研究，给出了一种基于划分和分解的算法，该算法基于划分的思想，只需扫描数据库一次

3、，较大的减少了候选项集的数量，也缩小了检验候选项集时考虑的范围。实验表明该算法在效率上有较大的改进。针对效用挖掘的情况，本文在总结前人研究的基础上，将问题转化为一个最优化问题,提出一种基于二分划分树的启发式算法，该算法能有效的在数据中寻找效用模式。相对于基于剪枝的效用模式发现算法，该算法性能上有较大的突破。本研究的主要内容为有效的关联规则算法和效用挖掘新算法，通过在实验中对比算法的性能，验证了研究成果的先进性。关键字：数据挖掘；关联规则；频繁项目集；效用挖掘；效用模式 II ABSTRACT With the coming of the age of informati

4、on, more and more companies and government agencies are being digitally equipped. And the database technology is playing more important a role than ever before. Huge amount of information remains undiscovered in these accumulated databases .it becomes a crucial challenge to efficiently and correctly

5、 extract the useful information hidden in these databases .data mining technology address this problem. As far as it goes , the most popular technology in data mining is association rules mining. Many researches have been contributed in this area. The mostly studied association rule mining model car

6、e about whether an item is included in a transaction or not. And thus treat all items equally. While in the real world case, there are discriminations between items. We can quantify these differences between items , one option is that we can use utility as a measure to signify the usefulness of the

7、respective items. The thesis works on the research of association rule mining, along with the research on utility pattern mining, which is a emergent new topic in the data mining community .utility pattern mining is somewhat like the association rule and the sequence analysis ,for they share the sam

8、e form of the targeted database .and all contrived to obtain the finding of potentially useful, non- trivial ,decision- support knowledge. The thesis proposes a association mining algorithm based on partition and decomposition. The algorithm was grounded on the idea of partitioning the whole databas

9、e which is a way to save for RAM storage .it scans the database once and shrinks the amount of candidate item- sets. The experiment concludes that it has an edge in the efficiency comparison. Another contribution of the thesis is the study on the utility pattern mining. the author proposed an heuris

10、tic algorithm based the dynamic binary partition tree. The algorithm doing so by further assuming the problem in an optimization framework .the experiment also shows that it is more robust than former ones. This dissertation paper mainly deal with the research of association rule mining and the util

11、ity item- sets mining problem. the experiment dedicated to the verification of the proposed algorithms show that the research is novel and constructive . Key words : data mining ; association rule ;frequent itemsets ;utility mining ;utility pattern. 湘潭大学湘潭大学学位论文原创性声明学位论文原创性声明本人郑重声明：所呈交的论文是本人在导师的指导

12、下独立进行研究所取得的研究成果。除了文中特别加以标注引用的内容外，本论文不包含任何其他个人或集体已经发表或撰写的成果作品。对本文的研究做出重要贡献的个人和集体，均已在文中以明确方式标明。本人完全意识到本声明的法律后果由本人承担。作者签名：日期：年月日学位论文版权使用授权书学位论文版权使用授权书本学位论文作者完全了解学校有关保留、使用学位论文的规定，同意学校保留并向国家有关部门或机构送交论文的复印件和电子版，允许论文被查阅和借阅。本人授权湘潭大学可以将本学位论文的全部或部分内容编入有关数据库进行检索，可以采用影印、缩印或扫描等复制手段保存和汇编本学位论文。涉密论文按学

13、校规定处理。作者签名：日期：年月日导师签名：日期：年月日 1 第一章绪论 1.1 关联规则的背景基于 Internet 的全球信息系统的发展使我们拥有了前所未有的丰富数据。大量信息在给人们带来方便的同时也带来了一大堆问题：第一是信息过量，难以消化；第二是信息真假难以辨识；第三是信息安全难以保证；第四是信息形式不一致，难以统一处理。数据丰富、知识贫乏已经成为一个典型问题。Data Mining （数据挖掘）的目的就是有效地从海量数据中提取出需要的答案，实现“数据- 信息-知识-价值”的转变过程。 Data Mining（数据挖掘）是指用非平凡的方法从海量的数据中抽取

14、出潜在的、有价值的知识（模型或规则）的过程。该术语还有其他一些同义词：数据库中的知识发现（Knowledge discovery in databases）、信息抽取（Information extraction）、信息发现（Information discovery）、智能数据分析（Intelligent data analysis）、探索式数据分析（exploratory data analysis）、信息收获（information harvesting）、数据考古（data archeology）等。数据挖掘的发展历程大致如下： 1989 IJCAI 会议：数据库

15、中的知识发现讨论专题 Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. Frawley, 1991) 1991-1994 KDD 讨论专题 Advances in Knowledge Discovery and Data Mining (U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 1996) 1995-1998 KDD 国际会议 (KDD95-98) Journal of Data Mining and Knowledge Discovery

16、 (1997) 1998 ACM SIGKDD, SIGKDD1999-2002 会议,以及 SIGKDD Explorations 数据挖掘方面更多的国际会议 PAKDD, PKDD, SIAM-Data Mining, (IEEE) ICDM, DaWaK, SPIE-DM, etc. Data Mining（数据挖掘）是数据库研究、开发和应用最活跃的一个分支，是多学科的交叉领域，它涉及数据库技术、人工智能、机器学习、神经网络、数学、统计学、模式识别、知识库系统、知识获取、信息提取、高性能计算、并行计算、数据可视化等多方面知识。数据挖掘技术从一开始就是面向应用的，它不仅是面向特定数据库的简单检索查询调用，而且要对这些数据进行微观、中观乃至宏观的统计、分析、综合和 2 推理

展开阅读全文

数据库中关联规则及效用模式挖掘算法的研究

最新文档