数据挖掘研究前沿-韩家炜

上传人:Bod****ee 文档编号:55249475 上传时间:2018-09-26 格式:PPT 页数:33 大小:8.83MB
返回 下载 相关 举报
数据挖掘研究前沿-韩家炜_第1页
第1页 / 共33页
数据挖掘研究前沿-韩家炜_第2页
第2页 / 共33页
数据挖掘研究前沿-韩家炜_第3页
第3页 / 共33页
数据挖掘研究前沿-韩家炜_第4页
第4页 / 共33页
数据挖掘研究前沿-韩家炜_第5页
第5页 / 共33页
点击查看更多>>
资源描述

《数据挖掘研究前沿-韩家炜》由会员分享,可在线阅读,更多相关《数据挖掘研究前沿-韩家炜(33页珍藏版)》请在金锄头文库上搜索。

1、1,Data Mining: Unlimited New Research Frontiers,Jiawei Han Data Mining Research Group Department of Computer Science University of Illinois at Urbana-Champaign Acknowledgements: NSF, ARL, NASA, AFOSR (MURI), DHS, Microsoft, IBM, Yahoo!, HP Lab & Boeing September 26, 2018,2,Outline,An Introduction to

2、 Data Mining Research Group Mining and OLAPing Information Networks Mining Heterogeneous Information Networks Mining Text-Rich Information Networks OLAPing (Multi-dimensional analysis) of information networks: TextCube, OLAP heterogeneous networks Taming the Web: WINACS (Integrated mining of Web str

3、uctures and contents) Mining Cyber-Physical Systems and Networks Conclusions,Data Mining and Data Warehousing Jiawei Hans Group at CS, UIUC,Mining patterns and knowledge discovery from massive data Data mining in heterogeneous information networks Exploring broad applications of data mining,Develope

4、d many effective data mining algorithms, e.g., FPgrowth, PrefixSpan, gSpan, StarCubing, CrossMine, RankingCube, CrossClus , RankClus, and NetClus 600+ research papers in conferences and journals Fellow of ACM, Fellow of IEEE, ACM SIGKDD Innovation Award, W. McDowell Award, Daniel Drucker Eminent Fac

5、ulty Award Textbook, “Data mining: Concepts and Techniques,” adopted worldwide Project lead for NASA EventCube for Aviation Safety 2008-2012 Director of Information Network Academic Research Center funded from Army Research Lab (ARL) 2009-2014,3,Data Mining Research Group at CS, UIUC,4,New Books on

6、Data Mining & Link Mining,5,Han, Kamber and Pei, Data Mining, 3rd ed. 2011,Yu, Han and Faloutsos (eds.), Link Mining, 2010,Sun and Han, Mining Heterogeneous Information Networks, 2012,6,Outline,An Introduction to Data Mining Research Group Mining and OLAPing Information Networks Mining Heterogeneous

7、 Information Networks Mining Text-Rich Information Networks OLAPing (Multi-dimensional analysis) of information networks: TextCube, OLAP heterogeneous networks Taming the Web: WINACS (Integrated mining of Web structures and contents) Mining Cyber-Physical Systems and Networks Conclusions,Mining Hete

8、rogeneous Information Networks,RankClus/NetClus,VS.,RankCompete: A Competing Random Walk Model for Rank-Based Clustering,RankClass KDD11,Knowledge Propagation in Heterogeneous Network,8,Similarity Search and Role Discovery in Information Networks,Path: ITI,Path: ITIGITI,Which images are most similar

9、 to me in Flickr?,PathSim VLDB11,Meta Path-Guided Similarity Search in Networks,A “dirty” Information Network (imaginary),Cleaned/Inferred Adversarial Network,Chief,Insurgent,Cell Lead,Automatically infer,Role Discovery in Information Networks KDD10,Interesting Results from Other Domains,RankCompete

10、: Organize your photo album automatically!,Rank treatments for AIDS from MEDLINE,9,Meta-Path Based Co-authorship Prediction in DBLP,Co-authorship prediction problem Whether two authors are going to collaborate for the first time Co-authorship encoded in meta-path Author-Paper-Author Topological feat

11、ures encoded in meta-paths,Meta-paths between authors under length 4,Meta-Path,Semantic Meaning,10,The Power of PathPredict,Explain the prediction power of each meta-path Wald Test for logistic regressionHigher prediction accuracy than using projected homogeneous network 7% higher in prediction accu

12、racy,Social relations play more important role?,11,Case Study: Predicting Concrete Co-Authors,High quality predictive power for such a difficult task,12,Using data in T0 =1989; 1995 and T1 = 1996; 2002 Predict new coauthor relationship in T2 = 2003; 2009,13,Outline,An Introduction to Data Mining Res

13、earch Group Mining and OLAPing Information Networks Mining Heterogeneous Information Networks Mining Text-Rich Information Networks OLAPing (Multi-dimensional analysis) of information networks: TextCube, OLAP heterogeneous networks Taming the Web: WINACS (Integrated mining of Web structures and cont

14、ents) Mining Cyber-Physical Systems and Networks Conclusions,Intuitions: Friends tend to hold similar opinions, while foes tend to hold conflicting opinions Based on users sentiment scores on different objects, we can infer the similarity and dissimilarity (i.e., pseudo-friend and pseudo-foe relatio

15、nship) between users Based on the inferred friendship, we can improve sentiment analysis and user clustering by considering global consistency on heterogeneous networks,State-of-the-Art Explore similar opinions instead of opposite opinions Typically consider text content while ignore InfoNet Rely on

16、 observed friendship (but many are hidden),Industry Need/Benefits Use sentiment analysis to understand and mine public opinions on product/market-related issues QoI-aware mining of text-rich multi-genre networks Intelligent methods for public opinion assessment,14,Insight: Exploring opposite opinions may help to discover hidden friendship, which can produce better sentiment scores and user clustering. (mutually enhance each other),1. Consider information and social networks 2. Explore opposite and similar opinions 3. Both observed & hidden friendship are valuable,

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 行业资料 > 其它行业文档

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号