Hierarchical Classification of Documents with Error Control的

上传人:镜花****ul 文档编号:98757011 上传时间:2019-09-14 格式:PPT 页数:32 大小:307.50KB
返回 下载 相关 举报
Hierarchical Classification of Documents with Error Control的_第1页
第1页 / 共32页
Hierarchical Classification of Documents with Error Control的_第2页
第2页 / 共32页
Hierarchical Classification of Documents with Error Control的_第3页
第3页 / 共32页
Hierarchical Classification of Documents with Error Control的_第4页
第4页 / 共32页
Hierarchical Classification of Documents with Error Control的_第5页
第5页 / 共32页
点击查看更多>>
资源描述

《Hierarchical Classification of Documents with Error Control的》由会员分享,可在线阅读,更多相关《Hierarchical Classification of Documents with Error Control的(32页珍藏版)》请在金锄头文库上搜索。

1、1,Hierarchical Classification of Documents with Error Control,Chun-Hung Cheng, Jian Tang, Ada Wai-chee Fu, Irwin King,This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during your presentation In Slide Show

2、, click on the right mouse button Select “Meeting Minder” Select the “Action Items” tab Type in action items as they come up Click OK to dismiss this box This will automatically create an Action Item slide at the end of your presentation with your points entered.,2,Overview,Abstract Problem Descript

3、ion Document Classification Model Error Control Schemes Recovery oriented scheme Error masking scheme Experiments Conclusion,3,Abstract,Traditional document classification (flat classification) involves only a single classifier Single classifier takes care of everything Slow and high overhead,4,Abst

4、ract,Hierarchical document classification Class hierarchy Use one classifier at each internal node,5,Abstract,Advantage Better performance Disadvantage Wrong result if misclassified in any node,6,Abstract,Introduce error control mechanism Approach 1 (recovery oriented) Detect and correct misclassifi

5、cation Approach 2 (error masking) Mask errors by using multiple versions of classifiers,7,Problem Description,class | doc_id | ,Class Taxonomy,Training Documents,Class-doc Relation,Training System,Statistics,Feature Terms,8,Problem Description,Classification System,Statistics,Feature Terms,Target Cl

6、ass,Incoming Documents,9,Problem Description,Objective: Achieve Higher accuracy Fast performance Our proposed algorithms provide a good trade-off between accuracy and performance,10,Document Classification Model,Formally, we use a model from Chakrabarti et al. 1997 Based on naive Bayesian network Fo

7、r simplicity, we study a single node classifier.,11,zi,d number of occurrence of term i in the incoming document d Pj, c probability that a word in class c is j (estimated using the training data),Probability that an incoming document d belongs to c is,12,Feature Selection,Previous formula involves

8、all the terms Feature selection reduces cost by using only the terms with good discriminating power Use the training sets to identify the feature terms,13,Fishers Index,Fishers Index indicates the discriminating power of a term Good discriminating power: large interclass distance, small intraclass d

9、istance,c1,c2,w(t),Interclass distance,Intraclass distance,14,Document Classification Model,Consider only feature terms in the classification function p(ci|c,d) Pick the largest probability among all ci Use one classifier in each internal node,15,Recovery Oriented Scheme,Database system Failure in D

10、BMS Restart from a consistent state Document classification Error detected Restart from a correct class (High Confidence Ancestor, or HCA),16,Recovery Oriented Scheme,In practice, Rollback is slow Identify wrong paths and avoid them To identify wrong paths, Define closeness indicator (CI) On wrong p

11、ath, when CI falls below a threshold,17,Recovery Oriented Scheme,Define distance of HCA and current node = 2,Wrong path,18,Recovery Oriented Scheme,Wrong path,Define distance of HCA and current node = 2,19,Error Masking Scheme,Software Fault Tolerance Run multiple versions of software Majority votin

12、g Document Classification Run classifiers of different designs Majority voting,20,O-Classifier,Traditional classifier,21,N-classifier,Skip some intermediate levels,22,Error Masking Scheme,Run three classifiers in parallel O-classifier N-classifier O-classifier using new feature length This selection

13、 minimizes the time wasted on waiting the slowest classifiers,23,Experiments,Data Sets US Patents Preclassified Rich text content Highly hierarchical 3 Sets Collected 3 levels/large no of docs 4 levels/large no of docs 7 levels/small no of docs,24,Experiments,Algorithm compared Simple hierarchical T

14、APER Flat Recovery oriented Error masking Generally, flat is the slowest and the most accurate simple hierarchical is the fastest and the least accurate,25,Accuracy: 3 levels/large,26,Accuracy: 4 levels/large,27,Accuracy: 7 levels/small,28,Performance: 3 levels/large,29,Performance: 4 levels/large,3

15、0,Performance: 7 levels/small,31,Conclusion,Real-life application Large taxonomy Flat classification is too slow Our algorithm is faster than flat classification at as low as 4 levels Performance gain widens as the number of levels increases A good trade-off between accuracy and performance for most applications,32,Thank You,The End,

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 办公文档 > PPT模板库 > 总结/计划/报告

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号