确立了评价在信息检索研究中的核心地位Cranfield是一个地名

资源描述

《确立了评价在信息检索研究中的核心地位Cranfield是一个地名》由会员分享，可在线阅读，更多相关《确立了评价在信息检索研究中的核心地位Cranfield是一个地名（97页珍藏版）》请在金锄头文库上搜索。

1、ThePrincipleofInformationRetrieval DepartmentofInformationManagementSchoolofInformationEngineeringNanjingUniversityofFinance Economics2011 II课程内容 5Evaluationininformationretrieval Howtomeasureuserhappiness ThekeyutilitymeasureisuserhappinessRelevanceofresults effectiveness SpeedofresponseSizeofthein

2、dexUserinterfacedesignIndependentofthequalityoftheresultsUtility success completeness satisfaction worth value time cost 5 1Introduction 评价的作用评价在信息检索研究中发挥着重要作用评价在信息检索系统的研发中一直处于核心的地位以致于算法与其效果评价方式是合二为一的 Saracevic SIGIR1995 信息检索系统评价的起源 Kent等人第一次提出了关于Precision和Recall 开始称为relevance 的概念 Kent 1955 信息检索系统

3、评价的起源 Cranfield likeevaluationmethodologyCranfield在上世纪伍十年代末到六十年代初提出了基于查询样例集标准答案集和语料库的评测方案被称为IR评价的 grand daddy 确立了评价在信息检索研究中的核心地位Cranfield是一个地名也是一个研究所的名称信息检索系统评价的起源 GerardSalton与SMART系统GerardSalton是SMART系统的主要研发者 SMART首次提供了一个研究平台你可以只关心算法而不必关心索引什么的同时也提供了一个评测计算你提供了答案后可以给出常用的指标信息检索系统评价的起源 Sparc

4、k Jones的著作 Informationretrievalexperiment 主要论述IR实验和评测 Howtomeasureinformationretrievaleffectiveness1 2 WeneedatestcollectionconsistingofthreethingsAdocumentcollectionAtestsuiteofinformationneeds expressibleasqueriesAsetofrelevancejudgments standardlyabinaryassessmentofeitherrelevantornotrelevantfore

5、achquery documentpair Howtomeasureinformationretrievaleffectiveness2 2 AndinthistestcollectionAdocumentisgivenabinaryclassificationaseitherrelevantornotrelevantCollectionandsuiteofinformationneedshavetobeofareasonablesizeResultsarehighlyvariableoverdifferentdocumentsandinformationneeds50informationn

6、eedsatleast Difficulties ThedifferenceofstatedinformationneedandqueryRelevanceisassessedrelativetoaninformationneed notaqueryThesubjectivityofrelevancedecisionManysystemscontainvariousparametersthatcanbeadjustedtotunesystemperformanceThecorrectprocedureistohaveoneormoredevelopmenttestcollections Dif

7、ficulties Voorhees估计对一个规模为800万的文档集合进行针对1个查询主题的相关性评判需要耗费1名标注人员9个月的工作时间TREC提出pooling方法在保证评价结果可靠性的基础上大大减少了评判工作量缺点处理的查询数目少针对小规模的查询集合仍需要耗费十余名标注人员1 2个月的工作时间 5 2Standardtestcollections TheCranfieldcollectionTextRetrievalConference TREC GOV2NIITestCollectionsforIRSystems NTCIR Reuters 21578andReuters R

8、CV1 TheCranfieldcollection CollectedintheUnitedKingdomstartinginthelate1950s itcontains1398abstractsofaerodynamicsjournalarticles asetof225queriesAllowingprecisequantitativemeasuresButtoosmall TextRetrievalConference TREC 1 2 TheU S NationalInstituteofStandardsandTechnology NIST hasrunalargeIRtestbe

9、devaluationseriessince1992OverarangeofdifferenttestcollectionsButthebestknowntestcollectionsaretheonesusedfortheTRECAdHoctrackduringthefirst8TRECevaluationsfrom1992to1999Comprise6CDscontaining1 89milliondocuments mainly butnotexclusively newswirearticles TextRetrievalConference TREC 2 2 TRECs6 8prov

10、ide150informationneedsoverabout528 000newswireandForeignBroadcastInformationServicearticlesRelevancejudgmentsareavailableonlyforthedocumentsthatwereamongthetopkreturnedforsomesystemwhichwasenteredintheTRECevaluation GOV2 Contains25millionGOV2webpagesGOV2isoneofthelargestWebtestcollectionbutstillmore

11、than3ordersofmagnitudesmallerthanthecurrentsizeofthedocumentcollectionsindexedbythelargewebsearchcompanies NIITestCollectionsforIRSystems NTCIR SimilarsizestotheTRECcollectionsFocusingonEastAsianlanguageandcross languageinformationretrieval Reuters 21578andReuters RCV1 MostusedfortextclassificationR

12、euters 21578collectionof21578newswirearticlesReutersCorpusVolume1 RCV1 ismuchlarger consistingof806 791documents 8 3Evaluationofunrankedretrievalsets PrecisionRecallAccuracyFmeasure Precision Precision P isthefractionofretrieveddocumentsthatarerelevant Recall Recall R isthefractionofrelevantdocument

13、sthatareretrieved TheanotherwaytodefinePandR ThecomparisonofPandR TypicalwebsurferspreferPtoRVariousprofessionalsearcherssuchasparalegalsandintelligenceanalystspreferRtoPIndividualssearchingtheirharddiskspreferRtoP ThecomparisonofPandR ThetwoquantitiesclearlytradeoffagainstoneanotherRecallisanon dec

14、reasingfunctionofthenumberofdocumentsretrievedCanalwaysgetarecallof1 butverylowprecision byretrievingalldocumentsforallqueriesPrecisionusuallydecreasesasthenumberofdocumentsretrievedisincreased Whichismoredifficulttomeasure PrecisionRecall 关于查全率的质疑分母是个无法确定的值所以建立在其理论上的查全率也是不实际的相关文献没能被检索出来的原因是什么数据库

15、的设计水平还是用户操作水平是标引的原因还是检索的原因是任何一个数据库都有一个相关的系数存在关于查准率的质疑数据库中存在大量应查到而查不到的文献时查出来的文献就是100 准确有意义吗查准率分母中的不相关文档是如何产生的是系统造成的用户检索时由于表达不清造成的还是用户最终取舍形成的相对查准率关于查全率与查准率关系的质疑1 2 a relevantandretrieved c relevantandunretrieved b irrelevantandretrieved irrelevantandunretrieved 关于查全率与查准率关系的质疑2 2 一般认为是呈反比关系如

16、果a c a 值增大必然是c值减小c值减小有两种原因是c线下移或b线右移其结果必然b值增大所以a b a 值减小反之也成立但是这是建立在一种假设之上那就是a为定量但是定量不是a 而是 c a 如c值下降 a值必然上升所以a是变量所以 b和c之间没有必然的联系可以想象b线和c线能够同时向边线移动 c和b等于0不太可能但不是没有可能事实上检索系统可以同时提高两个指标 Contingencytable TheotherwaytodefinePandR P tp tp fp R tp tp fn Fmeasure1 4 Asinglemeasurethattradesoffprecisionversusrecall 0 1 andthus 2 0 ThedefaultbalancedFmeasureuse 1 Fmeasure2 4 Valuesof 1emphasizerecallItisharmonicmeanratherthanthesimpleraverage Fmeasure3 4 Theharmonicmeanisalwayslessthaneitherthea

展开阅读全文