搜索引擎实践课件－金锄头文库

资源描述

《搜索引擎实践课件》由会员分享，可在线阅读，更多相关《搜索引擎实践课件（51页珍藏版）》请在金锄头文库上搜索。

1、Search Engines,Information Retrieval in Practice,All slides Addison Wesley, 2008,Evaluation,Evaluation is key to building effective and efficient search engines measurement usually carried out in controlled laboratory experiments online testing can also be done Effectiveness, efficiency and cost are

2、 related e.g., if we want a particular level of effectiveness and efficiency, this will determine the cost of the system configuration efficiency and cost targets may impact effectiveness,Evaluation Corpus,Test collections consisting of documents, queries, and relevance judgments, e.g.,Test Collecti

3、ons,TREC Topic Example,Relevance Judgments,Obtaining relevance judgments is an expensive, time-consuming process who does it? what are the instructions? what is the level of agreement? TREC judgments depend on task being evaluated generally binary agreement good because of “narrative”,Pooling,Exhaus

4、tive judgments for all documents in a collection is not practical Pooling technique is used in TREC top k results (for TREC, k varied between 50 and 200) from the rankings obtained by different search engines (or retrieval algorithms) are merged into a pool duplicates are removed documents are prese

5、nted in some random order to the relevance judges Produces a large number of relevance judgments for each query, although still incomplete,Query Logs,Used for both tuning and evaluating search engines also for various techniques such as query suggestion Typical contents User identifier or user sessi

6、on identifier Query terms - stored exactly as user entered List of URLs of results, their ranks on the result list, and whether they were clicked on Timestamp(s) - records the time of user events such as query submission, clicks,Query Logs,Clicks are not relevance judgments although they are correla

7、ted biased by a number of factors such as rank on result list Can use clickthough data to predict preferences between pairs of documents appropriate for tasks with multiple levels of relevance, focused on user relevance various “policies” used to generate preferences,Example Click Policy,Skip Above

8、and Skip Next click data generated preferences,Query Logs,Click data can also be aggregated to remove noise Click distribution information can be used to identify clicks that have a higher frequency than would be expected high correlation with relevance e.g., using click deviation to filter clicks f

9、or preference-generation policies,Filtering Clicks,Click deviation CD(d, p) for a result d in position p: O(d,p): observed click frequency for a document in a rank position p over all instances of a given query E(p): expected click frequency at rank p averaged across all queries,Effectiveness Measur

10、es,A is set of relevant documents, B is set of retrieved documents,Classification Errors,False Positive (Type I error) a non-relevant document is retrieved False Negative (Type II error) a relevant document is not retrieved 1- Recall Precision is used when probability that a positive result is corre

11、ct is important,F Measure,Harmonic mean of recall and precision harmonic mean emphasizes the importance of small values, whereas the arithmetic mean is affected more by outliers that are unusually large More general form is a parameter that determines relative importance of recall and precision,Rank

12、ing Effectiveness,Summarizing a Ranking,Calculating recall and precision at fixed rank positions Calculating precision at standard recall levels, from 0.0 to 1.0 requires interpolation Averaging the precision values from the rank positions where a relevant document was retrieved,Average Precision,Av

13、eraging Across Queries,Averaging,Mean Average Precision (MAP) summarize rankings from multiple queries by averaging average precision most commonly used measure in research papers assumes user is interested in finding many relevant documents for each query requires many relevance judgments in text c

14、ollection Recall-precision graphs are also useful summaries,MAP,Recall-Precision Graph,Interpolation,To average graphs, calculate precision at standard recall levels: where S is the set of observed (R,P) points Defines precision at any recall level as the maximum precision observed in any recall-pre

15、cision point at a higher recall level produces a step function defines precision at recall 0.0,Interpolation,Average Precision at Standard Recall Levels,Recall-precision graph plotted by simply joining the average precision points at the standard recall levels,Average Recall-Precision Graph,Graph fo

16、r 50 Queries,Focusing on Top Documents,Users tend to look at only the top part of the ranked result list to find relevant documents Some search tasks have only one relevant document e.g., navigational search, question answering Recall not appropriate instead need to measure how well the search engine does at retrieving relevant documents at very high ranks,Focusing on Top Documents,Precision at Rank R R typically 5, 10, 20 easy to compute, average, understand not sensitive to rank positions l

展开阅读全文