2、的个性化搜索算法可以显著提高搜索质量。当前搜索引擎已经成为网民最普遍的辅助检索Web 信息的工具。对于人们给定的查询,目前的网络搜索引擎返回的检索结果还令人满意,但是现在搜索引擎一个很大的缺陷是不能辨析用户的搜索意图。搜素引擎通常可以返回数以千计的结果,但只有极少数满足用户的需要。究其原因主要是现在通用的搜索引擎仅以用户查询关键字作为惟一的输入;而光凭关键字本身并不能完全代表不同用户的不相同的查询意图。假如系统掌握了用户的个人爱好信息,毫无疑问能够向用户提供更符合兴趣的查询结果。众所周知,每个人对查询结果的是否符合各自的需求的理解是各不相同的;个性化搜索目的是针对相同的查询,向不同的用户提供不

3、同的且更满足其搜索意图的搜索结果。个性化搜索是当今搜索引擎领域热门的研究方向之一。在本文中,我们研究了通过使用用户反馈改进搜索质量、通过用户兴趣模型过滤和重排序搜索结果、通过用户扩展捕捉用户潜在的查询意图等途径来共同实现个性化搜索的方法。用户反馈通过显式或隐式的方式来反映出用户的兴趣爱好;用户兴趣模型通过学习用户反馈获得的反映用户喜好的文档后,被用来重排- I - 摘要序初始的查询文档,使之体现个性化的结果;查询扩展是系统通过对用户查询关键字进行扩展,并经过用户兴趣模型的过滤,主动地向用户提供潜在的符合用户查询意图的查询关键字。在本文中,我所进行的研究主要包括: 提出了综合了用户反馈、用户兴趣

4、模型和用户查询扩展等方法的基于文本的电视节目个性化搜索系统的架构设计; 提出了多兴趣的用户兴趣模型的建立、动态更新算法; 提出了利用语义库和使用基于字符串相似的搜索日志IDF 过滤等两种用户查询关键字扩展方法; 提出了一种高效的变长索引压缩算法。关键词:个性化搜索,用户兴趣模型,查询扩展,用户反馈- II - ABSTRACT RESEARCH ON PERSONALIZED TV PROGRAMS SEARCH ABSTRACT Watching TV programs is the most popular way of spending our free time in China. B

5、ut with the high development of TV technology, therere so many kinds of TV programs around us that it is really difficult for us to choose the programs we like in a short time. Since most TV programs have subtitles, the search of TV programs can be converted to do search on the subtitles. In order t

6、o help people solve the problem, a joint research whose aim is to help users find their favorite TV programs rapidly is carried by Ubiquitous Digital lab of Shanghai Jiao Tong University and Hitachi. A small-personalized search system prototype for TV Programs is developed to demonstrate one of the

7、applications of the technique presented in this paper. In the end of this paper, we show that such personalization algorithms can significantly improve search results. Nowadays search engines are widely used by people all over the word. While current web search engines do a good job in retrieving re

8、sults to satisfy certain peoples needs for a given query, they do not do a very good job in discerning individuals search goals. As a result, users can get thousands of search results, but few results meet users need. The reason is that keywords that we use to search are not always an appropriate me

9、ans of locating the information in which a user is interested. Assuming that the system knows sufficient information about users personal preference, it goes with saying that it can provide them much better search results. Since different people have different interpretations of what is relevant, pe

10、rsonalized search is trying to show different results that are most likely - I - ABSTRACT relevant to different people on the same search. In order to provide personalize service for users, we use a compound methods of user profile, user feedback and query expansion. We mainly employ user profile to

11、 do the personalized search. The system maintains the profiles of its users that represent current users multiple interests. Users interests which are learned through positive and negative user feedback. According to these profiles, a set of relevant documents retrieved from the document repository

12、is recommended to it users. Moreover, our research suggests that query expansion is also of great help to the personalized search because we can approximate users query representations through query expansion. To summarize, my main research work in this thesis reads as follows: Design personalized s

13、earch architecture for TV programs that uses a compound methods of user profile, user feedback and query expansion. Design the use profile structure which can represent multiple interest categories for a single use and develop a dynamic learning algorithm to adapt to the changing users interests. De

14、sign two schemes to perform query expansion through Corpus and IDF analysis of user query similar log words. Design a high efficiency of index compression algorithm. KEY WORDS: personalized, user profile, query expansion, user feedback- II - 上海交通大学学位论文原创性声明本人郑重声明:所呈交的学位论文,是本人在导师的指导下,独立进行研究工作所取得的成果。除

15、文中已经注明引用的内容外,本论文不包含任何其他个人或集体已经发表或撰写过的作品成果。对本文的研究做出重要贡献的个人和集体,均已在文中以明确方式标明。本人完全意识到本声明的法律结果由本人承担。学位论文作者签名:宋懿日期:2023 年1 月 18 日上海交通大学学位论文版权使用授权书本学位论文作者完全了解学校有关保留、使用学位论文的规定,同意学校保留并向国家有关部门或机构送交论文的复印件和电子版,允许论文被查阅和借阅。本人授权上海交通大学可以将本学位论文的全部或部分内容编入有关数据库进行检索,可以采用影印、缩印或扫描等复制手段保存和汇编本学位论文。保密,在年解密后适用本授权书。本学位论文属于不保密

16、。(请在以上方框内打“”)学位论文作者签名:宋懿指导教师签名:傅育熙日期:2023 年 1 月 18 日日期:2023 年 1 月 18 日第1 章绪论1.1 研究背景电视节目是人们普通的娱乐方式。然而随着电视技术的高速发展,电视节目日益丰富的,人们不知不觉中已深处在电视节目的海洋中。电视节目包含了文本字幕,因而电视节目的搜索可间接地转换为对电视文本的搜索,本文主要研究了基于文本的电视个性化的电视节目搜索。搜索引擎是伴随着互联网的发展而不断发展的,由于互联网已经成为人们学习工作和生活中不可缺少的平台,搜索也成为了网民必备的技能。进入21 世纪,随着Internet 技术和应用的快速发展,搜索引擎已经成为人们从浩瀚的Web 信息资源中找到所需的信息基本工具。据艾瑞发布的



