面向图书馆关联数据的自动问答技术研究

资源描述

《面向图书馆关联数据的自动问答技术研究》由会员分享，可在线阅读，更多相关《面向图书馆关联数据的自动问答技术研究（15页珍藏版）》请在金锄头文库上搜索。

1、面向图书馆关联数据的自动问答技术研究欧石燕唐振贵南京大学信息管理学院摘要：早期针对语义网的自动问答主要是面向单一 RDF 数据集,随着网络上相互关联数据集的急速增加,迫切需要将自动问答扩展到多个 RDF 数据集,但同时在语义标注、答案整合方面也带来了更大的难度与挑战。本文提出了一种面向图书馆关联数据的自动问答新方法,通过将自然语言提问转换为结构化的 SPARQL 查询,从图书馆领域相互关联的五个 RDF 数据集中提取特定答案。该方法的创新点在于,将问句分为涉及一个数据集的简单句和涉及多个数据集的复杂句分别进行处理,又将简单句分为查询属性和查询实例两种类别分别制定 SPARQL 查询

2、构建规则,将复杂句分解成若干个简单句进行处理,有利于 SPARQL 查询的构建和答案的整合。通过实验测评,100 个问句的回答精确率达到 91%,表明这是一种行之有效的问答方法,对于促进关联数据在图书馆中的应用具有重要意义。关键词：自动问答; 关联数据; RDF 数据集; SPARQL 查询; 语义标注; 本体; 作者简介：欧石燕;Email:;ORCID:0000-0001-8617-6987收稿日期：2015-06-24基金：国家社科基金项目“基于 SOA 架构的术语注册和服务系统构建与应用研究”(编号:11BT0023)的研究成果之一A Question Answering Metho

3、d over Library Linked DataOU Shiyan TANG Zhengui Abstract： Since the advent of Linked Data,more and more structured data have been published on the Web in Linked Data format,including a large amount of bibliographic data,academic information and controlled vocabularies from libraries and other relat

4、ed institutions.Therefore,the issue of how to effectively access these interlinked RDF data becomes of crucial importance.SPARQL provides a standard way to query RDF data;however,it is very difficult for ordinary users to construct SPARQL queries.Question answering,which can provide an easy-to-use n

5、atural language interface,is undoubtedly an ideal solution.Earlier question answering research on the Semantic Web is oriented to a single RDF dataset.With the growth of interlinked RDF datasets on the Web,there is an urgent need to extend question answering from a single RDF dataset to multiple RDF

6、 datasets,which thus causes more problems and challenges in semantic annotation and answer integration.This paper proposes a novel question answering method over Library Linked Data,which transforms a natural language question into a structured SPARQL query to retrieve answers from five interlinked

7、RDF datasets in libraries,including bibliographic data,thesauri,events,people/organizations and locations.The question answering procedure includes three main steps:1) Index construction:extract instance names(i.e.named entities) from RDF data and the lexical labels of ontology classes and propertie

8、s from OWL files,and offline construct two indexes(one for named entities and one for ontology terms) using the open source information retrieval toolkit LUCENE;2) Question preprocessing:perform Chinese word segmentation,named entity recognition,and semantic annotation based on the constructed index

9、es,categorize questions into two categories,i.e.simple questions involving a single RDF dataset and complex questions involving multiple RDF datasets,according to the number of the involved ontologies and the number of the classes and their relationships,and furthermore categorize simple questions i

10、nto two types,i.e.the A type querying attributes and the B type querying names;3) Question answering:for a simple question,construct a SPARQL query based on the pre-defined rules;for a complex question,decompose it into several simple sub-questions,process each sub-question using the simple question

11、 method,and then combine the results of the sub-questions to construct a SPARQL query for the whole complex question.The innovation of this proposed question answering method lies in transforming question answering over multiple RDF datasets into the one over a single RDF dataset in order to facilit

12、ate the construction of SPARQL queries and answer integration,by decomposing a complex question into several simple questions based on its dependency parsing result.The experiment results show that this is an effective question answering method which greatly simplifies the processing of complex ques

13、tions and obtains an answer accuracy of 88%for complex questions and 91%for both simple and complex questions.However,this method can only be used to answer the questions which are stated explicitly in RDF datasets,and is not able to answer the questions which require reasoning and computing,for exa

14、mple,those containing more and the most.Question answering provides a straightforward and easy-to-use manner of accessing Linked Data.It is a key step in the application of Linked Data in the real world.Thus,the research content of this paper has a very significant value to facilitate the applicatio

15、n of Linked Data in libraries.It is an earlier study about Chinese question answering over Linked Data,and also an earlier study focusing on Library Linked Data.Keyword： Question answering; Linked Data; RDF dataset; SPARQL query; Semantic annotation; Ontology; Received： 2015-06-240 引言自 2006 年伯纳斯李首次

16、提出“关联数据”以来,作为一种在网络上发布结构化数据的方式,关联数据受到学术界和企业界的极大关注,越来越多的机构开始在网络上发布自己的关联数据集。关联开放数据云( Linked OpenData Cloud) 已由 2007 年的 12 个 RDF 数据集发展到现今的近 600 个,呈井喷式增长。图书馆拥有且一直持续不断地生成大量高质量的结构化数据,是关联数据的天然实践者与提供者。目前,图书馆数据已经成为关联数据云的一个重要来源,大量书目数据( 如 LIBRIS、World Cat) 、词表数据( 如 LCSH、AGROVOC) 和学术论文数据( 如 DBLP、Cite Seer) 被发布为关联数据,约占整个关联数据云的 9. 5%。随着关联数据在网络上不断增多,如何直接、有效地查询和访问这些结构化数据成为亟须解决的问题。基于 RDF 数据模型的关联数据需采用 SPARQL 语言才能进行查询,要求用户既要了解底层数据所使用的描述词汇和结构,又要具备构建复杂 SPARQL 查询的能力,这对于普通用户来说是极为困难的

展开阅读全文