家谱关联数据服务平台的开发实践

上传人:小** 文档编号:34151658 上传时间:2018-02-21 格式:DOC 页数:14 大小:114.50KB
返回 下载 相关 举报
家谱关联数据服务平台的开发实践_第1页
第1页 / 共14页
家谱关联数据服务平台的开发实践_第2页
第2页 / 共14页
家谱关联数据服务平台的开发实践_第3页
第3页 / 共14页
家谱关联数据服务平台的开发实践_第4页
第4页 / 共14页
家谱关联数据服务平台的开发实践_第5页
第5页 / 共14页
点击查看更多>>
资源描述

《家谱关联数据服务平台的开发实践》由会员分享,可在线阅读,更多相关《家谱关联数据服务平台的开发实践(14页珍藏版)》请在金锄头文库上搜索。

1、家谱关联数据服务平台的开发实践 夏翠娟 刘炜 陈涛 张磊 上海图书馆系统网络中心 上海图书馆 中国科学院上海生命科学信息中心 摘 要: 数字图书馆对馆藏的揭示,沿袭传统的描述标准(如 MARC),多以文献特征为主,很难直接满足广大读者对文献知识内容进行查询的需求。关联数据技术通过构建关系明确的语义本体,能够很好地提供基于文献知识内容的揭示、导航和检索,通过开放数据重用和与外部数据的互联,丰富了数据的关联性,扩展了数据利用场景,释放了数据的潜能,为基于互联网的数据服务提供了一种基础设施。这是未来数字图书馆进行知识服务的应有之义。上海图书馆以家谱数据作为起点,尝试利用关联开放数据技术重组图书馆传统

2、资源,构建历史文献数据服务平台。该平台经过基于 BIBFRAME 的本体设计,从 RDB 到 RDF 的数据转换,基于关联数据四原则的系统设计和基于语义技术框架的系统开发,支持面向万维网的书目控制,提供针对普通用户的寻根搜索服务和针对专业人士的数据挖掘服务。关键词: 家谱; 数据服务; 关联数据; 开放数据; 作者简介:夏翠娟,Email:,ORCID:0000-0002-1859-6979收稿日期:2016-03-14基金:国家社会科学基金青年项目“W3C 的 RDB2RDF 标准规范在关联数据服务构建中的应用”(编号:13CTQ008)的研究成果之一A Genealogy Data Ser

3、vice Platform Implemented with Linked Data TechnologyXIA Cuijuan LIU Wei CHEN Tao ZHANG Lei Abstract: The description of digital library resources has followed the traditional standard( such as MARC) in the past twenty years. The information, such as title, author, publication information, carrier i

4、nformation,etc, has been well described. However, in this way, it is difficult to directly meet the query requirements of the knowledge implicated in the content. Linked data technologies via building relationships among resources can provide a better way for knowledge organization, description, nav

5、igation and retrieval. By reusing and connecting with the open data, linked data technologies can help enrich the relationships among data, expand data using scene, release the potential energy of the data, and build the architecture of data service on the Web. Shanghai Library is trying to use link

6、ed data technologies to reorganize the traditional library resources in order to meet the requirements of data sharing, reusing, and also bibliographic control in the internet environment. And at the same time, try to build the historical data services platform which can meet differentiated users se

7、rvice needs. Firstly, we designed an ontology based on Bibliographic Framework( BIBFRAME). Secondly, we extracted the surname, person, place, time, event and other entities from the metadata records according to the ontology. Thirdly, we cleaned the data by merging, disambiguation and standardizatio

8、n, and supplemented information for some important properties( e. g. headstream of the surnames and GIS information of the places). Then, we assigned HTTP URI for each entity and described the entities based on the RDF abstract data model. By using the RDB2 RDF data conversion tools which support W3

9、 C R2RML standards and the data processing tools called OpenRefine, we transformed the data format from RDB to RDF, and loaded the RDF data into RDF store called Virtuoso. Finally, we designed the system based on the four principles of linked data, and developed the system based on semantic technolo

10、gies such as Jena, SPARQL, and other data visualization tools. So the system can support bibliographic control in internet environment. That means users can know the genealogy documents location information about nearly 600 organizations all over the world. The open access to all RDF data for the ma

11、chines is based on simple technologies such as content-negotiation and Restful API. There are easy-to-use search services for those who just want to know about the stories of the surname and family, and advanced search services for those who want professional data mining and knowledge discovering. M

12、ost importantly, the platform allows authenticated users to contribute content by submiting comments and suggestions, or modify data directly.After other experts confirm, the modifications would be published openly. All comments and modifications would be recorded automatically. Linked genealogy dat

13、a is the first project to provide open data services based on linked open data technologies in the area of libraries in China. There are some innovation meanings in the methodology of implementation, the process of development and the usage of technological tools. But it is just a starting point for

14、 Shanghai Library. There would be lots of work to do about the authority data, which is still insufficient. And there are more external data sets such as Geonames, DBPedia, VIAF and so on need to mashup with the local data. Finally, there are some unresolved problems such as geographical names autho

15、rity control in a historical view.Keyword: Genealogy; Data service; Linked data; Open data; Received: 2016-03-140 引言开放数据是互联网发展的一个新趋势,数据作为一种极其重要的资源逐渐在世界范围内形成共识。在开放数据大潮中,政府和公共机构拥有最多的公共数据,是数据开放运动的先锋1。2009 年,Data.gov 在美国正式上线,吹响了数据开放运动的号角,澳大利亚的 Data.gov. au,英国的 Data. gov. uk 也紧随其后,到2010 年 11 月,欧盟委员会首次提出“

16、欧盟开放数据战略”,将数据开放运动推向高潮2。纽约时报、英国广播公司等媒体已先后成功实施,图书馆行业也是数据开放运动的积极拥护者,瑞典、美国、匈牙利、英国、德国、西班牙、韩国、日本等国的国家图书馆以及 OCLC 陆续将自己的书目数据或规范数据以关联数据的形式发布, 美国国会图书馆还牵头开展书目数据格式标准的关联数据化。上海图书馆( 以下简称“上图”) 一直非常关注开放数据运动,很早就开始跟踪、研究和尝试开发相关技术,认为这是把数字图书馆带入以数据技术为特征的下一代互联网的新契机。对于上图来说,大量的历史文献资源,如古籍、家谱、尺牍、近代文献、民国文献、档案、照片、笔记、手稿、小报等,虽然从纸质文献到电子文件的数字化工作一直在进行,也一直在提供基本的文献检索服务,然而,想要更好地满足读者需求,必须将其所包含的知识内容描述出来,利用新的技术在互联网上提供服务,让更多人在使用的同时,能够参与系统的优化、迭代和内容建设,进而实现系统的平台化,使其成为读者从事相关学习、交流和研究活动的必经之所。中文家谱是上图最重要的特色文

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 学术论文 > 管理论文

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号