Crowdsourcing tasks in Linked Data management

资源描述

《Crowdsourcing tasks in Linked Data management》由会员分享，可在线阅读，更多相关《Crowdsourcing tasks in Linked Data management（12页珍藏版）》请在金锄头文库上搜索。

1、Crowdsourcing tasks in Linked Data managementElena Simperl1, Barry Norton2, and Denny Vrandecic31;3Institute AIFB, Karslruhe Institute of Technology, Germany2Ontotext AD, Bulgaria1elena.simperlkit.edu, ,3denny.vrandecickit.eduAbstract. Many aspects of Linked Data management including exposing legacy

2、data and applications to semantic formats, designing vocabularies to describeRDF data, identifying links between entities, query processing, and data curation are necessarily tackled through the combination of human effort with algorith-mic techniques. In the literature on traditional data managemen

3、t the theoreticaland technical groundwork to realize and manage such combinations is being es-tablished. In this paper we build upon and extend these ideas to propose a frame-work by which human and computational intelligence can co-exist by augmentingexisting Linked Data and Linked Service technolo

4、gy with crowdsourcing func-tionality. Starting from a motivational scenario we introduce a set of generic taskswhich may feasibly be approached using crowdsourcing platforms such as Ama-zons Mechanical Turk, explain how these tasks can be decomposed and trans-lated into MTurk projects, and roadmap t

5、he extensions to SPARQL, D2RQ/R2Rand Linked Data browsing that are required to achieve this vision.1 IntroductionOne of the basic design principles in Linked Data is that its usage in applications shouldbe amenable to a high level of automation. Standardized interfaces should allow to loaddata direc

6、tly from the Web, resolve descriptions of unknown resources, and automati-cally integrate data sets published by different parties according to various vocabularies.But the actual experience with developing applications that consume Linked Data soonreveals the fact that for many components of a Link

7、ed Data application this is hardlythe case, and that many aspects of Linked Data management remain, for principledor technical reasons, heavily reliant on human intervention. This includes exposinglegacy data and applications to semantic formats, designing vocabularies to describeRDF data, identifyi

8、ng links between entities, vocabulary mapping, query processingover distributed data sets, and data curation, to name only several of the more prominentexamples. In all of these areas, human abilities are indispensable for the resolution ofthose particular tasks that are acknowledged to be hardly ap

9、proachable in a systematic,engineering-driven fashion; and also, though to a lesser extent, for those tasks that havebeen subject to a wide array of techniques that attempt to perform them automatically,but yet require human input to produce training data and validate their results.In previous work

10、of ours we have extensively discussed the importance of com-bining human and computational intelligence to handle such inherently human-driventasks, which, abstracting from their technical flavor in the context of Linked Data, tendto be highly contextual and often knowledge-intensive, thus challengi

11、ng to fully auto-mate through algorithmic approaches 2, 19. Instead of aiming at such fully automatedsolutions, which often do not reach a level of quality required to create useful resultsand applications,1 we propose a framework in which such human computation becomesan integral part of existing L

12、inked Data and Linked Service technology as crowdsourc-ing functionality exposed via platforms such as Amazons Mechanical Turk.2 We arguethat the types of tasks that are decisively required to run a Linked Data application canlargely be uniformly decomposed, and a formal, declarative description of

13、the domain,scope and purpose of the application can form the basis for the automatic design andseamless operation of crowdsourcing features to overcome the limitations and comple-ment computational methods and techniques. As a next step, we explain how these taskscan be decomposed and translated int

14、o MTurk projects, and roadmap the extensions toSPARQL, D2RQ/R2R and Linked Data browsing that are required to turn the access tohuman intelligence in the context of specific applications into a commodity.2 Human intelligence tasks in Linked Data managementTwo of the primary advantages claimed for ex

15、posing data sets in the form of LinkedData are improvements and uniformity, allowing provision at Web-scale, in data discov-ery and data integration. In the former case a follow-your-nose approach is enabled,wherein links between data sets facilitate browsing through the Web of Data. On thetechnical

16、 level previously undiscovered data is aggregated, and enriches the semanticsof known resources (ad hoc integration), by virtue of the RDFs uniform data model.True integration across this Web of Data, however, is hampered by the publish first,refine later philosophy encouraged by the Linking Open Data movement. While thishas resulted in an impressive amount of Linked Data online, quality of the actual dataand of the links connecting data sets is som

展开阅读全文