Crowdsourcing for Social Multimedia at MediaEval 2013：Challenges, Data set, and Evaluation中世纪多媒体社会化众包2013：挑战、数据集与评估

资源描述

《Crowdsourcing for Social Multimedia at MediaEval 2013：Challenges, Data set, and Evaluation中世纪多媒体社会化众包2013：挑战、数据集与评估》由会员分享，可在线阅读，更多相关《Crowdsourcing for Social Multimedia at MediaEval 2013：Challenges, Data set, and Evaluation中世纪多媒体社会化众包2013：挑战、数据集与评估（2页珍藏版）》请在金锄头文库上搜索。

1、Crowdsourcing for Social Multimedia at MediaEval 2013: Challenges, Data set, and EvaluationBabak Loni1, Martha Larson1, Alessandro Bozzon1, Luke Gottlieb21Delft University of Technology, Netherlands2International Computer Science Institute, Berkeley, CA, USA b.loni, m.a.larson, a.bozzontudelft.nl, l

2、ukeicsi.berkeley.eduABSTRACTThis paper provides an overview of the Crowdsourcing for Multimedia Task at MediaEval 2013 multimedia benchmark- ing initiative. The main goal of this task is to assess the potential of hybrid human/conventional computation tech- niques to generate accurate labels for soc

3、ial multimedia con- tent.The task data are fashion-related images, collected from the Web-based photo sharing platform Flickr. Each image is accompanied by a) its metadata (e.g., title, de- scription, and tags), and b) a set of basic human labels collected from human annotators using a microtask wit

4、h a basic quality control mechanism that is run on the AmazonMechanical Turk crowdsourcing platform. The labels reflect whether or not the image depicts fashion, and whether or not the image matches its category (i.e., the fashion-related query that returned the image from Flickr). The basic hu- man

5、 labels were collected such that their noise levels would be characteristic of data gathered from crowdsourcing work- ers without using highly sophisticated quality control. The task asks participants to predict high-quality labels, either by aggregating the basic human labels or by combining them w

6、ith the context (i.e., the metadata) and/or the con- tent (i.e., visual features) of the image.1.INTRODUCTION Creating accurate labels for multimedia content is conven- tionally a tedious, time consuming and potentially high-cost process. Recently, however, commercial crowdsourcing plat- forms such

7、as Amazon Mechanical Turk (AMT) have opened up new possibilities for collecting labels that describe multi-media from human annotators. The challenge of effectively exploiting such platforms lies in deriving one reliable label from multiple noisy annotations contributed the crowdsourc- ing workers.

8、The annotations may be noisy because work-ers are unserious, because the task is difficult, or because of natural variation in the judgments of the worker popu- lation. The creation of a single accurate label from noisy annotations is far from being a trivial task. Simple aggregation algorithms like

9、 majority voting can,to some extent, filter noisy annotations 3. These require several annotations per object to create acceptable quality, incurring relatively high costs. Ipeirotis et al. 1 developed a quality management method which assigns a scalar value tothe workers that reflects the quality o

10、f the workers answers.Copyright is held by the author/owner(s). MediaEval 2013 Workshop,October 18-19, 2013, Barcelona, SpainThis score can be used as a weight for a single label, allowingmore accurate estimation of the final aggregated label. Hybrid human/conventional computing approaches com- bine

11、 human contributed annotations with automatically gen- erated annotations in order to achieve a better overall result. Although the Crowsourcing Task does allow for investiga- tion of techniques that rely only on information from hu- man labels, its main goal is to investigate the potential ofintell

12、igently combining human effort with conventional com- putation. In the following sections we present the overview of the task, and describe the dataset, ground truth and evaluation method it uses.2.TASK OVERVIEW The task requires participants to predict labels for a set of fashion-related images, re

13、trieved from the Web photo- sharing platform Flickr1.Each image belongs to a given fashion category (e.g., dress, trousers, tuxedo). The name of the fashion category of the image is the fashion-related query that was used to retrieve the image from Flickr at the time that the data set was collected.

14、 The process is de- scribed in further detail below. For each image listed in the test set, participants predict two binary labels. Label1 indi- cates whether or not the image is fashion-related, and Label2 indicates whether or not the fashion category of the image correctly characterizes its depict

15、ed content. Three sources of information can be exploited to infer the correct label of an image: a) a set of basic human labels, which are annota- tions collected from crowdworkers using an AMT microtask with a basic quality control mechanism; b) the metadata of the images (such as title, descripti

16、on, comments, geo-tags, notes and context); c) the visual content of the image. Par- ticipants in the task were encouraged to use visual content analysis methods to infer useful information from the image. They were also allowed to collect labels by designing their own microtask (including the quality control mechanism) and running it on a crowdsourcing platform.3.TASK DATASET The dataset for the MediaEval 2013 Crowdsourcing Task consists of two collections of images. Both col

展开阅读全文