《ranking annotators for crowdsourced tasks》由会员分享,可在线阅读,更多相关《ranking annotators for crowdsourced tasks(9页珍藏版)》请在金锄头文库上搜索。
1、Ranking annotators for crowdsourced labeling tasksVikas C. RaykarSiemens Healthcare, Malvern, PA, USAShipeng YuSiemens Healthcare, Malvern, PA, USAAbstractWith the advent of crowdsourcing services it has become quite cheap and reason-ably effective to get a dataset labeled by multiple annotators in
2、a short amount oftime. Various methods have been proposed to estimate the consensus labels bycorrecting for the bias of annotators with different kinds of expertise. Often wehave low quality annotators or spammersannotators who assign labels randomly(e.g., without actually looking at the instance).
3、Spammers can make the cost ofacquiring labels very expensive and can potentially degrade the quality of the con-sensus labels. In this paper we formalize the notion of a spammer and definea score which can be used to rank the annotatorswith the spammers having ascore close to zero and the good annot
4、ators having a high score close to one.1 Spammers in crowdsourced labeling tasksAnnotating an unlabeled dataset is one of the bottlenecks in using supervised learning to build goodpredictive models. Getting a dataset labeled by experts can be expensive and time consuming. Withthe advent of crowdsour
5、cing services (Amazons Mechanical Turk being a prime example) it hasbecome quite easy and inexpensive to acquire labels from a large number of annotators in a shortamount of time (see 8, 10, and 11 for some computer vision and natural language processingcase studies). One drawback of most crowdsourc
6、ing services is that we do not have tight controlover the quality of the annotators. The annotators can come from a diverse pool including genuineexperts, novices, biased annotators, malicious annotators, and spammers. Hence in order to get goodquality labels requestors typically get each instance l
7、abeled by multiple annotators and these multipleannotations are then consolidated either using a simple majority voting or more sophisticated meth-ods that model and correct for the annotator biases 3, 9, 6, 7, 14 and/or task complexity 2, 13, 12.In this paper we are interested in ranking annotators
8、 based on how spammer like each annotator is.In our context a spammer is a low quality annotator who assigns random labels (maybe because theannotator does not understand the labeling criteria, does not look at the instances when labeling, ormaybe a bot pretending to be a human annotator). Spammers
9、can significantly increase the cost ofacquiring annotations (since they need to be paid) and at the same time decrease the accuracy of thefinal consensus labels. A mechanism to detect and eliminate spammers is a desirable feature for anycrowdsourcing market place. For example one can give monetary b
10、onuses to good annotators anddeny payments to spammers.The main contribution of this paper is to formalize the notion of a spammer for binary, categorical,and ordinal labeling tasks. More specifically we define a scalar metric which can be used to rank theannotatorswith the spammers having a score c
11、lose to zero and the good annotators having a scoreclose to one (see Figure 4). We summarize the multiple parameters corresponding to each annotatorinto a single score indicative of how spammer like the annotator is. While this spammer score wasimplicit for binary labels in earlier works 3, 9, 2, 6
12、the extension to categorical and ordinal labels isnovel and is quite different from the accuracy computed from the confusion rate matrix. An attemptto quantify the quality of the workers based on the confusion matrix was recently made by 4 wherethey transformed the observed labels into posterior sof
13、t labels based on the estimated confusion1matrix. While we obtain somewhat similar annotator rankings, we differ from this work in that ourscore is directly defined in terms of the annotator parameters (see 5 for more details).The rest of the paper is organized as follows. For ease of exposition we
14、start with binary labels( 2) and later extend it to categorical ( 3) and ordinal labels ( 4). We first specify the annotatormodel used, formalize the notion of a spammer, and propose an appropriate score in terms of theannotator model parameters. We do not dwell too much on the estimation of the ann
15、otator modelparameters. These parameters can either be estimated directly using known gold standard 1 or theiterative algorithms that estimate the annotator model parameters without actually knowing the goldstandard 3, 9, 2, 6, 7. In the experimental section ( 6) we obtain rankings for the annotator
16、s usingthe proposed spammer scores on some publicly available data from different domains.2 Spammer score for crowdsourced binary labelsAnnotator model Let yji 0,1 be the label assigned to the ith instance by the jth annotator, andlet yi 0,1 be the actual (unobserved) binary label. We model the accuracy of the annotatorseparately on the positive and the negative examples. If the true label is one, the sensitivit