Optimizing Task Assignment for Crowdsourcing

资源描述

《Optimizing Task Assignment for Crowdsourcing》由会员分享，可在线阅读，更多相关《Optimizing Task Assignment for Crowdsourcing（14页珍藏版）》请在金锄头文库上搜索。

1、Optimizing Task Assignment for CrowdsourcingEnvironmentsLuyi Moy, Reynold Chengy, Xuan S. Yangy, Chenghui Reny,Siyu Leiy, Eric Loz, Ben Kaoy, David W. Cheungyy University of Hong Kong, Pokfulam Road, Hong Kongz Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kongyflymo, ckcheng, xyang2, ch

2、ren, sylei, kao, dcheunggcs.hku.hk z ericlocomp.polyu.edu.hkABSTRACTIn a crowdsourcing system, Human Intelligence Tasks (HIT-s) (e.g., sentence translation, photo matching, providingkeywords for videos) can be conveniently speci ed. TheseHITs are available to a large pool of workers, who are paidupo

3、n completing their selected HITs. Since these workersmay have di erent capabilities, some di cult HITs may notbe satisfactorily performed. If more workers are employed toperform these HITs, their result quality could be statisticallyimproved. In this paper, we address the important problemof decidin

4、g the number of workers (or plurality) for a givenset of HITs. We propose a theoretically optimal algorithmgiven a xed budget (i.e., the amount of money paid forperforming HITs). This solution determines the best plural-ity of each HIT. We also observe that for a HIT that involvesmultiple-choice que

5、stions, its quality increases monotonical-ly with the number of workers, albeit in a decreasing rate.This leads us to develop an e cient greedy algorithm withprovably close-to-optimal e ectiveness. We further proposethe early-stop strategy, which avoids the budget to be fullyexhausted, yet still ach

6、ieves a high quality with statisticalguarantees. Extensive experiments on synthetic and realdatasets are performed to validate our approaches.1. INTRODUCTIONRecently, there has been a rise of interest on crowdsourc-ing systems, such as the Amazon Mechanical Turk (AMT) 1and CrowdFlower 2. These Inter

7、net-based systems harnesshuman e ort to solve problems that are easy for humanusers but di cult for computers 3, 68. Typically, a re-quester announces a set of Human Intelligent Tasks (or HIT-s). Through a user interface, a worker selects her preferredHIT(s) and submits her results to the system. If

8、 the re-quester is satis ed about the results, the worker is duly re-warded. Example HITs include question answering 3,8,15,1https:/2http:/entity resolution 6, 17, 18, sorting and join 7, 9, data l-tering 14, and image tagging 19.For these HIT results to be useful, the workers involvedneed to perfor

9、m well. In practice, workers could be casualInternet users and their results are hardly perfect 3, 68;they may make careless mistakes, or misinterpret the HITrequirements. Moreover, a HIT result may only describesome of the many aspects of the true answer. Consider a HITthat inquires: What is the UR

10、L http:/ (Google Earth homepage) about? A worker may givean answer map, navigation, another worker may say thatit is about scenery, photos, while a third one may give ananswer weather. Thus, a single result may not capture allaspects of an answer.To combat the above problems, a requester is suggeste

11、dto assign a su cient number of workers to a HIT 2,3,68.In AMT, for instance, a requester is asked to specify theplurality of a HIT i.e., the number of workers required toperform that HIT. As advised by AMT 2, the value ofplurality should be big enough. This allows a requester toamass a su cient amo

12、unt of information from workers, andmake a statistically correct decision 3,68. In the previousexample, if 95 out of 100 workers give the answer map,navigation, then there is a high chance that the answeris correct. In practice, plurality is bounded, because: 1)a HIT is associated with a cost for pa

13、ying the worker andthe crowdsourcing system; 2) a requester may only have alimited budget for rewarding workers; and 3) a requesterhas to spend some time to verify the HIT results. Thesefactors should be taken account during the assignment ofHIT plurality values. Due to the increase in popularity in

14、crowdsourcing systems in recent years, the plurality-settingproblem has attracted plenty of attention (e.g., 3,7,8).Our goal is to develop algorithms for deciding HIT plural-ity. For requesters who have to manage a large number ofHITs, these solutions could be very useful. We found that on28 October

15、 2012, requesters submitting the 10 highest num-ber of HITs to AMT own an average of 9,000 HITs; the 30requesters with the largest number of HITs contribute over3,600 HITs. Hence, it is not uncommon for a requester to begiven the task of handling thousands of HITs. Our goal isto design solutions to

16、alleviate their burden of setting HITplurality values.Speci cally, given a budget provided by a requester, we s-tudy the problem of e ectively assigning plurality values to aset of HITs. As we will discuss, a good plurality assignmentsolution should consider factors like the capability of work-ers, the amount of reward, and the nature of the HIT. We1tackle this intricate problem for an important type

展开阅读全文