modeling participation behaviors in design crowdsourcing ... · in design crowdsourcing, a contest...

14
Modeling Participation Behaviors in Design Crowdsourcing Using A Bipartite Network-Based Approach Zhenghui Sha * Assistant Professor Department of Mechanical Engineering University of Arkansas Fayetteville, Arkansas 72701 Ashish M. Chaudhari Graduate Research Assistant School of Mechanical Engineering Purdue University West Lafayette, Indiana 47907 Jitesh H. Panchal Associate Professor School of Mechanical Engineering Purdue University West Lafayette, Indiana 47907 Abstract This paper analyzes participation behaviors in de- sign crowdsourcing by modeling interactions between par- ticipants and design contests as a bipartite network. Such a network consists of two types of nodes, participant nodes and design contest nodes, and the links indicating participation decisions. The exponential random graph models (ERGMs) are utilized to test the interdependence between participants’ decisions. ERGMs enable the utilization of different network configurations (e.g., stars and triangles) to characterize dif- ferent forms of dependencies, and to identify the factors that influence the link formation. A case study of an online design crowdsourcing platform is carried out. Our results indicate that designer, contest, incentive, and factors of dependent relations have significant effects on participation in online contests. The results reveal some unique features about the effects of incentives, e.g., the fraction of total prize allocated to the first prize negatively influences participation. Further, we observe that the contest popularity modeled by the al- ternating k-star network statistic has a significant influence on participation, whereas associations between participants modeled by the alternating 2-path network statistic does not. These insights are useful to system designers for initiating effective crowdsourcing mechanisms to support product de- sign and development. The approach is validated by apply- ing the estimated ERGMs to predict participants’ decisions and comparing with their actual decisions. * Corresponding author: [email protected] 1 INTRODUCTION In design crowdsourcing, a contest sponsor specifies a design problem to a crowd of engineers or designers. Par- ticipants decide how much effort and resources to expend working on solutions, and submit their solutions to the con- test sponsor. The contest sponsor evaluates all solutions and picks the best solutions. The contest sponsor awards mone- tary prizes or gifts with comparable value to the participants with winning solutions. Such an execution of design crowd- sourcing is called a crowdsourcing contest. For the effective use of crowdsourcing contests in en- gineering design applications, it is necessary to understand how the characteristics of the design problem and payment structure influence participants’ decisions. The existing game-theoretical studies of crowdsourcing contests predict that contest-related factors such as the number of prizes and entry fee may increase or decrease participant effort and the quality of solutions depending upon participants’ risk pref- erences and homogeneity in their problem-specific knowl- edge [1–4]. The results of our prior studies on design crowd- sourcing contests indicate that participants’ decisions depend not only on contest-related factors, such as the number of prizes, payment structure, and prize amounts, but also on problem-related factors, such as design problem, problem description, cost of resources and evaluation criteria [5–8]. Although existing studies are useful in predicting the ef- fects of various factors on contest outcomes, there is a lack of knowledge about how inherent relations between partici- pants or between contests influence participation decisions. On online crowdsourcing platforms such as GrabCAD [9] - 1 -

Upload: others

Post on 09-Jun-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modeling Participation Behaviors in Design Crowdsourcing ... · In design crowdsourcing, a contest sponsor specifies a design problem to a crowd of engineers or designers. Par-ticipants

Modeling Participation Behaviors in DesignCrowdsourcing Using A Bipartite Network-Based

Approach

Zhenghui Sha∗

Assistant ProfessorDepartment of Mechanical Engineering

University of ArkansasFayetteville, Arkansas 72701

Ashish M. ChaudhariGraduate Research Assistant

School of Mechanical EngineeringPurdue University

West Lafayette, Indiana 47907

Jitesh H. PanchalAssociate Professor

School of Mechanical EngineeringPurdue University

West Lafayette, Indiana 47907

Abstract This paper analyzes participation behaviors in de-sign crowdsourcing by modeling interactions between par-ticipants and design contests as a bipartite network. Such anetwork consists of two types of nodes, participant nodes anddesign contest nodes, and the links indicating participationdecisions. The exponential random graph models (ERGMs)are utilized to test the interdependence between participants’decisions. ERGMs enable the utilization of different networkconfigurations (e.g., stars and triangles) to characterize dif-ferent forms of dependencies, and to identify the factors thatinfluence the link formation. A case study of an online designcrowdsourcing platform is carried out. Our results indicatethat designer, contest, incentive, and factors of dependentrelations have significant effects on participation in onlinecontests. The results reveal some unique features about theeffects of incentives, e.g., the fraction of total prize allocatedto the first prize negatively influences participation. Further,we observe that the contest popularity modeled by the al-ternating k-star network statistic has a significant influenceon participation, whereas associations between participantsmodeled by the alternating 2-path network statistic does not.These insights are useful to system designers for initiatingeffective crowdsourcing mechanisms to support product de-sign and development. The approach is validated by apply-ing the estimated ERGMs to predict participants’ decisionsand comparing with their actual decisions.

∗Corresponding author: [email protected]

1 INTRODUCTION

In design crowdsourcing, a contest sponsor specifies adesign problem to a crowd of engineers or designers. Par-ticipants decide how much effort and resources to expendworking on solutions, and submit their solutions to the con-test sponsor. The contest sponsor evaluates all solutions andpicks the best solutions. The contest sponsor awards mone-tary prizes or gifts with comparable value to the participantswith winning solutions. Such an execution of design crowd-sourcing is called a crowdsourcing contest.

For the effective use of crowdsourcing contests in en-gineering design applications, it is necessary to understandhow the characteristics of the design problem and paymentstructure influence participants’ decisions. The existinggame-theoretical studies of crowdsourcing contests predictthat contest-related factors such as the number of prizes andentry fee may increase or decrease participant effort and thequality of solutions depending upon participants’ risk pref-erences and homogeneity in their problem-specific knowl-edge [1–4]. The results of our prior studies on design crowd-sourcing contests indicate that participants’ decisions dependnot only on contest-related factors, such as the number ofprizes, payment structure, and prize amounts, but also onproblem-related factors, such as design problem, problemdescription, cost of resources and evaluation criteria [5–8].

Although existing studies are useful in predicting the ef-fects of various factors on contest outcomes, there is a lackof knowledge about how inherent relations between partici-pants or between contests influence participation decisions.On online crowdsourcing platforms such as GrabCAD [9]

− 1 −

Page 2: Modeling Participation Behaviors in Design Crowdsourcing ... · In design crowdsourcing, a contest sponsor specifies a design problem to a crowd of engineers or designers. Par-ticipants

and Amazon Mechanical Turk [10], multiple crowdsourc-ing contests are open for participation at the same time, anda participant also makes decisions about which contests toparticipate in. In such cases, the external connections be-tween participants and the stated characteristics of contestsare likely to generate dependencies. If two participants knoweach other through a common workplace, their individualparticipation decisions may be dependent (peer effect). Iftwo participants are aware of each others’ submissions inpast contests, their decisions to participate in the same con-test in the future are related (association effect). A partic-ipant may decide to participate in a contest because manyother participants are participating in the same contest (pop-ularity effect). Two contests with the same sponsor, similardesign problems, or similar payment structure may attractparticipation of the same set of participants (homophily ef-fect).

The primary objective of this study is to understandwhether and to what extent the aforementioned participantand contest dependencies influence the participation in de-sign crowdsourcing contests. Understanding the role of de-pendencies in participation will help in identifying a set ofthe most critical factors that drive participation decisions, es-tablishing better predictive models of participation, and forcreating design guidelines for future contests. We also eval-uate the effects of problem- and contest-related factors onparticipation with and without incorporating dependencies.Although there are various ways to evaluate the contest out-comes [11], we focus on participation because the diversityand the the quality of solutions depends on the number ofpeople participating in a contest. For instance, more numberof participants implies a greater number of solutions, and apotential for greater diversity. For the design problems wherespecific designs need to be improved, a greater number ofparticipants means higher chances of achieving a better solu-tion.

In this study, we propose a bipartite network-based ap-proach to modeling participation decisions in crowdsourcingcontests between multiple participants and multiple contests.Each link in a network represents an instance of participa-tion between a participant and a contest. For a scenario witha single contest and multiple participants, a star network is anatural representation, where the focal node denotes a con-test while the connected nodes denote its participants [12].On the other hand, a bipartite network is more suitable forthe a design crowdsourcing scenario with multiple contestsand participants. A bipartite network representation dividesnodes into two groups, called modes. Each node is the focalnode of a star network connecting it to a set of nodes in theother mode (see Figure 1). The bipartite network representa-tion assumes that a link exists between nodes from differentmodes, so that a connection between two participants or twocontests is irrelevant when representing participation deci-sions.

With a bipartite network setting, we employ the Expo-nential Random Graph Model (ERGM) to model participa-tion decisions as the link formation process in network. TheERGM models the probability of a link formation in terms

Fig. 1. An example of bipartite network representation for multi-ple number of contests (c1,c2, . . .cn) and multiple number of par-ticipants (p1, p2, . . . pm). Links joining contest nodes to participatenodes represent respective participation.

of the statistics of node attributes and the assumed networkconfigurations. The coefficients associated with these statis-tics represent the effects of problem- and contest-related fac-tors, and the effects of dependencies in the formation processrepresenting participants’ decision-making in a crowdsourc-ing contest. This approach is demonstrated in a case study onparticipation decisions in GrabCAD, an online design crowd-sourcing platform [9].

The rest of the paper is structured as follows. Sec-tion 2 provides a review of literature on crowdsourcing stud-ies as well as the applications of ERGM in engineering de-sign. Section 3 presents theoretical details of ERGM andthe network configurations adopted in this study. Section 4introduces GrabCAD contests, its participants and the cor-responding dataset. Section 5 discusses the implementationof ERGMs on the GrabCAD dataset, and the results of theanalysis. Section 6 presents evaluation and validation of theERGMs estimated from the GrabCAD dataset. Finally, Sec-tion 7 summarizes the paper’s findings, limitations and futurework.

2 REVIEW OF RELEVANT LITERATUREIn this section, a review of the literature on crowdsourc-

ing studies and ERGM research in engineering design arepresented, and the research gaps are highlighted.

2.1 Crowdsourcing studies in engineering designIn the theoretical studies of crowdsourcing contests,

game-theoretic models have been developed to model theeffects of prize structure on participants’ effort and the re-sources (cost) they invest. The authors’ previous work sum-marizes such models and insights from them [6, 8]. The re-sults in these models are based on the assumptions placedon participant attributes. For example, the models by Szy-manski and Velentti [13] and Clark and Riis [14] show thata single prize motivates greater effort than multiple prizes ofthe same total amount if all participants are homogeneous,i.e., they incur the same cost for the same effort and haverisk-neutral preferences. However, if participants have non-homogeneous costs, then, according to Sheremeta [4] andSisak [3], multiple prizes are better than a single prize ofthe same amount. Other models have been developed to

− 2 −

Page 3: Modeling Participation Behaviors in Design Crowdsourcing ... · In design crowdsourcing, a contest sponsor specifies a design problem to a crowd of engineers or designers. Par-ticipants

compare fixed prize contests and auctions where participantsbid on a prize that they want to be paid for their submis-sions [2, 15, 16].

Due to the current lack of models to account for all thefactors influencing participant behaviors in crowdsourcing,many studies employ empirical data. The empirical stud-ies [17–23] investigate the effects of intrinsic motivation,such as enjoyment, curiosity, developing individual skills,and extrinsic motivation, such as monetary rewards, improv-ing job prospects, and gaining peer recognition, on partic-ipation and effort in contests. Apart from the effects ofmotivation, the empirical studies also evaluate the effectsof problem-specific factors such as task granularity, prob-lem complexity, specificity, and solvability on solution qual-ity [24–26], as well as sponsor-specific factors such as rep-utation, trust in sponsor and sponsor feedback on participa-tion and solution quality [27–29]. Some studies [30–32] alsoinvestigate how participants’ expertise, experience, and fa-miliarity with the problem affect contest outcomes. The do-mains of contest problems in these studies vary from graphicdesign, logo design, CAD design, product design, and soft-ware engineering to e-commerce.

Despite a growing interest and various research efforts,the existing crowdsourcing studies analyze participation andeffort decisions based on the characteristics of a single con-test, or when analyzing multiple contests, they assume thatparticipation decisions are independent. We address this gapby quantifying dependencies using network structures andexplicitly model the effects of dependencies on participationdecisions in crowdsourcing using a network-based approach.

2.2 ERGM in engineering design

The exponential random graph model (ERGM) has beena popular approach in social science for modeling the socialnetwork formation [33–35] due to its capability to modelcomplex social relations as network topologies. A rich lit-erature exists on modeling conditional dependence betweennetwork links using ERGM specifications such as dyadic in-dependence assumption [36], the Markov assumption [37],and the social circuit assumptions [38, 39], both in bipartitesettings [40] and in multilevel settings [41].

In recent years, there has been a trend towards us-ing ERGMs in engineering design literature for modelingcomplex sociotechnical systems and understanding customerdecision-making behaviors. The extension of ERGMs to en-gineering design is natural because it enables incorporatingsocial interactions and human factors in design process tobetter model system architectures. In addition, the uniquestrength of ERGMs in modeling cross-level network inter-connections [40,41] has made it a powerful tool for studyinghierarchical structures and multilevel relations among differ-ent information layers in complex systems. ERGMs havebeen used in engineering design from different aspects. Forexample, Wang et al. [42] used a multidimensional ERGMto study customer preferences in vehicle markets by con-sidering both social interactions and product associations.

Sha and Wang studied products’ co-consideration relationswith a unidimensional ERGM [43], and compared its pre-dictive performance with network-based logistic regressionmodel [44]. Fu et al. [45] adopted the bipartite network set-ting of ERGM and studied the choice behaviors in support ofengineering design. These studies have inspired us to furtherextend the capabilities of ERGM in engineering design, andto evaluate the feasibility of network configurations and theinterdependence assumption in analyzing participation be-haviors in design crowdsourcing contests.

3 MODELING INTERDEPENDENCE AMONG PAR-TICIPATION BEHAVIORS

In this section, we provide an overview of exponentialrandom graph models. Then, the network configurationsused to represent dependencies as well as their mathemati-cal expressions are introduced in Section 3.2.

3.1 Overview of exponential random graph models(ERGM)

An ERGM, first introduced by Frank and Strauss [37],interprets the global network structure as a collective resultof various local network effects, called network configura-tions. The key idea is that an ERGM considers an observednetwork, yyy, as one specific realization from a set of possiblerandom networks, YYY , following the distribution in Equation(1). A network yyy can be represented by an adjacency ma-trix of binary values yi j, where 1 indicates the existence ofa link between nodes i and j, and 0 indicates the absence ofthe link. The probability of a random matrix YYY equal to theadjacency matrix of a particular network yyy is:

Pr(YYY = yyy) = exp(θθθ′G(yyy))

κ(θθθ), (1)

where θθθ is a vector of model parameters. G(yyy) is a vectorof statistics for network configurations and nodal attributes,and κ(θθθ) is a normalizing coefficient to make sure the prob-ability has value between 0 and 1. Equation (1) suggests thatthe probability of observing any particular network is propor-tional to the exponent of a weighted combination of networkcharacteristics.

With such a modeling framework, the objective of a re-searcher is to formulate an appropriate model structure thatrepresents real-world systems through the identification ofnetwork configurations as well as the attributes of nodes andlinks. In ERGM, the attributes of nodes are representedthrough the nodal covariates [46], whereas network con-figurations are represented through network statistics. Suchmodel structure provides a plausible and theoretically princi-pled hypothesis for the network generation process. Once amodel structure is determined, the parameter θθθ can be read-ily estimated with training data using maximum likelihoodestimates or Markov Chain Monte Carlo simulations [47].

− 3 −

Page 4: Modeling Participation Behaviors in Design Crowdsourcing ... · In design crowdsourcing, a contest sponsor specifies a design problem to a crowd of engineers or designers. Par-ticipants

Fig. 2. Two types of network configurations for modeling interde-pendence of participation decisions.

The final model structure is often achieved through an iter-ative process by trials of different combinations of networkconfigurations supported by domain expertise or feature se-lection algorithms in data mining.

ERGM is used to describe different kinds of interdepen-dence using various network configurations, and moreover,it can quantitatively assess the importance of such interde-pendence during the network formation process. ERGMhas specifications for one-mode network, bipartite networks,and multidimensional networks. Among these specifica-tions, the bipartite network setting is appropriate for model-ing designer-contest relationships as described in Section 1.

3.2 Representing dependencies in participation deci-sions with network configurations

To support the modeling of dependencies in a bipartitenetwork, a list of network configurations available in ERGMis provided in [46]. The selection of network configurationsdepends on the prior knowledge about the application con-text and the hypotheses of interdependence to be tested. Inthis study, we focus on testing two types of network con-figurations, alternating k-star and alternating 2-path, as ex-plained in the rest of this section. We use these configura-tions because of the capability they offer for modeling partic-ipant relations, contest associations, and contest popularity.

a) Alternating k-star configuration (AkS): A k-star is aset of k distinct links that all share one endpoint. In the de-sign crowdsourcing context, the k-star network structure in-dicates that a link (decision made) from a participant (p1) anda contest (c1) depends on the existence of links that connectp1 to other contests, e.g., c2,c3, . . . ,ck. Since the network isbipartite, it comprises of k-star of participants (see Figure 2(a)) as well as k-star of design contests (see Figure 2 (b)).

The contest k-star can be used to model the popularity effectof a design contest. The underlying reason for the formationof k-star (why a contest is popular) does not necessarily havecorrelations with its own attributes (e.g., the prize amount)because popularity is a social phenomenon.

The k-star statistic Sk(yyy), k = 2, ...,n − 1 denotes thenumber of k-stars in the graph yyy. For example, the k-starstatistic for participants quantifies the number of participantswho have participated in 2,3, ...,n−1 contests, respectively.Recent work by Snijders et al. [39] considers a more re-fined model based on the k-star statistics. It is defined as thescalar-valued network statistic using the following equation:

gAkS(yyy;λs) = S2(yyy)− S3(yyy)λs+ ...+(−1)n−3 Sn−1(yyy)

λn−3s

(2)

where λs is a positive scalar quantifying the decaying effectof large k on this network statistic, i.e., a larger k has a lowereffect on the overall k-star statistic than a smaller k. λs can betreated as a constant defined by researchers or treated as a pa-rameter, similar to θ in Equation (1), estimable from empiri-cal data. Because of the alternating signs in front of the Sk(yyy)terms, Snijders et al. [39] call gAkS(yyy;λs) an AkS statistic.

b) Alternating 2-paths configuration (A2P): The A2Pconfiguration is used to account for the interdependence aris-ing from the shared events. In design crowdsourcing, theinclusion of this effect allows us to answer the followingquestion: if two participants have one challenge in common(meaning a shared event), are they more likely than expectedby chance to have a second challenge in common, and a thirdone and so on? It tests whether the distribution of observednon-random shared events is driven by pairs of participants.A possible reason for the formation of shared events is thattwo participants may have common interests. It is importantto note that such a dependency may be implicit to partici-pants and not explicitly known. Knowing the precise com-mon interests, however, may not be of interest to researchersas long as the effect of that dependency is understood, whichcan be realized by A2P network configuration.

A k-two-path is a set of k distinct paths of length twojoining the same pair of nodes (see Figures 2 (c) and 2 (d)).A 1-two-path is the same thing as a 2-star [39]. However, a1-two-path is usually associated with its endpoints whereasa 2-star is usually associated with its center. If the number ofk-two-paths with participants as endpoints in the network yyyis Pk(yyy), where k may be any value from 1 to n−2, then theA2P statistic is [48],

gA2P(yyy;λp) = P1(yyy)− P2(yyy)λp+ ...+(−1)n−3 Pn−2(yyy)

λn−3p

(3)

where λp has the similar effect as λs does in Equation (3).This configuration can help quantify the potential effects ofthe relations among contests (e.g., sub-contest relations) andthe relations among participants (e.g., friends) on the partic-ipation decisions.

− 4 −

Page 5: Modeling Participation Behaviors in Design Crowdsourcing ... · In design crowdsourcing, a contest sponsor specifies a design problem to a crowd of engineers or designers. Par-ticipants

4 FIELD DATA FROM GRABCAD CONTESTSWith the ERGM and those identified network configu-

rations discussed above, in this section, we demonstrate ourapproach of modeling participation behaviors in a case studyon GrabCAD – an online design crowdsourcing platform.

4.1 Introduction to GrabCADGrabCAD [9] is a community of individuals who con-

tribute to an online library of computer-aided-design (CAD)files. This library is open to online crowds, and CAD filesare available to everyone for free. Over 4 million mem-bers have contributed more than 2 million CAD files. Mem-bers of the community hail from various regions across theworld. Members’ areas of expertise in design are also wideranging. They include interior-, graphic-, industrial-, andproduct-design.

Manufacturers and GrabCAD sponsors leverage thelarge crowds on GrabCAD to solve problems, create new de-signs or promote their brands using design contests. The con-tests are conducted over a period of few days, and consist ofproblem descriptions, solutions from community membersin the form of CAD files, and prize information for top so-lutions. Each design challenge starts with posting a problemstatement, supplementary files if the problem statement in-volves using or improving existing designs, prize amountsto be awarded to top solutions, and a description of require-ments upon which solutions are evaluated. During the activephase of a challenge, the GrabCAD community membersupload solutions as STEP, IGES files, or rendering imagesalong with any analysis results required by the problem state-ment. All submissions are accessible by the public to viewand download during the time when the challenge is active.Members, at the time of uploading their solutions, are there-fore able to know the solutions that others have submittedalready. At the end of a challenge, a panel of judges from thesponsor company and/or GrabCAD staff evaluate all submis-sions, and determine finalists to award and recognize theirsubmissions. The finalists are made public by posting web-links to their profile and web-links to their submission.

4.2 Data collectionWe analyzed all contests hosted on GrabCAD platform

between 2011 and 2018. From this dataset, we filtered outcontests that involved promotional tasks with no specificproblem statement or prizes, e.g., community contests withno prizes hosted by GrabCAD to promote crowd’s involve-ment. We removed contests that have crowd submissions setto private. Such contests are dropped from our analysis be-cause participants’ actions in them are either unknown, ormost likely caused by intrinsic factors which can not be ob-served. Finally, we selected data of 117 contests which pro-vide measurable information on influencing factors and out-comes of contests. 96 contests conducted between year 2011to 2016 were used for training ERGMs, while the remaining21 contests were used for validation. Scrapy [49], a Python-based web crawler, was used to capture the visible data inaccordance with regulations of GrabCAD.

The dataset for each challenge includes the descriptionof the sponsor company, the number of monetary prizes andgifts, corresponding prize amounts (worth price for gifts), thenumber of entries/submissions, the unique identity of eachsubmission and corresponding participant, and the identityof the winners. Based on participant identities and their sub-missions, the number of participants 1 for a challenge is de-rived after removing repetitive and blank submissions so thateach participant is counted only once. The number of prizesis counted as the sum of number of monetary prizes and num-ber of gifts. Based on the unique identity of every participant,we estimate the number of times a participant is mentionedas a finalist. The winning rate for a participant is the ratio ofthe number of times he/she is a finalist to the total number ofcontests in which he/she participated.

4.3 Descriptive analysis of GrabCAD networkWith the collected GrabCAD data, we construct a bi-

partite network (see Figure 3). We first perform descriptivenetwork analysis to have an overall understanding of the net-work and the participation behaviors in GrabCAD contests.The degree indicates a node’s connectivity to other nodeswhich is an important measure of network properties [50].The average degree of participant nodes in the GrabCADnetwork is 1.87. This indicates that on average one partic-ipant participated in about two design contests. The averagedegree of contest nodes is 67.47. It means that one contesthas about 65 participants on average.

Figure 4 shows the degree distribution of participantnodes and contest nodes, respectively, on log-log scale. Theresults indicate certain aggregate-level participation behav-iors. For example, it is observed that a few contests attracteda large number of participants: 300 participants submittedtotal of 637 entries to the GE Jet Engine Bracket Challenge,297 participants submitted 461 entries to the NASA HandrailClamp Assembly Challenge, and 227 participants submitted390 entries to the Ultimaker 3D Printer Toy Design Chal-lenge. These three design contests are shown in Figure 3.Among 96 design contests, there are 25 contests each ofwhich has attracted more than 100 participants.

In addition, we observe that most people (2661 out of3462) just participated in one design contest. Only a fewparticipants participated in many contests. These partici-pants can be regarded as active GrabCAD players. For ex-ample, there are 159 people out of 3462 participated in 5design contests. And only two people participated in morethan 30 design contests. As of 2016, the top 3 designers haveparticipated GrabCAD design contests 42, 37, and 29 times,respectively.

From the degree analysis, we observe that on GrabCAD,some of the contests are very popular, and some of the par-ticipants are very active and motivated. These characteris-tics indicate that the existing mechanism of GrabCAD indeedinfluences participants’ behaviors and the network structure

1The number of participants is less than the number of entries becauseof multiple submissions, including repeated or blank ones, by some partici-pants. The numbers in the paper are the numbers of unique participants.

− 5 −

Page 6: Modeling Participation Behaviors in Design Crowdsourcing ... · In design crowdsourcing, a contest sponsor specifies a design problem to a crowd of engineers or designers. Par-ticipants

Fig. 3. A visualization of the GrabCAD bipartite network.

observed in Figure 3. But the descriptive analysis in itself isnot sufficient to answer questions about how quantitativelythe popularity effects and effects of other factors affect par-ticipation decisions. In the next section, we use ERGMs toestablish a statistical inference model to answer these ques-tions and provide insights on the participation behaviors onGrabCAD.

5 IMPLEMENTATION OF ERGMsIn this section, we discuss the ERGM implemented on

the GrabCAD network with different settings. Specifically,

100 101 102 103

Degree of a node

100

101

102

103

104

Freq

uenc

y

Challenge nodesDesigner nodes

Fig. 4. Degree distribution of design contests and participants.

in Section 5.1, we specify the assumptions for incorporatingvarious contest and participant attributes as well as the net-work configurations into the ERGM. Section 5.2 presents theresults and discussion.

5.1 Model settings for the GrabCAD networkTable 1 lists the attributes we consider for the ERGM

implementation. Our prior analysis of the GrabCAD dataidentifies that task complexity, clarity of problem specifica-tion, prize amounts, and the number of prizes influence theparticipation [8]. In this paper, with the network represen-tation of the GrabCAD data, we evaluate the effects of ad-ditional attributes such as a participant’s past success (e.g.,winning rate) and past recognition (e.g., the frequency of be-ing a finalist), and dependencies such as contest popularityand associations.

a) Attributes of design contests. The first attribute is thetype of design task in a contest. If a task requires the designof a single component, we refer to it as a part-design task.If a task involves the design of multiple components and amechanism of interaction between them, we refer to it as asystem-design task. For example, Autodesk Robot GripperArm Design Challenge [51] involved a design of a roboticarm where the objective was to minimize the weight of thearm. Since the task was to design a single component, i.e.,the arm, it is a part design task. In contrast, the Dirt-bike TireChanging Tool Challenge [52], for instance, observed a taskto design a tire-changing tool with multiple parts such as tool

− 6 −

Page 7: Modeling Participation Behaviors in Design Crowdsourcing ... · In design crowdsourcing, a contest sponsor specifies a design problem to a crowd of engineers or designers. Par-ticipants

frame, lever, and gears. The solution required contestants todesign a mechanism for interaction between the tool, its partsand tire. This task is therefore a system-design task. We usea variable, isPart, to indicate whether the task is a part designtask (isPart = 1) or a system design task (isPart = 0).

Furthermore, we categorize the GrabCAD contests intoideation- and expertise-based types based on whether theproblem requirements and judging criteria are clearly spec-ified. Following the definitions by Terwiesch and Xu [53],the contests with high uncertainty in evaluation are cate-gorized as ideation contests, whereas the contests with lowuncertainty in evaluation are categorized as expertise-basedcontests. For example, the Alcoa Airplane Bracket Bear-ing Design Challenge [54] asked the participants to optimizean existing bracket design. The contest sponsors used well-defined requirements such as the ratio of ultimate strengthand weight to evaluate the submissions. Also, the contestrequired participants to use topology optimization methods,because of which the technical uncertainty was low. Thus,this contest was an expertise-based challenge. On the otherhand, in the Robot Gripper Test Object Challenge [55], par-ticipants were asked to come up with ideas for test objects toevaluate a robot gripper, but the evaluation criteria were notwell-defined. Any implementation of test object can be veri-fied using simulation or prototypes. However, unclear prob-lem requirements result in uncertainty in comparing any twoobjects. This challenge, therefore, fits into the category ofideation-based contests. We use another variable, isIdea, toindicate whether a contest is an ideation-based design prob-lem (isIdea = 1) or an expertise-based design problem (isIdea= 0).

We use an additional attribute, the number of entries(Ne), as an indicator of a contest’s overall attractiveness. Necounts multiple entries (submissions) by the same participantas well as the blank entries, i.e., only web pages but with nofiles attached.

b) Attributes of designers. The designer attributes in-clude two continuous variables, the winning rate (Rw) andthe number of times that a designer’s solution was mentionedin the finalists of a challenge (Nh). These two variables re-flect the overall skills and problem-solving ability in designcrowdsourcing. The number of wins is not adopted becausethere is a potential correlation between the number of winsand the number of participation. So people who participatedmany times may win more number of times. Using frequencyin this situation may overestimate the skills of a participant.The adoption of percentage is especially suitable when theoccurrence likelihood of an event is low, such as the winningevent in this context.

However, there is also a downside accompanied withadopting percentage. The downside arises when the occur-rence likelihood of an event is high. For example, in theGrabCAD, Nh is counted as the number of times a designis ranked in top ten. We observe from the data that thereare many cases where a person participated once and she/hewas in the finalist. This result indicates that this person isextremely skilled, leading to an unfair comparison to a re-ally skilled participant who may have been a finalist 50% of

times out of a large number of contests where she/he mayhave participated in. Therefore, using percentage in this sit-uation may overestimate the skills of a participant. Other at-tributes pertaining to designers’ demographics may have animpact on the participation behaviors but are not consideredin this paper, mainly due to the data unavailability.

c) Incentive structure. The incentive mechanism is acritical component that impacts the success of crowdsourc-ing. In the literature on tournament compensation [56], re-searchers commonly predict that status, power, or money in-cites competition and differentiation, which are beneficialto crowdsourcing. However, as indicated by Bloom andMichel [57], “research has yielded mixed results about whatamount of pay dispersion is optimal. In some cases, moredispersed pay distributions have been positively related toperformance outcomes. In other cases, greater dispersion hasbeen negatively related to performance outcomes”. There-fore, the role of monetary incentives in design crowdsourc-ing is not completely understood. In this study, we create aset of variables to study the effects of monetary incentive indesign crowdsourcing.

The structure of monetary incentive is often studiedin terms of the difference between prizes. Harrison andKlein [58] suggest two indices to quantify the difference,the coefficient of variation (COE) and Gini coefficient. Weadopt the COE in our model. In most contests, the first prizeis an important factor that influences participation [59]. Toaccount for the effect of the first prize, we include both thefraction value (Fp1) as well as the absolute amount (Mp1) ofthe first prize in our model.

In the GrabCAD contests, prizes are awarded in twoways, through monetary prize or through gifts (e.g., iPad).To capture the impact of the prize type, we create a categor-ical variable prize level, L = {L1,L2,L3,L4}. In this study,prize level L1 includes contests that have neither the mone-tary prize nor gifts; prize level L2 includes contests with onlyhave gifts; prize level L3 includes the contests which haveonly monetary prizes; while prize level L4 includes the con-tests which have both monetary prize and gifts. In summary,we use the four variables, the COE, Fp1, Mp1, and prize lev-els, to model the monetary incentives in design crowdsourc-ing.

d) Network configurations. As introduced in Section 3,we consider the alternating k-star statistics (gpAkS for par-ticipant k-stars and gcAkS for contest k-stars) and the alter-nating 2-path statistics (gA2P) to model interdependence be-tween links. The former enables the assessment of the con-test popularity effect (contest AkS) and the level of partic-ipant’s engagement in GrabCAD contests (participant AkS)on the formation of participation links. The latter enablesthe investigation of significance of shared events/partners inthe GrabCAD network. Since the current ERGM does notsupport the two-mode analysis of A2P, we use the one-modeversion to capture the overall effect of the shared events re-gardless of the modes of nodes. In ERGM, a network con-figuration, called edge, is included to control the number oflinks in a network to ensure the regenerated networks havethe same density as the observed ones. The edge estimates

− 7 −

Page 8: Modeling Participation Behaviors in Design Crowdsourcing ... · In design crowdsourcing, a contest sponsor specifies a design problem to a crowd of engineers or designers. Par-ticipants

Table 1. Summary of the variables and their statistics. For continuous variables, minimum and maximum along with the average and thestandard deviation are reported. For categorical or binary variables, the number of instances is reported.

Variables Descriptions StatisticsC

onte

st Number of entries (Ne)The number of solution entries observed for a challenge (mayinclude multiple entries by the same designer or blank entries).

Min:2; Max:638;Avg:113.90; Std:112.36

Ideation-based orexpertise-based(isIdea)

Whether a contest is an ideation-based design problem or anexpertise-based design problem.

Ideation-based: 59;expertise-based: 37

Part design or systemdesign (isPart)

Whether a contest is a part design problem or a system designproblem.

Part design: 46;System design: 50

Part

icip

ant Number of times being

in finalist (Nh)The number of times a designer’s solution are mentioned in thefinalists of a challenge.

Min:0; Max:5;Avg:0.10; Std:0.36

Winning rate (Rw)The percentage of wins of a participant regardless of the levelof prize.

Min:0; Max:1;Avg:0.03; Std:0.15

Ince

ntiv

est

ruct

ure

1st prize (Mp1)The absolute total value of the first prize including the value ofgifts.

Min:0; Max:$10000;Avg:$1575.97; Std:$1649.28

1st prize fraction (Fp1)The percentage of the value of the first prize including thevalue of gifts.

Min:0; Max:1;Avg:0.55; Std:0.25

Coefficient of variation(COE)

The disparity between different levels of prize.Min:0; Max:4.80;Avg:0.68; Std:0.75

Prize level L1 Neither do the contests have monetary prize nor gifts. # of instances: 3

Prize level L2 The contests only have gifts. # of instances: 14

Prize level L3 The contests only have monetary prize. # of instances: 65

Prize level L4 The contests have both monetary prize and gifts. # of instances: 14

the likelihood that a participant chooses a design contest ran-domly if no knowledge about the participant or design con-test attributes is provided. It is analogous to the interceptterm in a linear regression model.

5.2 Results and discussionWith the model settings introduced in Section 5.1, we

define two ERGMs based on Equation (1). Model 1 as-sumes that the links are formed independent of each other,thus does not incorporate network configurations. Model 2,on the other hand, incorporates the network configurationsintroduced in Section 3.2 to model dependencies. So, Model2 includes the variables gpAkS,gcAkS, and gA2P, but Model 1excludes them. Table 2 presents the model parameters esti-mates and relevant performance metrics calculated using thestatnet package in R program [46].

As shown in Table 2, regardless of the differences innetwork configurations, both models agree on certain partic-ipation behaviors from GrabCAD data. Participants are morelikely to participate in design contests with part design prob-lems than ones with system design problems. They are morelikely to participate in ideation contests than in expertise-based contests. For example, Model 1’s results indicate thatthe probability of an individual participating in a part designcontest is about 1.5 (i.e., e0.407) times that in a system designcontest. The probability of participating in a ideation-baseddesign contest is 1.16 times that in an expertise-based con-

test.For given design contests, links are more likely to be

formed with the participants who have higher winning rateor have been finalists more often. In other words, the partici-pants who have won or been recognized have a higher prob-ability of participation. Since all the attributes of design con-tests and participants are normalized to [0,1], the estimatedparameters allow comparison of effects across different vari-ables. Accordingly, the number of times being a finalist isthe most influential among all the independent variables (ex-cluding the network configurations).

The effect of the 1st prize amount is not statistically sig-nificant for the GrabCAD network. However, the estimatedeffect for the 1st prize fraction is statistically significant andnegative. This indicates that higher the fraction of the 1st

prize, lower is the probability of link formation between adesigner and a design contest. These two results imply that,in GrabCAD contests, participants more likely to participatein contests that have lower allocation to the 1st prize. Thelarger the allocation to the 1st prize, the smaller is the prob-ability of winning for a participant. With a lower percentageof the 1st prize, the total prize is distributed across multipleprizes, and the probability of winning prize amount throughlower places such as the second or third prizes is higher.

The parameter estimates for prize level indicate that thetype of prize affects participation. With no prize and no gift(Prize level 1) as a baseline, the positive signs of the threeprize levels in Table 2 show that participation is higher in

− 8 −

Page 9: Modeling Participation Behaviors in Design Crowdsourcing ... · In design crowdsourcing, a contest sponsor specifies a design problem to a crowd of engineers or designers. Par-ticipants

Table 2. Estimated parameters of two ERGMs for the GrabCAD net-work. Model 1 assumes link independence (i.e., no network con-figurations are included), while Model 2 takes link dependence intoaccount (i.e., three network configurations are included).

Explanatory Var.

Model 1 Model 2

Est. θθθ

MeanStd.Err.

Est. θθθ

MeanStd.Err.

Attributes of GrabCAD design contests

Number ofentries (Ne) 2.234 *** 0.074 1.962 *** 0.159

isPart = 1 0.407 *** 0.038 0.383 *** 0.037

isIdea = 1 0.145 *** 0.031 0.122 *** 0.031

Attributes of participants

Number of timesbeing in finalists(Nh)

3.913 *** 0.092 3.840 *** 0.119

Winning rate(Rw) 0.326 *** 0.092 0.314 *** 0.080

Incentive structure

1st prize amount(Mp1) 0.130 ⋅ 0.070 0.025 0.082

1st prize fraction(Fp1) -0.273 *** 0.067 -0.185 ** 0.067

Coefficient ofvariation (COE) 1.131 *** 0.016 0.121 *** 0.015

Prize level L2 1.141 ⋅ 0.200 0.234 ⋅ 0.134

Prize level L3 1.295 *** 0.196 0.370 ** 0.127

Prize level L4 1.263 *** 0.198 0.344 ** 0.129

Network configurations

edge -6.164 *** 0.190 -5.187 * 0.134

Participantalternating k-star(gpAkS)

- - -0.080 0.079

Contestalternating k-star(gcAkS)

- - -6.395 *** 0.366

Alternating2-paths (gA2P) - - 0.00092 ⋅ 0.00048

Model fit

Null deviance 460738 460738

BIC 59127 59064

Significance codes: ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘⋅’ 0.1

contests with prize and/or gift. By comparing the magnitudeof the estimates between Prize level = 2 (1.141) and othertwo prize level variables, i.e., 1.295 and 1.263, respectively,we observe that monetary prize has a larger impact on par-ticipation in GrabCAD than gift prizes.

From the results of Model 2, we observe that the ef-fect of contest AkS is statistically significant, and the neg-ative sign indicates that the popularity effect plays a rolein affecting participants’ behaviors. The more participants

a contest has, the more likely will a participant join this con-test. As mentioned in Section 3.2, in ERGM, a negative es-timate reflects an increased likelihood of link formation tohigher-degree nodes [39]. Therefore, a positive parameterof an AkS configuration indicates a uniform degree distri-bution of a network, whereas the negative parameter indi-cates a skewed degree distribution2 and often yields the phe-nomenon of preferential attachment [60]. The magnitude ofcontest AkS, −6.395, shows the importance of popularity ef-fects is greater than the edge and the A2P network statistic.

We do not observe significant effects of the participantAkS on link formation in the GrabCAD network. This in-dicates that the number of contests an individual partici-pated in is less likely to influence participation in future con-tests. Also, the effect of the A2P network statistic is onlymarginally significant at the level of significance 0.1. There-fore, the participants associations (e.g., the same pair of par-ticipants participate in multiple contests together) or the con-tests association (e.g., the same pair of contests attractingsame participants) are not observed to have significant influ-ence on the link formation process.

The obtained results can be explained based on existinggame-theoretic models of contests, as follows.

1. The preference for participating in the contests withsmall first prize fraction (negative coefficient of Fp1)and with dispersed pay structure (positive coefficient ofCOE) is justified because, with more prizes and dis-persed price allocation, every participant has a higherchance of winning a prize and receiving more money.

2. The results also suggest that the participants are riskaverse, have asymmetric skills, or both. Game-theoreticmodels show that input effort by risk-averse and asym-metric participants increases with the number of prizes[3].

3. The participants favor ideation contests in that such con-tests require no or minimal problem-specific domainknowledge. The game-theoretic model developed byTerwiesch and Xu [53] highlights that expertise-basedcontests reduce the participation from people who do nothave a required expertise.

4. The participants likely prefer-part design tasks becauseof the relatively low effort involved in designing a singlecomponent. A system-design task, on the other hand,entails the layout design for assembly that often requiresmore effort/cost.

5. The large magnitude of the coefficient of the contestAkS in negative direction (i.e., strong contest-popularityeffect) indicates a preferential attachment process un-derneath the formation of the GrabCAD network. Al-though the exact reasons for such process in the Grab-CAD are unknown, possible reasons could be related tothe contest sponsor, e.g., the sponsor itself being well-

2In the actual implementation, the network statistics gwdegree (calledgeometrically weighted degree, equivalent to AkS) and gwdsp (called geo-metrically weighted dyadwise shared partner, equivalent to A2P) were used.The interpretation to the sign of the estimated coefficients is opposite. Pleaserefer to [42] for more details.

− 9 −

Page 10: Modeling Participation Behaviors in Design Crowdsourcing ... · In design crowdsourcing, a contest sponsor specifies a design problem to a crowd of engineers or designers. Par-ticipants

Table 3. The confusion matrix for network evaluation.

Predicted Positive Predicted Negative

ObservedPositive

(P)

True Positive (TP)A link between a pairof nodes is observedin the real network

and is predicted

False Negative (FN)A link between a pairof nodes is observedin the real networkbut not predicted

ObservedNegative

(N)

False Positive (FP)A link between a pair

of nodes is notobserved in the real

network but predicted

True Negative (TN)A link between a pair

of nodes is not observedin the real networkand not predicted

Precision = T P/(T P+FP); Recall = T P/(T P+FN) = T P/PSpeci f icity = T N/(FP+T N) = T N/NAccuracy = (T P+T N)/(P+N)

recognized and popular, or the sponsor engaged in pro-viding feedback to participants [8, 28].

As to the overall model fit, Model 2 is not greatly im-proved comparing to Model 1. This is indicated by theBayesian information criteria that is the lower, the better3.Model 2’s BIC (59064) is only slightly lower than Model1’s (59127). To better evaluate the models’ performance andvalidate the capabilities of those dependence settings, fur-ther investigation on their predictive power is performed inthe following section.

6 MODEL EVALUATION AND VALIDATIONIn this section, we analyze the predictive power of

Model 1 and Model 2 for predicting links in the GrabCADnetwork. We evaluate model predictions using the trainingdata of 96 contests, and cross-validate using the test data of21 contests.

We follow a two-step approach for model evaluation andvalidation. First, the model estimates given in Table 2, thedata of the explanatory variables and the observed GrabCADnetwork are used together with the ERGM model given inEquation (1) to calculate the probability of whether a linkwill be formed. Based on the probabilities of individual linkformation, synthetic networks are generated. Second, the re-generated networks are compared with the observed links inthe real network and the prediction accuracy can be quan-tified by various measurements such as Precision, Recall(a.k.a. Sensitivity), Specificity, and Accuracy. The confusionmatrix [61] in Table 3 summarizes these measurements.

We mainly use Precision and Recall to assess the mod-els’ predictive performance, because the GrabCAD networkis imbalanced. The number of all possible links between theparticipants and the contests is 3462×96 (i.e., 332352), outof which the number of observed links is 6432 and 325920links are negatives. Since a model is more likely to predictnegative cases under the imbalanced scenario (the randomprobability is about 98%), counting the predicted positive

3BIC is a commonly used statistical metric for evaluating model fit, cal-culated by the formula, BIC = −2logL+ log(n)p, where L is the log likeli-hood, p is the number of explanatory variables, and n is the sample size.

Table 4. The predictive performance of Model 1 and Model 2.Cutoff

Settings Measure Model1

Model2

RandomModel

Eva

luat

ion Fixed prob.

of 0.5Precision 0.0037 0.0044 0.00031

Recall 0.36 0.43 0.056Closestnetworkdensity

Precision 0.10 0.10 0.045

Recall 0.10 0.10 0.045

Valid

atio

n Fixed prob.of 0.5

Precision 0.004 N/A N/A

Recall 0.38 N/A N/AClosestnetworkdensity

Precision 0.28 0.38 0.20

Recall 0.083 0.11 0.066

cases using Precision and Recall better reflects the predic-tive capability of the model [62].

Table 4 shows the results of model evaluation and pre-diction. With each model, 200 synthetic networks are gener-ated in the first step. In the second step, at the threshold ofthe probability 0.5, Model 1 yields the Precision and Recallat 0.0037 and 0.36, respectively. Model 2 yields the Preci-sion and Recall at 0.0044 and 0.43, respectively. The resultsshow that among the observed 6432 cases of participation,Model 1 predicts 36% of them while Model 2 successfullypredicts 43% of them indicating that Model 2 has a betterpredictive performance than Model 1. Note that the Preci-sion values in both models are very small, because at thethreshold of 0.5, only 67 and 65 links are predicted to existin the network with the two models, respectively. Amongthese predicted links, Model 1 correctly predicts 24 cases,whereas Model 2 correctly predicts 28 cases. That is alsowhy Model 2 has higher recall value than Model 1. As abaseline for reference, the random network model generatedby randomly forming links at the threshold of 0.5 has Preci-sion and Recall of 0.00031 and 0.056, respectively.

Alternatively, we can vary the threshold for linkingprobability such that the number of links of a predicted net-work is closest to that of the real network, i.e., 6432 links 4.With this setting, Model 1 has both Precision and Recall of0.1, Model 2 delivers the same performance compared toModel 1, while the random network model produces bothPrecision and Recall of 0.045.

In addition to the evaluation on the regenerated Grab-CAD networks, we perform a cross-validation using the testnetwork of 21 GrabCAD contests that were conducted afterthe month of July 2016 till August 2018. The test data in-cludes a total of 1486 participants and 1950 number of par-ticipation instances (i.e., the number of links in the networkis 1950). The results of cross-validation are presented in Ta-ble 4. The precision and recall are not obtainable for Model2 and the random network model with the fixed probabilitythreshold of 0.5, because the link formation probabilities are

4Model 1 produces 6474 links with the probability threshold of 0.085.Model 2 produces 6537 links with the threshold of 0.085. And, the randomnetwork model produces 6429 links with the threshold of 0.185.

− 10 −

Page 11: Modeling Participation Behaviors in Design Crowdsourcing ... · In design crowdsourcing, a contest sponsor specifies a design problem to a crowd of engineers or designers. Par-ticipants

very small and no links are predicted. In the other scenariowhere the number of links is kept closest to that of the realnetwork, Model 2 outperforms both Model 1 and the randomnetwork model.

The predictive performance of both models still hasroom for improvement. The limitation mainly comes fromtwo sources. First, the network is a highly imbalanced mean-ing a significant higher number of negative cases than posi-tive cases. So predicting the positive cases (i.e., the existinglinks) is naturally challenging. This leads to a small numberof correctly predicted links, thus a low precision. Second, themodels developed in this study consider a small number offactors to explain the participants’ decisions. Many other ex-planatory variables need to be explored further. For example,other factors of dependent relations, such as social ties be-tween more than two participants, could play a role. But thelack of appropriate network configurations makes the mod-eling of those relations challenging at this stage. Despite itslimitations, this work demonstrates how network structuresare used to represent different forms of dependencies andstudy their effects on participation decisions in crowdsourc-ing contests. Further, the purpose of the prediction analy-sis using the models is not to support the forecast of par-ticipation decisions per se. It is to compare performance ofthe model with dependencies assumed (Model 2) against thebenchmark model without dependent settings (Model 1), andagainst the random network model. The results support ourhypothesis in this regard.

7 CONCLUSIONS AND FUTURE WORK

In this paper, we study the participation behaviors in on-line design crowdsourcing contests assuming that there ex-ists dependencies among participants’ decisions to partici-pate. We propose a network-based approach which uses theERGM in a bipartite setting to represent participation de-cisions. This approach enables a novel combination of ad-vanced network modeling techniques and real crowdsourcingdata for behavioral analysis. The results indicate that contestpopularity, modeled by alternating k-star network statistics,significantly influences whether a participant would choosea design contest or not. Associations among participantsor among contests, modeled by alternating 2-path networkstatistics, do not have a significant influence on the partici-pation.

In addition to the findings related to the network con-figurations, we obtain following insights into the design ofcrowdsourcing contests:

1. Participants are more likely to participate in contests re-lated to part-design, compared to system-design. There-fore, sponsor companies may consider first decompos-ing a system design problem into several integrated partdesign problems, and then using GrabCAD to solicit po-tential solutions for each component.

2. Participants are more inclined to participate in ideation-based design contests, and, therefore, companies may

analyze the level of expertise required before postingthem online to estimate the number of participants.

3. The significant effects of participant attributes (e.g., thewinning rate) indicate that participants are significantlymotivated by recognition. The more the number ofwins, the higher is the probability of participation in fu-ture contests. Therefore, contest designers should im-plement a certain accomplishment or reputation mech-anism, such as the level-based or badge-based recom-mendation system, to attract more participation.

4. Contest designers should avoid allocating a high per-centage of the total prize amount to the 1st prize. In-stead, maintaining a certain variation among differentprizes will likely improve overall participation. Also,having multiple forms of prizes, such as both monetaryaward and gift in a contest, will likely attract greater par-ticipation.

The results indicate that the model which accounts fordependencies (Model 2) is a better predictor of participa-tion decisions than the model without dependencies incor-porated (Model 1), as manifested in a higher prediction ac-curacy measured by precision and recall. The imbalancednature of the network data poses challenges in achieving asuccessful rate of prediction. Despite this, the ERGM withdependencies is capable of achieving the prediction accuracyof 43%, which is significantly higher than 5.6% of randompredictions.

In the future work, more experiments are required to col-lect datasets to validate the conclusions. Also, other types ofdependent relations need to be modeled using different net-work configurations and tested on crowdsourcing data. Forexample, if the data about the social relations between par-ticipants are available, the ERGM for multidimensional bi-partite networks can be used to probe into the underlyingfactors influencing interdependent decisions. With the esti-mated ERGMs in Table 2, further prediction of how poten-tial participants make decisions given the attributes of designcontests can be carried out. This is useful for contest design-ers in planning decisions and anticipating the possible num-ber of solutions that can be generated from a contest. Sucha prediction at aggregate level would enable the forecast ofthe evolution of the entire crowdsourcing system, which maybenefit crowdsourcing system designers to develop new in-centives to attracting more participants.

Acknowledgements

The authors gratefully acknowledge the financial sup-port from the U.S. National Science Foundation (NSF)CMMI through grant # 1400050. An earlier version withpreliminary results was presented at the ASME 2018 Com-puters and Information in Engineering Conference in QuebecCity, Canada [63].

− 11 −

Page 12: Modeling Participation Behaviors in Design Crowdsourcing ... · In design crowdsourcing, a contest sponsor specifies a design problem to a crowd of engineers or designers. Par-ticipants

References[1] Taylor, C., 1995. “Digging for golden carrots: An anal-

ysis of research tournaments”. The American EconomicReview, 85(4), pp. 872–890.

[2] Che, Y.-K., and Gale, I., 2003. “Optimal design ofresearch contests”. The American Economic Review,93(3), pp. 646–671.

[3] Sisak, D., 2009. “Multiple-prize contests- the optimalallocation of prizes”. Journal of Economic Surveys,23(1), pp. 82–114.

[4] Sheremata, R. M., 2011. “Contest design: An ex-perimental investigation”. Economic Inquiry, 49(2),pp. 573–590.

[5] Sha, Z., Kannan, K. N., and Panchal, J. H., 2015. “Be-havioral experimentation and game theory in engineer-ing systems design”. ASME Journal of Mechanical De-sign, 137, pp. 051405–051405–10.

[6] Chaudhari, A., Thekinen, J., and Panchal, J., 2016.“Using contests for engineering systems design: Astudy of auctions and fixed-prize tournaments”. InDS 84: Proceedings of the DESIGN 2016 14th In-ternational Design Conference, Durbovnik, Croatia,pp. 947–956.

[7] Panchal, J. H., Sha, Z., and Kannan, K. N., 2017. “Un-derstanding design decisions under competition usinggames with information acquisition and a behavioralexperiment”. Journal of Mechanical Design, 139(9),pp. 091402–091402–12.

[8] Chaudhari, A. M., Sha, Z., and Panchal, J. H., 2018.“Analyzing participant behaviors in design crowd-sourcing contests using causal inference on field data”.Journal of Mechanical Design, 140(9), pp. 091401–091401–12.

[9] GrabCAD, 2016. Grabcad challenges. https://grabcad.com/challenges. Accessed: 2018-11-25.

[10] Amazon, 2016. Amazon mechanical turk. https://www.mturk.com. Accessed: 2018-11-25.

[11] Panchal, J., 2015. “Using crowds in engineering de-signatowards a holistic framework”. In InternationalConference on Engineering Design, Design Society,pp. 27–30.

[12] Simula, H., and Ahola, T., 2014. “A network perspec-tive on idea and innovation crowdsourcing in indus-trial firms”. Industrial Marketing Management, 43(3),pp. 400–408.

[13] Szymanski, S., and Valletti, T. M., 2005. “Incentiveeffects of second prizes”. European Journal of PoliticalEconomy, 21, pp. 467 – 481.

[14] Clark, D. J., and Riis, C., 1998. “Influence and thediscretionary allocation of several prizes”. EuropeanJournal of Political Economy, 14(4), pp. 605–625.

[15] Fullerton, R., Lincster, B., McKee, M., and Slate, S.,2002. “Using auctions to reward tournament winners:theory and experimental investigation”. RAND Journalof Economics, 33(1), pp. 62–84.

[16] Schottner, A., 2008. “Fixed-prize tournaments versusfirst-price auctions in innovation contests”. Economic

Theory, 35, pp. 55–71.[17] Zhao, Y. C., and Zhu, Q., 2014. “Effects of extrin-

sic and intrinsic motivation on participation in crowd-sourcing contest”. Online Information Review, 38(7),p. 896.

[18] Zheng, H., Li, D., and Hou, W., 2011. “Task design,motivation, and participation in crowdsourcing con-tests”. International Journal of Electronic Commerce,15(4), pp. 57–88.

[19] Roberts, J. A., Hann, I.-H., and Slaughter, S. A., 2006.“Understanding the motivations, participation, and per-formance of open source software developers: A lon-gitudinal study of the apache projects”. Managementscience, 52(7), pp. 984–999.

[20] Fuller, J., 2010. “Refining virtual co-creation from aconsumer perspective”. California management review,52(2), pp. 98–122.

[21] Leimeister, J. M., Huber, M., Bretschneider, U.,and Krcmar, H., 2009. “Leveraging crowdsourcing:activation-supporting components for it-based ideascompetition”. Journal of management information sys-tems, 26(1), pp. 197–224.

[22] Ke, W., and Zhang, P., 2009. “Motivations in opensource software communities: The mediating role ofeffort intensity and goal commitment”. InternationalJournal of Electronic Commerce, 13(4), pp. 39–66.

[23] Rogstadius, J., Kostakos, V., Kittur, A., Smus, B.,Laredo, J., and Vukovic, M., 2011. “An assessment ofintrinsic and extrinsic motivation on task performancein crowdsourcing markets.”. ICWSM, 11, pp. 17–21.

[24] Archak, N., 2010. “Money, glory and cheap talk: ana-lyzing strategic behavior of contestants in simultaneouscrowdsourcing contests on topcoder.com”. In Proceed-ings of the 19th international conference on World wideweb, ACM, pp. 21–30.

[25] Aker, A., El-Haj, M., Albakour, M.-D., and Kr-uschwitz, U., 2012. “Assessing crowdsourcing qualitythrough objective tasks.”. In LREC, Istambul, Turkey,ELRA, pp. 1456–1461.

[26] Li, K., Xiao, J., Wang, Y., and Wang, Q., 2013. “Anal-ysis of the key factors for software quality in crowd-sourcing development: An empirical study on topcoder.com”. In 2013 IEEE 37th Annual Computer Softwareand Applications Conference, IEEE, pp. 812–817.

[27] Kim, D. J., Ferrin, D. L., and Rao, H. R., 2008. “Atrust-based consumer decision-making model in elec-tronic commerce: The role of trust, perceived risk, andtheir antecedents”. Decision support systems, 44(2),pp. 544–564.

[28] Zheng, H., Xie, Z., Hou, W., and Li, D., 2014. “An-tecedents of solution quality in crowdsourcing: Thesponsor’s perspective”. Journal of Electronic Com-merce Research, 15(3), p. 212.

[29] Algesheimer, R., Borle, S., Dholakia, U. M., and Singh,S. S., 2010. “The impact of customer community par-ticipation on customer behaviors: An empirical investi-gation”. Marketing Science, 29(4), pp. 756–769.

[30] Kazai, G., 2011. “In search of quality in crowdsourcing

− 12 −

Page 13: Modeling Participation Behaviors in Design Crowdsourcing ... · In design crowdsourcing, a contest sponsor specifies a design problem to a crowd of engineers or designers. Par-ticipants

for search engine evaluation”. In European Conferenceon Information Retrieval, Springer, pp. 165–176.

[31] Jeppesen, L. B., and Lakhani, K. R., 2010. “Marginal-ity and problem-solving effectiveness in broadcastsearch”. Organization Science, 21(5), pp. 1016 – 1033.

[32] Burnap, A., Ren, Y., Gerth, R., Papazoglou, G., Gon-zalez, R., and Papalambros, P. Y., 2015. “When crowd-sourcing fails: A study of expertise on crowdsourceddesign evaluation”. Journal of Mechanical Design,137(3), pp. 031101–031101–9.

[33] Valente, T. W., Fujimoto, K., Chou, C.-P., and Spruijt-Metz, D., 2009. “Adolescent affiliations and adiposity:A social network analysis of friendships and obesity”.The Journal of adolescent health, 45(2), 08, pp. 202–204.

[34] Lusher, D., Robins, G., and Kremer, P., 2010. “Theapplication of social network analysis to team sports”.Measurement in Physical Education and Exercise Sci-ence, 14(4), pp. 211–224.

[35] Contractor, N. S., DeChurch, L. A., Carson, J., Carter,D. R., and Keegan, B., 2012. “The topology of col-lective leadership”. The Leadership Quarterly, 23(6),pp. 994 – 1011.

[36] Holland, P. W., and Leinhardt, S., 1981. “An expo-nential family of probability distributions for directedgraphs”. Journal of the american Statistical associa-tion, 76(373), pp. 33–50.

[37] Frank, O., and Strauss, D., 1986. “Markovgraphs”. Journal of the american Statistical associa-tion, 81(395), pp. 832–842.

[38] Pattison, P., and Robins, G., 2002. “Neighborhood–based models for social networks”. SociologicalMethodology, 32(1), pp. 301–337.

[39] Snijders, T. A., Pattison, P. E., Robins, G. L., and Hand-cock, M. S., 2006. “New specifications for exponen-tial random graph models”. Sociological methodology,36(1), pp. 99–153.

[40] Wang, P., Pattison, P., and Robins, G., 2013. “Expo-nential random graph model specifications for bipar-tite networksa dependence hierarchy”. Social networks,35(2), pp. 211–222.

[41] Wang, P., Robins, G., Pattison, P., and Lazega, E., 2013.“Exponential random graph models for multilevel net-works”. Social Networks, 35(1), pp. 96–115.

[42] Wang, M., Chen, W., Huang, Y., Contractor, N. S., andFu, Y., 2016. “Modeling customer preferences usingmultidimensional network analysis in engineering de-sign”. Design Science, 2, p. e11.

[43] Wang, M., Sha, Z., Huang, Y., Contractor, N., Fu,Y., and Chen, W., 2018. “Predicting product co-consideration and market competitions for technology-driven product design: anetwork-based approach”. De-sign Science, 4, p. e9.

[44] Sha, Z., Huang, Y., Fu, J. S., Wang, M., Fu, Y., Con-tractor, N. S., and Chen, W., 2018. “A network-basedapproach to modeling and predicting product coconsid-eration relations”. Complexity, 2018, p. 2753638: 14.

[45] Fu, J. S., Sha, Z., Huang, Y., Wang, M., Fu, Y.,

and Chen, W., 2017. “Two-stage modeling of cus-tomer choice preferences in engineering design us-ing bipartite network analysis”. In ASME 2017 In-ternational Design Engineering Technical Conferencesand Computers and Information in Engineering Con-ference, ASME, p. V02AT03A039.

[46] Handcock, M. S., Hunter, D. R., Butts, C. T., Goodreau,S. M., and Morris, M., 2008. “statnet: Software toolsfor the representation, visualization, analysis and simu-lation of network data”. Journal of statistical software,24(1), p. 1548.

[47] Snijders, T. A. B., 2002. “Markov chain monte carloestimation of exponential random graph models”. Jour-nal of Social Structure, 3(2), pp. 1–40.

[48] Hunter, D. R., 2007. “Curved exponential family mod-els for social networks”. Social networks, 29(2),pp. 216–230.

[49] Scrapy, 2016. Scrapy documentation. https://doc.scrapy.org. Accessed: 2018-11-25.

[50] Albert, R., and Barabasi, A.-L., 2002. “Statistical me-chanics of complex networks”. Reviews of modernphysics, 74(1), p. 47.

[51] Autodesk, 2013. Autodesk robot grip-per arm design challenge. https://grabcad.com/challenges/autodesk-robot-gripper-arm-design-challenge.Accessed: 2018-11-25.

[52] Rabaconda, 2012. Dirtbike tire changing tool chal-lenges. https://grabcad.com/challenges/dirtbike-tire-changing-tool-challenge.Accessed: 2018-11-25.

[53] Terwiesch, C., and Xu, Y., 2008. “Innovation con-tests, open innovation, and multiagent problem solv-ing”. Manage. Sci., 54(9), pp. 1529–1543.

[54] Alcoa, 2016. Airplane bearing bracket challenge.https://grabcad.com/challenges/airplane-bearing-bracket-challenge.Accessed: 2018-11-25.

[55] Autodesk, 2013. Robot gripper test object chal-lenge. https://grabcad.com/challenges/robot-gripper-test-object-challenge/entries. Accessed: 2018-11-25.

[56] Lazear, E. P., and Rosen, S., 1981. “Rank-order tourna-ments as optimum labor contracts”. Journal of politicalEconomy, 89(5), pp. 841–864.

[57] Bloom, M., and Michel, J. G., 2002. “The relation-ships among organizational context, pay dispersion,and among managerial turnover”. Academy of Man-agement Journal, 45(1), pp. 33–42.

[58] Harrison, D. A., and Klein, K. J., 2007. “What’s thedifference? diversity constructs as separation, variety,or disparity in organizations”. Academy of managementreview, 32(4), pp. 1199–1228.

[59] Hvide, H. K., 2002. “Tournament rewards and risk tak-ing”. Journal of Labor Economics, 20(4), pp. 877–898.

[60] Barabasi, A.-L., and Albert, R., 1999. “Emergenceof scaling in random networks”. science, 286(5439),pp. 509–512.

− 13 −

Page 14: Modeling Participation Behaviors in Design Crowdsourcing ... · In design crowdsourcing, a contest sponsor specifies a design problem to a crowd of engineers or designers. Par-ticipants

[61] Fawcett, T., 2006. “An introduction to roc analysis”.Pattern Recognition Letters, 27(8), pp. 861 – 874.ROC Analysis in Pattern Recognition.

[62] Saito, T., and Rehmsmeier, M., 2015. “The precision-recall plot is more informative than the roc plot whenevaluating binary classifiers on imbalanced datasets”.PLOS ONE, 10(3), 03, pp. 1–21.

[63] Sha, Z., Chaudhari, A. M., and Panchal, J. H., 2018.“Modeling participation behaviors in design crowd-sourcing using a bipartite network-based approach”. InASME 2018 International Design Engineering Tech-nical Conferences and Computers and Information inEngineering Conference, Vol. Volume 1A: 38th Com-puters and Information in Engineering Conference,p. V01AT02A031.

− 14 −