a participant recruitment framework for...

A Participant Recruitment Framework for Crowdsourcing based Software Requirement Acquisition

Hao Wang1, 2,Yasha Wang1,3*, Jiangtao Wang1, 2

1 Key Laboratory of High Confidence Software Technologies, Ministry of Education, Beijing 100871, China 2 School of Electronics Engineering and Computer Science, Peking University, China

3 National Engineering Research Center of Software Engineering, Peking University, China {wanghao13, wangys, wangjt10}@sei.pku.edu.cn

Abstract

The opportunity to leverage crowdsourcing-based model to facilitate software requirements acquisition has been recognized to maximize the advantages of the diversity of talents and expertise available within the crowd. Identifying well-suited participants is a common issue in crowdsourcing system.Requirements acquisition tasks call for participants with particular kind of domain knowledge. However, current crowdsourcing system failed to provide such kind of identification among participants. We observed that participants with a particular kind of domain knowledge often have the opportunity to cluster in particular spatiotemporal spaces. Based on this observation, we propose a novel opportunistic participant recruitment framework to enable organizers to recruit participants with desired kind of domain knowledge in a more efficient way. We analyzed the feasibility of our opportunistic approach through both theoretic study on analytical model and simulated experiment on real world mobility model. The results showed the feasibility of our approach.

Keywords—Crowdsourcing, Opportunistic Network.

I. INTRODUCTION

Software engineering is more than just programming. Software requirements acquisition is a communication activity between stakeholders for acquiring requirements [1]. With the penetration of Internet, mobile devices, and the globalization process, many software products are designed for the mass dispersed across different cultural, time zones and continents,not limited for a certain small group of stakeholders. In traditional collocated project environments the requirement analyst proactively engages stakeholders in a series of face-to-face activities [5]. However, when software products are designed for the mass around the globe, the traditional requirement acquisition process may not work. The dispersion of the users across different geographical locations is not the only difficulty, but also to find the method to recruit enough representative users for acquiring requirements. The power of the crowd is in the diversity of talents and expertise available within the crowd. Crowdsourcing-based model is considered as a promising method for software requirements acquisition, and its possibility and potential opportunity has been recognized [2].

Many published crowdsourcing tasks on the web have requirements that need to be met before individuals can take part in [3], and requirements acquisition task is not an exception.

* Corresponding Author

Requirements acquisition tasks call for participants with particular kind of domain knowledge [4, 5, 6] For example, children-monitoring applications may need hear more from parents who currently feed little children for requirements prioritization, but high school students may not suite for this kind of tasks.

Distributing tasks to individuals is not necessarily a daunting process. However, current crowdsourcing based labor markets such as InnoCentive, iStockPhoto and Amazon Mechanical Turk, failed to provide such kind of identification among participants. Because different software products focus on different user groups with various features. These features often reveal particular kinds of domain knowledge for software requirements acquisition. These kinds of information are often captured by current labor markets in a simplistic way through the registration process, inviting the participants to enumerate their characteristics, which is obviously not enough. (e.g.,people may think it necessary to provide information about his occupational skills, but few will tell the system that he need to pick up his child from kindergarten after work and he likes fishing) However, these neglected information are often important for organizers to tell whether the participant possess the required domain knowledge for requirements acquisition. So how to enable organizers to identify well-suited participants with required domain knowledge remains unsolved problem for current crowdsourcing system. Crowdsourcing system should make more efforts in understanding the participants in order to decide the domain knowledge of each individual owned.

Fortunately, the law of urban life has the power to reveal many potential characteristics of people clustering in particular spatiotemporal (ST) spaces [7, 8, 9, 10]. We observed that participants with a particular kind of domain knowledge often have the opportunity to cluster in a particular spatiotemporal space, i.e., the density of participants with the domain knowledge is high in particular spatiotemporal spaces and gradually decreases as the spatiotemporal dimension deviation. For example, the commuters can be identified by the rush hour on transportation stations; the foregoing children-monitoring software may identify well-suited participants who appear at the kindergarten after the school time; participants in Beijing at the Financial Street (where many financial companies clustered) at rush hour seems more likely be financial practitioners than

2014 IEEE 9th International Conference on Global Software Engineering

978-1-4799-4360-9/14 $31.00 © 2014 IEEE

DOI 10.1109/ICGSE.2014.26

65

who at the Xi Erqi subway station (where located many IT companies) at the same time.

With the ubiquity of interactive mobile devices providing location awareness and network connectivity, we expect this trend to accelerate. People carry their phones with them the entire day, providing them the opportunity to contribute at any time [11], and the spatiotemporal information of participants can be captured. However, building such a mobile crowdsourcing system for well-suited participants identification based on our observation remains a big challenge:

� Location privacy concerns [11, 12] arise since the ST information of participants should be traced in real time in order not to miss the chance to capture the participants in target ST space for task distribution.

� Energy consumption issues [13, 14, 29] arises since too often ST information gathering and uploading will dry out the energy of mobile devices quickly.

� Connectivity to the Internet via infrastructure (often the cellular network) becomes a prerequisite for ST information uploading and task distribution that is often limited, especially for low-income participants who may be sensitive to the cellular charges [15, 16]. And the availability of cellular network may not be guaranteed considering the overloaded cellular network at rush hour [17].

For this work, we focus on the recruitment issue, considered as one of the four key challenges [18], that how to enable organizers to identify well-suited participants for software requirements acquisition based on spatiotemporal availability. To overcome the forgoing challenges, it is of utmost importance to reduce the dependence on the real time ST information gathering and uploading from individuals, as well as reduce the dependence on the cellular network for task distribution. Inspired by a lately published achievement in pervasive computing, the Floating Content [19, 20, 21], our approach leverages its opportunistic philosophy and modifies it to fit for our recruitment framework. By employing the opportunistic network for task distribution among participants, the number of participants obtaining tasks from the server directly is expected to be orders of magnitude less than the number of participants who obtained the tasks. Furthermore, the whole distribution process can be monitored by the crowdsourcing system to evaluate the performance of the recruitment framework. We expect that such kind of recruitment framework can help development teams distribute requirement tasks to target participants with particular domain knowledge in any corner of the world and allow supporting more sophisticated crowdsourcing tasks (e.g., software testing) on top as long as the well-suited participants can be identified based on that simple observation.

II. RELATED WORK

As far as we know, this is the first work focusing on recruitment issue by leveraging characteristics of participants intrinsically cluster in particular ST spaces. In this section, we detail related work on the two main aspects of the paper. First, we summarize the recruitment methods in current

crowdsourcing system. Second, we review the recruitment methods based on opportunistic network.

Crowdsourcing sites like Amazon Mechanical Turk and GURU.com, often recruit participants in a passive way, i.e., tasks are published by organizers in the web page, waiting for participants to search. In Amazon Mechanical Turk, work done by a participant is evaluated in terms of whether it was accepted or rejected by requesters, so identification can be done by analyzing the history completed tasks of participants. Many qualification testing are provided for participants and organizers. Organizers can specify the qualification requirements on the participants before they can take the task. In GURU.com, the technical skill, creativity, timeliness, and communication capabilities of a worker are kept through a star-based rating system based on feedback from organizers [22]. Mobile crowdsourcing system often leverage the spatiotemporal availability of individuals to recruit participants. Crowdphysics [23] focus on how to crowdsourcing tasks that require participants to collaborate and synchronize both in time and physical space. CrowdPark [24] leveraging ST information to enable individuals to buy parking spots prior to leaving their home. We can see that recruitment methods in current crowdsourcing system is not fit for identification of participants with particular domain knowledge for software requirements acquisition. Furthermore, our work differ from these mobile crowdsourcing system is that we leverage the ST information to understanding the characteristics of participants rather than serving for specific geographic bonded task distribution, which make our work more general.

To the best of our knowledge, the only work that leverages opportunistic network for participant recruitment is [25], which shares a very similar idea with our approach. It is proposed to support participatory and opportunistic sensing. Based on behavior-oriented protocol, it can spread messages to group nodes matching a certain target behavioral profile in opportunistic networks. Our approach doesn’t need to save participants’ behavioral profile (namely the history ST information) for privacy concerns. Moreover, we evaluate our approach in different mobility model in the real world, as different mobility model plays an important role in evaluating the feasibility of opportunistic network.

III. SYSTEM DESIGN

We assume that all users are mobile participants and there is no supporting infrastructure. Participants use mobile phones or other mobile devices to get crowdsourcing tasks. Accordingly to our forgoing observation, participants with a particular kind of domain knowledge often have the opportunity to cluster in a particular ST space. Assuming that the organizers manage to specify ST spaces based on the observation, in this section, we detail system design for distributing tasks to participants in these specified ST spaces and monitoring the whole distribution process.

We are designing such a crowdsourcing system that can air drop a piece of software acquisition task to only a few of participants clustering at any ST space, and the information can spread itself to the whole group.

66

There are three roles involved in the system. Organizerspublish crowdsourcing tasks and they are responsible for specify the ST features of target participants. According to this specification, crowdsourcing platform will distribute tasks to participants who match the ST features and evaluate the performance of the whole distribution process. Participantsupload ST information to the crowdsourcing platformperiodically, and may be identified as suitable participants for particular tasks published by Organizers. The details is illustrated as follows:

A. Organizers Organizers are software teams, searching for suitable participants to complete requirement acquisition tasks. A task item I contain two separate parts: the Task Content (TC) and the Spatial Temporal Vector (STV).

TC includes the necessary task description and task body as common task item published in mobile crowdsourcing system.

STV includes the following control parameters: a certain lifetime (TTL), a certain effecttime (EFT), an anchor zone (AZ)defined by its center P and spread radii R. R defines the replication range within which participants always try to replicate I to other participants they encounter; EFT and TTLpair defines the replication time window within which participants can replicate I to others. So they should know the ST features of target participants based on common sense or field observation when they set STV. For example, if they want to collect some requirements from football fans when designing relevant mobile applications. They have to know when and where football fans may cluster, e.g., a football match will be on 6:00-8:00 pm at the Capital Gymnasium. Then the longitude and latitude of the gymnasium will be the value of P, and an appropriate R should be specified to cover a particular AZ. The time during the match can help calculate the value of EFT and TTL.

B. Crowdsourcing Platform Here we emphasis two major functions of the platform.

The first major function is participant identification and task distribution. Receiving the task item I, platform assigns a unique serial number (SN) for I and resolves its STV. According to the value of EFT, TTL, and AZ information, platform will push I to participants who “happen to” upload ST information that matches the requirements and do not have I yet. ST information contains the time stamp (T) and location (L) information. IfEFT<=T<=EFT+TTL L AZ, then the ST informationmatches a certain STV. These participants are identified as super nodes in target group. Platform expects super nodes can replicate I to other participants (common nodes) they encountered in the same ST space, and collecting far more responses from participants than the number of super nodes it directly pushes to. Platform decides whether the participants already have I by comparing SNs uploaded by participants. That is to say, task will not be pushed to participants who already have it.

The second major function is to monitor and evaluate the whole distribution process. Three key indicators will be calculated: (1) the rate of target participants in AZ that have

obtained I. (2) The total number of participants that have obtained I during the distribution process. (3) The number of participants obtaining I from the platform directly. These will be discussed in detail in Section IV.

C. Mobile Devices of participants Participants obviously have no pre-knowledge about the

tasks before published. They neither need to search for tasks nor put efforts in order to get tasks. The only thing they have to do is to make sure the crowdsourcing client is running background in their cellphones. For privacy concerns, energy concerns, and cellular traffic cost concerns, participants upload their ST information by selecting an acceptable time interval (TI). Participants can obtain tasks by two means: (a) as super nodes,get tasks from platform pushing operation (b) as common nodes,get tasks from participants nearby by Bluetooth or other wireless interfaces. Participants spread the given task Iaccording to the STV in a peer to peer way when two nodes are within the range of transmission, without caring about where they get the task. Device periodically gets location information to wrap as ST information and upload it to the platform according to a certain selecting time interval. All the SNs of tasks already distributed to this device will be uploaded along with ST information, enabling the platform to tell whether the participants have task I. ST information can be further shared in a collaborative way, getting from nearby devices in an accepted accuracy range, to minimize energy consumption.

The spread protocol between nodes derives from the floating content protocol. It follows the so-called 4-phase protocol for exchanging floating content messages. Generally speaking, the task item can “spread” to the whole group itself but confined at specific geographic range and lifetime. [19] gives very detail description.

IV. ANALYTICAL MODEL

In this section, we will analyze our system in a quantitate way. Floating Content (FC) [19] has be analytically studied by the so-called Non-spatial black-box model. Our approach have more considerations compared to classic FC model. Because we should take the time interval of participants uploading ST information into consideration for privacy, cost, and energy concerns and evaluate the distribution process.

A. Impact of different values of TI In [19], the analytical model have showed that given a

particular group of participants in a particular ST space, the distribution process will reach a steady state. In that state, the Penetration of the information in target participants will remain steady and we call the time to reach this state from the beginning as Convergence Time. Here we will review the model but taking the TI into consideration to see the impact of different values of TI on the Penetration at steady state andConvergence Time (CT).

Consider the anchor zone as a locale where the nodes enter and then spend some time and finally exit. Participants clustered intrinsically at a given ST space for any reason, e.g., they may wait for their children at a school gate at 6:00 pm, or

67

wandering around the Tian’An Men Square as tourists, can be taken as in a certain anchor zone at a certain time. The number of participants, denoted by N, in this anchor zone, is large enough and/or the density is stable enough that random fluctuations in N can be neglected. The mean sojourn time of participants is 1/ , the mean time interval of uploading ST information to platform is , and denotes the frequency a pair of participants come in contact with each other during sojourn time, i.e., within each other’s’ transmission ranges. So the

number of participants uploading ST information per unit time is N/ . The contact process is further assumed to constitute a Markov process, and the fraction of participants carrying the task item I is denoted by p. Then there are (1/2) N (N-1) (1/2) N2 pairs and the total rate of encounters is (1/2) N2 . The fraction 2p(1-p) indicates that a participants without the task item I gets it, and the total rate of such events is p(1-p) N2 .Considering the total exit rate of participants is N , then the exit rate of participants carrying task item is Np . Moreover,according to system design described in previous section, the platform pushes I to participants who upload ST informationthat matches the requirements but without I yet. Then the rate of super nodes growth is N(1-p)/ . The total task growth rate for p = p (t) satisfies the following differential equation:

N �� = ��(1 − �)� − �� + �(1 − �)1/ (1)

The first two term on the right-hand side are the same as typical Floating Content Model [19], and the third term corresponds to the rate at which uploading participants without the task item gets it from the platform directly. At a steady state dp/dt is zero.

Suppose that N=100, the mean sojourn time 1/ is normalized to one, and the rate of pair-wise contacts, and the time interval are free parameters. To make it simple, we use the same value of as in [26] for comparison to see if our approach is never worse than the FC model no matter what the value of is. Fig. 1 depicts solutions for the penetration p(t) as

a function of time t and time interval with a initial valuep(0)=1/N (the first participants get the task from the platform atthe beginning). In the left figure, is chosen in the same way as [26] that lim→� p(t) ≈ 0.8 , i.e., 80% of the participants on average has the task, and the CT is about 2 minutes. In the right figure, 20% of the nodes acquire the task and the CT is about 8 minutes. It is obviously that when the time interval tends to infinity, the performance of penetration and convergence time will be same as the FC model. When the frequency a pair of participants come in contact with each other is relatively big, i.e., 1/20 in the left figure, different time interval will not play an important role in the performance. However, when the frequency is relatively small, i.e., 1/80 in the right figure, a small time interval will dramatically decrease the CT and promote the penetration level. According to Fig. 1,we suggest that the set of time intervals for participants to select from should be around 50 minutes. Some incentive policy should be employed to encourage ones who configure a shorter interval obviously.

B. Evaluating the distribution process In the FC model [19], there is no means for the publishers to

monitor the distribution process. Because the value of p (t) cannot be measured. Our approach can capture this information by means of recording the data uploaded by participants every time interval. The platform can calculate the mean time interval

. This configuration parameter can be uploaded by the device of participants every time reporting ST information to the platform. We assume that the set of time intervals follows thenormal distribution with a mean time interval TI. And we further assume that if appropriate designed, the system is able to lead the participants to choose time intervals around a particular value as the mean value of the normal distribution.

There are many meaningful indicators for the platform and organizers to monitor the whole distribution process:

� Penetration p (t) is used to estimate rate of participants in target ST space now who have the task item.

Fig. 1. Solutions to (1) with different time interval

68

� Number of participants (S) who have obtained the task item during the whole distribution process.

� Number of participants (L) who obtained the task item during the whole distribution process from the platform directly.

As we have assumed that the platform have calculated the mean time interval . By recording the number of participants (n) uploading ST information at target ST space per unit time,Number of participants (N) at target ST space can be estimatedas:

N = ∗ � (2) Assuming that N, in this anchor zone, is large enough and/or

the density is stable enough that random fluctuations in N can be neglected, the p (t) can be calculated by recording the number of participants (s) who uploading ST information in this anchor zone but already have the task item. So the p (t) can be calculated as:

�(t) = �/� (3) In section IV we have known that the value of p (t) will reach

a steady state. So the CT can be derived. If the TTL is much longer than CT, e.g., TTL is 10 minutes and the CT is less than 1 minutes in the left figure of fig 2. S can be roughly calculated by the following equation:

� = � ∗ �(��) ∗ t ∗ � (4) Where p (CT) is the value when p (t) reach a steady state, i.e.,

t=CT. can be calculated by two means: based on field observation or using the mobility model to derive [5, 6, 7].

However, when it takes a relative long time to reach the steady state. And the TTL is not much longer than CT, e.g., TTL is 10 minutes but the CT is about 8 minutes in the right figure of fig 2. Using equitation (4) will be inaccurate. Then the S should be calculated by the following integral formulas:

� = � ∗ � ∗ ∫ �(�)�� (5)

Fig. 2. Number of Participants obtained the task

p (t) can be obtained by dissolving the differential equation (1). Fig. 2 take the situation of the right figure in fig 1 as an example to illustrate the participants obtained the task item during the whole distribution process. Here we set the time interval to 50 minutes. The value of S is about 138 at 10 minutes according to formula (5), while only 44 (138-94) participants obtained the task item if we use the simplified equation (4). Because during the convergence procedure, 94 participants have obtained the tasks but neglected by equation (4), making the result inaccurate.

So if the organizer expected X participants to be involved, the critical condition is:

� = � ∗ � ∗ ∫ �(�)�� ≥ � (6)

The number of participants who obtain the task item from the platform directly during the whole distribution process:

� = ∫ � ∗ (1 − �(�))�� ≤ � ∗ �� (7)

V. SIMULATED EXPERIMENT

In this section, we present two experiments using the real world mobility data collected from Xi Erqi subway station and Tian Anmen square to evaluate the feasibility of our approach.Furthermore, we also validate our recruitment framework by comparing with the simply random task distribution approach.

We choose Xi Erqi subway station and Tian An Men square as experimental environments to give a comprehensive evaluation of the feasibility of our approach. Because they can represent two well-known mobility models. The area of Xi Erqi subway station is relatively small but the density of people clustered here is relatively high. People waiting for entrance to the station often follow the direct mobility model (DMM) [26]. That means all directions equally likely. Furthermore, Xi Erqi is a place that clustering many IT companies and we observed that most of the passengers waiting for trains at rush hour are IT relevant workers. On the contrary, the area of Tian Anmen square is large but the density of people clustered there is relatively low. People wandering at the square often follow the random waypoing model (RWP) [27]. People in Tian Anmen square are often tourists. So if you are developing navigation applications for tourists, or other relevant apps, these people may be good candidates.

To validate our recruitment framework, we will compare our approach with a random distribution framework. The latter onewill push task items to participants by random selection, i.e., the platform will not care about the participants’ ST information and tasks will not be spread among participants. The following indicators will be evaluated for comparison:

� Complete Time spent on distributing tasks to a given number of target participants with a given distribution rate of the platform.

� Direct connectivity number indicate the number of participants who obtain the task item from the platform directly during the whole distribution process. It can be estimated by formula (7).

0 1 2 3 4 5 6 7 8 9 10

0

50

100

150

X: 9.977Y: 138

Time t

Analytical model: N = n*� = 100, � = 1, � = 1/80, � =50

Num

ber o

f par

ticip

ants

rece

ived

the

task

s S

X: 8.014Y: 94.64

69

Complete Time is used to evaluate if our approach can help accelerate the process of participant recruitment. Direct connectivity number reflects the degree of dependence on the connectivity between the mobile devices and the platform. (Connectivity consumes cellular traffic, so it will involve energy and traffic cost concern)

A. Subway Station For simplicity, here we simply estimate it by focusing only

on the people flow waiting for entrance to the station. Fig. 3 shows the people flow at about 17:30 pm at Xi Erqi station when works are waiting trains to go back home.

We observed that the flow at the entrance. The target anchor zone is then confined at an region of approximately A= 600 m2

area with side lengths of 10 m 60 m. The number of participants (N) at this area is about 112. Transmission range dis 10 m. The mean sojourn time is about 3 minutes and the people move at a constant speed of v=0.5 m/s. According to [26], the frequency a pair of participants come in contact with each other can be derived by the following equation:

� = 8��!" = 8 × 10 × 0.5

! × 600 = 0.0212 ≈ 0.02

Fig. 4. Evaluation at Xi Erqi subway station Fig. 4 shows the performance of the system when put in Xi

Erqi subway station by simulation. The Convergence Time is about 4 minutes (assuming the time interval is about 50 minutes), and the penetration when reach a steady state is about

0.85, i.e., p (CT) = 0.85. So equation (4) is feasible to estimate the value of S. For example, if organizer want to know when the task item can be distributed to about 1000 participants in that anchor zone. Then the time can be derived by:

� = �� ∗ �(��) ∗ � = 1000

112 ∗ 0.85 ∗ #13$

≈ 31.5

After 31.5 minutes, more than 1000 participants have obtained the task item.

Suppose that the density of target participants (IT relevant workers) by random selection is a, and the density of target participants in particular ST space (Xi Erqi station) is a+ .Then the Complete Time T1of our approach is:

�1 = �� ∗ �(��) ∗ � ∗ (% + ∆)

When put in random selection based approach, the Complete Time T2 is:

�2 = �% ÷ �

= � ∗ % ∗ �

N/ is the same distribution rate of the platform for both approaches. It indicates the number of participants that the platform pushes the task item to per time unit.

�2�1 = (�(��) ∗ � ∗ ) ∗ (% + ∆)

% = 0.85 ∗ '1

3* ∗ 50 ∗ (% + ∆)% ≈ 14 ∗ (% + ∆)

% ≥ 14This means that in order to recruit a given number of target

participants, our approach will take a Complete Time 14 times shorter than the random selection based approach in this situation. We conduct a survey on 100 passengers who walk into the station at rush hour, finding that almost 100% of them are working in IT relevant company, and 78% of them are technical practitioners in IT companies (people in IT companies may engage in other non-technical jobs). Beijing

Municipal Bureau of statistics has published [28] that the number of IT practitioners in Beijing in 2013 is about 700,000, and the total number of practitioners is about 11,073,000, i.e., 6% of practitioners in Beijing are IT practitioners. Then we can roughly estimate that (,-∆)

, = 79%;% = 13 , and ��

�< ≈ 13 ∗ 14 =

02

46

8

050

100

150200

0

0.2

0.4

0.6

0.8

1

Time t

Xi Erqi subway Station: N = 112, � = 1/3, and � = 0.02

Time interval �

Pen

etra

tion

p

Fig. 3. People flow at Xi Erqi station

70

182. That means our approach can significantly accelerate the process to recruit target participants in this situation.

Furthermore, the Direct connectivity number D of our approach during the whole distribution process (1000 participants in that anchor zone have obtained the task item) is:

� = > � ∗ (1 − �(�))��

�

�≤ �

∗ � ≈ 70Then no more than 7% (70/1000) of the participants obtained

the task item from the platform directly, and 93% of the participants obtained the task item from the surrounding peers. So our approach gets rid of the strong dependence on the cellular network by leveraging the opportunistic network.

B. City SQUARE In this case, we want to evaluate the performance of the

system to see if it can be feasible when put into a large scale with low density of people. Moreover, we are interested in the performance when � (the frequency a pair of participants come in contact with each other) is very low. Because in Section IV, our analytical model suggested that if the � is very low, the convergence time will be long and the penetration at steady state is small. Fig. 5 shows the people flow at Tian Anmen square.

The whole area of Tian Anmen square is very large. For simplicity, we limit ourselves to consider a region of approximately A=100,000 m2 area in front of the Tian Anmen gate tower with side lengths of 200 m 500 m. The number of participants: N 500. Transmission range is 10 m. The sojourn time we observed is more than 10 minutes (We traced several group of people and individuals, finding that most of them spend more than 10 minutes at the anchor zone). Here we choose 10 minutes as the sojourn time because we want to see that under the worst situations, whether our approach can work well and evaluate the performance. The speed of people here is v=1 m/s. Here we choose the City square model [26] based on the observation that people at Tian Anmen square follow the random waypoint mobility model. The frequency a pair of participants come in contact with each other can be derived in the same way as in Xi Erqi station by the following equation:

� = 8��!" = 8 × 10 × 1

! × 100000 ≈ 0.00025

Fig. 6. Evaluation at Tian Anmen square

Fig. 6 shows the performance of the system when put in Tian Anmen square by simulation. The Convergence Time is about 45 minutes (assuming the time interval is about 50 minutes), and the penetration when reach a steady state is about 0.4, i.e., p (CT) = 0.4.

Fig. 7. Number of Participants obtained the task As the Convergence Time is relative long. Equation (4) is not

accurate to estimate the value of S if the TTL is not very long.

0

20

40

60

80

100

050

100150

2000

0.2

0.4

0.6

0.8

1

Time t

Tian Anmen Square: N = 500, � = 1/10, and � = 0.00025

Time interval �

Pen

etra

tion

p

0 10 20 30 40 50 60

0

100

200

300

400

500

600

700

800

900

1000

Time t

Tian Anmen Square: N = 500, � = 1/10, and � = 0.00025, � =50

Num

ber o

f par

ticip

ants

rece

ived

the

task

s S

Fig. 5. People flow at Tian Anmen square

71

According to integral formulas (5), Fig. 7 illustrate the distribution process with the time goes by.

We can find that after 60 minutes, the whole distribution process has succeeded to spread the task item to about 1000 participants.

Here we will evaluate our approach in the same way as we did for Xi Erqi subway station. Suppose that the density of target participants (tourists) by random selection is a, and the density of target participants in particular ST space (Tian Anmen Square) is a+ . Then the Complete Time T1of our approach is:

�1 = 60(% + ∆)

When put in random selection based approach, the Complete Time T2 is:

�2 = �% ÷ �

= � ∗ % ∗ �

Then

�2�1 = � ∗

� ∗ 60 ∗ (% + ∆)%

= 1000 ∗ 50500 ∗ 60 ∗ (% + ∆)

% ≈ 1.67 ∗ (% + ∆)% ≥ 1.67

This means that in order to recruit a given number of target participants, our approach will take a Complete Time 1.67 times shorter than the random selection based approach in this situation. We surveyed 100 people in Tian Anmen square, finding that 87% (% + ∆) of them are tourists, but it is difficult to estimate the value of the mean density a of target participants (tourists) in the whole city. However, it is common sense that the mean density % is much smaller than 87%, making the result of(,-∆)

, ≫ 1, and ��< ≫ 1.67

Furthermore, the Direct connectivity number D of our approach during the whole distribution process (1000 participants in that anchor zone have obtained the task item) is:

� = > � ∗ A1 − �(�)B�� = 10 ∗ > A1 − �(�)B��

;�

�

;�

�≈ 400

Then about 40% (400/1000) of the participants obtained the task item from the platform directly, and 60% of the participants obtained the task item from the surrounding peers.

VI. LIMITATIONS AND FUTURE WORK

This study has the following limitations. Our recruitment framework are of course biased due to availability of spatiotemporal information of participants. Although we demonstrated that our approach gets rid of the dependence on the real time spatiotemporal information of participants (allowing the participants to configure a time interval), and leverage the opportunistic network to help distribute tasks among participants, a quantitative analysis on energy cost is still necessary to be carried out. Moreover, we evaluated our approach through a highly abstract model capturing only the essential elements. The size of buffer in smartphones, the size

of task item suitable for distribution, and many other details arenot taken into consideration in this study. Our future will focus on these details to build the system based on our approach.

VII. APPLICATIONS Many published work focused on such a global RE scenario

that software are designed for a particular group of stakeholders, with RE teams and stakeholders geographically distributed. These scenario can be roughly considered as the outsourcing RE. However, with the penetration of Internet, mobile devices, and the globalization process, many software products are designed for the mass dispersed across different cultural, time zones and continents. To make a comprehensive understanding of the users’ demands, it is necessary to investigate enough

users from all over the world for requirement acquisition (e.g.,considering the need of localization and customization). In this section, we will give an application scenario to illustrate how our recruitment framework can be used for requirements acquisition when software products are designed for the mass from all over the world.

Suppose that a development team are designing a gaming software for football sports betting. Software localization and customization are needed considering the difference of local policy, cultural or even the competitor from local similar software products. So it is necessary to investigate enough football fans from different regions for requirement acquisition. Obviously, it is not cost-efficient for the development team to fly to different regions all over the world to do investigation onfootball fans. Furthermore, current online crowdsourcing systems like Amazon Mechanical Turk may help distribute tasks to participants all over the world but failed to provide identification among participants. With the recruitment framework proposed in this work, the development team can distribute tasks to football fans by telling the system when and where target participants are clustering (e.g., an important football match will be hold at a particular time at a particular place). From the point view of development team, they can distribute requirement tasks to clustered target participants from different regions all over the world in an extremely cost-efficient way.

By enabling the developers to identify participants with required domain knowledge, we expect the crowd sourcing based RE model can take a preliminary step to invite non-programmers all over the world to make contributions to software engineering.

VIII. CONCLUSION

This work focuses on the recruitment issue that how to enable organizers to identify well-suited participants for software requirements acquisition based on spatiotemporal availability of participants. Requirements acquisition tasks call for participants with particular kind of domain knowledge. The observation that participants with a particular kind of domain knowledge often have the opportunity to cluster in particular spatiotemporal spaces was recognized in this work. Based on this observation, we propose a novel opportunistic participant recruitment framework to enable organizers to recruit

72

participants with desired kind of domain knowledge in a more efficient way.

To evaluate the feasibility of our approach, we conduct both theoretical study on analytical model and simulated experiment on real world mobility model. In the theoretical study, we build an analytical model of our approach to compare the performance with the original approach of floating content. Through specifying the exact same value of parameters as the original approach, we proved that our approach works better than in Convergence Time and the Penetration no matter what the time interval is. Furthermore, we illustrated how the time interval affects the performance and propose the way to evaluate the whole distribution process. In the simulated experimental study, we put our recruit framework to the real world background. We evaluated our approach in two scenarios (with different mobility model and different crowd density), validating the feasibility of the underlying opportunistic approach of our recruitment framework. Furthermore, comparing with the simple random selection approach (distribute tasks to random selected participants), we found that our approach will significantly accelerate the process to recruit target participants in Complete Time, and reduce the dependence on the cellular infrastructure in Direct connectivity number.

Identifying well-suited participants is a common issue in crowdsourcing system. We expect that such kind of recruitment framework can help development teams distribute requirement tasks to target participants with particular domain knowledge in any corner of the world and allow supporting more sophisticated crowdsourcing tasks (e.g., software testing) on top as long as the well-suited participants can be identified based on the observation we recognized in this work.

ACKNOWLEDGMENT

This work is funded by the National High Technology Research and Development Program of China (863) under Grant No. 2013AA01A605, the National Basic Research Program of China (973) under Grant No. 2011CB302604 and the National Natural Science Foundation of China under Grant No.61121063.

REFERENCES

[1] Maiden, N. A. M., & Rugg, G. (1996). ACRE: selecting methods for requirements acquisition. Software Engineering Journal, 11(3), 183-192.

[2] Adepetu, Adedamola, Khaja Altaf Ahmed, Yousif Al Abd, Aaesha Al Zaabi, and Davor Svetinovic. "CrowdREquire: A Requirements Engineering Crowdsourcing Platform." In 2012 AAAI Spring Symposium Series. 2012.

[3] Howe, J. (2006). The rise of crowdsourcing. Wired magazine, 14(6), 1-4. [4] Gottesdiener E. Requirements by Collaboration: Workshops for Defining

Needs. Boston: Addison-Wesley, 2002 [5] Wiegers K E. Software Requirements. 2nd ed. Redmond: Microsoft Press,

2003 [6] Robertson S, Robertson J C. Mastering the Requirements Process. 2nd ed.

Boston: Addison-Wesley, 2006 [7] Moreira, Waldir, Paulo Mendes, and Susana Sargento. "Opportunistic

routing based on daily routines." In World of wireless, mobile and multimedia networks (WoWMoM), 2012 IEEE international symposium on a, pp. 1-6. IEEE, 2012.

[8] Song, Chaoming, Zehui Qu, Nicholas Blumm, and Albert-László Barabási. "Limits of predictability in human mobility." Science 327, no. 5968 (2010): 1018-1021.

[9] Brockmann, Dirk, Lars Hufnagel, and Theo Geisel. "The scaling laws of human travel." Nature 439, no. 7075 (2006): 462-465.

[10] Ahas, Rein, Anto Aasa, Siiri Silm, and Margus Tiru. "Daily rhythms of suburban commuters’ movements in the Tallinn metropolitan area: case study with mobile positioning data." Transportation Research Part C: Emerging Technologies 18, no. 1 (2010): 45-54.

[11] Alt, F., Shirazi, A. S., Schmidt, A., Kramer, U., & Nawaz, Z. (2010, October). Location-based crowdsourcing: extending crowdsourcing to the real world. In Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries (pp. 13-22). ACM.

[12] Wang, Yang, Yun Huang, and Claudia Louis. "Respecting User Privacy in Mobile Crowdsourcing." SCIENCE 2, no. 2 (2013): pp-50.

[13] Chatzimilioudis, Georgios, Andreas Konstantinidis, Christos Laoudias, and Demetrios Zeinalipour-Yazti. "Crowdsourcing with smartphones." Internet Computing, IEEE 16, no. 5 (2012): 36-44.

[14] Yang, Dejun, Guoliang Xue, Xi Fang, and Jian Tang. "Crowdsourcing to smartphones: incentive mechanism design for mobile phone sensing." InProceedings of the 18th annual international conference on Mobile computing and networking, pp. 173-184. ACM, 2012.

[15] Gupta, Aakar, William Thies, Edward Cutrell, and Ravin Balakrishnan. "mClerk: enabling mobile crowdsourcing in developing regions." In Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems, pp. 1843-1852. ACM, 2012.

[16] Narula, Prayag, Philipp Gutheim, David Rolnitzky, Anand Kulkarni, and Bjoern Hartmann. "MobileWorks: A Mobile Crowdsourcing Platform for Workers at the Bottom of the Pyramid." In Human Computation. 2011.

[17] Bao, Xuan, Yin Lin, Uichin Lee, Ivica Rimac, and Romit Roy Choudhury. "DataSpotting: Exploiting naturally clustered mobile devices to offload cellular traffic." In INFOCOM, 2013 Proceedings IEEE, pp. 420-424.

[18] Doan, A., Ramakrishnan, R., & Halevy, A. Y. Crowdsourcing systems on the world-wide web. Communications of the ACM, 54(4), 86-96.

[19] Ott, J., Hyytia, E., Lassila, P., Vaegs, T., & Kangasharju, J. (2011, March).Floating content: Information sharing in urban areas. In IEEE International Conference on Pervasive Computing and Communications (PerCom), 136-146.

[20] Hyytia, E., Virtamo, J., Lassila, P., Kangasharju, J., & Ott, J. When does content float? Characterizing availability of anchored information in opportunistic content sharing. In INFOCOM, 2011 (pp. 3137-3145)

[21] J. Ott, Aalto University. Floating Content. Retrieved on 18th Jan, 2014 from: http://www.floating-content.net/

[22] Reddy, Sasank, Deborah Estrin, and Mani Srivastava. "Recruitment framework for participatory sensing data collections." In Pervasive Computing, pp. 138-155. Springer Berlin Heidelberg, 2010.

[23] Sadilek, A., Krumm, J. & Horvitz, E. (2013). Crowdphysics: Planned and Opportunistic Crowdsourcing for Physical Tasks.. In E. Kiciman, N. B. Ellison, B. Hogan, P. Resnick & I. Soboroff (eds.), ICWSM.

[24] Yan, T., Hoh, B., Ganesan, D., Tracton, K., Iwuchukwu, T., & Lee, J. S. (2011). CrowdPark: A crowdsourcing-based parking reservation system for mobile phones. University of Massachusetts at Amherst Tech. Report.

[25] Tuncay, G. S., Benincasa, G., & Helmy, A. (2013). Autonomous and distributed recruitment and data collection framework for opportunistic sensing. ACM SIGMOBILE Mobile Computing and Communications Review,16(4), 50-53.

[26] Desta, Michael Solomon, Esa Hyytia, Jorg Ott, and Jussi Kangasharju. "Characterizing content sharing properties for mobile users in open city squares." In Wireless On-demand Network Systems and Services (WONS), 2013 10th Annual Conference on, pp. 147-154. IEEE, 2013.

[27] C. Bettstetter, H. Hartenstein, and X. P´erez-Costa, “Stochastic propertiesof the random waypoint mobility model,” ACM/Kluwer WirelessNetworks, vol. 10, no. 5, Sep. 2004

[28] Beijing Municipal Bureau of statistics. Beijing Statistical Yearbook. Retrieved on 18th Jan, 2014 from: http://www.bjstats.gov.cn/nj/main/2013-tjnj/index.htm

[29] Ganti, R. K., Ye, F., & Lei, H. (2011). Mobile crowdsensing: Current state and future challenges. Communications Magazine, IEEE, 49(11), 32-39

73

a participant recruitment framework for...

Documents