1 software engineers vs. machine learning algorithms: … software engineers vs. machine learning...

22
1 Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Nathalia Nascimento, Carlos Lucena, Paulo Alencar and Donald Cowan Abstract—Several papers have recently contained reports on applying machine learning (ML) to the automation of software engineering (SE) tasks, such as project management, modeling and development. However, there appear to be no approaches comparing how software engineers fare against machine-learning algorithms as applied to specific software development tasks. Such a comparison is essential to gain insight into which tasks are better performed by humans and which by machine learning and how cooperative work or human-in-the-loop processes can be implemented more effectively. In this paper, we present an empirical study that compares how software engineers and machine-learning algorithms perform and reuse tasks. The empirical study involves the synthesis of the control structure of an autonomous streetlight application. Our approach consists of four steps. First, we solved the problem using machine learning to determine specific performance and reuse tasks. Second, we asked software engineers with different domain knowledge levels to provide a solution to the same tasks. Third, we compared how software engineers fare against machine-learning algorithms when accomplishing the performance and reuse tasks based on criteria such as energy consumption and safety. Finally, we analyzed the results to understand which tasks are better performed by either humans or algorithms so that they can work together more effectively. Such an understanding and the resulting human-in-the-loop approaches, which take into account the strengths and weaknesses of humans and machine-learning algorithms, are fundamental not only to provide a basis for cooperative work in support of software engineering, but also, in other areas. Index Terms—Machine learning, human-in-the-loop, software engineer, automatic software engineering, internet of things, empirical study 1 I NTRODUCTION S OFTWARE engineering processes can be very complex, costly and time-consuming [1]. They typically consist of a collection of related tasks [2] such as designing, implementing, main- taining, testing and reusing software applications [3]. In addition, as software has become embed- ded in systems of all kinds, millions of computer programs have to be corrected, adapted, and Laboratory of Software Engineering (LES) at the Depart- ment of Informatics, Pontifical Catholic University of Rio de Janeiro, Brazil. CAPES scholarship/Program 194/Process: 88881.134630/2016-01 E-mail: nnascimento,[email protected] See http://www.inf.puc-rio.br/nnascimento/ David R. Cheriton School of Computer Science. University of Waterloo, Canada. E-mail: palencar,[email protected] enhanced [2]. As a result, the field of software engineering requires millions of skilled Infor- mation Technology (IT) professionals to create millions of lines of code, which must be installed, configured, tuned, and maintained. According to Kephart (2005) [4], in the near future, it will be ex- tremely challenging to manage IT environments, even for the most skilled IT professionals. Several researchers have proposed the use of artificial intelligence, especially machine-learning (ML) techniques, to automate different software engineering (SE) tasks [3], [5]–[20]. For exam- ple, Zhang has extensively studied this theme recently and in [3] he stated that: ”The field of software engineering turns out to be a fertile ground where many software development and maintenance arXiv:1802.01096v2 [cs.SE] 7 Feb 2018

Upload: leque

Post on 08-May-2018

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Software Engineers vs. Machine Learning Algorithms: … Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Nathalia Nascimento,

1

Software Engineers vs.Machine Learning Algorithms:An Empirical Study AssessingPerformance and Reuse Tasks

Nathalia Nascimento, Carlos Lucena, Paulo Alencar and Donald Cowan

Abstract—Several papers have recently contained reports on applying machine learning (ML) to theautomation of software engineering (SE) tasks, such as project management, modeling and development.However, there appear to be no approaches comparing how software engineers fare against machine-learningalgorithms as applied to specific software development tasks. Such a comparison is essential to gain insightinto which tasks are better performed by humans and which by machine learning and how cooperative work orhuman-in-the-loop processes can be implemented more effectively. In this paper, we present an empirical studythat compares how software engineers and machine-learning algorithms perform and reuse tasks. Theempirical study involves the synthesis of the control structure of an autonomous streetlight application. Ourapproach consists of four steps. First, we solved the problem using machine learning to determine specificperformance and reuse tasks. Second, we asked software engineers with different domain knowledge levels toprovide a solution to the same tasks. Third, we compared how software engineers fare againstmachine-learning algorithms when accomplishing the performance and reuse tasks based on criteria such asenergy consumption and safety. Finally, we analyzed the results to understand which tasks are betterperformed by either humans or algorithms so that they can work together more effectively. Such anunderstanding and the resulting human-in-the-loop approaches, which take into account the strengths andweaknesses of humans and machine-learning algorithms, are fundamental not only to provide a basis forcooperative work in support of software engineering, but also, in other areas.

Index Terms—Machine learning, human-in-the-loop, software engineer, automatic software engineering,internet of things, empirical study

F

1 INTRODUCTION

SOFTWARE engineering processes can be verycomplex, costly and time-consuming [1].

They typically consist of a collection of relatedtasks [2] such as designing, implementing, main-taining, testing and reusing software applications[3]. In addition, as software has become embed-ded in systems of all kinds, millions of computerprograms have to be corrected, adapted, and

• Laboratory of Software Engineering (LES) at the Depart-ment of Informatics, Pontifical Catholic University of Rio deJaneiro, Brazil. CAPES scholarship/Program 194/Process:88881.134630/2016-01E-mail: nnascimento,[email protected]

• See http://www.inf.puc-rio.br/∼nnascimento/

• David R. Cheriton School of Computer Science. Universityof Waterloo, Canada.E-mail: palencar,[email protected]

enhanced [2]. As a result, the field of softwareengineering requires millions of skilled Infor-mation Technology (IT) professionals to createmillions of lines of code, which must be installed,configured, tuned, and maintained. According toKephart (2005) [4], in the near future, it will be ex-tremely challenging to manage IT environments,even for the most skilled IT professionals.

Several researchers have proposed the use ofartificial intelligence, especially machine-learning(ML) techniques, to automate different softwareengineering (SE) tasks [3], [5]–[20]. For exam-ple, Zhang has extensively studied this themerecently and in [3] he stated that:

”The field of software engineering turnsout to be a fertile ground where manysoftware development and maintenance

arX

iv:1

802.

0109

6v2

[cs

.SE

] 7

Feb

201

8

Page 2: 1 Software Engineers vs. Machine Learning Algorithms: … Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Nathalia Nascimento,

2

tasks could be formulated as learningproblems and approached in terms oflearning algorithms.”

However, there is a lack of approaches tocompare how software engineers fare againstmachine-learning algorithms for specific soft-ware development tasks. This comparison is crit-ical in order to evaluate which S.E. tasks arebetter performed by automation and which re-quire human involvement or human-in-the-loopapproaches [21], [22]. In practice, because thereare no explicit comparisons between the tasksperformed by engineers and automated proce-dures, including machine learning, it is often notclear when to use automation in a specific setting.For example, a Brazilian company acquired asoftware system to select petroleum explorationmodels automatically, but the engineers decidedthey could provide a better solution manually.However, when there was a comparison of themanual solution with the one provided automat-ically by the system, it became clear that theautomated solution was better. This illustratesthat a lack of comparisons makes choosing amanual or an automated solution or a combinedhuman-in-the-loop approach difficult.

This paper, contains an empirical study [23] tocompare how software engineers and machine-learning algorithms achieve performance andreuse tasks. The empirical study uses a case studyinvolving the creation of a control structure foran autonomous streetlight application. The ap-proach consists of four steps. First, the problemwas solved using machine learning to achievespecific performance and reuse of tasks. Sec-ond, we asked software engineers with differentdomain-knowledge levels to provide a solutionto achieve the same tasks. Third, we comparedhow software engineers compare with machine-learning algorithms when accomplishing the per-formance and reuse tasks based on criteria suchas energy consumption and safety. Finally, theresults were analyzed to understand which tasksare better performed by either humans or al-gorithms so that they can work together moreeffectively.

Such an understanding is essential in re-alizing novel human-in-the-loop approaches inwhich machine-learning procedures assist soft-ware developers in achieving tasks. Such human-in-the-loop approaches, which take into accountthe strengths and weaknesses of humans andmachine-learning algorithms, are fundamentalnot only to provide a basis for cooperative work

in software engineering, but also in other appli-cation areas.

This paper is organized as follows: Section2 presents the empirical study describing re-search questions, hypotheses and the objectiveof the study. Section 3 presents the method se-lected to collect our empirical data. Sections 4and 5 present the experimental results. Section6 presents the threats to the validity of our ex-periment. Section 7 presents the related work.The paper ends with concluding remarks andsuggestions for future work.

1.1 MotivationThe theme of this paper, namely whether arti-ficial intelligence such as machine learning, canbenefit software engineering, has been investi-gated since 1986, when Hebert A Simon pub-lished a paper entitled “Whether software engi-neering needs to be artificially intelligent” [24].In this paper, Simon discussed “the roles thathumans now play versus the roles that could betaken over by artificial intelligence in developingcomputer systems.” Notwithstanding, in 1993,Ian Sommerville raised the following question[25]: “What of the future - can Artificial Intelli-gence make a contribution to system engineer-ing?” In this paper [25], Sommervile performed aliterature review in applications of artificial intel-ligence to software engineering, and concludedthat:

“the contribution of AI will be in sup-porting...activities that are characterizedby solutions to problems which areneither right nor wrong but which aremore or less appropriate for a par-ticular situation...For example, require-ments specification and analysis whichinvolves extensive consultation with do-main experts and in project manage-ment.”

Several papers have since investigated the useof Machine Learning (ML) [26] in solving differ-ent software engineering (SE) tasks [5]–[20], [27]–[103]. These investigations include approachesto: i) project management [27]–[49], dealing withproblems related to cost, time, quality prediction,and resource management; ii) defect prediction[50]–[84]; iii) requirements management, focus-ing on problems of classifying or representing re-quirements [85]–[88], or generating requirements[89]; iv) software development, such as code gen-eration [20], [68], [90]–[96], synthesis [97]–[101],and code evaluation [102], [103].

Page 3: 1 Software Engineers vs. Machine Learning Algorithms: … Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Nathalia Nascimento,

3

Most of these papers present successful ap-plications of machine learning in software engi-neering, showing that ML techniques can providecorrect automatic solutions to some SE problems.However, very few papers discuss whether ornot a domain expert could propose a manualsolution more appropriate for the particularsituation. “More appropriate”, means a solutionthat provides better performance or increases an-other quality that is important to a particular ap-plication scenario, such as user preference [104].For example, in the medical and aviation engi-neering fields, trust [105] in a solution providedto the end-user is an important factor to considerfor a solution to be more appropriate. However,although many authors [106]–[109] have beenpromoting the use of neural networks [110] inmedicine, Abbas et al. [105] and Castelvecchi[111] are among the few authors who questioned:“what is the trustworthiness of a prediction madeby an artificial neural network?”

In other application scenarios, such as manyof those related to the Internet of Things (IoT)[112], [113], numerous authors [93], [95], [101],[114] consider the reuse of a solution as an im-portant quality. They agree that to achieve thegoal of billions of things connected to the Internetover the next few years [112], it is necessary tofind ways to reduce time to market. For example,it is desirable that the solution or parts of thesolution to design autonomous streetlights [96]for a specific scenario could be reused to designstreetlights for another scenario.

In particular, the Internet of Things has con-siderably increased the number of approachesthat propose the use of machine learning toautomate software development [93]–[96], [101],[115]. None of this research contains a compar-ison of their results to experiments designed byIoT experts. For example, do Nascimento and Lu-cena [101], [116] developed a hybrid frameworkthat uses learning-based and manual programsynthesis for the Internet of Things (FIoT). Theygenerated four instances of the framework [101],[108], [117], [118] and used learning techniquesto synthesize the control structure automatically.These authors stated that the use of machinelearning made feasible the development of theseapplications. However, they did not present anyexperiment without using learning techniques. Incontrast, most of the solutions released for theInternet of Things, such as Apple’s HomeKit’sapproach [119] and Samsung Smart Things, [120]consider a software developer synthesizing the

control structure for each thing manually.

1.2 Objective

In this context, we decided to ask the followingquestion: “How do software engineers comparewith machine-learning algorithms?” To explorethis question, we selected the Internet of Thingsas our application domain and then, compareda solution provided by a skilled IoT professionalwith a solution provided by a learning algorithmwith respect to performance and reuse tasks.In short, Figure 1 depicts the theory [121] thatwe investigate in this paper. According to thetheory, the variables that we intend to isolateand measure are the performance and reusabil-ity achieved from three kinds of solutions: i)solutions provided by learning techniques; ii)solutions provided by software engineers withIoT skills; and iii) solutions provided by softwareengineers without IoT skills.

To evaluate the relationship among these vari-ables, we performed an empirical study, usingFIoT [101]. As shown in Figure 1, we raisedfour research questions (RQx) to investigate ourtheory’s propositions (e.g hypotheses (H-RQx)).We present these questions and hypotheses inSection 2. To collect and analyze our empiri-cal data, we performed a controlled experiment.To perform this experiment, we reproduced theproblem of synthesizing the control structureof autonomous streetlights using neuroevolution(i.e. “a learning algorithm which uses geneticalgorithms to train neural networks” [122]) pre-sented in [118]. Then, we invited 14 softwareengineers to provide a solution for the sameproblem using the same architecture and envi-ronment. Lastly, we compared the solution pro-vided by the learning algorithm against the so-lutions provided by the software engineers. Inthis application of autonomous streetlights, weare considering a “more appropriate” solutionas one that presents a better performance inthe main scenario [118] or can be satisfactorilyreused in a new scenario, based on criteria suchas minimal energy consumption and safety(that is, maximum visual comfort in illuminatedareas).

Page 4: 1 Software Engineers vs. Machine Learning Algorithms: … Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Nathalia Nascimento,

4

ActorTechnology SETasks

SoftwareSystem

MachineLearningTechniques

InternetofThings

Performance

Reusability

Synthesizeand reusethecontrol structure ofautonomous things

Improve (H-RQ1,H-RQ3)

Increase (H-RQ2,H-RQ4)

withIoT skills

without IoT skills

produce solutions betterthan (H-RQ3,H-RQ4)

produce solutionsbetterthan(H-RQ1,H-RQ2)

Software Engineers

Fig. 1. Theory [121]: Machine Learning can create solutions more appropriate than software engineers in the context ofthe Internet of Things.

2 HOW DO SOFTWARE ENGINEERS COM-PARE WITH MACHINE-LEARNING ALGO-RITHMS? AN EMPIRICAL STUDY AD-DRESSING PERFORMANCE AND REUSE INTHE IOT DOMAIN

The experimental goal, based on the Goal-Question-Metric (GQM) method [123] is to usethe Framework for the Internet of Things (FIoT)for the purpose of comparing the use of anautomated approach against a manual approachwhen synthesizing the control of autonomousthings with respect to their performance andreuse.

For this purpose, we asked four researchquestions (RQs) and performed a controlled ex-periment [23] (section 3) to investigate them.

2.1 Questions

In terms of synthesizing the control structure ofautonomous things, how does the result froma machine learning-based solution differ fromsolutions provided by...

RQ1. ...software engineers with IoTskills with respect to their perfor-mance?RQ2. ...software engineers with IoTskills with respect to their re-usability?RQ3. ...software engineers without IoTskills with respect to their perfor-mance?

RQ4. ...software engineers without IoTskills with respect to their re-usability?

2.2 HypothesesEach RQ is based on one or more hypotheses,which are described next.

H - RQ1.

• H0. An ML-based approach does not im-prove the performance of autonomousthings compared to solutions provided byIoT expert software engineers.

• HA. An ML-based approach improves theperformance of autonomous things com-pared to solutions provided by IoT expertsoftware engineers.

H - RQ2.

• H0. An ML-based approach does not in-crease the reuse of autonomous thingscompared to solutions provided by IoTexpert software engineers.

• HA. An ML-based approach increases thereuse of autonomous things compared tosolutions provided by IoT expert softwareengineers.

H - RQ3.

• H0. An ML-based approach does not im-prove the performance of autonomousthings compared to solutions provided bysoftware engineers without experience inIoT development.

Page 5: 1 Software Engineers vs. Machine Learning Algorithms: … Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Nathalia Nascimento,

5

• HA. An ML-based approach improves theperformance of autonomous things com-pared to solutions provided by softwareengineers without experience in IoT de-velopment.

H - RQ4.

• H0. An ML-based approach does not in-crease the reuse of autonomous thingscompared to solutions provided by soft-ware engineers without experience in IoTdevelopment.

• HA. An ML-based approach increases thereuse of autonomous things compared tosolutions provided by software engineerswithout experience in IoT development.

2.3 The object of the study: The Frameworkfor the Internet of Things (FIoT)The Framework for the Internet of Things (FIoT)[101] is a hybrid software framework that allowsthe developer to generate controller structures forautonomous things through learning or procedu-ral algorithms.

If a researcher develops an application usingFIoT, the application will contain a Java softwarecomponent already equipped with modules fordetecting autonomous things in an environment,assigning a controller to a specific thing, creatingsoftware agents, collecting data from devices andsupporting the communication structure amongagents and devices.

Some features are variable and may be select-ed/developed according to the application type,as follows: (i) a control module such as “if-else”,neural network or finite state machine; (2) anadaptive technique to synthesize the controllerat design-time, such as reinforcement learning[124] or genetic algorithm; and (iii) an evaluationprocess to evaluate the behavior of autonomousthings that are making decisions based on thecontroller.

For example, Table 1 summarizes how the“Streetlight Control” application will adhereto the proposed framework using a machinelearning-based approach, while extending theFIoT flexible points.

Table 2 summarizes how the “Streetlight Con-trol” application will adhere to the proposedframework using a solution provided by a soft-ware engineer, while extending the FIoT flexiblepoints.

Our goal is to provide both solutions to thesame application and compare the results basedon the same evaluation process.

TABLE 1Implementing FIoT flexible points to synthesize streetlight

controllers using a ML-based approach.

FIoT Framework Light Control ApplicationController Three Layer Neural Network

Making Evaluation

Collective Fitness Evaluation:the solution is evaluatedbased on the energyconsumption, the number ofpeople that finished theirroutes after thesimulation ends, and thetotal time spent by peopleto move during their trip

Controller Adaptationat design time

Evolutionary Algorithm:Generate a pool ofcandidates to represent theneural network parameters

TABLE 2Implementing FIoT flexible points to synthesize streetlight

controllers using a solution provided by a SoftwareEngineer

FIoT Framework Light Control Application

Controller if-else module provided by asoftware engineer

Making Evaluation

Collective Fitness Evaluation:the solution is evaluatedbased on the energyconsumption, the number ofpeople that finished theirroutes after thesimulation ends, and thetotal time spent by peopleto move during their trip

Controller Adaptationat design time None

3 CONTROLLED EXPERIMENT

The first step of the experiment was to reproducethe experiment presented in [118] by using a notsupervised learning approach. Then, we invited14 software engineers to provide a solution forthe same problem. Finally, we compared the so-lution provided through the learning algorithmagainst solutions provided by the participants.

3.1 Participant AnalysisAs we have described previously, the knowl-edge in the application domain is an importantvariable in our empirical study. Therefore, beforeperforming the controlled experiment, we askedparticipants to describe their experience with thedevelopment of applications based on the In-ternet of Things, that is, developing distributedsystems with embedded characteristics, such asproviding each element of the system with sen-sors and actuators. As shown in Figure 2, 43% of

Page 6: 1 Software Engineers vs. Machine Learning Algorithms: … Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Nathalia Nascimento,

6

High

22%

Medium

14%Low

21%

None

43%

Fig. 2. Experience of participants in developing applica-tions based on the Internet of Things.

participants have never developed an applicationbased on the Internet of Things and 57% havedeveloped at least one application.

3.2 Experiment: Streetlight Application

In short, our experiment involves developingautonomous streetlights. The overall goal of thisapplication is to reduce the energy consump-tion while maintaining appropriate visibility inilluminated areas [118]. For this purpose, weprovided each streetlight with ambient bright-ness and motion sensors, and an actuator tocontrol light intensity. In addition, we also pro-vided streetlights with wireless communicatorsas shown in Figure 3. Therefore, the streetlightsare able to cooperate with each other to establishthe most likely routes for passers-by and thusachieve the goal of minimizing energy consump-tion.

Lighting

Presence

Data collected fromthe closest street light

Light Decision(Dark/DIM/Light)

(Yes/No)

(0.0/0.5/1.0)

(OFF/DIM/ON)

(0.0/0.5/1.0)

Wireless Transmitter

Listening Decision(Yes/No)

Previous ListeningDecision

(Yes/No)

Fig. 3. Variables collected and set by streetlights.

Each streetlight in the simulation has a mi-crocontroller that is used to detect the proximityof a person, and control the closest streetlight. Astreetlight can change the status of its light to ON,OFF or DIM.

Each streetlight has to execute three tasksevery second: data collection, decision-makingand action enforcement. The first task consistsof receiving data related to people flow, ambi-ent brightness, data from the neighboring street-lights and current light status (activation levelof sensors and the previous output value oflisteningDecision). The second task consists ofanalyzing collected data and making decisionsabout actions to be enforced. The last task isthe action enforcement, which consists of settingthe value of three output variables: (i) listen-ingDecision, that enables the streetlight to receivesignals from neighboring streetlights in the nextcycle; (ii) wirelessTransmitter, a signal value tobe transmitted to neighboring streetlights; and(iii) lightDecision, that activates the light’s OF-F/DIM/ON functions.

The interested reader may consult a moreextensive paper about the application scenario[118] 1.

3.2.1 The ChallengeAs we explained to the participants, the tasksof collecting data and enforcing actions havealready been implemented. The challenge wasto provide a solution for the task of makingdecisions, as depicted in Figure 4.

Fig. 4. The challenge: how does a streetlight make deci-sions based on collected data?

We provided pseudocode that considered allpossible combinations of input variables. Then,participants decided how to set output variablesaccording to the collected data. Part 2 of thispseudocode is depicted in Figure 5.

Each participant provided a different solu-tion. Therefore, we conducted the experiment by

1. All documents that we prepared to explain this appli-cation scenario to participants are available athttp://www.inf.puc-rio.br/.nnascimento/projects.html

2. The pseudocode that we provided to participants isavailable at:http://www.inf.puc-rio.br/.nnascimento/projects.html

Page 7: 1 Software Engineers vs. Machine Learning Algorithms: … Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Nathalia Nascimento,

7

Fig. 5. Small portion of the pseudocode of the decision module that was filled by participants.

if (lighting_sensor = Medium AND detected_person = NO AND data_collected = 0.0 AND previous_listening_decision = YES) then {

light_decision = __X_OFF____DIM____ON wireless_transmitter = __X_0.0 ___0.5____1.0 listening_decision = _X__YES ____NO

} if (lighting_sensor = Medium AND detected_person = YES AND data_collected = 0.0 AND previous_listening_decision = YES) then {

light_decision = ___OFF__X__DIM____ON wireless_transmitter = _X__0.0 ___0.5____1.0 listening_decision = __X_YES ____NO

} if (lighting_sensor = Dark AND detected_person = NO AND data_collected = 0.0 AND previous_listening_decision = NO) then { light_decision = __X_OFF____DIM____ON wireless_transmitter = ___0.0 __X_0.5____1.0 listening_decision = ___YES __X__NO}

Fig. 6. Small portion of the rule decisions that was synthesized according to the learning-based approach.

using each one. In addition, we also considereda “zeroed” solution, which always sets all valuesto zero. This zeroed solution is supposed to bethe worst solution, since streetlights will alwaysswitch their lights to OFF.

3.2.2 The solution generated by a machine-learning algorithmWe compared the results from all of these ap-proaches to the result produced using the ma-chine learning approach. As do Nascimento andLucena explain in [118], the learning approachuses a three-layer feedforward neural networkcombined with an evolutionary algorithm to gen-erate decision rules automatically. Figure 6 de-picts some of the rules that were generated by theevolved neural network. The interested readercan consult more extensive papers [101], [118] orread Nascimento’s dissertation [116] (chap. ii, sec.iii).

Based on the generated rules and the systemexecution, we observe that using the solutionprovided by the neural network, only the street-

lights with broken lamps emit “0.5” from theirwireless transmitters.

In addition, we also observed that a streetlightthat is not broken switches its lamp ON if it de-tects a persons proximity or receives “0.5” from awireless transmitter.

3.2.3 Scenario constraints

Before starting a solution, each participantshould consider the following constraints:

• Do not take light numbering into account,since your solution may be used in differ-ent scenarios (see an example of a scenarioin Figure 7).

• Three streetlights will go dark during thesimulation.

• People walk along different paths startingat random departure points. Their role isto complete their routes by reaching a des-tination point. The number of people thatfinished their routes after the simulationends, and the total time spent by people

Page 8: 1 Software Engineers vs. Machine Learning Algorithms: … Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Nathalia Nascimento,

8

moving during their trip are the mostimportant factors for a good solution.

• A person can only move if his current andnext positions are not completely dark.In addition, we also consider that peoplewalk slowly if the place is partially devoidof light.

• The energy consumption also influencesthe solution evaluation.

• The energy consumption is proportionalto the light status (OFF/DIM/ON).

• We also consider the use of the wirelesstransmitter to calculate energy consump-tion (if the streetlight emits somethingdifferent from “0.0”, it consumes 0.1 ofenergy).

Therefore, each solution is evaluated after thesimulation ends based on the energy consump-tion, the number of people that finished theirroutes after the simulation ends, and the totaltime spent by people moving during their trip.

pPeople =(completedPeople× 100)

totalPeople(1)

pEnergy =(totalEnergy × 100)

( 11×(timeSimulation×totalSmartLights)10 )(2)

pTrip =(totalT imeTrip× 100)

(( 3×timeSimulation(2) )× totalPeople)(3)

fitness = (1.0× pPeople)− (0.6× pTrip)−(0.4× pEnergy)

(4)

Equations (1) through (4) show the values tobe calculated for the evaluation in which pPeopleis the percentage of the number of people thatcompleted their routes before the end of thesimulation out of the total number of peoplein the simulation; pEnergy is the percentage ofenergy that was consumed by streetlights out ofthe maximum energy value that could be con-sumed during the simulation. We also consideredthe use of the wireless transmitter to calculateenergy consumption; pTrip is the percentage ofthe total duration time of peoples trips out of themaximum time value that their trip could spend;and fitness is the fitness of each candidate thatencodes the proposed solution.

3.2.4 Example - Simulating the environmentWe showed participants the same simulatedneighborhood scenario that was used by the ge-netic algorithm to evolve the neural network.

Figure 7 depicts the elements that are part of theapplication namely, streetlights, people, nodesand edges.

ONDIMOFF

Broken Lamps

departure pointstarget points

Execution: 12 seconds

- A person moves from one point to another in one second or a second and a half.- Street lights execute cycles of 1 second

Fig. 7. Simulated Neighborhood.

Nascimento and Lucena [118] modeled thescenario as a graph, in which a node representsa streetlight position and an edge representsthe smallest distance between two streetlights.The graph representing the streetlight networkconsists of 18 nodes and 34 edges. Each noderepresents a streetlight. In the graph, the yellow,gray, black and red triangles represent the street-light status (ON/DIM/OFF/Broken Lamp). Eachedge is two-way and links two nodes. In addi-tion, each edge has a light intensity parameterthat is the sum of the environmental light and thebrightness from the streetlights in its nodes. Theirgoal is to simulate different lighting in differentneighborhood areas.

1 second/1 second and a half

Fig. 8. Person moving in the simulated Neighborhood.

As depicted in Figure 8, only one person wasstarted in the scenario that we showed to partic-ipants. For instance, the person starting at point0 has point 11 as a target. We ask participants toprovide a solution to streetlights to assure that

Page 9: 1 Software Engineers vs. Machine Learning Algorithms: … Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Nathalia Nascimento,

9

this person will conclude his route before thesimulation ends after 12 seconds.

3.2.5 New Scenario: Unknown environment

The second step of the experiment consists ofexecuting solutions from participants and thelearning approach in a new scenario, but with thesame constraints. This scenario, that is depictedin Figure 9 was not used by the learning algo-rithm and was not presented to participants.

The goal of this new part of the experimentis to verify if the decision module that was de-signed to control streetlights in the first scenariocan be reused in another scenario.

Fig. 9. Simulating a new neighborhood.

In this new scenario, we also only startedone person, who has the point 18 (yellow point)as departure and the point 8 as target. As thescenario is larger, we established a simulationtime of 30 seconds.

4 EXPERIMENT - PART 1 - RESULTS

We executed the experiment 16 times, onlychanging the decision solution of the au-tonomous streetlights. In the first instance, weset all outputs to zero (the zeroed solution) dur-ing the whole simulation, which is supposed tobe the worst solution. For example, streetlightsnever switch their lights ON. In the second in-stance, we executed the experiment using thebest solution that was found by the learningalgorithm, according to the experiment presentedin [118]. Then, we executed the simulation for

the solution provided by each one of the 14participants 3.

To provide a controlled experiment and beable to compare the different solutions, westarted with only one person in the scenario andmanually we set the parameters that were sup-posed to be randomly selected, such as departureand target points and broken lamps.

Each experiment execution consists of execut-ing the simulated scenario three times: (i) night(environmental light is equal to 0.0); (ii) lateafternoon (environmental light is equal to 0.5);and (iii) morning (environmental light is equalto 1.0). The main idea is to determine how thesolution behaves during different parts of theday. Figure 10 depicts the percentage of energythat was spent according to the environmentallight for each one of the 16 different solutions. Aswe described previously, we also considered theuse of the wireless transmitter to calculate energyconsumption. As expected, as streetlights usingthe zeroed decision never switch their lights ONand never emit any signal, the energy consumedusing this solution is always zero. It is possibleto observe that only the solutions provided bythe learning algorithm and by the 5th and 11thparticipants do not expend energy when the en-vironmental light is maximum. In fact, accordingto the proposed scenario, there is no reason toturn ON streetlights during the period of the daywith maximum illumination.

0102030405060708090

100

.)1

(0

.

.

.)1 ).5 .)1 ).5 % .)1 ).5

Fig. 10. Scenario1: Percentage of energy spent in differentparts of the day according to the participant solutions.

Figure 11 depicts the percentage of time thatwas spent by the unique person in each one ofthe simulations. As shown, the higher differencebetween solutions occurs at night. If the time is

3. All files that were generated during the developmentof this work, such as executable files and participants’solutions results, are available athttp://www.inf.puc-rio.br/.nnascimento/projects.html

Page 10: 1 Software Engineers vs. Machine Learning Algorithms: … Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Nathalia Nascimento,

10

100%, it means that the person did not completethe route, thus the solution did not work

0102030405060708090

100).

)(

1) 0

). ( ). ( % ). (

Fig. 11. Scenario1: Percentage of time spent by personto conclude his route based on different parts of the dayaccording to the participant solutions.

Besides presenting the results of the differentsolutions in different parts of the day, the bestsolution must be the one that presents the bestresult for the whole day. Thus, we calculated theaverage of each one of the parameters (energy,people, trip and fitness) that was achieved bysolutions in different parts of the day. Figure 12depicts a common average. We also calculated aweighted average, taking into account the dura-tion of the parts of the day (we considered 12hours for the night period, 3h for dim and 9h forthe morning), but the results were very similar.

55.48

26.99

62.88 62.49

30.50

51.09 54.37

16.59

28.62

61.60

29.67

47.81

30.32

56.91 59.53

28.33

0102030405060708090

100

%

Fitness

Fig. 12. Scenario1: Average of energy, trip and fitnesscalculated for the different parts of the day according to theparticipant solutions.

As shown in Figure 12, based on the fitnessaverage, three participants namely 3, 4 and 10provided a solution slightly better than the so-lution provided by the learning algorithm. Fiveother participants provided a solution that worksand the remaining six provided a solution thatdoes not work. As explained earlier, we havebeen considering an incorrect solution as onein which the person did not finish the routebefore the simulation ends. Even increasing the

simulation time did not allow the person to finishthe route.

4.1 Discussion: Participants Knowledge inIoT Versus ResultsAfter executing the solution proposed by eachparticipant, we connect that solution’s resultswith the participant’s knowledge in the IoT do-main, as shown in Table 3.

TABLE 3Correlation between participants expertises in the Internet

of Things with their solution results.

SoftwareEngineer

Experiencewith IoT

Development(None/Low/

Medium/High)

SolutionPerformance

(FitnessAverage)

Doesthe

solutionwork?

1 High 55.48 Y2 None 26.99 N3 High 62.88 Y4 Low 62.49 Y5 None 30.50 N6 Low 51.09 Y7 Medium 54.37 Y8 None 16.59 N9 High 28.62 N10 None 61.60 Y11 None 29.67 N12 Medium 47.81 Y13 None 30.32 N14 Low 56.91 Y

Learning 59.53 Yzeroed 28.33 N

We observe a significant difference betweenresults from software engineers with any ex-perience in IoT development and results fromsoftware engineers without experience in IoTdevelopment. Participant 10 is the only individ-ual without knowledge of IoT that provided asolution that works and participant 9 is the onlyindividual with any knowledge of IoT that didnot provide a working solution.

4.2 Hypothesis TestingIn this section, we investigate the hypothesesrelated to the solutions’ performance evaluation(i.e H-RQ1 and H-RQ3), as presented in subsec-tion 2.2. Thus, we performed statistical analyses,as described by Peck and Devore [125], of themeasures presented in Table 3.

As shown in Table 4, we separated the resultsof the experiments into two groups: i) softwareengineers with IoT knowledge and ii) softwareengineers without IoT knowledge. Then, we cal-culated the mean and the standard deviation of

Page 11: 1 Software Engineers vs. Machine Learning Algorithms: … Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Nathalia Nascimento,

11

the results achieved by each group performingthe experiment and compare each result againstthe value achieved using the ML-based solution.

4.2.1 How does the evaluation result from a ma-chine learning-based solution differ from solutionsprovided by IoT expert software engineers withrespect to their performance?H - RQ1.

• H0. An ML-based approach does not im-prove the performance of autonomousthings compared to solutions provided byIoT expert software engineers.

• HA. An ML-based approach improves theperformance of autonomous things com-pared to solutions provided by IoT expertsoftware engineers.

The first null hypothesis H0 claims that thereis no difference between the mean performancefor IoT expert software engineers’ solutions andthe ML-based approach solution. The alternativehypothesis claims that the ML-based approachsolution improves the performance of the appli-cation in comparison to IoT expert software en-gineers’ solutions. Thus, the claim is that the trueIoT expert software engineers’ solutions meanis below the performance achieved by the ML-based approach, that is = 59.53.

Therefore, we used the ML performance asour hypothesizedvalue to test the following one-sided hypothesis:

H0: µse = 59.53H1: µse < 59.53where µse denotes the true mean perfor-

mance for all IoT expert software engineers’ so-lutions.

For instance, we restricted the populationsample to the number of software engineers thatconfirmed having experience with developingapplications for the Internet of Things. As shownin Table 4, the performance mean (x) of the IoTexpert software engineers’ solutions is 52.46 andthe standard deviation (σ) is 10.91. To verifyif the data that we have is sufficient to acceptthe alternative hypothesis, we need to verify theprobability of rejecting the null hypothesis [125].Assuming that the H0 is true, and using a statis-tical significance level [125] of 0.01 (the chance ofone in 100 of making an error), we computed thetest statistic (t− statistic), as follows [125]:

t−statistic : t = (x− hypothesizedvalue)

( σ√n)

(5)

t− statistic : t =(52.46− 59.53)

( 10.91√8)

= −1.83 (6)

According to t-statistic theory, we can safelyreject our null hypothesis if the t − statisticvalue is below the negative t − criticalvalue(threshold) [125]. This negative t− criticalvaluebounds the area of rejection of a T-distribution,as shown in Figure 13. In our experiment, as wespecified a statistical significance level of 0.01,the probability of getting a T-value less or equalthan the negative t− criticalvalue is 1%. We cal-culated the tcriticalvalue of this T-distributionaccording to the T-table presented in Peck andDevore (2011, pg 791) [125]. Accordingly, for adistribution with 7 degrees of freedom (see Table4) and a confidence level of 99%, the negativetcriticalvalue is -3.00. As we depicted in Figure13, the test statistic of our sample is higher thanthe tcriticalvalue.

12/11/17, 6:12 PMnorm.php 850×350 pixels

Page 1 of 1http://www.imathas.com/stattools/norm.php

Fig. 13. Hypothesis H - RQ1 Test Graph.

As the test statistic does not fall in the criticalregion, we cannot safely reject this null hypoth-esis. Based on a t-value of -1.83 and a degree offreedom of 7, we could reject our null hypothesisonly if we had reduced the precision of ourexperiment to 85%. Thus, we would fail to rejectthe null hypothesis and would not accept thealternative hypothesis. Therefore, we cannotstate that an ML-based approach improves theperformance of autonomous things comparedto solutions provided by IoT expert softwareengineers.

4.2.2 How does the evaluation result from a ma-chine learning-based solution differ from solutionsprovided by software engineers without IoT skillswith respect to their performance?

H - RQ3.

• H0. An ML-based approach does not im-prove the performance of autonomousthings compared to solutions provided by

Page 12: 1 Software Engineers vs. Machine Learning Algorithms: … Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Nathalia Nascimento,

12

TABLE 4Data to perform test statistic.

Variable nsamples

Highestvalue

Meanx

MedianStandarddeviation

σ

Degreesof

freedom(n-1)

tcriticalvalue(.99%)

SoftwareEngineers 14 62.88 43.95 49.45 16.00 13 2.65

SoftwareEngineerswith IoT

knowledge

8 62.88 52.46 54.92 10.91 7 3.00

SoftwareEngineers

without IoTknowledge

6 61.60 32.61 30.00 15.15 5 3.37

Machine-learning

basedapproach

1 59.53

software engineers without experience inIoT development.

• HA. An ML-based approach improves theperformance of autonomous things com-pared to solutions provided by softwareengineers without experience in IoT de-velopment.

For instance, we restricted the populationsample to the number of software engineers thatconfirmed not having experience with develop-ing applications for the Internet of Things. Asshown in Table 4, the performance mean (x) ofthe solutions from software engineers withoutexperience in IoT development is 32.61 and thestandard deviation (σ) is 15.15. Thus, the

t− statistic : t =(32.61− 59.53)

( 15.15√6)

= −4.35 (7)

As shown in Table 4, this T-distribution has 5degrees of freedom. Thus, for a confidence levelof 99%, the negative tcriticalvalue is -3.37. Aswe depicted in Figure 14, the test statistic of oursample is below the tcriticalvalue .

12/11/17, 6:14 PMnorm.php 850×350 pixels

Page 1 of 1http://www.imathas.com/stattools/norm.php

Fig. 14. Hypothesis H - RQ3 Test Graph.

As the t − statistic value is below thenegative tcriticalvalue (-4.35 < −3.37), we cansafely reject the null hypothesis, assuring thatthe error chancing of making an error is lowerthan 1%. Therefore, we accepted the alternativehypothesis: An ML-based approach improves theperformance of autonomous things compared tosolutions provided by software engineers with noexperience in IoT development.

5 EXPERIMENT - PART 2 - RESULTS

As explained previously, the second part of theexperiment consists of translating the solutionprovided by machine learning and participantsto an unknown environment. In this second partof the experiment, we also executed the simula-tion 16 times: for each one of the participants’solutions, for the machine-learning solution andfor the zeroed solution.

Table 5 shows the results that were achievedby the different solutions at night in a simulationof 30 seconds. As shown, most of the solutionsdid not work. The person in these simulationsdid not finish the route even when we increasedthe simulation time. Only the solution providedby the machine-learning algorithm and by par-ticipant 12 worked. Remember, this scenario wasnot used by the machine-learning algorithm dur-ing the training process. This solution was pro-vided through machine learning for the first sce-nario and it was just reused in this new scenario.In other words, we did not restart the machine-learning process.

We selected only those solutions that workedand verified their results for the other periods of

Page 13: 1 Software Engineers vs. Machine Learning Algorithms: … Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Nathalia Nascimento,

13

the day (morning and late afternoon). As shownin Table 6, when considering the whole day, themachine-learning approach presented the bestresult. Because the average time for the tripwas a little higher using the machine-learningapproach, the difference in energy consumptionbetween the two solutions is considerably higher.

TABLE 5Using the same solution in a different environment - only

at night.

SoftwareEngineer Energy% People% Trip% Fitness

1 6.50 0 100 -42.602 2.77 0 100 -41.113 6.62 0 100 -42.654 4.30 0 100 -41.725 2.58 0 100 -41.036 6.88 0 100 -42.757 8.33 0 100 -43.338 2.33 0 100 -60.269 3.77 0 100 -41.5110 3.78 0 100 -60.1811 11.36 0 100 -44.5412 50.56 100 42.22 54.4313 2.77 0 100 -41.1114 4.50 0 100 -41.80Learning 24.44 100 61.11 53.55zeroed 0 0 100 -40

TABLE 6Using the same solution in a different environment - day

average.

Energy% People% Trip% FitnessAverageParticipant12

50.52 100 38.14 56.90

AverageLearning 8.46 100 46.29 68.83

5.1 Hypothesis TestingIn this section, we investigate the hypotheses re-lated to the solutions’ reuse evaluation, that is H-RQ2 and H-RQ4, as presented in subsection 2.2.Their alternative hypotheses state that an ML-based approach improves the performance of au-tonomous things compared to solutions providedby software engineers, software engineers withexperience in IoT development, and software en-gineers without experience in IoT development,respectively. We planned to perform a statisti-cal development to evaluate these hypotheses.However, as depicted in Figure 15, in the newscenario, 0% of participants provided a resultbetter than the result provided by the machine-learning solution. In addition, from the group

of 14 engineers, only one participant, who hasexperience with IoT development, provided asolution that worked.

Bad93%

Work7%

Work better than ML0%

Fig. 15. Participants’ solution results in the second sce-nario.

Therefore, we can safely reject the null hy-pothesis and accept both alternative hypotheses:

1) H-RQ2: H1: An ML-based approach in-creases the reuse of autonomous thingscompared to solutions provided by IoTexpert software engineers.

2) H-RQ4: H1: 2) An ML-based approachincreases the reuse of autonomous thingscompared to solutions provided by soft-ware engineers with no experience withIoT development.

6 DISCUSSION

In this section, we analyze the empirical exper-imental results to understand which tasks arebetter performed by humans and which by al-gorithms. This is important for selecting whethersoftware engineers or machine learning can ac-complish a specific task better.

In our empirical study, in which we have as-sessed performance and reuse tasks, we acceptedthree alternative hypotheses and rejected one:

Accepted:

1) An ML-based approach improves theperformance of autonomous things com-pared to solutions provided by softwareengineers without experience with IoTdevelopment.

2) An ML-based approach increases thereuse of autonomous things comparedto solutions provided by IoT expert soft-ware engineers.

3) An ML-based approach increases thereuse of autonomous things comparedto solutions provided by software engi-neers without experience with IoT devel-opment.

Page 14: 1 Software Engineers vs. Machine Learning Algorithms: … Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Nathalia Nascimento,

14

Rejected:1) An ML-based approach improves the

performance of autonomous things com-pared to solutions provided by IoT ex-pert software engineers.

Based on these results, we have found ev-idence that the use of machine-learning tech-niques can perform some SE tasks better thansoftware engineers, considering solutions thatimprove performance and increase reuse. As il-lustrated in the experimental results, only oneof the 14 software engineers provided a solutionthat could be reused in a new scenario. Furthernone of those software engineers provided a so-lution that works better than the ML’s solutionin this new scenario. If the flexibility of the ap-plication is the most important factor, based onour results, we can safely recommend the use ofmachine learning.

However, if we had considered performanceas the only important factor to evaluate the qual-ity of these solutions, we have found evidencethat software engineers can perform SE tasks bet-ter than machine learning, considering “better”as a solution that improves performance. As de-scribed in our experiments, we cannot state thatML improves the performance of an applicationin comparison to solutions provided by domainexpert software engineers. This is also an interest-ing result as many researchers, especially in theIoT domain, have strictly focused on automatingsoftware development.

In brief, our experiment indicates that in somecases, software engineers outperform machine-learning algorithms, whereas in other cases, theydo not. The evidence shows that it is important toknow which one performs better in different sit-uations in order to determine ways for softwareengineers to work cooperatively and effectivelywith automated machine-learning procedures.

7 THREATS TO VALIDITY

Although we have designed and conducted theexperiments carefully, there are always factorsthat can challenge the experiments validity. Somethreats to validity as described in [123] couldindeed limit the legitimacy of our results. In thissection, we present the actions taken to mitigatetheir impact of these factors on the research re-sults.

As Oizumi et al. (2017) report in [126], thenumber of participants in the study can be athreat to validity. In addition, Fernandes et al

(2016) [127] report the diversity of participants asanother possible threat. Therefore, in our study,we needed to be aware of at least two threatsto validity namely: we have selected a sample ofonly 14 participants, which may not be enoughto achieve conclusive results; and our sampleconsisted of only graduate students from twoBrazilian universities. Such a group may not berepresentative of all software engineers, who mayhave substantially more professional experienceand background.

To mitigate the problems of the number ofparticipants and their diversity, we selected ourparticipants carefully. All of them have at leasttwo years of experience with software develop-ment. In addition, we allowed participants tosolve the problem by manipulating a pseudocodeversion, thereby avoiding gaps in the partici-pants’ knowledge, such as experience with a par-ticular programming language or architecture.Note that a survey was used to select participantsand they all indicated a level of experience withpseudocode. The pseudocode provided by eachparticipant was carefully translated into Java asthis is the language supported by the Frameworkfor the Internet of Things.

Oizumi et al. (2017) reported a third threatto validity in [126], namely, possible misunder-standings during the study. To mitigate this prob-lem of misunderstandings, we asked all partici-pants to write about their understanding of theproblem both before and after applying the solu-tion. All participants indicated that they under-stood the task completely. We also asked themabout their confidence in their proposed solu-tion. Most of them evaluated their own solutionwith the highest grade, allowing us to increaseour confidence in the experimental results. Inaddition, we assisted the participants during theentire study, making sure they understood theexperimental task

8 RELATED WORK

Comparing intelligent machines to the ability ofa person to solve a particular problem is nota new approach. This kind of discussion hasbeen promoted since the beginning of ArtificialIntelligence. For example, in 1997, an importantmoment in the history of technology happenedwith Garry Kasparov’s 1997 chess match againstthe IBM supercomputer Deep Blue [128].

Recently, Silver et al. (2016, 2017) [129], [130]published a paper in the Nature Science Journal,comparing the performance of a ML technique

Page 15: 1 Software Engineers vs. Machine Learning Algorithms: … Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Nathalia Nascimento,

15

against the results achieved by the world cham-pion in the game of Go. In [130], Silver et al.(2017) state that their program “achieved super-human performance.”

Whiteson et al. [122] indirectly performed thiscomparison, by evaluating the use of three dif-ferent approaches of the neuroevolution learningalgorithm to solve the same tasks: (i) coevolution,that is mostly unassisted by human knowledge;(ii) layered learning, that is highly assisted; and(iii) concurrent layered learning, that is a mixedapproach. The authors state that their results“demonstrate that the appropriate level of hu-man assistance depends critically on the diffi-culty of the problem.”

Furthermore, there is also a new approachin machine learning, called Automatic MachineLearning (Auto-ML) [100], which uses learningto set the parameters of a learning algorithm au-tomatically. In a traditional approach, a softwareengineer with machine learning skills is respon-sible for finding a good configuration for thealgorithm parameters. Zoth and Lee [100] presentan Auto-ML-based approach to design a neuralnetwork to classify images of a specific dataset.In addition, they compared their results withthe previous state-of-the-art model, which wasdesigned by an ML expert engineer. Accordingto Zoth and Lee [100] , their AutoML-based ap-proach “can design a novel network architecturethat rivals the best human-invented architecturein terms of test set accuracy.” Zoth and Leealso showed that a machine-learning techniqueis capable of beating a software engineer withML skills in a specific software engineering task,but the authors do not discuss this subject in thepaper.

Our paper appears to be the first to pro-vide an empirical study to investigate the useof a machine-learning techniques to solve aproblem in the field of Software Engineering,by comparing the solution provided by a ML-based approach against solutions provided bysoftware engineers.

9 CONCLUSION AND FUTURE WORK

Several researchers have proposed the use ofmachine-learning techniques to automate soft-ware engineering tasks. However, most of theseapproaches do not direct efforts toward askingwhether ML-based procedures have higher suc-cess rates than current standard and manualpractices. A relevant question in this potential

line of investigation is: “Could a software engi-neer solve a specific development task better thanan ML algorithm?”. Indeed, it is fundamental toevaluate which tasks are better performed by en-gineers or ML procedures so that they can worktogether more effectively and also provide moreinsight into novel human-in-the-loop machine-learning approaches to support SE tasks.

This paper appears to be the first to pro-vide an empirical study comparing how soft-ware engineers and machine-learning algorithmsachieve performance and reuse tasks. In brief,as a result of our experiment, we have foundevidence that in some cases, software engineersoutperform machine-learning algorithms, and inother cases, they do not. Further, as is typical inexperimental studies, although we have designedand conducted the experiment carefully, there arealways factors that can threaten the experiment’svalidity. For example, some threats include thenumber and diversity of the software engineersinvolved in our experiment.

Understanding how software engineers fareagainst ML algorithms is essential to supportnew methodologies for developing human-in-the-loop approaches in which machine learningautomated procedures assist software developersin achieving their tasks. For example, method-ologies to define which agent (engineers or auto-mated ML procedure) should execute a specifictask in a software development set. Based onthis understanding, these methodologies can pro-vide a basis for software engineers and machinelearning algorithms to cooperate in Software En-gineering development more effectively.

Future work to extend the proposed experi-ment includes: (i) conducting further empiricalstudies to assess other SE tasks, such as design,maintenance and testing; (ii) experimenting withother machine-learning algorithms such as re-inforcement learning and backpropagation; and(iii) using different criteria to evaluate task exe-cution.

Possible tasks that could be investigated (referto (i)) include programming tasks, in which casetasks performed by software development teamsand ML algorithms are compared. For example,we could invite software developers from theteam with the highest score in the last ACM Inter-national Collegiate Programming Contest [131],which is one of the most important programmingchampionships in the world, to be involved inthis comparison. This competition evaluates thecapability of software engineers to solve complex

Page 16: 1 Software Engineers vs. Machine Learning Algorithms: … Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Nathalia Nascimento,

16

software problems. Software engineers are classi-fied according to the number of right solutions,performance of the solutions and developmenttime.

Another line of investigation could addressthe use of different qualitative or quantitativemethodologies. For example, the task execu-tion comparison could rely on reference per-formances, such as the performance of highlysuccessful performers [100], [129], [130] . Thisresearch work can also be extended by propos-ing, based on the comparison between the per-formance of engineers and ML algorithms, amethodology for more effective task allocation.This methodology could, in principle, lead tomore effective ways to allocate tasks such as soft-ware development in cooperative work involv-ing humans and automated procedures. Suchhuman-in-the-loop approaches, which take intoaccount the strengths and weaknesses of humansand machine learning algorithms, are fundamen-tal to provide a basis for cooperative work insoftware engineering and possibly in other areas.

ACKNOWLEDGMENTS

This work has been supported by the Labora-tory of Software Engineering (LES) at PUC-Rio.Our thanks to CAPES, CNPq, FAPERJ and PUC-Rio for their support through scholarships andfellowships. We would also like to thank thesoftware engineers who participated in our ex-periment.

REFERENCES

[1] F. Brooks and H. Kugler, No silver bullet.April, 1987.

[2] R. S. Pressman, Software engineering: a prac-titioner’s approach. Palgrave Macmillan,2005.

[3] Q. Zhang, “Software developments,” Engi-neering Automation for Reliable Software, p.292, 2000.

[4] J. O. Kephart, “Research challenges of auto-nomic computing,” in Software Engineering,2005. ICSE 2005. Proceedings. 27th Interna-tional Conference on. IEEE, 2005, pp. 15–22.

[5] J. Mostow, “Foreword what is ai? and whatdoes it have to do with software engineer-ing?” IEEE Transactions on Software Engi-neering, vol. 11, no. 11, p. 1253, 1985.

[6] D. Barstow, “Artificial intelligence and soft-ware engineering,” in Proceedings of the 9th

international conference on Software Engineer-ing. IEEE Computer Society Press, 1987,pp. 200–211.

[7] D. Partridge, “Artificial intelligence andsoftware engineering: a survey of possibil-ities,” Information and Software Technology,vol. 30, no. 3, pp. 146–152, 1988.

[8] L. C. Cheung, S. Ip, and T. Holden, “Surveyof artificial intelligence impacts on infor-mation systems engineering,” Informationand Software Technology, vol. 33, no. 7, pp.499–508, 1991.

[9] D. Partridge, Artificial Intelligence in Soft-ware Engineering. Wiley Online Library,1998.

[10] A. Van Lamsweerde and L. Willemet, “In-ferring declarative requirements specifica-tions from operational scenarios,” IEEETransactions on Software Engineering, vol. 24,no. 12, pp. 1089–1114, 1998.

[11] G. D. Boetticher, “Using machine learn-ing to predict project effort: Empirical casestudies in data-starved domains,” in ModelBased Requirements Workshop. Citeseer,2001, pp. 17–24.

[12] F. Padberg, T. Ragg, and R. Schoknecht,“Using machine learning for estimating thedefect content after an inspection,” IEEETransactions on Software Engineering, vol. 30,no. 1, pp. 17–28, 2004.

[13] D. Zhang, “Applying machine learning al-gorithms in software development,” in TheProceedings of 2000 Monterey Workshop onModeling Software System Structures, 2000,pp. 275–285.

[14] ——, “Machine learning in value-basedsoftware test data generation,” in Tools withArtificial Intelligence, 2006. ICTAI’06. 18thIEEE International Conference on. IEEE,2006, pp. 732–736.

[15] D. Zhang and J. J. Tsai, Machine learningapplications in software engineering. WorldScientific, 2005, vol. 16.

[16] D. Zhang, “Machine learning and value-based software engineering: a researchagenda.” in SEKE, 2008, pp. 285–290.

[17] T. M. Khoshgoftaar, “Introduction to thespecial issue on quality engineering withcomputational intelligence,” 2003.

[18] D. Zhang, “Machine learning and value-based software engineering,” in SoftwareApplications: Concepts, Methodologies, Tools,and Applications. IGI Global, 2009, pp.3325–3339.

Page 17: 1 Software Engineers vs. Machine Learning Algorithms: … Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Nathalia Nascimento,

17

[19] D. Zhang and J. J. Tsai, “Machine learningand software engineering,” in Tools withArtificial Intelligence, 2002.(ICTAI 2002). Pro-ceedings. 14th IEEE International Conferenceon. IEEE, 2002, pp. 22–29.

[20] M. D. Kramer and D. Zhang, “Gaps: agenetic programming system,” in ComputerSoftware and Applications Conference, 2000.COMPSAC 2000. The 24th Annual Interna-tional. IEEE, 2000, pp. 614–619.

[21] A. Holzinger, M. Plass, K. Holzinger, G. C.Crisan, C.-M. Pintea, and V. Palade, “To-wards interactive machine learning (iml):applying ant colony algorithms to solvethe traveling salesman problem with thehuman-in-the-loop approach,” in Interna-tional Conference on Availability, Reliability,and Security. Springer, 2016, pp. 81–95.

[22] A. Holzinger, “Interactive machine learn-ing for health informatics: when do weneed the human-in-the-loop?” Brain Infor-matics, vol. 3, no. 2, pp. 119–131, 2016.

[23] S. Easterbrook, J. Singer, M.-A. Storey, andD. Damian, “Selecting empirical methodsfor software engineering research,” Guide toadvanced empirical software engineering, pp.285–311, 2008.

[24] H. A. Simon, “Whether software engineer-ing needs to be artificially intelligent,” IEEETransactions on Software Engineering, no. 7,pp. 726–732, 1986.

[25] I. Sommerville, “Artificial intelligence andsystems engineering,” Prospects for Arti-ficial Intelligence: Proceedings of AISB’93,29 March-2 April 1993, Birmingham, UK,vol. 17, p. 48, 1993.

[26] R. S. Michalski, J. G. Carbonell, and T. M.Mitchell, Machine learning: An artificial in-telligence approach. Springer Science &Business Media, 2013.

[27] A. Marchetto and A. Trentini, “Evaluatingweb applications testability by combiningmetrics and analogies,” in Information andCommunications Technology, 2005. EnablingTechnologies for the New Knowledge Society:ITI 3rd International Conference on. IEEE,2005, pp. 751–779.

[28] S. Bouktif, F. Ahmed, I. Khalil, and G. Anto-niol, “A novel composite model approachto improve ality prediction,” Informationand Software Technology, vol. 52, no. 12, pp.1298–1311, 2010.

[29] L. Radlinski, “A survey of bayesian netmodels for software development effort

prediction,” International Journal of SoftwareEngineering and Computing, vol. 2, no. 2, pp.95–109, 2010.

[30] W. Zhang, Y. Yang, and Q. Wang, “Han-dling missing data in software effort pre-diction with naive bayes and em algo-rithm,” in Proceedings of the 7th InternationalConference on Predictive Models in SoftwareEngineering. ACM, 2011, p. 4.

[31] Ł. Radlinski, “A framework for inte-grated software quality prediction usingbayesian nets,” Computational Science andIts Applications-ICCSA 2011, pp. 310–325,2011.

[32] P. O. O. Sack, M. Bouneffa, Y. Maweed,and H. Basson, “On building an integratedand generic platform for software qualityevaluation,” in Information and Communica-tion Technologies, 2006. ICTTA’06. 2nd, vol. 2.IEEE, 2006, pp. 2872–2877.

[33] M. Reformat and D. Zhang, “Introduc-tion to the special issue on:“softwarequality improvements and estimationswith intelligence-based methods”,” Soft-ware Quality Journal, vol. 15, no. 3, pp. 237–240, 2007.

[34] B. Twala, M. Cartwright, and M. Shepperd,“Applying rule induction in software pre-diction,” in Advances in Machine LearningApplications in Software Engineering. IGIGlobal, 2007, pp. 265–286.

[35] V. U. Challagulla, F. B. Bastani, and I.-L.Yen, “High-confidence compositional relia-bility assessment of soa-based systems us-ing machine learning techniques,” in Ma-chine Learning in Cyber Trust. Springer,2009, pp. 279–322.

[36] R. C. Veras, S. R. Meira, A. L. Oliveira,and B. J. Melo, “Comparative study of clus-tering techniques for the organization ofsoftware repositories,” in Hybrid IntelligentSystems, 2007. HIS 2007. 7th InternationalConference on. IEEE, 2007, pp. 372–377.

[37] I. Birzniece and M. Kirikova, “Interactiveinductive learning service for indirect anal-ysis of study subject compatibility,” in Pro-ceedings of the BeneLearn, 2010, pp. 1–6.

[38] D. B. Hanchate, “Analysis, mathemati-cal modeling and algorithm for softwareproject scheduling using bcga,” in In-telligent Computing and Intelligent Systems(ICIS), 2010 IEEE International Conference on,vol. 3. IEEE, 2010, pp. 1–7.

[39] Z. Xu and B. Song, “A machine learning ap-

Page 18: 1 Software Engineers vs. Machine Learning Algorithms: … Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Nathalia Nascimento,

18

plication for human resource data miningproblem,” Advances in Knowledge Discoveryand Data Mining, pp. 847–856, 2006.

[40] J. Wen, S. Li, Z. Lin, Y. Hu, and C. Huang,“Systematic literature review of machinelearning based software development ef-fort estimation models,” Information andSoftware Technology, vol. 54, no. 1, pp. 41–59, 2012.

[41] E. Rashid, S. Patnayak, and V. Bhattacher-jee, “A survey in the area of machine learn-ing and its application for software qualityprediction,” ACM SIGSOFT Software Engi-neering Notes, vol. 37, no. 5, pp. 1–7, 2012.

[42] H. A. Al-Jamimi and M. Ahmed, “Machinelearning-based software quality predictionmodels: state of the art,” in InformationScience and Applications (ICISA), 2013 Inter-national Conference on. IEEE, 2013, pp. 1–4.

[43] Ł. Radlinski, “Enhancing bayesian networkmodel for integrated software quality pre-diction,” in Proc. Fourth International Confer-ence on Information, Process, and KnowledgeManagement, Valencia. Citeseer, 2012, pp.144–149.

[44] F. Pinel, P. Bouvry, B. Dorronsoro, and S. U.Khan, “Savant: Automatic parallelizationof a scheduling heuristic with machinelearning,” in Nature and Biologically InspiredComputing (NaBIC), 2013 World Congress on.IEEE, 2013, pp. 52–57.

[45] D. Novitasari, I. Cholissodin, and W. F.Mahmudy, “Optimizing svr using localbest pso for software effort estimation,”Journal of Information Technology and Com-puter Science, vol. 1, no. 1, 2016.

[46] Ł. Radlinski, “Towards expert-based mod-elling of integrated software quality,” Jour-nal of Theoretical and Applied Computer Sci-ence, vol. 6, no. 2, pp. 13–26, 2012.

[47] T. Rongfa, “Defect classification methodfor software management quality controlbased on decision tree learning,” in Ad-vanced Technology in Teaching-Proceedingsof the 2009 3rd International Conference onTeaching and Computational Science (WTCS2009). Springer, 2012, pp. 721–728.

[48] R. Rana and M. Staron, “Machine learn-ing approach for quality assessment andprediction in large software organizations,”in Software Engineering and Service Science(ICSESS), 2015 6th IEEE International Con-ference on. IEEE, 2015, pp. 1098–1101.

[49] H. Wang, M. Kessentini, W. Grosky, and

H. Meddeb, “On the use of time seriesand search based software engineering forrefactoring recommendation,” in Proceed-ings of the 7th International Conference onManagement of computational and collectiveintElligence in Digital EcoSystems. ACM,2015, pp. 35–42.

[50] V. U. Challagulla, F. B. Bastani, I.-L.Yen, and R. A. Paul, “Empirical assess-ment of machine learning based soft-ware defect prediction techniques,” inObject-Oriented Real-Time Dependable Sys-tems, 2005. WORDS 2005. 10th IEEE Inter-national Workshop on. IEEE, 2005, pp. 263–270.

[51] K. Kaminsky and G. Boetticher, “Build-ing a genetically engineerable evolvableprogram (geep) using breadth-based ex-plicit knowledge for predicting softwaredefects,” in Fuzzy Information, 2004. Process-ing NAFIPS’04. IEEE Annual Meeting of the,vol. 1. IEEE, 2004, pp. 10–15.

[52] ——, “How to predict more with less, de-fect prediction using machine learners inan implicitly data starved domain,” in The8th world multiconference on systemics, cyber-netics and informatics, Orlando, FL. Citeseer,2004.

[53] K. Kaminsky and G. D. Boetticher, “Bettersoftware defect prediction using equalizedlearning with machine learners,” KnowledgeSharing and Collaborative Engineering, 2004.

[54] O. Kutlubay and A. Bener, “A machinelearning based model for software defectprediction,” working paer, Boazici University,Computer Engineering Department, 2005.

[55] X. Ren, “Learn to predict “affectingchanges” in software engineering,” 2003.

[56] E. Ceylan, F. O. Kutlubay, and A. B. Bener,“Software defect identification using ma-chine learning techniques,” in Software En-gineering and Advanced Applications, 2006.SEAA’06. 32nd EUROMICRO Conference on.IEEE, 2006, pp. 240–247.

[57] Y. Kastro and A. B. Bener, “A defect pre-diction method for software versioning,”Software Quality Journal, vol. 16, no. 4, pp.543–562, 2008.

[58] O. Kutlubay, B. Turhan, and A. B. Bener, “Atwo-step model for defect density estima-tion,” in Software Engineering and AdvancedApplications, 2007. 33rd EUROMICRO Con-ference on. IEEE, 2007, pp. 322–332.

[59] A. S. Namin and M. Sridharan, “Bayesian

Page 19: 1 Software Engineers vs. Machine Learning Algorithms: … Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Nathalia Nascimento,

19

reasoning for software testing,” in Proceed-ings of the FSE/SDP workshop on Future ofsoftware engineering research. ACM, 2010,pp. 349–354.

[60] C. Murphy and G. Kaiser, “Metamor-phic runtime checking of non-testable pro-grams,” Columbia University Dept of Com-puter Science Tech Report cucs-042-09, p.9293, 2009.

[61] W. Afzal, R. Torkar, R. Feldt, andT. Gorschek, “Genetic programming forcross-release fault count predictions inlarge and complex software projects,” Evo-lutionary Computation and Optimization Al-gorithms in Software Engineering, pp. 94–126,2010.

[62] C. Murphy et al., “Using metamorphic test-ing at runtime to detect defects in applica-tions without test oracles,” 2008.

[63] D. Qiu, S. Fang, and Y. Li, “A framework todiscover potential deviation between pro-gram and requirement through mining ob-ject graph,” in Computer Application and Sys-tem Modeling (ICCASM), 2010 InternationalConference on, vol. 4. IEEE, 2010, pp. V4–110.

[64] C. Murphy, G. E. Kaiser et al., “Automaticdetection of defects in applications withouttest oracles,” Dept. Comput. Sci., ColumbiaUniv., New York, NY, USA, Tech. Rep. CUCS-027-10, 2010.

[65] W. Afzal, “Search-based approaches to soft-ware fault prediction and software test-ing,” Ph.D. dissertation, Blekinge Instituteof Technology, 2009.

[66] M. K. Taghi, B. Cukic, and N. Seliya, “Anempirical assessment on program module-order models,” Quality Technology & Quan-titative Management, vol. 4, no. 2, pp. 171–190, 2007.

[67] J. H. Wang, N. Bouguila, and T. Bdiri, “Em-pirical evaluation of selected algorithms forcomplexity-based classification of softwaremodules and a new model,” in IntelligentSystems: From Theory to Practice. Springer,2010, pp. 99–131.

[68] H. Jin, Y. Wang, N.-W. Chen, Z.-J. Gou,and S. Wang, “Artificial neural network forautomatic test oracles generation,” in Com-puter Science and Software Engineering, 2008International Conference on, vol. 2. IEEE,2008, pp. 727–730.

[69] J. Ferzund, S. N. Ahsan, and F. Wotawa,“Automated classification of faults in pro-

gramms using machine learning tech-niques,” in Artificial Intelligence Techniquesin Software Engineering Workshop, 2008.

[70] O. Maqbool and H. Babri, “Bayesian learn-ing for software architecture recovery,” inElectrical Engineering, 2007. ICEE’07. Inter-national Conference on. IEEE, 2007, pp. 1–6.

[71] A. Okutan, “Software defect prediction us-ing bayesian networks and kernel meth-ods,” Ph.D. dissertation, ISIK UNIVER-SITY, 2012.

[72] D. Cotroneo, R. Pietrantuono, and S. Russo,“A learning-based method for combiningtesting techniques,” in Proceedings of the2013 International Conference on Software En-gineering. IEEE Press, 2013, pp. 142–151.

[73] D. Zhang, “A value-based framework forsoftware evolutionary testing,” in Advancesin Abstract Intelligence and Soft Computing.IGI Global, 2013, pp. 355–373.

[74] A. Okutan and O. T. Yıldız, “Software de-fect prediction using bayesian networks,”Empirical Software Engineering, vol. 19, no. 1,pp. 154–181, 2014.

[75] S. Agarwal and D. Tomar, “A feature se-lection based model for software defectprediction,” assessment, vol. 65, 2014.

[76] G. Abaei and A. Selamat, “Important is-sues in software fault prediction: A roadmap,” in Handbook of Research on EmergingAdvancements and Technologies in SoftwareEngineering. IGI Global, 2014, pp. 510–539.

[77] A. Okutan and O. T. Yildiz, “A novel kernelto predict software defectiveness,” Journalof Systems and Software, vol. 119, pp. 109–121, 2016.

[78] X.-d. Mu, R.-h. Chang, and L. Zhang,“Software defect prediction based on com-petitive organization coevolutionary algo-rithm,” Journal of Convergence InformationTechnology (JCIT) Volume7, Number5, 2012.

[79] J. Cahill, J. M. Hogan, and R. Thomas,“Predicting fault-prone software moduleswith rank sum classification,” in SoftwareEngineering Conference (ASWEC), 2013 22ndAustralian. IEEE, 2013, pp. 211–219.

[80] R. Rana, M. Staron, C. Berger, J. Hans-son, M. Nilsson, and W. Meding, “Theadoption of machine learning techniquesfor software defect prediction: An initialindustrial validation,” in Joint Conferenceon Knowledge-Based Software Engineering.Springer, 2014, pp. 270–285.

[81] T. Schulz, Ł. Radlinski, T. Gorges, and

Page 20: 1 Software Engineers vs. Machine Learning Algorithms: … Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Nathalia Nascimento,

20

W. Rosenstiel, “Predicting the flow of de-fect correction effort using a bayesian net-work model,” Empirical Software Engineer-ing, vol. 18, no. 3, pp. 435–477, 2013.

[82] E. Rashid, “R4 model for case-based rea-soning and its application for softwarefault prediction,” International Journal ofSoftware Science and Computational Intelli-gence (IJSSCI), vol. 8, no. 3, pp. 19–38, 2016.

[83] ——, “Improvisation of case-based reason-ing and its application for software faultprediction,” International Journal of ServicesTechnology and Management, vol. 21, no. 4-6,pp. 214–227, 2015.

[84] J. K. Chhabra and A. Parashar, “Predic-tion of changeability for object orientedclasses and packages by mining changehistory,” in Electrical and Computer Engi-neering (CCECE), 2014 IEEE 27th CanadianConference on. IEEE, 2014, pp. 1–6.

[85] G. Spanoudakis, A. S. d. Garcez, andA. Zisman, “Revising rules to capture re-quirements traceability relations: A ma-chine learning approach.” in SEKE, 2003,pp. 570–577.

[86] M. Shin and A. Goel, “Modeling soft-ware component criticality using a ma-chine learning approach,” Artificial Intelli-gence and Simulation, pp. 440–448, 2005.

[87] J. S. Shirabad, “Predictive techniques insoftware engineering,” in Encyclopedia ofMachine Learning. Springer, 2011, pp. 782–789.

[88] A. A. Araujo, M. Paixao, I. Yeltsin, A. Dan-tas, and J. Souza, “An architecture basedon interactive optimization and machinelearning applied to the next release prob-lem,” Automated Software Engineering, pp.1–49, 2016.

[89] T. Tourwe, J. Brichau, A. Kellens, andK. Gybels, “Induced intentional softwareviews,” Computer Languages, Systems &Structures, vol. 30, no. 1, pp. 35–47, 2004.

[90] J. S. Di Stefano and T. Menzies, “Machinelearning for software engineering: Casestudies in software reuse,” in Tools withArtificial Intelligence, 2002.(ICTAI 2002). Pro-ceedings. 14th IEEE International Conferenceon. IEEE, 2002, pp. 246–251.

[91] J. Fu, F. B. Bastani, and I.-L. Yen, “Auto-mated ai planning and code pattern basedcode synthesis,” in Tools with Artificial In-telligence, 2006. ICTAI’06. 18th IEEE Interna-tional Conference on. IEEE, 2006, pp. 540–

546.[92] J. Fu, F. B. Bastani, I.-L. YEN et al.,

“Semantic-driven component-based auto-mated code synthesis,” Semantic Comput-ing, pp. 249–283, 2010.

[93] A. Katasonov, O. Kaykova, O. Khriyenko,S. Nikitin, and V. Y. Terziyan, “Smartsemantic middleware for the internet ofthings.” ICINCO-ICSO, vol. 8, pp. 169–178,2008.

[94] L. Baresi, S. Guinea, and A. Shahzada,“Short paper: Harmonizing heterogeneouscomponents in sesame,” in Internet ofThings (WF-IoT), 2014 IEEE World Forum on.IEEE, 2014, pp. 197–198.

[95] L. Zhu, H. Cai, and L. Jiang, “Minson: Abusiness process self-adaptive frameworkfor smart office based on multi-agent,” ine-Business Engineering (ICEBE), 2014 IEEE11th International Conference on. IEEE, 2014,pp. 31–37.

[96] J. F. De Paz, J. Bajo, S. Rodrıguez, G. Vil-larrubia, and J. M. Corchado, “Intelligentsystem for lighting control in smart cities,”Information Sciences, vol. 372, pp. 241–255,2016.

[97] I. Birzniece, “The use of inductive learningin information systems,” in Proceedings ofthe 16th International Conference on Informa-tion and Software Technologies, 2010, pp. 95–101.

[98] D. Alrajeh, A. Russo, and S. Uchitel, “Infer-ring operational requirements from scenar-ios and goal models using inductive learn-ing,” in Proceedings of the 2006 internationalworkshop on Scenarios and state machines:models, algorithms, and tools. ACM, 2006,pp. 29–36.

[99] A. M. Sharifloo, A. Metzger, C. Quinton,L. Baresi, and K. Pohl, “Learning and evo-lution in dynamic software product lines,”in Proceedings of the 11th International Sym-posium on Software Engineering for Adaptiveand Self-Managing Systems. ACM, 2016, pp.158–164.

[100] B. Zoph and Q. V. Le, “Neural architecturesearch with reinforcement learning,” arXivpreprint arXiv:1611.01578, 2016.

[101] N. M. do Nascimento and C. J. P. de Lu-cena, “Fiot: An agent-based framework forself-adaptive and self-organizing applica-tions based on the internet of things,” Infor-mation Sciences, vol. 378, pp. 161–176, 2017.

[102] F. Jacob and R. Tairas, “Code template in-

Page 21: 1 Software Engineers vs. Machine Learning Algorithms: … Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Nathalia Nascimento,

21

ference using language models,” in Proceed-ings of the 48th Annual Southeast RegionalConference. ACM, 2010, p. 104.

[103] B. Amal, M. Kessentini, S. Bechikh, J. Dea,and L. B. Said, “On the use of machinelearning and search-based software engi-neering for ill-defined fitness function: acase study on software refactoring,” in In-ternational Symposium on Search Based Soft-ware Engineering. Springer, 2014, pp. 31–45.

[104] Y. Peng, G. Wang, and H. Wang, “Userpreferences based software defect detectionalgorithms selection using mcdm,” Informa-tion Sciences, vol. 191, pp. 3–13, 2012.

[105] H. A. Abbass, E. Petraki, K. Merrick, J. Har-vey, and M. Barlow, “Trusted autonomyand cognitive cyber symbiosis: Open chal-lenges,” Cognitive computation, vol. 8, no. 3,pp. 385–408, 2016.

[106] W. G. Baxt, “Use of an artificial neuralnetwork for the diagnosis of myocardialinfarction,” Annals of internal medicine, vol.115, no. 11, pp. 843–848, 1991.

[107] M. A. Mazurowski, P. A. Habas, J. M.Zurada, J. Y. Lo, J. A. Baker, and G. D.Tourassi, “Training neural network clas-sifiers for medical decision making: Theeffects of imbalanced datasets on classifica-tion performance,” Neural networks, vol. 21,no. 2, pp. 427–436, 2008.

[108] N. M. do Nascimento, M. L. Viana, andC. J. P. de Lucena, “An iot-based tool forhuman gas monitoring,” in IXV CongressoBrasileiro de Informatica em Saude (CBIS),vol. 1. SBIS, 2016, pp. 96–98.

[109] R. Morejon, M. Viana, and C. Lucena,“Generating software agents for data min-ing: An example for the health data area,”in International Conference on Software En-gineering & Knowledge Engineering-SEKE,2017.

[110] S. Haykin, Neural Networks: A Com-prehensive Foundation. Macmillan, 1994.[Online]. Available: http://books.google.com.br/books?id=PSAPAQAAMAAJ

[111] D. Castelvecchi, “Can we open the blackbox of ai?” Nature News, vol. 538, no. 7623,p. 20, 2016.

[112] L. Atzori, A. Iera, and G. Morabito, “Theinternet of things: A survey,” Computer net-works, vol. 54, no. 15, pp. 2787–2805, 2010.

[113] M. A. Salahuddin, A. Al-Fuqaha,M. Guizani, K. Shuaib, and F. Sallabi,

“Softwarization of internet of thingsinfrastructure for secure and smarthealthcare,” Computer, vol. 50, no. 7, pp.74–79, 2017.

[114] I. Ayala, M. Amor, L. Fuentes, and J. M.Troya, “A software product line process todevelop agents for the iot,” Sensors, vol. 15,no. 7, pp. 15 640–15 660, 2015.

[115] J.-P. Briot, N. M. de Nascimento, and C. J. P.de Lucena, “A multi-agent architecture forquantified fruits: Design and experience,”in 28th International Conference on Soft-ware Engineering & Knowledge Engineering(SEKE’2016). SEKE/Knowledge SystemsInstitute, PA, USA, 2016, pp. 369–374.

[116] N. M. Nascimento, “FIoT: An agent-based framework for self-adaptive andself-organizing internet of things applica-tions,” Master’s thesis, PUC-Rio, Rio deJaneiro, Brazil, August 2015.

[117] N. M. d. Nascimento, C. J. P. d. Lucena,and H. Fuks, “Modeling quantified thingsusing a multi-agent system,” in IEEE / WIC/ ACM International Conference on Web Intel-ligence and Intelligent Agent Technology (WI-IAT), vol. 1. IEEE, 2015, pp. 26–32.

[118] N. M. NASCIMENTO and C. J. P. LU-CENA, “Engineering cooperative smartthings based on embodied cognition,” inNASA/ESA Conference on Adaptive Hardwareand Systems (AHS 2017). IEEE, 2017.

[119] Apple, “Homekit,”https://developer.apple.com/homekit/,March 2017.

[120] Samsung, “Samsung smart things,”https://www.smartthings.com, March2017.

[121] D. I. Sjøberg, T. Dyba, B. C. Anda, andJ. E. Hannay, “Building theories in softwareengineering,” Guide to advanced empiricalsoftware engineering, pp. 312–336, 2008.

[122] S. Whiteson, N. Kohl, R. Miikkulainen, andP. Stone, “Evolving soccer keepaway play-ers through task decomposition,” MachineLearning, vol. 59, no. 1-2, pp. 5–30, 2005.

[123] C. Wohlin, P. Runeson, M. Host, M. C.Ohlsson, B. Regnell, and A. Wesslen, Exper-imentation in software engineering. SpringerScience & Business Media, 2012.

[124] R. S. Sutton and A. G. Barto, Reinforcementlearning: An introduction. MIT press Cam-bridge, 1998, vol. 1, no. 1.

[125] R. Peck and J. Devore, Statistics: The Explo-ration & Analysis of Data. Nelson Educa-

Page 22: 1 Software Engineers vs. Machine Learning Algorithms: … Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Nathalia Nascimento,

22

tion, 2011.[126] W. Oizumi, L. Sousa, A. Garcia, R. Oliveira,

A. Oliveira, O. Agbachi, and C. Lucena,“Revealing design problems in stinky code:a mixed-method study,” in Proceedings ofthe 11th Brazilian Symposium on SoftwareComponents, Architectures, and Reuse. ACM,2017, p. 5.

[127] E. Fernandes, F. Ferreira, J. A. Netto, andE. Figueiredo, “Information systems devel-opment with pair programming: An aca-demic quasi-experiment,” in Proceedings ofthe XII Brazilian Symposium on InformationSystems on Brazilian Symposium on Infor-mation Systems: Information Systems in theCloud Computing Era-Volume 1. BrazilianComputer Society, 2016, p. 64.

[128] G. Kasparov, Deep Thinking: Where MachineIntelligence Ends and Human Creativity Be-gins. Hachette UK, 2017.

[129] D. Silver, A. Huang, C. J. Maddison,A. Guez, L. Sifre, G. Van Den Driessche,J. Schrittwieser, I. Antonoglou, V. Panneer-shelvam, M. Lanctot et al., “Mastering thegame of go with deep neural networks andtree search,” Nature, vol. 529, no. 7587, pp.484–489, 2016.

[130] D. Silver, J. Schrittwieser, K. Simonyan,I. Antonoglou, A. Huang, A. Guez, T. Hu-bert, L. Baker, M. Lai, A. Bolton et al.,“Mastering the game of go without humanknowledge,” Nature, vol. 550, no. 7676, pp.354–359, 2017.

[131] A. Trotman and C. Handley, “Program-ming contest strategy,” Computers & Edu-cation, vol. 50, no. 3, pp. 821–837, 2008.