inferring work task automatability from ai expert evidencemosb/public/pdf/5172/duckworth et...

7
Inferring Work Task Automatability from AI Expert Evidence Paul Duckworth University of Oxford Oxford, United Kingdom [email protected] Logan Graham University of Oxford Oxford, United Kingdom [email protected] Michael Osborne University of Oxford Oxford, United Kingdom [email protected] ABSTRACT Despite growing alarm about machine learning technologies automating jobs, there is little good evidence on what activi- ties can be automated using such technologies. We contribute the first dataset of its kind by surveying over 150 top aca- demics and industry experts in machine learning, robotics and AI, receiving over 4, 500 ratings of how automatable specific tasks are today. We present a probabilistic machine learning model to learn the patterns connecting expert esti- mates of task automatability and the skills, knowledge and abilities required to perform those tasks. Our model infers the automatability of over 2,000 work activities, and we show how automation differs across types of activities and types of occupations. Sensitivity analysis identifies the specific skills, knowledge and abilities of activities that drive higher or lower automatability. We provide quantitative evidence of what is perceived to be automatable using the state-of-the-art in ma- chine learning technology. We consider the societal impacts of these results and of task-level approaches. CCS CONCEPTS Applied computing Economics ; Computing methodologies Machine learning approaches ; Gaussian processes ; Social and professional topics Govern- ment technology policy . KEYWORDS automation, labor economics, Bayesian machine learning, interpretable machine learning, open datasets ACM Reference Format: Paul Duckworth, Logan Graham, and Michael Osborne. 2019. Inferring Work Task Automatability from AI Expert Evidence. In AAAI/ACM Conference on AI, Ethics, and Society (AIES ’19), January 27–28, 2019, Honolulu, HI, USA. ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/3306618.3314247 Both authors contributed equally to this research. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). AIES ’19, January 27–28, 2019, Honolulu, HI, USA 2019 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-6324-2/19/01. https://doi.org/10.1145/3306618.3314247 INTRODUCTION Machine learning (ML), in combination with complementary technologies such as robotics and software-based standardiza- tion, have rapidly become real substitutes and complements to human labor. This work aims to better understand that automation, and its effects on work. One example is Amazon Go, a recently opened grocery store that uses computer vision to replace cashiers, of which over 3.5 million are employed in the United States [1, 11]. Further, the 500 000 designers in the US are beginning to use constraint-based generative design to automate creative designs of buildings, industrial components, and more [1, 4]. As a result, we as researchers in these fields often confront examples of media and public concern about technologies we develop. What remains uncer- tain is the magnitude and direction of impact on employment of machine learning technologies. While recent advances in technology seem able to automate intelligent work, we lack good data on the scope of such automation. We collected a detailed task -based survey of 150+ machine learning, robotics, and automation researchers. This is the first dataset of its kind with over 4,500 datapoints about what specific tasks are automatable according to current technology. In this “nowcasting” exercise, technologists pro- vide knowledge of the extent to which a task can or cannot be automated with technology that exists today. We use a probabilistic model to infer the automatability of thousands of activities for which it would be prohibitively difficult to collect reliable data. By collecting task-specific data to model automatability, we believe we can develop richer, more ac- curate frameworks about what can be automated by the current state-of-the-art in intelligent technology. We believe this allows society to better understand and prepare for au- tomation. The contributions of this paper are a novel dataset of the automatability of workplace activities; a probabilistic method for inferring automatability of unmeasured activities; activity- level analysis of automatability; and consequent patterns across worktypes, occupations, income, and education. We use more detailed numeric attributes and incorporate more expert knowledge than used in previous studies. We demonstrate that we can accurately model expert opin- ions regarding the current state of automation, and introduce a methodology that goes beyond the limitations of the litera- ture. We call for more and better measurement of automation at the activity level. We discuss how this is needed to bet- ter prepare governments, employees, and businesses for the effects of automation. Session 8: AI for Social Good AIES’19, January 27–28, 2019, Honolulu, HI, USA 485

Upload: others

Post on 12-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Inferring Work Task Automatability from AI Expert Evidencemosb/public/pdf/5172/Duckworth et al… · Inferring Work Task Automatability from AI Expert Evidence Paul Duckworth University

Inferring Work Task Automatability from AI Expert Evidence

Paul Duckworth�

University of OxfordOxford, United Kingdom

[email protected]

Logan Graham�

University of OxfordOxford, United [email protected]

Michael OsborneUniversity of Oxford

Oxford, United [email protected]

ABSTRACT

Despite growing alarm about machine learning technologiesautomating jobs, there is little good evidence on what activi-ties can be automated using such technologies. We contributethe first dataset of its kind by surveying over 150 top aca-demics and industry experts in machine learning, roboticsand AI, receiving over 4, 500 ratings of how automatablespecific tasks are today. We present a probabilistic machinelearning model to learn the patterns connecting expert esti-mates of task automatability and the skills, knowledge andabilities required to perform those tasks. Our model infersthe automatability of over 2,000 work activities, and we showhow automation differs across types of activities and types ofoccupations. Sensitivity analysis identifies the specific skills,knowledge and abilities of activities that drive higher or lowerautomatability. We provide quantitative evidence of what isperceived to be automatable using the state-of-the-art in ma-chine learning technology. We consider the societal impactsof these results and of task-level approaches.

CCS CONCEPTS

� Applied computing → Economics ; � Computingmethodologies → Machine learning approaches; Gaussianprocesses; � Social and professional topics → Govern-ment technology policy .

KEYWORDS

automation, labor economics, Bayesian machine learning,interpretable machine learning, open datasets

ACM Reference Format:Paul Duckworth, Logan Graham, and Michael Osborne. 2019.Inferring Work Task Automatability from AI Expert Evidence. In

AAAI/ACM Conference on AI, Ethics, and Society (AIES ’19),January 27–28, 2019, Honolulu, HI, USA. ACM, New York, NY,USA, 7 pages. https://doi.org/10.1145/3306618.3314247

�Both authors contributed equally to this research.

Permission to make digital or hard copies of part or all of this workfor personal or classroom use is granted without fee provided thatcopies are not made or distributed for profit or commercial advantageand that copies bear this notice and the full citation on the first page.Copyrights for third-party components of this work must be honored.For all other uses, contact the owner/author(s).

AIES ’19, January 27–28, 2019, Honolulu, HI, USA

� 2019 Copyright held by the owner/author(s).ACM ISBN 978-1-4503-6324-2/19/01.https://doi.org/10.1145/3306618.3314247

INTRODUCTION

Machine learning (ML), in combination with complementarytechnologies such as robotics and software-based standardiza-tion, have rapidly become real substitutes and complementsto human labor. This work aims to better understand thatautomation, and its effects on work. One example is AmazonGo, a recently opened grocery store that uses computer visionto replace cashiers, of which over 3.5 million are employedin the United States [1, 11]. Further, the 500 000 designersin the US are beginning to use constraint-based generativedesign to automate creative designs of buildings, industrialcomponents, and more [1, 4]. As a result, we as researchersin these fields often confront examples of media and publicconcern about technologies we develop. What remains uncer-tain is the magnitude and direction of impact on employmentof machine learning technologies. While recent advances intechnology seem able to automate intelligent work, we lackgood data on the scope of such automation.

We collected a detailed task -based survey of 150+ machinelearning, robotics, and automation researchers. This is thefirst dataset of its kind with over 4,500 datapoints aboutwhat specific tasks are automatable according to currenttechnology. In this “nowcasting” exercise, technologists pro-vide knowledge of the extent to which a task can or cannotbe automated with technology that exists today. We use aprobabilistic model to infer the automatability of thousandsof activities for which it would be prohibitively difficult tocollect reliable data. By collecting task-specific data to modelautomatability, we believe we can develop richer, more ac-curate frameworks about what can be automated by thecurrent state-of-the-art in intelligent technology. We believethis allows society to better understand and prepare for au-tomation.

The contributions of this paper are a novel dataset of theautomatability of workplace activities; a probabilistic methodfor inferring automatability of unmeasured activities; activity-level analysis of automatability; and consequent patternsacross worktypes, occupations, income, and education. Weuse more detailed numeric attributes and incorporate moreexpert knowledge than used in previous studies.

We demonstrate that we can accurately model expert opin-ions regarding the current state of automation, and introducea methodology that goes beyond the limitations of the litera-ture. We call for more and better measurement of automationat the activity level. We discuss how this is needed to bet-ter prepare governments, employees, and businesses for theeffects of automation.

Session 8: AI for Social Good AIES’19, January 27–28, 2019, Honolulu, HI, USA

485

Page 2: Inferring Work Task Automatability from AI Expert Evidencemosb/public/pdf/5172/Duckworth et al… · Inferring Work Task Automatability from AI Expert Evidence Paul Duckworth University

RELATED WORK

Traditionally, approaches to predicting what is automatablehave developed frameworks based on the “types” of occupa-tions and the skills they require [2, 5, 9]. One popular frame-work places occupations on a manual-cognitive spectrum anda standardizable-dynamic spectrum. These frameworks arebroad and don’t best reflect that tasks (or groups of tasks),not entire occupations, are the unit of automation. As a re-sult, public perception is conflicted about the true effect ofautomation on work [24].

Recent work assumes that occupations are better analyzedas evolving combinations of detailed tasks, skills, and/orenvironments [2, 3, 14]. With increasingly granular job dataavailable from sources such as from O*NET [20], that breakdown occupations into hundreds of continuously updatednumerical components, we believe there is an opportunity toevaluate occupations by focusing on their tasks, skills, andenvironments. Two recent reports use a task-first approach.A recent report by McKinsey [14] uses an unclear approachto model the opinions of automation potential of an unknownnumber of industry-based potentially non-technical experts.Another recent analysis by the OECD [3] derives high-leveltask-level estimates from occupation-level estimates, and usesworker (not task) characteristics in their inference procedure.

We differ from previous approaches in three key aspects:first, we seek expert knowledge at the most granular tasklevel, similar to [14] and [10]. Second, we ask what is au-tomatable today and do not make speculative assumptionsabout future developments or uptake of future technologicaladvancements. Third, we present a robust and (soon) openlyavailable probabilistic methodology and dataset.

DATA REPRESENTATION

Expert Survey. We conducted an online survey of 156 aca-demic and industry experts in machine learning, robotics andintelligent systems about how automatable specific tasks areusing technology available today. Each expert was presentedwith 5 occupations and their 5 “most important” tasks, takenfrom the Occupational Network (O*NET) 2016 database [20].The complete list of 70 occupations whose tasks are annotatedis shown in Table 2 in Appendix A, with occupations chosento be representative of the feature space, with an emphasison high-employment and hence familiar occupations. Fivesample occupations and their surveyed tasks are displayed inTable 3 in Appendix A.

Each expert answered the following question: “Do youbelieve that technology exists today that could automate thesetasks?”, then labeled each task as either: Not automatabletoday (score of 1.0), Mostly not automatable today (humandoes most of it) (2.0), Could be mostly automated today (hu-man still needed) (3.0), Completely automatable today. (4.0),or Unsure. Respondents also reported overall confidence intheir answers (distribution shown in Appendix B).

Our dataset contains 4 599 task level responses from 156academic and industrial experts from around the world, andacross various scientific and industrial fields. Due to the

mix of fields, we broadly label these as experts in artificialintelligence and recognize experts offer varying perspectives.

We combine each task’s multiple expert labels using Inde-pendent Bayesian Classifier Combination (IBCC), a prin-cipled Bayesian approach to combine multiple classifica-tions [13, 23]. IBCC creates a posterior over labels that re-flects the individual labellers’ tendencies to agree with otherlabellers over ultimately chosen label values. We averagedIBCC task scores into their task’s work activity (describedbelow). Labels concentrate around whole and half values andwe round the final values to the closest 0.5 (a half-class).

We believe a survey of many experts, combined usingIBCC, is a transparent and reliable method of obtaining dataabout the current state of task automation. It requires noforecasting or prediction by the participants. The distributionof task-level expert responses, and the IBCC combined task-label distribution are shown in Table 5 in Appendix B. Thedistributions of field-relevant academic experience, and thegeographic location are shown in Figure 7 in Appendix B.97% of participants had field-relevant academic experience:most participants coming from computer science, ML, roboticor AI backgrounds, and 52% of responses from the US, UKand Germany, with the rest from 30 other countries. Thisaggregate dataset will be made openly available.

...�� �� �� �� �� �� ��

��

Major Occupation Group

��

�� ����

Activity Group

...

...

...

...

Occ

upat

ions

Task

sW

ork

Act

iviti

es

Figure 1: O*NET Database Taxonomy including oc-cupations o, work activities w, tasks t, and high-levelgroupings.

Occupations, Activities and Tasks. An occupation o is repre-sented as a set of discrete tasks an employee may be requiredto perform, o = {t1, t2, . . . , tn}. Each occupation may have adifferent number of tasks, and all tasks are detailed enoughto be unique to one occupation.

Similar tasks are grouped into a work activity w such thatw = {ti, . . . , tj}. While a task is occupation specific, workactivities are generic activities performed across multipleoccupations. We further aggregate results to major occu-pational groups and high-level activity groups. A diagramdemonstrating this hierarchy is shown in Figure 1.

Session 8: AI for Social Good AIES’19, January 27–28, 2019, Honolulu, HI, USA

486

Page 3: Inferring Work Task Automatability from AI Expert Evidencemosb/public/pdf/5172/Duckworth et al… · Inferring Work Task Automatability from AI Expert Evidence Paul Duckworth University

Automation by Work Activity. To create activity-specific fea-ture vectors, we aggregate the roughly 20, 000 tasks into their2, 067 work activities (which are called Detailed Work Ac-tivities in O*NET) to create a feature vector xw for specificwork activity w. Each occupation is represented as a vectorxo, comprising of the numerical ratings of its skills, (xs

o),knowledge (xk

o) and abilities (xao), i.e. xo =

[xs

o,xko ,x

ao

].

We represent each task t of occupation o with the featurevector of occupation o, because we assume that an occupa-tion’s skills, knowledge, and abilities informs those needed toperform its constituent tasks. These features are measuredquantitatively on a 1 to 5 scale by dozens of employees andexperts in the O*NET database.

The activity feature vector is a weighted average of its con-stituent task vectors: xw =

∑t∈w w(t,w)xo . w(t,w) is a nor-

malized weight of the task’s relative importance to its occupa-tion and work activity, i.e. w(t,w) = I(t,o)I(t,w)/

∑t∈w I(t,o)I(t,w).

The relative importance of the task to its occupation is cal-culated as I(t,o) = It/∑t∈o It, while the relative importanceof a task to its work activity is I(t,w) = It/∑t∈w It. Taskimportance It is a numeric measure also supplied by O*NET.

Automation by Occupation. We also explore what the au-tomatability of activities implies about the automatability ofthe occupations that perform them. First, we infer automa-tion scores yw for all work activities (including unlabeledones), as will be described in the next section. We constructan occupation automation score yo for occupation o usingthe importance-weighted average of its constituent work ac-tivities: yo =

∑w∈Ωo

I(w,o)yw, where Ωo, is the set of allwork activities performed by occupation o, and I(w,o) is theimportance score of work activity w normalized over this set.

We present automatability over major occupation groupswhich are the highest level occupation categorisation pro-vided by the Bureau of Labor Statistics’ SOC system. Werepresent a major occupation group G as a set of occupations{o1, o2, . . . } and construct the automation score yG by tak-ing the employment-weighted average of the automatabilityscores of its constituent occupations, yo for all occupationsin the group, i.e. o ∈ G.

On Using IBCC on Training Data. We believe using IBCC toachieve a single rating from multiple experts is advantageousboth because it is fully Bayesian and reflects a higher chanceof accurately recovering the true automatability label of atask in an environment of uncertainty and subjectivity. Themain idea is to use the agreement of responses to learn abelief over the correctness of each individual classifier (eachhuman expert) in order to weight their responses. The alter-native – averaging task scores – we believe misrepresents theordinal classification task as a continuous regression task. Aside-effect of this approach is that the ground truth labelsare more polarised at the extreme values (one and four),when compared with simply (mean) aggregating the task-level responses together. A complication of this is that whencomparing two models based upon their Root Mean SquaredError (RMSE), lower absolute error is achieved by not using

the (more polarized) IBCC combined labels. However, we be-lieve they are more representative when combining multiple(semi)-reliable expert sources. In future work, we would liketo explore modeling at the individual user-task label level.

The interpretation of the learning task and the socialscientific nature of the data influence the correct choice ofmetric and model. This is often found in computational socialscience. We save deeper rationalizations of our data, model,and task setup choices for further work.

MODEL COMPARISON ANDVALIDATION

We seek a flexible function estimation capable of modelingcomplex, non-linear relationships between the features (skills,abilities, knowledge) and (perceived) automatability in high-dimensional space. Given the social scientific nature of thestudy, we also desire a measure of model uncertainty. Wecompare models based on their “tolerance accuracy” score– the percent of posterior prediction means, yw, that arewithin 0.5 of the ground truth post-IBCC survey value yw.This is a sensible score for our task, and allows more flexi-bility in our multiclass ordinal setting than strict accuracyor average error. This score also takes into account that wecompare models with heterogenous output types – some out-put discrete labels, and some output continuous values. Weoptimised the hyperparameters of all models using 10-foldcross-validation.

Our first candidate model class is that of Gaussian Pro-cesses (GPs) [22], which have previously been applied tooccupation-based data in [9] and [7]. We specifically use theordinal likelihood function introduced in [8] to reflect thenature of having discrete labels but with an ordinal inter-pretation (not at all to completely automatable). We usethe squared exponential, or RBF, kernel, as it consistentlyperformed well compared to other kernels and was less likelyto overfit. We optimize the kernel hyperparameters by mini-mizing the negative marginal log likelihood log p(yw | xw) asdescribed in [22], using the open source software GPFlow [16].

Table 1: Model tolerance accuracy & negative loglikelihood.

Model Accuracy (std) −Log-likelihood

Ordinal RBF GP 0.645 (0.028) 385.6Gaussian RBF GP 0.643 (0.102) 382.3

Ordinal DNN 0.604 (0.071) –Random Forest 0.517 (0.081) –Ordinal Regression 0.451 (0.036) –yw = avg(yw) 0.575 (0.065) –Proportional Random 0.374 (0.049) –

For other candidate models, we consider ordinal logisticregression [21], a random forest, and a neural network with anordinal loss function [12], with a 4-layer (120-60-120-7) fullyconnected layer architecture and 10% layer-wise dropout. Aproportional random assignment and constant mean predictor

Session 8: AI for Social Good AIES’19, January 27–28, 2019, Honolulu, HI, USA

487

Page 4: Inferring Work Task Automatability from AI Expert Evidencemosb/public/pdf/5172/Duckworth et al… · Inferring Work Task Automatability from AI Expert Evidence Paul Duckworth University

Figure 2: (left:) Amount of employment affected across automatability scores, by 9 high level activity groups.(right:) Employment affected across automatability scores, by 12 major occupation groups.

are compared as a lower baseline on predictive performance.Notably, just predicting with the value of the mean achievesfourth-best performance. This is due to the concentration ofthe values, as the mean-predictor offers no useful informa-tion. While we believe our metric is the best for assessingmodels, this highlights the challenges of modelling subjectivedata with the ordinal classification interpretation we’ve taken.Results for each model are displayed in Table 1 (standarddeviation in brackets). The ordinal GP model consistentlyoutperforms comparative methods at prediction of posteriormean values of automatability over the space of work activi-ties. While the non-ordinal GP model and the optimised deepneural network perform on average similarly to the ordinalGP model, they do so much less reliably.

EXPERIMENTS AND RESULTS

Question 1: What is automatable?

Using the best performing GP model to infer the automata-bility score for all 2,067 work activities (including the train-ing set), we examine examples of automatable and not-automatable activities. Table 4 in Appendix C presents asample of work activities with the highest and lowest au-tomatability from the unlabeled data (with uncertainties).

We observe that activities such as “route mail to cor-rect destinations” (3.82) or “cut fabrics” (3.93) have a highautomation potential (and indeed, are widely automated).However, we also notice that white collar activities are alsohighly automatable with current technology: “Operate digi-tal imaging equipment” (3.63),“Send information, materials

or documentation” (3.39), and “Advise others on ways toimprove processes or products” (3.38). Insights such as thesepropose likely future automatable areas, where automationcould be achieved in the real world with relatively little fur-ther attention to the underlying technology. The mean ofautomatability scores is 2.65, indicating that that the model,learned from expert estimates, believes that tasks are onaverage more likely to be more automatable than not.

In lieu of profiling the long list of activities by their au-tomatability here, we consider instead what groups of ac-tivities are automatable, and the implications for occupa-tions when using an activity-first approach. In Figure 2 (left)we plot the automatability of activities by the number ofcurrently-employed individuals who perform them, and clas-sify into nine high-level activity groupings. It becomes evidentthat while most activities are between mostly and mostly notautomatable, work tends to lie closer to “mostly automat-able”. Eight times as much work lies between “mostly” and“completely” automatable than between “mostly not” and“not at all” automatable, when weighted by employment. Ac-tivities classified as “reasoning and decision making” and“coordinating, developing, managing, and advising” are lesslikely than others to be automatable. However, “adminis-tering”, “information and data processing” and (perhapssurprisingly) “performing complex and technical activities”are more likely to be automatable.

Additionally, we average the automatability scores of an oc-cupation’s activity automatabilities to create an occupation-level automatability score. (How to properly aggregate ac-tivities for an occupation-level score is a subject of further

Session 8: AI for Social Good AIES’19, January 27–28, 2019, Honolulu, HI, USA

488

Page 5: Inferring Work Task Automatability from AI Expert Evidencemosb/public/pdf/5172/Duckworth et al… · Inferring Work Task Automatability from AI Expert Evidence Paul Duckworth University

research, but we use this as preliminary exploration.) Weclassify occupations into 12 high-level “Major OccupationGroups”, as in Figure 2 (right). We see that the model pre-dicts very high automation potential in office, administrativesupport (orange), and sales occupations (red), which togetheremploy about 38 million people in the United States. Thisstands in contrast to the popular emphasis on the automationof physical processes such as production (yellow), farming,fishing and forestry (dark orange), and transportation andmaterial moving (teal), which employ about 20 million peoplein total. In contrast, two Major Occupation Groups appear ro-bust to automation: education, legal, community service, arts,and media occupations (light green), as well as management,business, and financial occupations (light blue).

Trends across Income and Education. Using our occupation-level mappings, we present a preliminary analysis of howautomatable activities cluster across income and education,to understand the likely impact on employees. We use medianannual income data from the Occupational EmploymentStatistics from the Bureau of Labor Statistics [1], and theaverage expert estimate that one requires at least a bachelor’sdegree to perform an occupation from O*NET [20].

The impact is perhaps expected. The highest paid, mosteducated occupations tend to be the least automatable. Theytend to be much smaller occupations. However, it is worthnoting that even being paid well and having a bachelor’sdegree does guarantee an occupation’s activities are not au-tomatable. “Air Traffic Controllers” make about 125,000 ayear, yet are deemed mostly automatable (2.93). “Cytoge-netic Technologists”, for example, require a Bachelor’s degree(with a 100% likelihood from O*NET), with an estimatedoccupation automatability of 3.03.

Nor is the opposite always true. “Preschool Teachers” and“Teacher Assistants” make just under 30,000 a year, yetare mostly non-automatable (1.71 and 1.87, respectively).O*NET experts estimate a 5% chance of needing at least abachelor’s degree to perform successfully as a “Heating andAir Conditioning Mechanics and Installer”, an occupationwhich is also predicted mostly not automatable (2.38).

Model Disagreements with Ground Truth. It is useful to con-sider where our model disagrees most with our ground truthlabels. While, as discussed, it’s difficult to confirm whichestimate is true, when the model disagrees with the groundtruth, it is using all information learned from all other rat-ings. Table 7 in Appendix C lists the 50 tasks with mostdisagreement between ground truth and model.

In the cases where the model overpredicts relative to theground truth, we see common themes. Some activities, suchas “connect electrical components or equipment” (yw: 1.0;yw: 2.69) are characterised by dextrous physical action (thedifficulty of automating of which is famously speculated as“Moravec’s paradox” in [18]). Others involve information ex-change in a situation with a clear goal, such as “communicatewith customers to resolve complaints or ensure satisfaction”(yw: 1.0; yw: 2.31). There are also cases involving complex

Figure 3: Occupation-level automatability scores byannual median income.

Figure 4: Occupation-level automatability scores byeducation attainment.

process monitoring. Across these activities, it seems con-ceivable that while intelligent technology may not automatethe entire activity, some combination of activity simplifica-tion, custom data gathering, and intelligent technology couldautomate a non-trivial amount of the activity.

The cases where the model underpredicts are more variedin interpretation. Some model underpredictions could be acase of downweighting unreasonable expert beliefs. For ex-ample, “position construction forms or molds” (yw: 4.0; yw:2.80) – simple in theory, but influenced by many dynamicenvironmental factors to render it complex in practice. Othercases of underprediction might also be a reflection of a lackof data around the activity in feature space, leading to lim-ited model learning. For example, there are some activitieswhich are clearly automatable (and already automated) forwhich the model predicts low automatability (e.g. “Processcustomer bills or payments” and “Create electronic databackup to prevent loss of information”). Alternatively, themore questionable predictions may be artifacts of the O*NETtaxonomy, in which the work activity title does not accuratelyreflect a critical nuance of its constituent tasks.

Session 8: AI for Social Good AIES’19, January 27–28, 2019, Honolulu, HI, USA

489

Page 6: Inferring Work Task Automatability from AI Expert Evidencemosb/public/pdf/5172/Duckworth et al… · Inferring Work Task Automatability from AI Expert Evidence Paul Duckworth University

Question 2: What makes workautomatable?

We now consider what increases or decreases the automata-bility of some activities. We compute the average derivativeof automatability with respect to each feature as describedin [6], over the space of work activities. For the nth feature,this is AD(n) := E(∂m(x)/∂xn), where m(x) is the posteriormean distribution. This measures the expected increase inautomatability for a unit increase in the feature. Table 8 inAppendix C shows the highest and lowest average derivativesof the posterior mean function per feature.

These gradients seem to reflect what intelligent technologyincreasingly offers: work that is clerical, repetitive, precise,and perceptual can increasingly be automated. Increases inthe features Clerical, Number Facility, Depth Perception,Control Precision and Production and Processing tend toincrease an activity’s automatability. Perhaps surprisingly,increases in Economics and Accounting and Sales and Market-ing knowledge and ability also increase automatability, whichreflects that we saw business-oriented sales & administrativework types having higher than average automatability.

On the other hand, work that is more creative, dynamic,and human oriented tends to be less automatable. While vari-able, the three strongest features driving decreased activityautomatability are Installation, Programming and Technol-ogy Design. That is to say, the experts who answered oursurvey are relatively safe, or misperceive themselves to be.

The gradients might, for example, be used by employ-ers/employees and policy makers to skill themselves differ-ently, proactively change the characteristics of work activities,or set policy to incentivize the development of particular skills,knowledge, abilities, or occupations.

Uncovering the Drivers of Automation. In our own analysis,we found that using activity-level ratings allows us to hy-pothesize richer frameworks about the drivers of automationthan the previous occupation-level frameworks. For exam-ple, one particularly useful analysis is to examine clusters ofsimilar activities, which score significantly high in a subsetof a features but vary substantially in inferred automata-bility. Some of these activity clusters across features suchas Dynamic Strength, Persuasion, Critical Thinking, andProduction and Processing seemed to imply the predictiveimportance of (a) task-standardization versus dynamism;(b) a well-specified performance metric versus open-endedthinking; (c) single-party versus multi-party goal satisfaction;and (d) active thinking versus active physical interaction.We reserve final conclusions from this approach for furtherexpanded and validated research. In summary, we believethis suggests that more activity-level data likely allows us tofind richer frameworks for automatability prediction.

SOCIETAL IMPACT

Automation is a notably data-sparse yet opinion-heavy areaof study [17, 19]. We need more clarity about what can be au-tomated, at the actual level of automation (tasks), and more

granular frameworks based on more granular data. Policy-makers would be able to design better, more targeted policyresponses, such as incentives to preemptively modify occupa-tions or programs to reskill workers. Workers would be ableto upskill or retrain in a more targeted way (towards certaintasks or away from certain tasks) to be robust to automation,instead of abandoning entire occupations. Further, we notethe large psychological burden of fear, uncertainty, and doubtthat comes with uncertain predictions; we hope to replacethat with the optimism, clarity, and confidence that comesfrom every worker having better predictions to enable moreeffective responses. Last, we would be able to better spotethically-challenging cases of activity automation, especiallyin healthcare and social services, before they happen, so thatwe can hold preparatory, informed ethical discussion.

We believe this necessitates collecting more data on au-tomation. Governments, researchers, businesses, and employ-ees would all benefit from uniting to do so. Despite automa-tion being one of the dominant themes of work in the comingdecades – a perhaps irreversible shift in how work is done inthe future – we are only just taking our first steps towardsgranular, activity-level measurement. In this paper we offerour dataset, consider the value of activity-level data, andpresent preliminary results that reinforce previous ones andprovide more fidelity and opportunities for deeper research.

CONCLUSION & FUTURE WORK

By using a more granular approach to “now-casting” task-level automation, we can unlock more nuanced frameworksabout what actually can be automated. Using task andactivity-level data, we can likely go further to better un-derstand the drivers of automatability.

However, our approach is a first step. We propose to thecommunity six important research gaps that our dataset andapproach should be useful for answering. Real-world vali-dation: Does our model accurately identify activities thatare already automated? Surprising automation: Whichactivities are more, or less, automatable than previous mod-els would have predicted? Automatable vs. automated:Why are some automatable activities not automated whileothers are? Multivariate interaction patterns: How doesone feature modify a different feature’s effect on a task’s au-tomatability? Economic value: What is the monetary valueof automation potential for highly automatable activities?(See [15].) Employee characteristics: What are the pat-terns of demographics, industry, technology use, and othercharacteristics of employees performing activities with low-and high-automatability activities?

ACKNOWLEDGEMENTS

We thank Carl Benedikt Frey, Jonathan Downing, KatjaGrace, Allan Dafoe, John Salvatier, Matthew Willis, TheHealth Foundation (award #7559), the Oxford Martin Pro-gramme on Technology and Employment, & the RhodesTrust.

Session 8: AI for Social Good AIES’19, January 27–28, 2019, Honolulu, HI, USA

490

Page 7: Inferring Work Task Automatability from AI Expert Evidencemosb/public/pdf/5172/Duckworth et al… · Inferring Work Task Automatability from AI Expert Evidence Paul Duckworth University

REFERENCES[1] 2017. Occupational Employment Statistics. , 90–93 pages. www.

bls.gov/oes/[2] Daron Acemoglu and Pascual Restrepo. 2016. Artificial Intelli-

gence, Automation and Work. (2016). https://doi.org/10.3386/w24196

[3] Melanie Arntz, Terry Gregory, and Ulrich Zierahn. 2016. The Riskof Automation for Jobs in OECD Countries: A Comparative Anal-ysis. OECD Social, Employment and Migration Working Papers2, 189 (2016), 47–54. https://doi.org/10.1787/5jlz9h56dvq7-en

[4] Autodesk. 2017. What Is Generative Design. https://www.autodesk.com/solutions/generative-design

[5] David H Autor. 2013. The “task approach” to labor mar-kets: an overview. Journal for Labour Market Research 46,3 (2013), 185–199. https://doi.org/10.1007/s12651-013-0128-zarXiv:arXiv:1011.1669v3

[6] David Baehrens, Timon Schroeter, Stefan Harmeling, MotoakiKawanabe, Katja Hansen, and Klaus-Robert Muller. 2010. How toExplain Individual Classification Decisions. Journal of MachineLearning Research 11 (2010), 1803–1831. arXiv:0912.1128

[7] Hasan Bakhshi, Jonathan M. Downing, Michael A. Osborne, andPhilippe Schneider. 2017. The Future of Skills Employment in2030.

[8] Wei Chu and Zoubin Ghahramani. 2005. Gaussian processes forordinal regression. Journal of machine learning research 6, Jul(2005), 1019–1041.

[9] Carl Benedikt Frey and Michael A Osborne. 2017. The futureof employment: how susceptible are jobs to computerisation?Technological Forecasting and Social Change 114 (2017), 254–280.

[10] Katja Grace, John Salvatier, Allan Dafoe, Baobao Zhang, andOwain Evans. 2017. When Will AI Exceed Human Performance?Evidence from AI Experts. (2017). arXiv:1705.08807

[11] Dhruv Grewal, Anne L Roggeveen, and Jens Nordfaelt. 2017.The Future of Retailing. Journal of Retailing 93, 1 (2017), 1–6.https://doi.org/10.1016/j.jretai.2016.12.008

[12] Jordan Hart. 2017. Keras Ordinal Categorical Crossentropy. https://github.com/JHart96/keras ordinal categorical crossentropy.

[13] Hyun-Chul Kim and Zoubin Ghahramani. 2012. Bayesian classifiercombination. In Artificial Intelligence and Statistics. 619–627.

[14] James Manyika, Michael Chui, Me Miremadi, J Bughin, K George,P Willmott, and M Dewhurst. 2017. A Future that Works: Au-tomation, Employment, and Productivity. McKinsey GlobalInstitute (2017).

[15] James Manyika, Michael Chui, Mehdi Miremadi, Jacques Bughin,Katy George, Paul Willmott, and Martin Dewhurst. 2017. Har-nessing Automation for a Future that Works. McKinsey GlobalInstitute (2017).

[16] Alexander G. de G. Matthews, Mark van der Wilk, Tom Nickson,Keisuke. Fujii, Alexis Boukouvalas, Pablo Leon-Villagra, ZoubinGhahramani, and James Hensman. 2017. GPflow: A Gaussianprocess library using TensorFlow. Journal of Machine LearningResearch 18, 40 (apr 2017), 1–6. http://jmlr.org/papers/v18/16-537.html

[17] Tom Mitchell and Erik Brynjolfsson. 2017. Track how technologyis transforming work. Nature 544, 7650 (apr 2017), 290–292.

[18] Hans Moravec. 1990. Mind Children: the Future of Robot andHuman Intelligence. Harvard University Press. 214 p. pages.https://doi.org/cblibrary-fut

[19] National Academies of Sciences, Engineering and Medicine. 2017.. National Academies Press, Washington, D.C. https://doi.org/10.17226/24649

[20] National Center for O*NET Development. [n. d.]. O*NET OnLine.https://www.onetonline.org/

[21] Fabian Pedregosa-Izquierdo. 2015. Feature extraction and super-vised learning on fMRI : from practice to theory. Theses. Univer-site Pierre et Marie Curie - Paris VI. https://tel.archives-ouvertes.fr/tel-01100921

[22] Carl Edward Rasmussen and Christopher K.I. Williams. 2006.Gaussian Processes for Machine Learning. Vol. 1.

[23] Edwin Simpson, Stephen Roberts, Ioannis Psorakis, and ArfonSmith. 2013. Dynamic bayesian combination of multiple imperfectclassifiers. In Decision making and imperfection. Springer, 1–35.

[24] Aaron Smith. 2016. Public Predictions for the Future of Work-force Automation: Full Report. Technical Report. Pew ResearchCenter.

Session 8: AI for Social Good AIES’19, January 27–28, 2019, Honolulu, HI, USA

491