measuring activity in big data: new estimates of big data ......2 1. introduction to date, much of...
TRANSCRIPT
![Page 1: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/1.jpg)
Measuring activity in big data: new estimates of big data employment in the
UK market sector
Omar Chebli, Peter Goodridge, Jonathan Haskel
Discussion Paper 2015/04
July 2015
![Page 2: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/2.jpg)
1
Measuring activity in Big Data: New estimates of Big Data
employment in the UK market sector*
Omar Chebli
Imperial College Business School
Peter Goodridge
Imperial College Business School
Jonathan Haskel
Imperial College Business School; CEPR and IZA
May 2015
Abstract
Statements around the growth in data and associated analytical activity are widespread but metrics are
rare. In the UK, exceptions to this include estimates of employment in the field of big data. We
document those studies and produce our own estimates using a new and novel dataset. We find that in
2010, estimated ‘big data employment’ in the UK market sector was 190,000. We show how this
estimate relates to official measures of employment in other knowledge creation activities, such as
own-account (in-house) production of software and also business performance of R&D.
*Contacts: Peter Goodridge, Jonathan Haskel, Omar Chebli, Imperial College Business School, Imperial
College, London. SW7 2AZ. [email protected] [email protected], [email protected]. We are very
grateful for financial support for this research from EPSRC (EP/K039504/1 and EP/I038837/1). We also thank
e-skills UK, TechUK and industry participants at a TechUK forum for helpful discussions. This work contains
statistical data from ONS which is Crown copyright and reproduced with the permission of the controller of
HMSO and Queen's Printer for Scotland. The use of these data does not imply the endorsement of the data
owner or the UK Data Service at the UK Data Archive in relation to the interpretation or analysis of the data.
This work uses research datasets which may not exactly reproduce National Statistics aggregates. This work
uses research datasets which may not exactly reproduce National Statistics aggregates. All errors and opinions
are of course our own.
![Page 3: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/3.jpg)
2
1. Introduction
To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer
volume, or growth in volume, of data available to firms, and how it is being, or could be, put to use.
On volume, Google’s Eric Schmidt is commonly quoted as stating that as much data/information is
being created every two days as was created from the dawn of civilisation to 2003 (Wong 2012).
However, aside from broad statements, in the UK at least, few hard metrics are available on the scale
or volume of big data activity.
Exceptions to this include work that has sought to produce labour market statistics on big data. In a
series of reports, e-skills UK (2013a; 2013b; 2014), the Information Technology Sector Skills Council
for the UK, document estimates of big data employment in UK firms, associated salaries, as well as
current and future estimates of demand for big data staff (vacancies). In other work, Mandel has
sought to estimate big data employment in both the US (2012; 2013), and also in work in conjunction
with NESTA, for the UK (Mandel and Scherer 2014).
There is however little consensus on numbers. E-skills UK estimate UK big data employment of
31,000 in 2013, whilst Mandel estimates 294,000 in 2014. To give some sense of scale, according to
the Business Expenditure on Research and Development (BERD) survey,1 in 2013 UK firms
employed 178,000 workers engaged in R&D and, according to the Annual Survey of Hours and
Earnings (ASHE), in 2010, 749,000 workers engaged in the writing of software.2 In light of this, in
this paper we produce our own estimate of big data employment for the UK market sector using a new
data source, namely the publically available profiles of workers registered on an employment-based
social media network. We show how that estimate relates to the ONS measurement of employment in
occupations that produce own-account software(see Chamberlin, Clayton et al. (2007)). This is a
natural place to focus since, although it is mathematics and statistics that are the foundations of data
analytics, both data-building and data analytics require software programming skills so that
employment in these activities is very much related. In future work we will show how to relate our
estimate of employment to standard national accounting procedures for measuring investment in
intangible assets. This paper is therefore a first step to documenting the contribution that data and
data-based assets are making to UK growth.
The plan of the rest of this paper is as follows. Section two sets out an informal model of the activity
we are seeking to measure. Section three presents measures of big data employment documented in
1 Data available at http://www.ons.gov.uk/ons/rel/rdit1/bus-ent-res-and-dev/2013/index.html
2 Authors own estimates constructed from ASHE microdata held at the UK Data Archive (Office for National
Statistics).
![Page 4: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/4.jpg)
3
other studies. Section four presents our new estimate of big data employment in the UK market sector
and compares it with estimates for other knowledge based employment such as that in software and
R&D. Finally section five concludes.
2. An informal model of big data activity
Before setting out various measures of big data employment, it is first worth describing exactly what
activity we are seeking to measure. Figure 1 presents a simplified exposition of the big data process,
shown in three stages. Note that although represented linearly, various feedbacks likely exist between
stages. Also note that the three stages can either exist in-house, that is within the same vertically
integrated firm, or within distinct specialist firms.3 Employment estimates that follow are designed to
incorporate both these types of activity i.e. outsourced and in-house.
2.1.1. Data-Building (Transformation) (D)
Starting at the top of the diagram, we first consider the data-building or transformation (D) process,
which transforms raw records into data/information of a format ready for analysis. Raw records are
raw data of any source that require transformation into an analytical format. Data building may
involve digitising, structuring, formatting, and/or cleaning data. This process is sometimes referred to
as “data management”, “data acquisition” or “data warehousing”. The literature on data warehousing
and data analytics commonly describes this as the ETL process, an acronym for ‘Extract, Transform,
Load’. ‘Extract’ refers to the extraction of raw records; ‘Transform’ to the transformation of raw
records into data, often of improved quality, of a format ready for analysis; and ‘Load’ to the loading
of the data into the database or data warehouse. The linking, matching and aggregation of datasets
may take place in this stage, or later in the knowledge creation stage.
3 Currently it is expected that the three stages predominantly exist in-house. However, as the field develops, it is
likely that more companies will specialise at different points in the chain/process (i.e. provision of raw records,
producers of information, producers of data-based knowledge, etc.). As an example, Google are a case where
all three stages exist in-house. As a by-product of providing search services, Google automatically generate raw
records on the search histories of users. They then employ labour and capital to manage, clean and transform
those data into an analytical format, producing information. Google then use that transformed data (i.e. it rents
from the Google stock of transformed data) to produce commercial knowledge. As a trivial example, this may
be the knowledge that users that search for product X (say, flights) also consume product Z (say, hotel
accommodation). In the downstream, Google sell advertising services to other firms. In doing so Google rents
from its stock of commercial knowledge (including data and algorithms) to sell advertising that can be targeted
at specific consumers e.g. in this example, hotels in a region advertise to those searching for aeroplane flights to
that area. Alternatively, consider a firm such as Experian. They operate in the knowledge creation stage,
buying or acquiring transformed data from numerous sources, and using that information to produce data-based
knowledge which they sell to other firms. The credit scores they sell to banks are just one example of the data-
based knowledge services they provide.
![Page 5: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/5.jpg)
4
Figure 1: The Big Data Production Chain
Note to figure: Commercialisation is the embodiment of knowledge into the output of goods and services, which
may be sold for profit or made freely available. We therefore use the term commercialisation as our focus is on
the market sector, but note that the framework can also be applied to the non-market sector.
2.1.2. Knowledge creation (N)
The next stage is the knowledge creation process (N), more commonly referred to as “data analytics”.
This stage takes the output of the data-building stage, and uses that data/information to conduct
analysis. That analysis could take a number of forms. It will include activities commonly referred to
in the literature as “data science”, “data/text mining”, “knowledge recovery”, “business intelligence”
and “machine learning”, with the latter referring to the use of artificial intelligence to discover
correlations in data. Whatever the method, the output of the analytics process is a piece of
commercial knowledge formed from the analysis of information, and used to construct advice to be
implemented in the final production of goods and services.
2.1.3. Downstream production of final goods and services (Y)
The final stage incorporates the application of knowledge in the production of final goods and
services, in the downstream production (or operations) sector (Y). We emphasise that the
![Page 6: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/6.jpg)
5
downstream is a pure operations stage, that does not conduct any activity in the creation of
information or data-based knowledge but rather just employs/rents labour and (tangible and
intangible) capital, including data-based knowledge, to deliver final goods and services.
We stress that we are seeking to measure employment in the first two stages presented in Figure 1,
that is, employment in the transformation or building of data and in the extraction of data-based
knowledge. In the final stage some workers will be involved in the implementation/use of data-based
knowledge, as well as the implementation of other forms of knowledge such as that from R&D,
market research etc. For instance, the data-based insight implemented in the downstream could be the
knowledge that the cross-promotion of goods results in increased sales, or alternatively the knowledge
to re-optimise downstream processes and improve productivity, derived from data emitted by sensors
embedded in machines (the “Internet of Things”). Similarly data and data-based knowledge may be
used in the generation of other types of knowledge, such as that created in the conduct of R&D or
market research. We are not seeking to measure activity in implementation here, and such “users” of
data-based knowledge are not intended to be included in the employment estimates that follow.
Also note that we do not seek to measure activity or employment in the generation of raw records. A
feature of ‘big data’ is that raw records are typically generated as a by-product of some other process,
for instance where data comes as exhaust data.4 Workers involved in the production of raw records
would therefore include those employees that work at the point where raw records are created,
including workers at the point of sale such as cashiers in supermarkets. We do not attempt to measure
this part of the process.
Rather we are seeking to measure employment in the transformation/building of data (information)
and in the use of that data to extract knowledge, thus including the kinds of occupations that are
receiving more and more attention, such as “data scientists”, “data engineers” and “business
intelligence analysts”. In the data-building stage, we would expect to find occupations that include
“data administrators”, “data managers”, “data engineers” and workers in “data control”. The
knowledge creation stage is more likely to contain workers with job titles that include “data
scientists”, “business intelligence” and “data/statistical analysts”.5 In practice, the roles of some
workers/occupations could include some aspects of both data-building and knowledge creation.
4 Typically unstructured data generated as a by-product of some online or digital process.
5 In a following section we document work by e-skills UK (2013b) which estimates employment in the
following occupations: “data engineers”, “data administrators”, “data analysts” and “data scientists”.
![Page 7: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/7.jpg)
6
2.2. Big and Little data It is worth making one other definitional point. It may have been noticed that we have not attempted
to formally define “big data”. Commonly used definitions of big data typically refer to the “3 V’s”,
that is the large volume, variety and velocity of data that is being created, largely as a result of the
spread of the digital economy. But in this paper, and future work, we are primarily concerned with
measurement of activity in data and data analytics that generates knowledge to be used in final
production. The volume, source, variety and type of data employed, or the speed with which it is
generated, is less of a concern. It therefore does not seem helpful to introduce a distinction between
“big” and “little” data, after all, each are based on the same foundations, that is mathematics,
statistics, computer science etc.
Further, data and data analytics have been making contributions to final production long before the
term “big data” became so widespread, even if some of the techniques, tools, technologies and
approaches are new. For example, the major supermarket chains have been collecting data on their
customers purchasing patterns and preferences for some time. That activity has just been made easier
and richer with the new types of data that are becoming available and to which they can link to.
Similarly insurance companies, who seek to create risk profiles of actual or potential customers, and
banks who use credit scores to assess customer applications for their products. We therefore see the
emergence of the field of big data analytics as growth in an activity that has long existed. The 3 V’s
mean that many more raw records are available, and that much more information can be created,
facilitating growth in the knowledge creation sector. Therefore in our measurement, we will not seek
to specifically exclude types of data and data analytics activity that do not meet particular strict
definitions of big data in terms of data type or the size of datasets, although we continue to refer to
“big data” for reasons of simplicity/shorthand.
2.3. Initial estimates
This framework suggests we can make an initial guess at the scale of UK big data employment. First,
we know from our discussions with industry and from the empirical work that follows that there is
some overlap with software. From the ASHE, we know that in 2010 there were around 289,000
workers in the UK market sector recorded under the occupation of “software professionals”. More
broadly, there were around 749,000 workers in IT occupations that the ONS consider are involved in
the writing of software (Office for National Statistics). Some proportion of these workers will be
engaged in the building of data and data-based assets. Second, there is also a potential overlap with
another category of knowledge workers, namely those in R&D. According to BERD, in 2010 there
were 154,000 workers engaged in R&D in UK firms, with 22,000 of those engaged in R&D in the
product field “Computer programming and information service activities”. Some proportion of these
workers may also be considered to be working in the sphere of big data.
![Page 8: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/8.jpg)
7
3. Big data employment
How are we to measure big data employment? Looking at the diagram in Figure 1, were the stages in
this chain served by separate industries, then we could look at official industry data. The problems
with this approach are that, first, to the extent that these activities are provided by specialist industries,
the Standard Industrial Classification (SIC) is not currently detailed enough to separately identify
these firms. Second, much of the activity detailed in Figure 1 actually takes place in-house in
industries not classified as ‘data industries’, be that manufacturing, retail etc. Therefore we go to the
data on occupations. The obvious sources for data on employment by occupation are the Labour
Force Survey (LFS) or Annual Survey of Hours and Earnings (ASHE), categorised according to the
Standard Occupational Classification (SOC). However, inspection of the SOC shows that official
occupational classifications have also not kept pace with the new occupations emerging in and around
data and data analytics. This may not be surprising, with many of the job titles associated with this
field having emerged relatively recently. Whilst it is possible to identify the codes where workers in
data-building and data analytics are likely allocated, the codes are not exclusive so such workers are
mixed in with other occupations in rather broad groups. Therefore in our work and other studies,
some other source must be used instead.
Survey data: e-skills UK 3.1.1.
Few studies have produced metrics of the resources devoted to data-building or data analytics.
Exceptions to this include a series of reports by e-skills UK (2013a; 2013b; 2014) which document
UK employment in big data activity. In conjunction with SAS, e-skills UK ran a survey of larger
market organisations asking firms about their adoption/use of data analytics, and questions on the
number of “big data staff” employed. They found that in 2013, 14% of firms with more than 100
employees had adopted big data analytics, and that 31,000 employees work in big data positions, with
32% (10,000) in IT-focused roles, 55% (17,000) in data-focused roles and 13% (4,000) in other roles.
E-skills estimates of big data employment are presented below in Table 1 .6
We note the following from Table 1. Of the data-focused roles, from our discussions with e-skills
UK, we consider the 3,000 Data Engineers and 1,000 Data Administrators to be likely employed in
the data-building (D) stage; and the 8,000 Data Analysts and 1,000 Data Scientists to be likely
employed in the knowledge creation (N) stage. With another 4,000 in undefined “other data-focused”
roles, these estimates suggest employment of 17,000 in the two stages combined, with 4-8,000 in
data-building/transformation, and 9-13,000 in knowledge creation. Alternatively we could use a
6 E-skills UK also conducted a survey of smaller organisations in conjunction with Experian. Of the 541 SMEs
they contacted, they concluded none had implemented big data analytics, suggesting the proportion of SMEs in
the UK population that had implemented is less than 0.2%.
![Page 9: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/9.jpg)
8
broader definition that incorporates supporting IT-focused staff, implying big data employment of
31,000 in UK firms in 2013.
Table 1: Big data employment, 2013 (e-skills UK 2013b)
Big data employment: 2012* 2013 2014* 2015*
20,000 31,000 39,000 47,000
IT-focused roles: 10,000
Strategy/planning/design 2,000
Development/Implementation 3,000
Administration/operations 1,000
Support 1,000
Other IT-focused 2,000
Data-focused roles: 17,000
Data Engineers 3,000
Data Administrators 1,000
Data Analysts 8,000
Data Scientists 1,000
Other data-focused 4,000
Other roles: 4,000 Source: Table 1 and Figure 9 in e-skills UK (2013b)
Note to table: Survey-based employment numbers for 2013. Numbers for 2012, and 2014-15 are
estimates/forecasts from e-skills UK/Experian (Figure 9 in e-skills UK (2013b)). 2013 survey was of firms, so
estimates relate to the UK market sector. . IT-focused roles defined as “enabling roles focused on the design,
development, implementation, administration, maintenance and support of big data related systems and
applications”. Data-focused roles defined as “analytical roles focused on identifying, acquiring, managing,
manipulating, analysing, understanding, utilising and presenting big data and related inferences/propositions”.
3.2.NESTA: Mandel and Scherer (2014)
In work in conjunction with NESTA, Mandel and Scherer (2014) produce alternative estimates of UK
big data employment. They too note that the Standard Occupational Classification has not kept pace
with new and changing occupations in growing, innovative fields such as big data. Therefore they
turn to using data on the number of jobs advertised on the job aggregator website Indeed.co.uk as a
means of measuring employment. They therefore assume that the number of job ads proxies the
number of gross hires, and that the number of gross hires has a strong correlation with employment,
and present evidence to support those assumptions. We note that job ads may have referred to jobs in
both the public and private sectors, so that final estimates reflect the UK whole economy rather than
just the market sector.
Specifically, the job descriptions and skill content contained within job ads are searched using a list of
14 keywords or phrases that include program names such as “Hadoop”, “MapReduce” and “Python”,
or job titles such as “data scientist”, “data engineer” or “data analyst”. In April 2014 such a search
returned 18,720 big data job advertisements. Then, in order to transform that estimate to an
employment number, it is multiplied by a “job/want ad multiplier” (of 15.7), based on the ratio of jobs
![Page 10: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/10.jpg)
9
to ads in general IT occupations, derived using official employment data based on the SOC. This
calculation translates to an estimate for UK (whole economy) big data employment of 294,000 in
2014, as summarised below in Table 2.7
Table 2: UK big data employment (Mandel and Scherer 2014)
Big data job ads Job-want ad multiplier Big data employment
2014 (April) 18,720 15.7 294,000
Note to table: Data from Mandel and Scherer (2014). Snapshot for April 2014. Column 1 is number of big data
job ads identified from Indeed.co.uk. Column 2 is the ratio of jobs to job advertisements for general IT
occupations. Column 3 is estimated UK big data employment, calculated as column 1 times column 2.
We note that this estimate is far larger than that produced by e-skills UK, which ranged from 17,000
to 31,000 depending on the definition used (e-skills estimates also referred to 2013 rather than 2014,
but project a figure of 39,000 for 2014 using their broad definition), although we do note that the
focus of this study is on the geographical distribution of innovative employment activity, rather than
absolute numbers in employment.
We consider there to be three predominant reasons for the large divergence between the estimates
produced by e-skills UK and Mandel/Scherer. First, as noted above, e-skills UK ran a survey of firms
so their estimates refer to the UK market sector, whereas results in Mandel/Scherer are based on an
aggregation of all job vacancies and so refer to the whole economy.
Second, is the heavy dependence of the Mandel/Scherer result on the “job/want ad multiplier” of 15.7.
The multiplier used is based on the ratio of jobs to vacancies in general IT occupations. However, the
big data arena is one that is relatively new, and so we may not expect the ratio of jobs:vacancies to be
as high as in general IT. Further, from a survey of 45 data-focused companies, Bakhshi, Mateos-
Garcia et al. (2014) report that 80% of firms are struggling to hire the skilled labour they require,
stating that the supply of data skills is insufficient for current (let alone future) demand. In their
labour market assessments, e-skills UK (2013a; 2013b; 2014) similarly report that firms are struggling
to fill vacancies in this area, giving us extra reason to suspect that the appropriate multiplier for big
data is lower. In fact, based on the (narrowly-defined) employment estimates constructed by e-skills,
the true multiplier may actually be in the order of one (17,000 in employment compared to the 18,720
job ads identified by Mandel/Scherer). Alternatively, using the broader e-skills definition, e-skills UK
(2013b) reports a vacancy estimate of 3,790 in 2012, compared to employment of 20,000 in the same
year, which would suggest a multiplier of around 5, rather than the 15.7 in Mandel/Scherer.
7 For information, according to ONS data, in April-June 2014 the jobs to vacancies ratio for all vacancies was
41.67.
![Page 11: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/11.jpg)
10
Third, e-skills UK, when contacting firms, restricted the definition of big data workers, whereas
estimates from Mandel/Scherer potentially include forms of data/analytics or business intelligence
activity that do not meet the stricter definition employed by e-skills UK.
To summarise, estimates of UK big data employment taken from work by e-skills UK and NESTA lie
in the rather large range of 17,000 to 294,000. In order to validate our final estimate, we must
therefore turn to some other source of information, which we do in the next section.
4. New estimates of big data employment: social media data
In this section we present our own estimates of big data employment derived from a novel dataset
built from the publically available profiles of members of an employment based social media network.
Before describing our method and results, it is first worth setting out some detail on estimates of
employment in a related investment activity in the national accounts, namely software or
‘computerised information’.
4.1.Employment in related occupations: “computerised information” The System of National Accounts (SNA) (United Nations 2008) recommends the capitalisation of
expenditures on ‘computerised information’, comprised of software and databases, both purchased
and own-account (in-house). This means that statistical authorities gather data on employment in
software-writing occupations, in order to estimate in-house investments in creating software. From
Chamberlin, Clayton et al. (2007), the list of occupations used by the ONS is presented below in
Table 3: columns 1 and 2 of present the seven occupational codes used in measurement (based on
SOC 2000), column 3 shows the approximate mapping to SOC 2010, column 4 provides typical
responsibilities and column 5 lists job titles related to each code. Some of the job titles considered
most relevant to data-building and data-based knowledge creation are highlighted in red. In column 6
we conjecture at which stage in the big data production chain these occupations are likely engaged.
From reading the associated responsibilities and related job titles in columns 4 and 5 it is clear that
workers allocated to these codes will include workers involved in the upstream stages shown in Figure
1,8 in particular software professionals which includes job titles such as “analyst-programmer”,
“systems analyst” and “data communications analyst” and whom we would expect to find in the
knowledge creation stage of our framework. Other workers with job titles such as “data processing
manager”, “data entry clerk” and “data processor” are more likely involved in the data-building stage.
8 The methodology for estimating investment in computerised information is based on a past vintage of the SOC
(2000). However, inspection of the latest revision to the SOC (2010) shows that the occupational coding is still
not sufficiently granular to separately identify the workers we are seeking to measure.
![Page 12: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/12.jpg)
11
In practice some of these workers are likely involved in a mix of data-building and data analytics
activity.
Table 3: Occupations used in estimation of UK investment in OACI (own-account computerised information)
SOC
(2000) Occupation Where in SOC (2010)? Responsibilities Related job titles included (SOC00 and SOC10):
Stage in Big Data
Production Chain
1136: Information technology and
telecommunications directors
2133: IT Specialist Managers
2131IT strategy and
planning professionals
2134: IT project and programme
managers
Providing advice on the effective
utilisation of information technology
in order to solve business problems
or to enhance the effectiveness of
business functions.
computer consultant, software consultant, IT
consultant, implementation manager (computing), IT
project manager, programme manager (computing),
project leader (software design)D/N
2135: IT Business Analysts,
Architects and Systems Designers
2136: Programmers and software
development professionals
2137: Web design & development
professionals
3131: IT operations technicians
3132IT user support
technicians3132: IT user support technicians
Providing technical support, advice
and guidance for customers or IT
users within an organisation, either
directly or by telephone, e-mail or
other network interaction.
helpdesk operator, helpline operator (computing), IT
helpline support officer, support technician
(computing), systems support officer D/N
4136Database
assistants/clerks4131: Records clerks and assistants
Creating, maintaining, preserving
and updating information held in
electronic databases, computer files,
voice mailboxes and e-mail systems.
computer clerk, data entry clerk, data processor, VDU
operator.
D
5245
Computer
engineers,installation
and maintenance
5245: IT engineers
Installing, maintaining and repairing
personal computers, mainframe and
other computer hardware.
computer engineer, computer maintenance manager,
computer service engineer, computer service
technician, computer repairer, hardware engineer
(computer), maintenance engineer (computer servicing)D/N
3131IT operations
technicians
The day-to-day running of computer
systems and networks, including the
preparation of back-up systems, and
performing regular checks to ensure
the smooth functioning of such
systems.
computer operator, database manager, IT technician,
network technician, systems administrator, web
master, database administrator
1136
Information and
communication
technology managers
2132 Software professionals
computer manager, computer operations manager,
data processing manager, IT manager, systems
manager, telecom manager, IT director, technical
director (computer services), telecommunications
director, data centre manager, IT support manager,
network operations manager (computer services),
service delivery manager
analyst-programmer, computer programmer, software
engineer, systems analyst, systems designer, business
analyst (computing), data communications analyst,
database developer, games programmer
Planning, organising and directing
work necessary to operate and
provide ICT services, maintaining
and developing associated network
facilities and providing software and
hardware support.
All aspects of the design application
and development and operation of
software systems.
D
N
D
Source: Table 1 and Table 6 of Chamberlin, Clayton et al. (2007) modified with mapping to SOC 2010 and to
the big data production chain..
Notes to table: Column 1 is the official occupational code used to identify workers that produce assets in
computerised information, and column 2 the occupational title for that code. Since the methodology is based on
SOC 2000, column 3 maps to the latest revision of the SOC (2010). Column 4 lists typical responsibilities in the
role. Column 5 shows other job titles typically used for that occupation, taken from documentation for SOC
2000 and SOC 2010. Job titles most relevant to data-based activity are highlighted in red. Column 6 shows
which stage of the Big Data production chain these workers are likely engaged.
As can be seen from column 2, one of the occupations used in the measurement of own-account
computerised information (OACI) is “database assistants/clerks” (SOC00 4136). Whilst we might
expect such workers to be engaged in the data-building (D) stage of our framework, the detailed job
description and tasks in the SOC documentation are actually a better fit with administrative roles
rather than occupations typically associated with big data and data analytics. Indeed in the latest
revision to the SOC (SOC 2010), this occupational group maps to secretarial/administrative
occupations that are not associated with IT, or components of IT such as software and/or data and data
analytics.
![Page 13: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/13.jpg)
12
Figure 2 shows how market sector employment in each of the occupations in Table 3 has changed
over time.
Figure 2: UK market sector employment in own-account software occupations, by occupation code
Note to figure: Each line represents the number of people employed in each occupational code in Table 3 in the
UK market sector. Market sector defined as UK economy excluding public administration & defence (O),
education (P) and health (Q). Constructed from ASHE microdata held in the Secure Data Service at the UK
Data Archive (Office for National Statistics)
From Figure 2 the largest employment group among these occupations is “Software professionals”
(2132) which included 290,000 workers in 2011. The next largest group is “Information and
communication technology managers” (1136) with 154,000 workers in the same year, followed by
“IT strategy and planning professionals” (2131) with 103,000 workers, “IT operations technicians”
(3131) at 83,000, “IT user support technicians” (3132) at 55,000, “Database assistants/clerks” (4136)
at 28,000 and “Computer engineers, installation and maintenance” (5245) at 13,000.
It is worth noting the steady decline in the number of “Database assistants/clerks”, further supporting
the idea that this occupational group does not include the types of workers we are searching for. We
also note the decline in the number of “IT operations technicians” (3131), which from Table 3
includes “database manager” as a related job title.
0
50,000
100,000
150,000
200,000
250,000
300,000
350,000
199719981999200020012002200320042005200620072008200920102011
1136: Information andcommunication technology managers
2131: IT strategy and planningprofessionals
2132: Software professionals
3131: IT operations technicians
3132: IT user support technicians
4136: Database assistants/clerks
5245: Computer engineers,installation and maintenance
![Page 14: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/14.jpg)
13
Much of the growth in employment in OACI occupations is driven by growth in the number of
“Software professionals” (2132). As outlined above, we suspect that some proportion of these
workers work in the D and N stages of the data supply chain, either in-house or in specialist firms.
However, they do not exclusively include D and N workers, and further, D and N workers alsoreside
in occupational codes outside this list, which we explore more below.
4.2.Other D and N workers in the Standard Occupational Classification (SOC)
As well as the occupations used by the ONS in measuring investment in OACI, inspection of the SOC
reveals additional occupations that will include workers involved in D and N stage activity. Table 4 is
laid out in the same format as Table 3 above, and highlighted in column 5 are job titles considered
most relevant to the activity we are seeking to measure.
Other occupational codes that may include workers in data-building and/or data-based knowledge
creation include: research professionals (232),9 management consultants, actuaries, economists and
statisticians (2423); and business and related associate professionals n.e.c. (3539). Other occupational
codes that we speculate may include big data workers include research and development managers
(1137) and science professionals (211). However, even if these occupations include those working on
big data, we still don’t know what fraction are working on big data and we cannot know that until the
official occupational codes become narrow enough to enumerate them separately e.g. data scientist.
To take the next step we therefore have to turn to some other data source.
9 Since we are focusing on the market sector, in our estimation we shall exclude workers in the education sector.
Thus researchers will exclude those working in universities but include researchers in market organisations.
![Page 15: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/15.jpg)
14
Table 4: Occupations outside official software occupations that potentially include workers in data-building and/or
data-based knowledge creation
SOC
(2000) Occupation Where in SOC (2010)? Responsibilities/tasks include: Related job titles included (SOC00 and SOC10):
1137Research and
development managers
2150: Research and development
managers
Plan, organise, coordinate and direct
resources to undertake the
systematic investigation necessary
for the development of new, or to
enhance the performance of existing
products and services.
director of research, laboratory manager, research
manager, creative manager (research and
development), design manager, market research
manager, research manager (broadcasting),
211 Science professionals211: Natural and Social Science
Professionals
Planning, directing and undertaking
research and development,
providing, technical, advisory and
consultancy services in the fields of
chemistry, biological sciences,
physics, geology and meteorology.
analytical chemist, chemist, development chemist,
biomedical scientist, geologist, anthropologist,
archaeologist, criminologist, epidemiologist,
geographer, historian, political scientist, social
scientist, geophysicist, medical physicist,
meteorologist, oceanographer, physicist. seismologist,
forensic scientist, horticulturist, microbiologist,
pathologist, industrial chemist, physical chemist,
research chemist, biochemist, biologist, botanist,
medical laboratory scientific officer, microbiologist,
pathologist, zoologist, geologist, mathematician,
physicist, development chemist, bioinformatician,
research scientist
211: Natural and Social Science
Professionals
2426: Business and related
research professionals
2423: Management consultants
and business analysts
2425: Actuaries, economists and
statisticians
3539 Business and related
associate professionals
n.e.c
3539: Business and related
associate professionals n.e.c.
Studies particular department or
problem area and assesses its
interrelationships with other
activities; - Studies work methods
and procedures by measuring work
involved and computing standard
times for specified activities, and
produces report detailing
suggestions for increasing efficiency
and lowering costs
business systems analyst, data analyst, marine
consultant, planning assistant, project administrator,
project coordinator, conference coordinator, exhibition
officer, management information officer, work study
engineer, work study officer
actuary, business analyst, economist, management
consultant, management services officer, statistician,
business adviser, business consultant, business
continuity manager, financial risk analyst, actuarial
consultant, statistical analyst
Management
consultants, actuaries
economists and
statisticians
2423
Advise industrial, commercial and
other establishments on a variety of
management, personnel, computing
and technical matters, and apply
theoretical principles and practical
techniques to analyse/interpret data
used to assist in formulation of
financial, business and economic
policies.
232 Research professionals
Planning, directing and undertaking
scientific, qualitative and
quantitative research through the
application of theoretical principles
and practical techniques in order to
address a research objective
research assistant, research associate, researcher,
university research fellow, crime analyst (police force),
fellow (research), games researcher (broadcasting),
inventor, postdoctoral researcher
Note to table: Other occupational codes that may include workers in data-building and data-based knowledge
creation. Columns 1 and 2 are occupational groups in SOC 2000 and column 3 maps to occupations from SOC
2010. Column 4 summarises typical responsibilities or tasks. Column 5 shows related job titles.
4.3.Estimating UK big data employment using social media data
If we are to isolate those workers in the big data sphere, (either those currently in computerised
information, or those not so classified e.g. economist/statistician) we need to decide how to allocate
them. We might for example undertake a detailed work-study of the occupations and allocate them in
this fashion. This is prohibitively expensive, so we proceeded using social media data. We gathered
data on UK (market sector) employees in 2010 using a snapshot of publically-available information
![Page 16: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/16.jpg)
15
on job titles/descriptions and employee skills from an employment-based social media network, with
the dataset constructed in 2011.10
First, we classified employees according to their occupations transforming the job titles that workers
report on the particular platform to the SOC. Second, we then computed the fraction of workers in that
occupation with big data skills: those for example, who can use Hadoop, Python etc. or that report an
application of skills or job description that related to big data (e.g. data/text mining, data visualisation,
predictive analytics etc.). Our method therefore has similarities with that used by Mandel and Scherer
(2014). That is, we construct a list of keywords and search the profiles of members to estimate the
number of workers with skills in the production of (transformed) data and/or data-based knowledge.
Our list of keywords is provided in the Appendix. We believe the list to be relatively comprehensive,
although there will obviously be some terms/words we haven’t included. However, as noted by
Mandel and Scherer (2014), only one matching word is required to extract a relevant profile, meaning
that there are diminishing returns to an ever expanding list of keywords.
Third, we need a method to convert our sample to estimates of the population. We proceed as
follows. We take the share, described above, for each occupation with big data skills, and apply that
share to grossed-up estimates of employment (by SOC) from ASHE (Office for National Statistics).
In particular, we benchmark to the occupations used in the measurement of own-account
computerised information as detailed in Table 3, as well as other occupations where such workers
may reside as detailed in Table 4, and some additional occupations that we found reporting big data
skills in our preliminary analysis of the data.
There are of course a number of issues in this procedure. First, we necessarily assume that workers
with (big) data skills work in (big) data occupations. We note however that those registered with such
networks are very aware of the growing interest in such skills among employers, and so there may be
some bias to our estimates if members have enhanced or exaggerated their skill profile in response, or
if they are simply advertising their skills but not currently working in (big) data related roles.
Second, our dataset is a snapshot of member profiles in 2010, providing data on the job titles, job
descriptions, industry and skills of 43.6m members worldwide. Of those 43.6m, 3.6m are based in the
UK.11
Of those 3.6m UK members, around 2.4m report a job title, and of those, 1.5m work in the UK
market sector.12
Of those market sector members that report a job title, 0.46m report at least one skill
10
Since we are estimating employment in the UK market sector, we exclude workers whose self-reported
industry maps to public administration and defence (O), education (P) or health (Q) in SIC07. 11
ONS Labour Market Statistics, released in June 2014, show that in February to April 2014, UK employment
was 30.54m. The corresponding figure for February to April 2010 was 28.84m. 12
We focus on the UK data but note that the larger worldwide sample shares very similar characteristics.
![Page 17: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/17.jpg)
16
on their profile. We are therefore working with a sample of employment, but note that sample may
not be representative. In particular we note that workers in data transformation and data analytics, as
well as other professional and technical occupations/industries, are on average more likely to be
represented on such networks, as are younger workers who may also be more likely to possess such
skills.
Third, the number of market sector workers that report a job title but do not list a skill is therefore
large, around 1m or 69%. An obvious concern worth noting is that this may introduce bias into our
analysis if (big) data workers are more or less likely to report a skill than the average member.
Results 4.3.1.
We find that in total, 12,548 UK (market sector) members report either a big data skill, competency or
description (from those listed in Appendix Table A1). The results by occupation are set out in Table
5. The table is split into two panels. Panel 1 presents data for occupations used to estimate own-
account investment in computerised information and panel 2 for other occupations outside that list.
Columns 1, 2 and 3 are the occupational groups taken from SOC 2000 and SOC 2010 respectively.
Column 4 reports the number of members that fall under each occupation. Column 5 reports the
number identified for each occupation that also report (big) data skills, competencies or job
descriptions. Column 6 reports the ratio of column 5 to column 4. Column 7 reports market sector
employment for each occupation from ASHE (Office for National Statistics). Finally, column 8
reports our estimate of big data employment, by occupation, estimated as column 6 times column 7.
Using our list of keywords, we identify 12,548 instances of ‘big data workers’ in the UK market
sector. Summing down column 5 of panels 1 and 2 shows that we allocate 9,942 (79%) of those to
occupations in the SOC, leaving 2,606 (21%) unallocated. Inspection of those 2,606 unallocated job
titles shows that the majority are: undergraduate or postgraduate students (particularly PhD students);
members that report themselves as owners, co-owners or founders; or members that report themselves
as freelance with no additional information. We therefore assume that students are not employed, and
we exclude owners/founders/freelancers as we are benchmarking to ASHE, a survey of employees
which does not include the self-employed.
![Page 18: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/18.jpg)
17
Table 5: Social media data: Big data employment by SOC
ASHE data (2010) Final estimates
SOC
(2000) Occupation SOC (2010)
(A): Number of
people for each
SOC
(B): Number of
"Big Data
workers" by SOC
(C): Ratio of "Big
Data workers" to
SOC = (B)/(A)
(D): ASHE
Employment (UK
market sector,
2010)
(E): Big Data
Employment
(scaled, UK
market sector) =
(C)*(D)
1136: Information technology and telecommunications directors
2133: IT Specialist Managers
2131 IT strategy and planning professionals 2134: IT project and programme managers 11,936 659 5.5% 99,387 5,487
2135: IT Business Analysts, Architects and Systems Designers
2136: Programmers and software development professionals
2137: Web design & development professionals
3131: IT operations technicians
3132 IT user support technicians 3132: IT user support technicians 2,186 241 11.0% 61,860 6,820
4136 Database assistants/clerks 4131: Records clerks and assistants 248 10 4.0% 30,796 1,242
5245 Computer engineers,installation and maintenance 5245: IT engineers 637 49 7.7% 13,499 1,038
Subtotal - All "software occupations" 60,634 7,669 12.6% 748,769 122,855
1132 Marketing and sales managers 3545: Sales accounts and business development managers 20,192 332 1.6% 514,489 8,459
2472: Public relations professionals
2473: Advertising accounts managers and creative directors
1137 Research and development managers 2150: Research and development managers 1,001 36 3.6% 40,848 1,469
211 Science professionals 211: Natural and Social Science Professionals 845 87 10.3% 51,703 5,323
212 Engineering professionals 212: Engineering professionals 7,412 413 5.6% 396,375 22,086
211: Natural and Social Science Professionals
2426: Business and related research professionals
2423: Management consultants and business analysts
2425: Actuaries, economists and statisticians
342 Design Associate Professionals 342: Design Occupations 498 27 5.4% 59,625 3,233
3539 Business and related associate professionals n.e.c. 3539: Business and related associate professionals n.e.c. 860 82 9.5% 67,338 6,421
Subtotal - Other (non-software) occupations 51,617 2,273 4.4% 1,384,396 67,351
Total - All occupations 112,251 9,942 8.9% 2,133,165 190,206
2,119 27.9%
723 81 11.2%
17,095 1,144 6.7%
71 2.4%2,991
Panel 2:
Other
occupations 232 Research professionals
2423Management consultants, actuaries economists and
statisticians
1134 Advertising and public relations managers
3131 IT operations technicians
Data building and Data analytics: Official occupation groups and job titles related with those occupations
(Panel 1: ONS software occupations; Panel 2: Other occupations) Social media data (2010)
Panel 1:
ONS
software
occupations
3,636 26.2%
1136 Information and communication technology managers
2132 Software professionals
24,143 955 4.0%
13,889
7,595
159,974 6,328
289,823 75,873
93,430 26,067
30,252 718
103,455 11,590
120,311 8,051
Notes to table: Column 1 is the SOC code for the occupations used in the estimation of own-account investment in computerised information as well as some additional
occupations in which D and N workers reside. Column 2 is the occupation title for that code. Column 3 shows the mapping to SOC 2010. Column 4 are the number of
social network members identified for that occupation. Column 5 is a subset of column 4, and is the number of members who report big data skill(s) that fall under that
occupation. Column 6 is the ratio of column 5 to column 4. Column 7 are the number of UK jobs in those occupations in 2010, constructed from ASHE microdata held in the
Secure Data Service at the UK Data Archive (Office for National Statistics). Column 8 is estimated big data employment derived by applying the ratios in column 6 to
ASHE employment in column 7
![Page 19: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/19.jpg)
18
Looking at estimates by occupation, in the top line, for example, we see that 4% of workers whose
occupation is “Information and communication technology managers” (1136) report having big data
skills. For “IT strategy and planning professionals” (2131) we find the number to be 6%. For
“Software professionals” (2132) it is higher, at 26%, and for “IT operations technicians” (3131) it is
higher still at 28%. The fractions for “IT user support technicians” (3132), “Database
assistants/clerks” (4136) and “Computer engineers” (5245) are 11%, 4% and 8% respectively. The
bulk of such workers therefore seem to be in occupations like “software professionals” and “IT
operations technicians”. Looking at the OACI occupations as a group, we estimate that 12.6% of
workers in these software-related occupations are engaged in data and data analytics activity, which is
equivalent to 122,855 workers when grossed up to the UK market sector population.
Of the second group, which lie outside these IT occupations, we find that 11% of “Research
professionals” (232), 10% of “Science professionals” (211), 10% of “Business and related associate
professionals n.e.c.” (3539), 7% of “Management consultants, actuaries, economists and statisticians”
(2423), 6% of “Engineering professionals” (212), 6% of “Design Associate professionals” (342), 4%
of “Research and development managers” (1137), 2% of “Advertising and public relations managers”
(1134) and 2% of “Marketing and sales managers” (1132) are identified as having big data skills.
Grossed up to the UK market sector population, these estimates imply an additional 67,351 workers
not already counted in the measurement of OACI. Taken together, the results in panels one and two
provide an estimate of UK (market sector) big data employment of 190,000, of which two-thirds are
already counted in the measurement of OACI.
There are two ways to interpret our results. The first is that the professional and technical occupations
we are looking to identify are so well represented on this social media network that we are effectively
capturing the universe, or close to it, of UK D and N workers. Note that the 12,548 workers identified
is relatively close to the 17,000 estimate for data-focused roles collected in the e-skills survey, and the
18,720 job ads identified in Mandel and Scherer (2014). The second is that the identified workers are
only a sample and do not represent the universe of workers with big data skills. We take the second
view and gross up our results to the UK population. Alternatively, we may consider the estimates as
lower and upper bounds, .at respectively 12,548 and 190,206, with the latter lying in between
estimates from e-skills UK (2013b) and Mandel and Scherer (2014). We do note however that those
estimates from Mandel/Scherer are for 2014 and the whole economy, compared to ours for 2010 and
the market sector. Allowing for the non-market sector and some growth in activity between those
dates, the two estimates are fairly consistent.
![Page 20: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/20.jpg)
19
From Table 5we have that 122,855 out of the 748,769 recorded in official software (and databases)
employment has big data skills, about 16%. In future work we will show that the link between
employment in (big) data and software production has an important implication for measurement. For
now we just note that our results suggest that of the 190,206 identified big data workers, 65% of those
workers are already accounted for in the measurement of employment in OACI. The remaining 35%
are employed in outside occupations.
We therefore find that the majority of data workers are recorded in IT occupations. Similarly, Hawk,
Powers et al. (2015) find that nearly two-thirds of employment in data occupations is in the broad
categories of business/financial occupations and computer/mathematical occupations (34 percent and
31 percent, respectively), including management and market research analysts, software application
developers, computer user support specialists and computer systems analysts.
We note that our objective was to measure employment in the two upstream stages of Figure 1.
However, our second panel includes some occupations of members that, although report big data
skills, may either be more likely involved in the implementation of (big) data-based knowledge in the
downstream, or alternatively in the use of big data insights in upstreams for the production of other
forms of knowledge-based capital such as R&D, branding or design e.g. marketing and sales
managers (1132), advertising and public relations managers (1134) and design associate professionals
(342). Excluding these three occupations results in an estimate of big data employment of 177,796, as
opposed to 190,206 with them included.
Potential overlap with employment in R&D 4.3.2.
As outlined in Figure 1, we are seeking to measure employment in data-based knowledge creation as
well as in the production/transformation of data that feeds into that process. Therefore, as well as
software, there appears a clear link with employment in another knowledge creation activity, namely
R&D. The national accounts definition of R&D is taken direct from the Frascati Manual (OECD
2002) and is defined as comprising of: “creative work undertaken on a systematic basis in order to
increase the stock of knowledge, including knowledge of man, culture and society, and the use of this
stock of knowledge to devise new applications”. Clearly the analysis of data and the creation of data-
based knowledge would appear to meet this rather broad definition.
Data from the Business Expenditure on R&D (BERD) 13
survey reports R&D employment by product
field, with one of those products being “computer programming and information service activities”.
13
Available at http://www.ons.gov.uk/ons/publications/re-reference-tables.html?edition=tcm%3A77-329762.
Accessed on 18th
September 2014.
![Page 21: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/21.jpg)
20
Employment data are presented below in Table 6, and show 27,000 full-time equivalents (FTEs)
working on R&D in this product field in 2013, compared to 178,000 performing Business R&D in
total. Of those 27,000, 14,000 are scientists and engineers. We conjecture that some of those will
include workers deployed in the extraction of data-based knowledge, although some other part will be
working on the development of new or improved software. Where data is used in the production of
knowledge to be applied to some other good/process, such employment may also be recorded
elsewhere in the BERD data, for instance to general R&D or to the primary product of that industry
e.g. pharma. From our discussions with the ONS, we are aware that they consider that BERD data
will potentially include activity in data analytics provided it meets the Frascati definition above. We
also note that R&D employment in this product field grew strongly in 2003, and then remained stable
until 2009, before growing again by 35% between the years 2009 and 2013, possibly reflecting
growth in data analytics activity.
Table 6: UK BERD employment
Employment
(FTEs 000s): Of which: Of which:
Year Total
Scientists
and
engineers
Technicians,
laboratory
assistants
and
draughtsmen
Administrative,
clerical,
industrial and
other staff Total
Scientists
and
engineers
Technicians,
laboratory
assistants and
draughtsmen
Administrative,
clerical,
industrial and
other staff
2000 145 86 30 30 10 5 1 4
2001 152 93 28 31 11 6 1 3
2002 158 96 31 31 13 7 2 4
2003 155 99 28 29 19 12 2 5
2004 150 94 27 29 19 11 3 5
2005 146 94 25 26 19 12 4 4
2006 147 92 27 28 20 13 4 4
2007 158 90 35 33 21 13 5 3
2008 151 86 37 28 20 11 6 3
2009 151 85 40 26 20 11 7 2
2010 154 87 41 27 22 11 8 3
2011 159 90 42 27 23 12 9 2
2012 161 91 44 27 24 13 9 2
2013 178 98 52 28 27 14 10 2
UK BERD: TotalUK BERD: Product: Computer programming and information
service activities
Source: archives of BERD data
Unfortunately however we have no official information or gauge on just how much data/analytics
activity may be included in the BERD data. The guidance notes to the BERD survey do state that
“consumer surveys, advertising and market research” and “general purpose or routine data collection”
are to be excluded from R&D figures provided, but the potential for some data analytics activity to be
included does remain.
From Table 5 we do have that 40,000 (or 21%) of our identified D and N workers are recorded under
the occupational codes science professionals (211), engineering professionals (212), research
![Page 22: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/22.jpg)
21
professionals (232) and research and development managers (1137). Therefore, from total estimated
big data employment of 190,000, we estimate that 123,000 (65%) are already recorded in software
occupations, and 40,000 (21%) may be recorded in the measurement of R&D. Note that the 40,000
identified is higher than the 22,000 employed in R&D for “computer programming and information
service activities” in 2010, as reported in Table 6. We also conjecture above that 12,000 (6%) may be
involved in the creation of other forms of knowledge-based capital such as advertising, market
research or design (based on the 12,000 employed as marketing and sales managers (1132),
advertising and public relations managers (1134) and design associate professionals (342) in Table 5).
It is also worth making a broader general point about the comparison between these knowledge
creation activities. BERD results state that in 2010, UK R&D employment was 154,000, consisting of
87,000 scientists and engineers, 41,000 technicians and 27,000 administrative staff. If we take the
sum of the first two of those occupations, then our estimates, and those from e-skills UK, suggest that
respectively big data employment lies in the range of (31,000/150,000=)21%14
and
(190,000/128,000=)148% of UK R&D employment. If we also incorporate clerical staff, the range is
(31,000/178,000=)17% to (190,000/154,000=)123%. Considering the attention devoted to R&D,
these are clearly significant estimates. We do note however that traditional R&D is largely
concentrated in manufacturing15
whilst data activities are likely to be more dispersed across industries,
potentially being a feature of any firm/industry that generates, or has access to, raw records or
information.
4.4. Alternative data sources: Employment in the D and N industries
So far we have presented total estimates of employment of D and N workers. As discussed, some of
those workers will be employed in specialist D and N firms in the D/N industry, and some will be
employed in-house in outside industries. Unfortunately, just as with the Standard Occupational
Classification (SOC), the Standard Industrial Classification (SIC) has not kept pace with this
emerging field, and is not yet sufficiently granular to separately identify economic activity in such
firms. Inspection of the 2007 SIC reveals two industries of particular interest, whose activities are
potentially relevant to data-building (transformation) or data analytics (knowledge creation).
Table 7 provides detail on the economic activities of two industries: Business and domestic software
development (62012) and Data processing, hosting and related activities (63110). The third column
lists the activities included in each industry and highlighted in red are the activities we consider
potentially part of either data-building (D) or knowledge creation (N) (indicated in final column).
14
E-skills UK estimates are for 2013, so we use an estimate of R&D employment of 150,000 (98,000 scientists
and engineers and 52,000 technicians) as in Table 6. 15
Table 27 of the BERD release shows that, in 2012, of the £12.4bn of R&D that occurred outside of the R&D
industry, £6.7bn (54%) took place in manufacturing.
![Page 23: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/23.jpg)
22
Unfortunately, data are not available for each activity, column 3, in Table 7. Instead, the five-digit
level of the SIC (as in Column 1) is the lowest level of aggregation available. From Table 7, we can
assume that some part of the sales of industry 63110 relate to data-building. We can also assume that
some part of industry 62012 relates to knowledge creation, and another part to data building. Industry
data for SICs 62012 and 63110 are presented in Table 8.
Table 7: SIC07 Industries whose activities might include data-building and/or data analytics
SIC (2007) Industry Activity Where in our framework?
62012
Business and
domestic software
development
Business and domestic software development
Custom software development
Data analysis consultancy services Knowledge Creation (N) sector
Database structure and content design
Designing of structure and content of business and
domestic software database
Made-to-order software
Programming services
Software house
Software systems maintenance services
System maintenance and support services
Systems analysis (computer) Knowledge Creation (N) sector
Web page design
63110
Data processing,
hosting and related
activities
Batch processing
Data conversion
Data preparation services
Data processing
Data storage services
Database running activities
Tabulating service
Time sharing services (computer)
Web hosting
Data-building (D) sector
Data-building (D) sector
Note to table: Excerpt from the 2007 Standard Industrial Classification
Table 8: Annual Business Survey (ABS) data
Source: ONS Annual Business Survey (ABS)
*indicates disclosive
Standard
Industrial
Classification
(Revised
2007)
Section
Division
Group
Class
Subclass
Description Year Number of
enterprises
Total
Turnover
Approximate
gross value
added at
basic prices
Total
purchases
Total
employment
- average
during the
year (1)
Total
employment
costs
Total net
capital
expenditure
Number £ million £ million £ million Thousand £ million £ million
62.01/2 Business and domestic software 2008 18,323 13,681 6,712 6,978 107 4,614 220
development 2009 11,197 12,899 6,928 6,126 82 4,128 97
2010 15,653 12,859 7,355 5,558 102 3,888 158
2011 22,085 14,889 8,877 6,017 108 4,399 209
2012 27,147 15,562 9,319 6,329 109 4,771 273
63.110 Data processing, hosting and related 2008 2,856 5,059 3,270 1,789 38 1,633 191
activities 2009 2,850 4,876 3,447 1,433 * 1,409 *
2010 2,783 5,640 3,711 1,882 * 1,618 *
2011 2,996 6,437 4,220 2,168 * 1,621 228
2012 3,038 6,676 4,221 2,438 * 1,643 *
![Page 24: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/24.jpg)
23
Therefore official data does not give us a precise estimate of employment in the D and N industries.
We do however have an additional point of information, namely the estimate in e-skills UK (2013b)
that of big data activities, 89% are conducted in-house, and 11% are purchased. Using that estimate
allows us to split the employment numbers from various sources into the part that we estimate work in
the specialist D and N industries, and the part that operates in-house in outside industries. Of those
employed in-house in outside industries, e-skills UK (2014) suggests that primary employers are those
in financial services, games, retail and marketing.
Estimates for employment in the data industry compared to in-house employment in outside industries
are summarised below in Table 9. In the final row we present data for various memo items including
employment for the wider industry as defined by the SIC, big data vacancies, software employment
and R&D employment.
How do our estimates compare to the employment numbers in industries 62012 and 63110 reported in
Table 8? From there we have that industry 62012 (Business and domestic software development)
employed 102,000 people in the year 2010. The figure for 63110 (Data processing, hosting and
related activities) is disclosive for the year 2010 and other years, but that industry employed 38,000 in
the year 2008. Employment costs for 63110 in 2010 are very similar to those in 2008 implying that
the employment level is also similar. Therefore, in total, employment for these two industries in 2010
stood at around 140,000. From Table 9 we estimate that around 20,900, or (20,900/140,000=)15%, of
those workers are in the D and N industries. The remainder will be employed in the production of
either pre-packaged or custom software,16
maintenance, consultancy, support, web page design and/or
web hosting. Alternatively, using the estimates from e-skills (wide definition) would imply a figure
of around (3,410/140,000=)2.5% and those from Mandel/Scherer a figure of (32,340/140,000=)23%.
16
Note there will be an element of crossover here in the sense that software provision now includes the sale of
software and business solutions that have analytics tools built in. In our framework, such software is capital that
is used in the D and N stages of the data supply chain, but the labour that produces that software is not directly
employed in D and N activity i.e. in data transformation or data-based knowledge creation.
![Page 25: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/25.jpg)
24
Table 9: Estimated workers in D and N industries and in-house in outside industries
Source Year
Estimated D and
N employment
Implied D and N
employment in D
and N industries
In-house D and
N employment
in outside
industries
narrow definition, 2013 17,000 1,870 15,130
wide definition, 2013 31,000 3,410 27,590
Mandel and Scherer / NESTA 2014 294,000 32,340 261,660
Social media data (this paper) 2010 190,000 20,900 169,100
Memo items:
Employment in D, N
and wider industry
ABI (ONS): SIC 62012 & 63110 2010* - 140,000 -
Big data vacancies (Mandel/Scherer) 2014 18,720
Software employment 2010 748,769
R&D employment 2010 154,000
Of which: programming & info services 22,000
e-skills UK
Note to table: Estimates of D and N workers located in D and N industries, and in outside industries, based on
the information that 11% of big data activities are outsourced/purchased (e-skills UK 2013b). Thus column 1 is
estimated employment, column 2 is 11% of estimated employment which we allocate to the D and N industries.
Column 3 is the remainder of employment, corresponding to in-house/own-account activity, and is column 1
minus column 2. Memo items include estimates of employment in wider industry that includes the D and N
industries as defined by the SIC. *Data for 2010 is partially disclosive, so employment partly based on 2008
data, but employment costs in 2010 similar to 2008 suggesting employment is also similar. Other memo items
are: big data vacancies as estimated in Mandel and Scherer (2014), employment in software occupations, and
R&D employment in general and also in the product field “computer programming and information service
activities”.
5. Conclusions
Much has been published on the volume, and growth in volume, of data that is available to firms and
used to generate new knowledge via analytics. However, aside from broad statements, in the UK at
least, few hard metrics are available on the scale or volume of big data activity. In this paper we
document various estimates of UK big data employment and produce our own estimates using a novel
dataset. We estimate that in 2010, UK employment in the big data sphere stood at 190,000. Of those
190,000, 65% are measured as part of official measurement of employment in the own-account (in-
house) production of computerised information, 21% are potentially included in the measurement of
business R&D, and 14% are employed in other occupations. Of those other occupations we note the
potential overlap with other measures of knowledge-based capital such as advertising, market research
and design. In future work we will show how to relate our estimates of employment to standard
national accounting procedures for measuring investment in intangible assets. This paper is therefore
a first step to documenting the contribution that data and data-based assets are making to UK growth.
![Page 26: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/26.jpg)
25
References
Bakhshi, H., J. Mateos-Garcia, et al. (2014). "Model Workers: how leading companies are recruiting
and managing their data talent."
Chamberlin, G., T. Clayton, et al. (2007). "New measures of UK private sector software investment."
Economic and Labour Market Review 1(5): 17-28.
e-skills UK (2013a). "Big Data Analytics: An assessment of demand for labour and skills, 2012-
2017." Report for SAS.
e-skills UK (2013b). "Big Data Analytics: Adoption and Employment Trends, 2012-2017."
e-skills UK (2014). Big Data Analytics: Assessment of Demand for Labour and Skills 2013-2020. T.
Partnership.
Hawk, W., R. Powers, et al. (2015). The Importance of Data Occupations in the U.S. Economy, US
Department of Commerce, Economics and Statistics Administration.
Mandel, M. (2012). "Where the jobs are: The app economy." South Mountain Economics, LLC.
Retrieved June 28: 2012.
Mandel, M. (2013). "Building a Digital City: The Growth and Impact of New York City's
Tech/Information Sector " South Mountain Economics, LLC Prepared for the Bloomberg
Technology Summit(September 30, 2013).
Mandel, M. and J. Scherer (2014). "Using Want-Ad Data for Mapping of Jobs and Economic Activity
Related to Innovative Technologies." Study funded by NESTA.
OECD (2002). Frascati Manual 2002: Proposed Standard Practice for Surveys on Research and
Experimental Development, Paris: OECD.
Office for National Statistics "Annual Survey of Hours and Earnings, 1997-2011: Secure Data Service
Access [computer file]. Colchester, Essex: UK Data Archive [distributor], April 2013. SN:6689."
United Nations (2008). "System of National Accounts 2008."
Wong, D. (2012). Data is the Next Frontier, Analytics the New Tool, London: Big Innovation Centre,
November. Available at: http://www. biginnovationcentre. com/Publications/21/Data-is-the-
nextfrontier-Analytics-the-new-tool.
![Page 27: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/27.jpg)
26
Appendix 1 Appendix Table A1: ‘Big Data’ keywords
"big data",
"sparql",
"mongodb",
"neo4j",
"elasticsearch",
"lucene",
"nosql",
"cassandra",
"couchdb",
"node.js",
"scala",
"graph databases",
"titan",
"machine learning",
"mlaas",
"data mininig",
"text mining",
"text analytics",
"hbase",
"mapreduce",
"pig",
"web scale architecture",
"hadoop",
"hdfs",
"zookeeper",
"impala",
"datameer",
"riak",
"redis",
"couchbase",
"memcached",
"mysql",
"data science",
"python",
"ruby",
"rest",
"rdf",
"owl",
"semantic web",
"web ontology",
"pattern recognition",
"natural language processing",
"nlp",
"sentiment analysis",
"data visualization",
"predictive analytics",
"computational linguistics",
"informatica",
"predictive modeling",
"semantic technologies",
"hive",
"recommender systems",
"nodejs",
"grid computing",
"sentiment analysis",
"velocity",
"data warehouse architecture"
![Page 28: Measuring activity in big data: new estimates of big data ......2 1. Introduction To date, much of what has been published on ‘Big Data’ and data analytics has focused on the sheer](https://reader035.vdocuments.mx/reader035/viewer/2022071216/60484560a6b836155843a822/html5/thumbnails/28.jpg)
This paper has been produced by the Department of Management at Imperial College Business School
Copyright © the authors 2014 All rights reserved
ISSN: 1744-6783
Imperial College Business School
Tanaka Building South Kensington Campus London SW7 2AZ United Kingdom
T: +44 (0)20 7589 5111 F: +44 (0)20 7594 9184
www.imperial.ac.uk/business-school
This work is licensed under a Creative Commons Attribution 4.0 International License.