ferraro dissertation

Upload: andresferraro

Post on 30-May-2018

233 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Ferraro Dissertation

    1/153

    Conducting Marketing Research with

    Amazons Mechanical Turk

    ByEnrique Andres Ferraro

    A DISSERTATION

    Submitted to

    The University of Liverpool

    in partial fulfillment of the requirementsfor the degree of

    MASTER OF BUSINESS ADMINISTRATION

    2008

  • 8/14/2019 Ferraro Dissertation

    2/153

    ii

    A Dissertation entitled

    Conducting Marketing Research with Amazons Mechanical Turk

    By

    Enrique Andres Ferraro

    We hereby certify that this Dissertation submitted by Enrique Andres Ferraro

    conforms to acceptable standards, and as such is fully adequate in scope and

    quality. It is therefore approved as the fulfillment of the Dissertation

    requirements for the degree of Master of Business Administration.

    Approved:

    Dissertation Advisor Lisa Harris, Ph.D. ___________

    The University of Liverpool2008

  • 8/14/2019 Ferraro Dissertation

    3/153

    iii

    CERTIFICATION STATEMENT

    I hereby certify that this paper constitutes my own product, that where the

    language of others is set forth, quotation marks so indicate, and that appropriate

    credit is given where I have used the language, ideas, expressions or writings of

    another.

    Signed

    Enrique Andres Ferraro

  • 8/14/2019 Ferraro Dissertation

    4/153

    iv

    Abstract

    Conducting Marketing Research withAmazons Mechanical Turk

    by

    Enrique Andres Ferraro

    The viability of market research online enabled the current wave of research-based

    management decision-making. Tapping an always-on, ever-ready, cost-effective

    community presents itself as the next evolutionary step in accelerating market

    research based decisions. Packaging a community as a Web Service accessible via

    an open API (Application Programming Interface) appears as the ultimate enabler

    whereby this human cloud can be leveraged programmatically and integrated into

    management decision systems. This human cloud wrapped by an API is what

    Amazon Inc. introduced as the Amazon Mechanical Turk in 2005.

    The Amazon Mechanical Turk is composed of thousands of workers that complete

    context-free tasks in response to work requests submitted into the system. The

    majority of work requests treat the Amazon Mechanical Turk as a service factory,

    processing various types of data though it. However, this massive workforce can be

    leveraged for market research by creating work requests that gather information from

    the worker directly by means of surveys or polls. We conducted an analysis of

    workers demographic characteristics with a sample of 1428 workers, which we

    contrasted and compared with the US Census of 2000.

  • 8/14/2019 Ferraro Dissertation

    5/153

    v

    Our study reveals that The Amazon Mechanical Turk attracts workers from all

    segments of a population, largely in proportions matching the population from which

    they were drawn, and that the system represents a portal through which market

    research can be conducted easily and cost-effectively without incurring some of the

    sacrifices in validity inherent in captive panels or indirect online research.

  • 8/14/2019 Ferraro Dissertation

    6/153

    vi

    Acknowledgements

    The participation of each and every Turker is acknowledged.

    The constructive feedback of Lisa Harris, Ph.D, acting as dissertation advisor is

    kindly acknowledged.

    The guidance over these past three years from Chiona Balfoussia, Ph.D, Prof. Debra

    Black, Prof. Roger Bradburn, Prof. Nicola Caramia, Arlene Hiss, Ph.D., Sultan

    Kermally, Ph.D and Prof. Christiane Prange among others is acknowledged.

    The brainstorming on current digital marketing issues of the peer team at PC-HOST,

    as well as Edward Castronova, Ph.D., is gratefully acknowledged.

    The understanding and support of friends and family during these years has been

    invaluable and is sincerely appreciated.

    -Andres Ferraro.

  • 8/14/2019 Ferraro Dissertation

    7/153

    vii

    Table of Contents

    ABSTRACT.................................................................................................................IV

    ACKNOWLEDGEMENTS...........................................................................................VITABLE OF CONTENTS.............................................................................................VII

    TABLE OF TABLES ...................................................................................................IX

    TABLE OF FIGURES..................................................................................................XI

    AIMS OF THE DISSERTATION................................................................................... 1

    REVIEW OF THE LITERATURE ................................................................................. 7

    Role of Research ...................................................................................................... 7The Online Difference............................................................................................... 8Future of Online Research...................................................................................... 11

    A Return to Direct Market Research....................................................................... 13Validity of Future Online Research......................................................................... 14Summary................................................................................................................. 15

    METHODOLOGY....................................................................................................... 15

    Online Survey Platform and Process...................................................................... 17Survey Instrument Creation Process ...................................................................... 19

    Stage One Digital Replica ................................................................................ 21Stage Two Mapping and Scaling to Desired Output ........................................ 22Stage Three Bottom-up Analysis ..................................................................... 23Stage Four Minimization................................................................................... 25

    Survey Execution .................................................................................................... 27

    Preparation.......................................................................................................... 27Sample Size Determination................................................................................. 28

    Participants and Sites ............................................................................................. 28Role of the Researcher........................................................................................... 29Data Gathering........................................................................................................ 29Data Analysis .......................................................................................................... 30Trustworthiness of the Method................................................................................ 31

    External Validity................................................................................................... 31Face, Content and Construct Validity.................................................................. 33Internal Validity.................................................................................................... 34Reliability ............................................................................................................. 34

    RESULTS AND ANALYSIS OF DATA...................................................................... 351. Sex ............................................................................................................... 362. Age ............................................................................................................... 373. Country and State ........................................................................................ 424. Race ............................................................................................................. 435. Relationships and Households .................................................................... 466. Education and School Enrollment................................................................ 537. Nativity and Citizenship................................................................................ 598. Language Spoken at Home ......................................................................... 659. Ancestry ....................................................................................................... 67

  • 8/14/2019 Ferraro Dissertation

    8/153

    viii

    10. US Military Duty........................................................................................ 6811. Disabilities................................................................................................. 6812. Employment.............................................................................................. 7013. Transportation to Work............................................................................. 8014. Occupation................................................................................................ 8415. Income...................................................................................................... 88

    16. Special Sources of Income....................................................................... 9117. Housing..................................................................................................... 92

    Summary............................................................................................................... 102

    CONCLUSIONS ....................................................................................................... 106

    REFERENCES ......................................................................................................... 109

    OTHER WORKS CONSULTED............................................................................... 113

    APPENDICES .......................................................................................................... 114

    Appendix A Survey Instrument .......................................................................... 114Questionnaire .................................................................................................... 114Skip Logic .......................................................................................................... 124

    Appendix B Participant Instructions................................................................... 134Appendix C International Data Base Information .............................................. 135Appendix D Calculation Tables.......................................................................... 136

  • 8/14/2019 Ferraro Dissertation

    9/153

    ix

    Table of Tables

    Table 1 - Online questionnaire design challenges ..................................................... 26Table 2 - Gender of survey respondents.................................................................... 36

    Table 3 - Age distribution analysis - case counts....................................................... 37Table 4 - Age distribution analysis.............................................................................. 38Table 5 - Top countries of survey respondents.......................................................... 42Table 6 - Top states of survey respondents.............................................................. 43Table 7 - Race percentages for Amazon Mechanical Turk respondents................... 44Table 8 - Hispanic/Latino frequency........................................................................... 46Table 9 - Marital status ............................................................................................... 48Table 10 - Unmarried partner summary ..................................................................... 49Table 11 - Relationships and households summary .................................................. 52Table 12 - Relationships and households comparative summary ............................. 53Table 13 - Student levels............................................................................................ 54Table 14 - Educational attainment.............................................................................. 56Table 15 - Educational attainment comparison.......................................................... 58Table 16 - Geographical region of birth...................................................................... 60Table 17 - Top 10 countries at birth............................................................................ 61Table 18 - Top 10 US states at birth........................................................................... 62Table 19 - US citizenship............................................................................................ 64Table 20 - Non-US citizen respondent living in the US.............................................. 64Table 21 - Language other than English at home...................................................... 66Table 22 - English as a second language .................................................................. 66Table 23 - Top 10 countries of ancestry..................................................................... 67Table 24 - US military duty ......................................................................................... 68Table 25 - Employment and disability by age group cross-tabulation........................ 69

    Table 26 - Disability and employment comparison I .................................................. 69Table 27 - Disability and employment comparison II ................................................. 70Table 28 - Currently employed respondents 16 years or older.................................. 71Table 29 - Ability to work of respondents 16 years or older....................................... 72Table 30 - Military duty of labor force 16 years or older............................................. 73Table 31 - Employment of the civilian respondents 16 years or older ....................... 74Table 32 - Military duty of respondents 16 years or older .......................................... 75Table 33 - Sex of respondents 16 years or older ....................................................... 75Table 34 - Military service of females 16 years or older............................................. 76Table 35 - Employment of female civilians 16 years or older..................................... 76Table 36 - Respondents with children six years of age or younger ........................... 77Table 37 - Respondents in the labor force w/children six years of age or younger... 77

    Table 38 - Employment status summary.................................................................... 78Table 39 - Employment status comparative summary............................................... 79Table 40 - Means of commuting to work .................................................................... 81Table 41 - Carpooling................................................................................................. 81Table 42 - Length of commute in minutes summary.................................................. 82Table 43 - Commute method summary...................................................................... 84Table 44 - Occupation class....................................................................................... 86Table 45 - Industry...................................................................................................... 86Table 46 - Class of worker.......................................................................................... 87Table 47 - Income range............................................................................................. 88

  • 8/14/2019 Ferraro Dissertation

    10/153

    x

    Table 48 - Special sources of income frequency analysis ......................................... 91Table 49 - House value............................................................................................... 92Table 50 - House population raw summary................................................................ 94Table 51 - House population weighted by number of occupants ............................... 95Table 52 - Rooms in house ........................................................................................ 96Table 53 - Occupants per room summary statistics................................................... 97

    Table 54 - Occupants per room grouping................................................................... 97Table 55 - Occupants per room comparison .............................................................. 97Table 56 - Year house built and year respondent moved into house ........................ 98Table 57 - Residence ownership.............................................................................. 101Table 58 - International data base information - US age distribution....................... 135Table 59 - Chi square of sex preparation ................................................................. 136Table 60 - Chi square of sex results......................................................................... 136Table 61 - Age group Chi square preparation.......................................................... 136Table 62 - Chi square calculation comparing age groups........................................ 137Table 63 - Race comparison - Chi Square preparation............................................ 137Table 64 - Chi square calculation for races between sample and US Census........ 137Table 65 - Hispanic/Latino Chi square preparation.................................................. 137

    Table 66 - Hispanic/Latino Chi square test .............................................................. 138Table 67 - Marital status Chi-square test preparation.............................................. 138Table 68 - Martial status Chi square test results..................................................... 138Table 69 - Educational attainment Chi-Square preparation..................................... 138Table 70 - Chi-square goodness-of-fit for educational attainment ........................... 139Table 71 - Commute means Chi-square test preparation ........................................ 139Table 72 - Commute means Chi-square test results................................................ 139Table 73 - Occupation class Chi-square .................................................................. 140Table 74 - Industry Chi-square ................................................................................. 140Table 75 - Income range Chi-square preparation .................................................... 140Table 76 - Income range Chi-square........................................................................ 141Table 77 - Special sources of income case summary.............................................. 141Table 78 - Occupants per room Chi-square preparation.......................................... 141Table 79 - Occupants per room Chi-square result ................................................... 141Table 80 - Residence ownership Chi-square preparation........................................ 142Table 81 - Residence ownership Chi-square ........................................................... 142

  • 8/14/2019 Ferraro Dissertation

    11/153

    xi

    Table of Figures

    Figure 1 - Amazon Mechanical Turk diagram .............................................................. 3Figure 2 - Tagcow.com: Crowdsourced service on the Amazon Mechanical Turk...... 4

    Figure 3 - Hamlin model: the research decision model after the Internet .................. 10Figure 4 - Survey execution process.......................................................................... 19Figure 5 - Interplay of elements shaping the creation of our survey instrument ........ 20Figure 6 - Relationships critical to face, content and construct validity...................... 33Figure 7 - US Census population contrasted to sampled population......................... 36Figure 8 - Age distribution stacked on gender with normal curve overlay ................. 39Figure 9 - Population pyramid with normal curve overlay .......................................... 40Figure 10 - Age distribution comparison..................................................................... 41Figure 11 - Race comparison ..................................................................................... 45Figure 12 - Student level graph .................................................................................. 55Figure 13 - Educational attainment graph .................................................................. 57Figure 14 - Educational attainment comparison graph .............................................. 58Figure 15 - Top 20 states at birth ............................................................................... 63Figure 16 - Histogram of the length of commute in minutes ...................................... 83Figure 17 - Income range histogram .......................................................................... 89Figure 18 - Income comparison bar chart .................................................................. 90Figure 19 - House value histogram ............................................................................ 93Figure 20 - House build year histogram ..................................................................... 99Figure 21 - House move-in year histogram.............................................................. 100

  • 8/14/2019 Ferraro Dissertation

    12/153

    1

    Aims of the DissertationThis dissertation aims to expose the potential validity of using the Amazon

    Mechanical Turk as a market research platform by expanding on our knowledge of its

    population, incidence rates, salient features and any significant distortions introduced

    by the system and its workers. The Amazon Mechanical Turk is a distributed

    workforce available for hire in piecemeal fashion though the internet and one of the

    many tasks it can perform is responding to market research surveys.

    Computers excel at many tasks, but there is a wide range of tasks that, while

    astonishingly simple for humans, are extremely difficult if not impossible with current

    technology For example distinguishing whether it is day or night in a picture,

    creating a new question for a trivia game or successfully translating a speech. One of

    these tasks is examining the description of two items in a catalog and ascertaining

    whether the description pertains to the same item. To assist in classifying and

    eliminating duplicates in their massive inventories, Amazon Inc. created a web site

    that would link its internal workforce with its massive catalog and allow this internal

    workforce to flag duplicate items coming up in searches (Pontin, 2005). Amazon

    recognized that this method of labor distribution was extremely efficient for the

    company and thus likely valuable to the broader market outside the company. For a

    few years now, Amazon has been creating and exposing to the public what is known

    as Web Services relatively small components that perform a service using industry-

    standard XML interfaces. These web services can be stringed together to create

    larger applications. Amazon offers data storage as a service, computing power as a

    service and several others. Among these services lies a web service that provides

    human intelligence in snippets termed Human Intelligence Tasks or HITs for short.

  • 8/14/2019 Ferraro Dissertation

    13/153

    2

    These HITs and this web service links people on one end and a computer interface

    on the other end - but inverting the traditional computing paradigm where the human

    directs and consumes the output of the computer In a HIT the computer is

    instructing the human to perform a task and the computing system consumes the

    results. In exposing this system, Amazon Inc created a data processing service that

    is powered by thousands of distributed workers performing simple tasks still beyond

    the reach of current computers. Massively distributed ad-hoc work structures are

    relatively new, with the buzzword Crowdsourcing being their modern reference

    word, crowdsourcing it turns over tasks traditionally performed by employees to

    the internet multitude (Libert, Spector, 2008). Wikipedia.org is an example of

    Crowdsourcing, where thousands of writers create an ever-changing encyclopedia.

    The service Amazon created goes one step further in enabling any business or

    individual to harness the power of ad-hoc workers and Crowdsourcing by tying an

    internet back-end to workers, then a web-based as well as an XML interface to

    businesses and gluing the system with a business-to-worker micropayment system.

    The company calls this consolidated web service the Amazon Mechanical Turk;

    Figure 1 shows a diagram of how this web service encapsulates a human workforce

    into an electronic resource.

  • 8/14/2019 Ferraro Dissertation

    14/153

    3

    Figure 1 - Amazon Mechanical Turk diagram

    At the time of writing, several popular services relied on the Amazon Mechanical Turk

    as the core workforce and virtual processing unit. The audio transcription service

    CastingWords (2008) was using the workers to transcribe audio programs and

    podcasts into text. Another company, Tagcow, was using open APIs to pull the

    picture collections of subscribers from photo sharing site Flickr (and others in the

    future), subsequently constructing and submitting work units requesting workers at

    the Amazon Mechanical Turk to describe photos or identify persons in the photo (by

  • 8/14/2019 Ferraro Dissertation

    15/153

    4

    comparing them to a known set the customer provided) then would funnel the results

    back into Flickr in the form of Tags associated to each picture, thus making the

    users collection electronically searchable. Figure 2 illustrates how Tagcow was

    leveraging the open APIs of photo sharing sites as well as that of the Amazon

    Mechanical Turk to provide their photo-tagging service.

    Figure 2 - Tagcow.com: Crowdsourced service on the Amazon Mechanical TurkDiagram courtesy of Timothy Wright (2008) and Tagcow.com

    The Amazon Mechanical Turk is powered by thousands of workers that use their

    spare time to complete tasks and receive small monetary incentives at the

  • 8/14/2019 Ferraro Dissertation

    16/153

    5

    completion of each task. These workers are also individuals in the general population

    and represent a significant pool of people that can be harnessed for market research.

    Tasks (HITs) can be crafted to survey the workers demographics, attitudes and

    behaviors. The range of remuneration for a HIT at the Amazon Mechanical Turk

    starts at one cent of a US dollar, and while there is no upper limit, the apparent

    majority of HITs as of March 2008 seem to be below $0.20. These ranges of

    expenses for completed surveys make the Amazon Mechanical Turk an incredibly

    inexpensive tool for market research that, when coupled with the large size of the

    active worker population, creates a unique platform for rapid and affordable market

    research. While it would appear that the key difference between current online panels

    run by market research firms and the Amazon Mechanical Turk is one of cost, validity

    is also a factor when considering direct online market research (Furrer, Sudharshan,

    2001). The Amazon Mechanical Turk and Crowdsourcing in general is a new form of

    online research that has had little or no scientific exploration because so little

    market research has been carried out to date using such systems.

    The range of market research that can be conducted using the Amazon Mechanical

    Turk is very broad but has limitations. Market research surveys using polls, graphics,

    video, sound, interactivity, and other rich media can be conducted using the Amazon

    Mechanical Turk, and responses can be recorded using the same mechanism that

    virtually any other online platform is able to provide. Continuous market research is

    another avenue of research that can be conducted using the Amazon Mechanical

    Turk, this includes tracking attitudes towards a brand, real-time information

    dispersion, marketing message penetration over time; moreover, market research

    can carried out as a continuous process and integrated into enterprise systems for

    continuously updated decision support or planning. As noted, some market research

  • 8/14/2019 Ferraro Dissertation

    17/153

    6

    tasks are impractical if not impossible to conduct using this system. More specifically,

    market research that requires either real-time or iterative collaboration among

    participants, such as an online focus group, and market research tasks that call for a

    follow-up to respondents are unlikely to be viable at the Amazon Mechanical Turk

    due to the self-contained nature of the tasks and the virtual impossibility of locating

    particular workers for a follow-up.

    While we could not ascertain the validity of using the Amazon Mechanical Turk for all

    future foreseeable market research, we endeavored to identify inherent biases and

    incidence rates on demographic parameters that will assist future researchers when

    evaluating conducting market research using the Amazon Mechanical Turk.

    In order to carry out the aims of our research we structured our project around the

    analysis of primary demographic research we would carry out, working our way

    backwards from the type of conclusions we sought to explore, to how we planned to

    analyze information, and what instrument we would use in collecting our information.

    During an initial stage, we collected information from the US Census Bureau to gain

    an understanding of what subjects we could broach with our research, what

    information would be available for comparison purposes once the data collection

    itself had taken place, and what conclusions we might be able to draw from these

    comparisons. Before fully committing to the project, we explored the literature

    surrounding the topic of online research to gain an understanding and validate where

    our research would fill a void and how it would fit with the present state of knowledge.

    Once committed to the project we created our survey instrument using a process we

    have documented in this paper. After executing our survey continuously over a period

    of two months, we analyzed the information and created the present document.

  • 8/14/2019 Ferraro Dissertation

    18/153

    7

    Review of the Literature

    Role of Research

    Our key goal in reviewing the present state of knowledge surrounding our area of

    research is primarily to ensure we are advancing the state of available knowledge in

    a significant manner. This implies that we add an element of originality to our

    research, whether this is by entering a new field with existing research methods,

    applying new methods to an existing area, or any other means by which we can fill a

    gap or extend present knowledge - thus helping us decide whether research in an

    area is necessary and what type of research is most appropriate. In our case, we

    found a gap of knowledge surrounding the Amazon Mechanical Turk as a market

    research tool. When we explored potential reasons for this gap we found that the

    core idea exposed by Amazon with regards to the Amazon Mechanical Turk is a

    service factory powered by human intelligence, where workers manipulate

    information in various ways. However, the aspect of the workers themselves being of

    value for their intrinsic characteristics was absent, thus contributing to the creation of

    this knowledge gap and validating that we were really confronting an area where

    there was a need for research.

    In this chapter we present our analysis of relevant literature covering several angles

    of our own research and advancing though several stages: First understanding in

    which ways the specific type of research that can be conducted at the Amazon

    Mechanical Turk differs from other research. Secondly, the potential for the future of

    online research using this system and thus how relevant our research might be going

  • 8/14/2019 Ferraro Dissertation

    19/153

    8

    forward. Thirdly, we explore the increasing relevance of direct market research online

    in the wake of a proliferation of indirect research alternatives. Lastly, we look at

    research focused on the validity of online research. This review of prior research

    helps us narrow down the nature -and even some specifics- of the research needed,

    points the way to potential pitfalls, and constitutes the epistemological context of our

    research.

    The Online Difference

    To uncover differences between traditional and online surveys, Adam and McDonald

    (2003) took a list of club members and sent half of those selected randomly a

    questionnaire by mail and another half a survey e-mail. They analyzed the results

    and discovered several large differences in response rates, demographics, and

    research question opinions, thus indicating that the segment of the population that

    responds to an online survey does not overlap smoothly with those that are willing to

    answer postal polls. The Amazon Mechanical Turk is a completely online tool, and

    we expected the segment to be even different from the ones Adam and McDonald

    found. At the Amazon Mechanical Turk, the respondents actively partake in a system

    where they are remunerated for their contribution Thus not only they are an active

    and wiling part of the research, as opposed to a passive recipient, there is an

    expectation of receiving a small incentive that mediates the interaction. Further

    differentiating it from a market research firms panels, the main purpose of the system

    is not research itself thus we can expect fewer professional respondents that

    plague traditional online panels (Gonier, Stafford, 2007).

  • 8/14/2019 Ferraro Dissertation

    20/153

    9

    Participants in online market research studies can come from a number of sources,

    but generally these can be a firms own current customers who might have consented

    to participate in research, or respondents mediated by a market research firm

    (Laskey, Wilson, 2003). The market research firm acts as an aggregator, but at the

    same time as an intermediary and thus raises the barrier to entry for conducting

    market research. In their paper titled A new research medium, new research

    populations and seven deadly sins for Internet researchers Brace, Nancarrow and

    Pallister (2001) present a diagram created by Charlie Hamlin of Insight Express

    which explains that the arrival of the internet enables market research to be carried

    out for smaller decisions when considered against their importance/risk to the

    company, as well as by smaller firms overall - see Figure 3 below. The addition of a

    ready, publicly available population for market research studies, sitting behind a self-

    service interface, further expands this category by offering the ultimate in inexpensive

    market research. Nevertheless, there is a caveat, as Adam and McDonald (2003)

    saw above, the segments that respond to research via different mediums are

    intrinsically different. We expected respondents at the Amazon Mechanical Turk to be

    a different segment than panelists at online market firms or postal respondents

    altogether.

  • 8/14/2019 Ferraro Dissertation

    21/153

    10

    Figure 3 - Hamlin model: the research decision model after the Internet

    (Nancarrow and Pallister, 2001)

    The study by Laskey and Wilson (2003) previously cited paints a contrastingly

    bleaker picture of internet market research they gathered information from 120

    market research firms and concluded that the phenomenal internet research boom

    did not happen as predicted, and that firms currently use internet market research for

    limited types of research where the audience is more likely to exist online or be

    provided by the company wishing to conduct the research. The companies surveyed

    cited concerns over sampling, attrition of panelists and response rates only 7% of

    surveyed companies expected large growth in their internet-based research. The

    papers most relevant conclusion to our own research is that care must be taken to

    ensure online research samples are representative of the desired population.

    Understanding just what segment of the population Amazon Mechanical Turk

    respondents represent is at the heart of enabling its use as a market research tool

    and the research we conducted. Laskey and Wilsons 2003 study builds on the initial

    understanding that online respondents are different than postal, confirms the use of

    internet research mainly for smaller and less risky decision-making, but counterpoints

    Brace, Nancarrow and Pallister (2001). Brace, Nancarrow and Pallister (2001)

  • 8/14/2019 Ferraro Dissertation

    22/153

    11

    present online research as a new frontier that expands the capabilities of market

    research into more routine decision by its lower costs, while Laskey and Wilson

    present the argument that internet market research is literally locked into these

    smaller decisions due to operational factors that have not yet been overcome.

    Future of Online Research

    Malhotra and Peterson (2004) took an even more futuristic approach than Brace,

    Nancarrow and Pallister (2001) in reviewing the current trends, and emphasize an

    increase in qualitative research conducted online whether directly through online

    versions of focus groups or by analyzing the actions and writings of users as well as

    competitors online. They further conclude that samples obtained from the internet will

    over time better approximate larger populations of interest as use of the internet

    rises. Malhotra and Peterson are in a sense stating that the problems the internet

    research firms that Laskey and Wilson (2003) surveyed will be alleviated by the influx

    of people into the internet and not by any actions on the part of market research firms

    themselves. Laskey and Wilson (2003) did not point to solutions to the current

    problems in internet market research, but rather mapped out the issues that are part

    of the territory and stated them as unavoidable realities. We dont expect use of the

    Amazon Mechanical Turk to grow in the same dimensions or with the same speed as

    use of the internet itself, given the more focused appeal the site has Thus the

    sampled demographic is likely to remain more stable over time when compared to

    internet users in general.

  • 8/14/2019 Ferraro Dissertation

    23/153

    12

    Detaching from survey-based research and attacking the operational issues that

    Laskey and Wilson (2003) considered nearly insurmountable, Agrawal et, al. (2004)

    published in the IBM Journal of Research and Development a live action-based

    market research paradigm that alters the behavior of an internet site dynamically,

    using the behavior of users online to provide market research data back into an

    experiment engine - the paper concludes that this system is not yet a reality. A part

    of the problem for turning such a live analysis engine into a reality is the amount of

    consumers needed to participate in the site in order to collect relevant data. The

    problem becomes a catch-22, or self-referential problem, when we realize that we

    need good market intelligence in order to create promotions that drive traffic to a site

    in the first place Thus placing such live action analysis outside the reach of entities

    with small and medium pre-existing footprints on the internet. Using such live-action

    market research concepts becomes a reality once a small business can use the

    Amazon Mechanical Turk to funnel considerable traffic affordably and avoiding many

    of the ethical hurdles Agrawal et al. (2004) note.

    The motives of participants in online panels and surveys can be expected to vary

    depending on the type of panel, the research, and any recognition whether monetary

    or otherwise as researched by Daugherty et al (2005). Daugherty conducted

    research on panelists motivations for participation; results showed that the attitudinal

    factors respondents used were evenly distributed among five identified clusters.

    Critically, though, Daughterys study was conducted on an established panel owned

    by a university and the study itself notes that this is likely to bias the study. We can

    expect a unique attitudinal landscape shaping participation at the Amazon

    Mechanical Turk, which carries with it the unique demographics we explored.

  • 8/14/2019 Ferraro Dissertation

    24/153

    13

    A Return to Direct Market Research

    The relevance of direct market research (as opposed to indirect or observation-based

    research) is highlighted when the consumer market becomes cluttered with offerings

    designed to block the collection of market intelligence information by indirect means.

    These offerings take several forms, the most common being privacy filters as part of

    software in a computer designed to hide a persons online tracks, and likely soon a

    proliferation of other information interdictors, such as RFID blockers. Agrawal et al.

    discuss the privacy and ethical issues surrounding indirect collection of information,

    but since they take the viewpoint of the owner of a website and focus on a single site,

    they are concerned more with the ethical implications of collecting this information.

    Consumers, on the other hand, seem to have voted with their computers and have

    layers upon layers of privacy-enhancing software that removes tracking objects such

    as cookies, prevent cross-site actions, block referrer information, selectively display

    graphics, refuse scripted content, etc. Joukhadar (2005) cites a study by Jupiter

    Research that reports a dramatic decline in the accuracy of cookie-based information

    since, according to the groups research, over 58% of users were regularly deleting

    this information in 2004. Joukhadar (2005) writes that cookies are one of the primary

    tools websites use to track market campaigns, thus without this information, and

    growing concerns over privacy on behalf of consumers, the validity of any information

    gathered by this method is dubious at best. Self-service and do-it-yourself market

    research based on a sites visitors is becoming increasingly inaccurate, but site

    operators still posses the need to understand their users and market. Our research is

    aimed at uncovering an affordable alternate means of obtaining needed information,

    and supplanting the declining quality of indirect methods Two years after Joukhadar

  • 8/14/2019 Ferraro Dissertation

    25/153

    14

    (2005) Gonier and Stafford (2007) wrote about just such an alternate method and

    termed it portal-based research.

    Validity of Future Online Research

    Gonier and Stafford (2007) argue that in present-day online studies validity is at risk

    because firms push for cheap and fast solutions, and argue that using captive panels

    or sending unsolicited e-mails to collect responses in order to accelerate results

    leads to sampling sacrifices The desirable characteristic that is sacrificed for speed

    and economy is inevitably the quality upon which scientific principles of validity

    reside (Gonier, Stafford, 2007). Their advice to managers is to avoid decisions

    based on poor sampling, which takes enticingly less time and money. The suggestion

    to researchers include moving to portal-based research where larger populations are

    tapped and artifacts such as professional respondents are minimized; Curiously, the

    Amazon Mechanical Turk acts as just such a portal, albeit not as visited as Google or

    Yahoo. Research based on the system we explored is expected to have lower

    incidence rates of the undesirable artifacts Gonier and Stafford mention.

    The validity of market research conducted using Amazon Mechanical Turk is at the

    core of our research, and there have already been studies indicating that consumers

    respond differently to online surveys vs. telephone surveys (Miller, 2001). Research

    cited by Miller (2001) uses propensity scoring to adjust for differences in

    demographic groups responding online and thus arrive at comparable results the

    propensity scoring used represents the probability of a respondent in one survey

    method to be present in another. This research method provides a guideline into how

  • 8/14/2019 Ferraro Dissertation

    26/153

    15

    actual market research based on the Amazon Mechanical Turk can be adjusted into

    a target market demographic. However, the researchers also mention that some

    groups oppose normalizing online research into other media, arguing that online

    media should be used to predict or measure what it can predict and measure, and

    not be shoehorned into another mediums standards. The critical analysis carried out

    by Miller (2001) will be very useful to future researchers building on top of our

    research.

    Summary

    Key issues uncovered by our literature review are firstly a significant difference in the

    makeup of respondents between online and traditional research, and a further

    divergence with portal-based online panels, which the Amazon Mechanical Turk

    would classify as one. Secondly, a backlash of privacy concerns is choking the

    avenues of indirect research for smaller firms, which face either not researching or

    pushing the limits of validity in order to afford direct research. Thirdly, we found a

    need to understand the sampling frame of the sampled population used in direct

    research. These key issues form the context that shaped the goals as well as

    methods of our research.

    Methodology

    Understanding the demographic makeup of Amazon Mechanical Turk workers

    Turkers as they call themselves- in isolation is useful in and of itself, however doing

  • 8/14/2019 Ferraro Dissertation

    27/153

    16

    so in a way that makes this information easily comparable to known studies provides

    significantly greater value. This use of secondary data presents a challenge How

    will we know whether the observed differences between the secondary data source

    and our own research are due to actual differences in the measured phenomena and

    not due to systematic errors introduced by using a different measurement

    instrument? This problem will be present whenever complex information is obtained

    from secondary sources and contrasted to primary research. Our intent is not only to

    gather primary data, but also to contrast it with existing information. Thus, we would

    be best served by a secondary data set that is publicly available, with public data

    capture instruments and known methodologies applied.

    The choice of data set and instruments to compare against was extremely broad,

    with a myriad of entities providing demographic data sets. For our comparison of

    these workers against a target population, we sought to focus on a sizeable but

    circumscribed market that was also likely to have a significant representation within

    the Amazon Mechanical Turk workforce. We chose the US population for this

    benchmark. The US Census Bureau provides a rich data set and instruments to the

    public much of it can be accessed online via the internet at the US Census Bureau

    website (US Census Bureau, 2008a).

    We chose to carry out primary research in the form of quantitative research obtained

    from questionnaires presented to Amazon Mechanical Turk participants, and

    secondary research into the broader US Census data. Subsequently comparing and

    contrasting this information with the information gathered from Amazon Mechanical

    Turk respondents. The results of these comparisons were expected to yield vital

  • 8/14/2019 Ferraro Dissertation

    28/153

    17

    information about the similarities and differences between the populations and

    specifics about those responding at Amazons Mechanical Turk.

    Online Survey Platform and Process

    The Amazon Mechanical Turk tasks are all accomplished over the internet. Amazon

    provides a basic web interface to complete tasks (used by the workforce), an XML

    API used to request tasks programmatically for completion, as well as a simple web

    interface for those who wish to submit tasks manually. A full analysis of these

    interfaces is beyond the scope of our research. The Amazon Mechanical Turk

    interface is capable of displaying and collecting data in ways that would suffice for

    very simple surveys, but its facilities are general-purpose and would provide a poor

    platform for our research, with nothing in the way of input validation or skip patterns.

    Thus we had to use an online survey platform external to the Amazon Mechanical

    Turk and devise a way for workers to prove that they had indeed completed the

    survey so they could be rewarded with the monetary incentive attached to the survey.

    One way to do this correlation is to use a platform that provides some form of survey

    ID or validation code after the survey is completed. We asked Turkers to take the

    survey via a hyperlink and, once they had completed the survey, to type into the text

    box at the Amazon Mechanical Turk the validation code the online survey software

    generated. Ultimately, we chose Questionpro (Questionpro, 2008) to host our survey,

    given that their features matched our requirements closely. Questionpro can also

    silently detect and flag duplicate surveys by storing a cookie in the users machine.

    The system also provides an incremental counter as a survey response ID,

    meanwhile the Amazon Mechanical Turk web interface for requesters displays

  • 8/14/2019 Ferraro Dissertation

    29/153

    18

    submitted results in chronological order the combination of these two allowed us to

    manually approve dozens of survey responses by simply studying the numerical

    progression as reported by respondents and noting any major discrepancies. We

    should mention that in over a thousand completed surveys there was only one

    instance of a user entering an invalid tracking number. While it may be desirable for

    future researchers to perform similar tracking, our experience indicates there is little

    cause for concern.

    The process by which we executed our research is summarized on Figure 4. When

    submitting a survey into the Amazon Mechanical Turk, the desired number of

    responses can be specified Thus submission to the workforce would only take

    place once, and not once per respondent.

  • 8/14/2019 Ferraro Dissertation

    30/153

    19

    Figure 4 - Survey execution process

    Survey Instrument Creation Process

    While we cannot increase the validity of the instrument used to capture our

    secondary data, we can increase the validity of our comparison and approximate the

    validity of our secondary data source by modeling our own survey instrument as

    closely as possible to the original instrument. Leveraging a known instrument

    increases the face validity of our comparisons. Our initial intent was to have a nearly

    Read Instructions

    Acce t Task

    Activate H erlink

    Com lete Surve

    Receive Code

    Enter Code

    Verif Code

    ConfirmPayout

    Receive Pa ment

    RejectCode

    End

    Start: Enter Survey

    into Amazon

    Turker at AmazonMechanical Turk

    Turker atQuestionpro

    Researcher at AmazonMechanical Turk

  • 8/14/2019 Ferraro Dissertation

    31/153

    20

    identical instrument to that used in the US Census of 2000, however there were a

    number of significant differences between the survey we wanted to conduct and the

    US census. Our survey aimed at obtaining results that could be compared to the

    results of the census, thus we needed to look at the results along with the instrument

    if we were going to create an instrument that while having a different scope would

    deliver comparable results and remain as close as possible to the original. The

    interplay of these six elements is illustrated in Figure 5.

    Figure 5 - Interplay of elements shaping the creation of our survey instrument

    The US Census actually uses 18 different data collection forms (US Census Bureau,

    2008b). These 18 forms can be divided into two major groups long form and

    short form questionnaires. The short form is administered to 100% of the population

    while the long form uses sampling to obtain representative data. The next major

    division of forms is between standard forms and individual forms individual forms

    gather information only form the respondent, standard forms gather information from

    up to six individuals, including their relationship to the respondent filling out the

    questionnaire. If there are more than six individuals for a location, the census may

    use phone interviews to retrieve information from them. The forms also break down

    US Census Form

    US Census Results

    Research Scope

    Comparison

    ?

    New Data Set

    New Survey Instrument

  • 8/14/2019 Ferraro Dissertation

    32/153

    21

    into particular geographical areas with minor modifications tailored to specific

    requirements, including persons in military service and non-continental US residents.

    While there are some intriguing differences in the forms for specific territories, the

    questions were not applicable to our survey, thus only of anecdotal value.

    In order to arrive at the basis of our survey instrument we analyzed our needs on

    several levels. The major influence on our instrument design was that we needed

    more than basic demographics, so we would use the long form as a basis.

    Requesting demographic information about all the members of the household a

    Turker lives in had to be considered carefully. We wanted to understand the Turker

    and his or her environment, thus asking about the employment status, race, age and

    other information about other members of the household was deemed relevant only if

    it had a direct impact on the demographic of the Turker and this impact was actually

    measured by the US Census - thus making the information relevant to our

    comparative analysis. However, the questions used to capture this information would

    have to be different from the long form itself, as we would only need highly relevant

    data points and not the entire data set.

    Stage One Digital Replica

    At this stage, we created a web-based digital replica of the US Census long form,

    thus assuring we were starting our instrument design with the greatest possible

    fidelity to the original US Census instrument. On a second forward pass at the

    creation of the questionnaire, we analyzed the questions that would need to be

    added or reworded and the skip pattern alterations that would be required in order to

  • 8/14/2019 Ferraro Dissertation

    33/153

    22

    reconcile the international nature of the target population with the US origin of the

    questionnaire. This also necessitated the addition of external information into our

    questionnaire in the form of a list of countries, for which we used the ISO 3166 list of

    246 official elements (ISO, 2008).

    The second major influence into our instrument design was a careful evaluation of

    the summarized results from the US Census, which we would later use for

    comparisons. The US Census Bureau provides a wealth of data as well as statistics

    on the data collected and its methods. The statistical data we chose for our

    comparison were the main metrics from the US Summary: 2000 report (US Census

    Bureau, 2002) document and its tables DP-1 Profile of General Demographic

    Characteristics: 2000, DP-2. Profile of Selected Social Characteristics: 2000, DP-

    3. Profile of Selected Economic Characteristics: 2000, and DP-4. Profile of Selected

    Housing Characteristics: 2000.

    Stage Two Mapping and Scaling to Desired Output

    As a second stage in our instrument design we used the above-mentioned US

    Summary: 2000 report (US Census Bureau, 2002) to conduct a bottom-up pass at

    the questionnaire revision in an iterative fashion to ensure we would be arriving at a

    comparable data set. First, we mapped each metric from the US Census to the

    survey elements designed to gather this information and further explored whether the

    scales in our survey would yield a data set useful during comparative analysis.

    Proceeding in this manner we were able to identify the components of our survey that

    would ultimately be used, and obtain a listing of data that the US Census profiles

  • 8/14/2019 Ferraro Dissertation

    34/153

    23

    provide that had no correspondence in our instrument. The most notable scale we

    had to adjust was the income scale while the US Census carries force of law and

    was incorporated into the US constitution in 1787, our survey is not law, and survey

    respondents are typically not open to revealing their exact income (Eisenhauer,

    2001), thus we altered the open-ended question of exact income into a scaled

    response with income ranges defined by the groupings used in the resulting report

    from the US Census. In this same area, the US Census long form asks about the

    specific amount of income from different sources such as Social Security, Retirement

    and other forms of assistance. However, the income amounts are not revealed in the

    Census results, but rather the number of people using different types of assistance.

    Thus, our survey does not ask for specific amounts of diverse income, but asks

    whether the person was using these sources of income.

    Stage Three Bottom-up Analysis

    At this stage we eliminated from the process those metrics where the US Census

    was using information we could not reasonably gather, such as number of

    unoccupied houses, number of housing units in a structure, information not collected

    on the US mainland individual long form, such as availability of indoor plumbing,

    kitchens, house heating and telephone service, and metrics which were deemed

    exceedingly burdensome for respondents, such as calculating the annual

    expenditures on water in the household. We also ascertained the difficulty of

    computing the statistics needed to carry out the comparison from the data set that

    our survey would generate, to ensure that our results would be directly comparable

    and thus avoid significant transformations. Part of the work done to ensure matching

  • 8/14/2019 Ferraro Dissertation

    35/153

    24

    scales were used assisted in simplifying this analysis, however the bulk of this

    analysis entails conducting the following key steps for every metric on the US

    Summary: 2000 report (US Census Bureau, 2002):

    Reverse-engineering the information presented into its basic required

    components

    Scanning the questionnaire to ensure the relevant information is being

    gathered in such a way the resulting data can be filtered, sorted or calculated

    o Noting missing data elements

    o Noting elements that are collected but rendered useless without the

    proper skip-pattern. Documenting the needed skip pattern.

    After understanding the gaps between the current instrument and the instrument that

    would be needed to arrive at a comparable data set, we modified the survey

    instrument to include any missing data elements using as closely a vocabulary as the

    original instrument. We also refined the skip pattern and added a number of simple

    dichotomous questions that would allow us to make the necessary connections

    between disparate pieces of information and to solve the problem described earlier of

    lacking information that the US Census infers from the relationships uncovered by

    requesting detailed demographic information from all members of a household. To

    illustrate overcoming this problem with an example, the US Census forms do not ask

    whether a person lives with children, but rather calculates this information from the

    responses given when filling out the sections on other people in the household and

    one or more are marked as children to the respondent. As noted earlier we did not

    want to survey the entire household of a Turker, but we still wanted to obtain the

  • 8/14/2019 Ferraro Dissertation

    36/153

    25

    same information when relevant in that particular case we added a direct question

    to find out if the person lives with their children.

    Stage Four Minimization

    As a final step in our survey instrument design, we again mapped the questions and

    skip pattern properties that would yield the desired data, noting and eliminating any

    redundancies or elements that no longer served a purpose.

    Table 1 summarizes the more significant challenges faced while executing the above

    process to design our survey instrument. The final survey instrument is reproduced in

    full in appendix A.

  • 8/14/2019 Ferraro Dissertation

    37/153

    26

    US Census Our research How we handled

    Took place in 2000 Takes place in 2008 eight years later

    Adjusted dates inquestions

    Paper-based Internet based Created digital replicaRequests personallyidentifiable information

    Is anonymous Amazonpolicies requireanonymity

    Eliminated personallyidentifiable questions, buttracked duplicates withnon-personallyidentifiable methods

    Assumes the respondent islocated within the territory ofthe United States ofAmerica

    Respondents aredispersed worldwide

    Adjusted questions,added questions oncountry and adjustedquestionnaire flow

    Derives meaningfulrelationship data byassociating the responsesof multiple respondentsliving in the same house,building or area.

    Is only concerned withTurkers and needs tominimize surveyelements employed

    Added targeted elementsand adjusted skip patternto obtain informationdirectly

    Asks information about USmilitary service and USveteran status

    Non-US respondentsmust be skipped to makestatistical comparisonsvalid in US militaryrelated questions

    Adjusted skip patternbased on country ofresidence

    Asks very detailed incomequestions to the dollar.

    Cannot expect a highresponse rate to detailedquestions about income

    Altered income responsescale from metric toordinal

    Requests detailedannualized expenses of thehousehold

    Need to avoidburdensome calculationsabout expenses

    Eliminated expensecategories

    Uses open-endedresponses for questionsabout employmentindustries

    Needs to compare resultsto US Census results andsimplify coding

    Altered question intoclose-ended with nominalscale based on USCensus reports

    Table 1 - Online questionnaire design challenges

  • 8/14/2019 Ferraro Dissertation

    38/153

    27

    Survey Execution

    Preparation

    Taking our survey into the Amazon Mechanical Turk system involved a few

    preliminary steps that are relevant to the research itself, as improper handling of

    these tasks can lead to poor response rates or increased response bias.

    First, we had to define the incentive amount for each completed survey. With little in

    the way of guidelines, we decided to start at the lower range of incentives and move

    upwards, observing what effect this had on response rates over time, in order to

    optimize our resources.

    We also had to design instructions for workers on how to complete the survey. To

    remain ethically grounded (Berry, 2004) our instructions had to accurately portray the

    purpose of our research, the data gathered by the survey and we deemed it essential

    to provide an estimate of time needed to complete the survey so as to not mislead

    participants and allow them to gauge the task and incentives. These instructions

    needed to guide the workers in submitting the return code for validation after the

    survey was completed. The complete text of these instructions is reproduced in

    appendix B.

    To ensure smooth interactions with the community that has formed behind the

    Amazon Mechanical Turk, we opened an account and posted our intentions to the

    online forum Turker Nation (Turker Nation, 2008). This allowed us to present our

  • 8/14/2019 Ferraro Dissertation

    39/153

    28

    research to the community in an open forum where questions or concerns could be

    addressed publicly.

    After a few hours of the survey being active, we revised our instructions to include the

    average time that workers were actually taking in completing the survey, as provided

    by our online survey management platform (Questionpro).

    Sample Size Determination

    We conducted our research using simple random sampling. Our analysis aimed to

    produce a confidence interval of +/-5%; however, the sampling size needed to

    achieve this confidence interval varies for each of the 50+ questions based on the

    type of data and analysis sought due to the skip pattern and complex constructs

    measured. The spread of research questions tackled made it so that our main

    constraints were time and resources, with confidence levels computed after the fact

    for each statistic.

    Participants and Sites

    The participants of our study were people completing tasks at the Amazon

    Mechanical Turk over a period of 60 days. They were presented with our survey and

    instructions as one more compensated task they could decide to undertake.

    Participation was strictly voluntary. Incentives were paid out within 24hs.

  • 8/14/2019 Ferraro Dissertation

    40/153

    29

    Role of the Researcher

    Our posture as researchers was neutral, objective and exploratory towards the

    research subjects. A very limited number did contact us by e-mail to provide

    feedback beyond the feedback collected as part of the survey. The feedback

    collected as part of the survey (125 instances) consisted primarily of thank-you notes

    and encouragement. We also decided to make the results of the research available

    to participants who wished to receive a copy of the research once approved; over

    233 respondents provided their e-mail addresses expressing interest in the results of

    our research.

    Data Gathering

    The sample was obtained by our survey instrument posted as an Amazon

    Mechanical Turk task. The survey ran continuously between March 20, 2008 and

    May 19, 2008, for a total of 60 days. In the 60 days that the survey was active, it

    collected 1292 complete responses. Incentives for survey completion ranged from

    $0.02 to $0.25 cents of US Dollar. Participants were not allowed to complete more

    than a single survey. The origin and purpose of the survey was explained before the

    survey start and a commitment was made to report research results only in

    aggregate form, thus maintaining confidentiality furthermore, no personally

    identifiable information was requested. Our data gathering system collected

    completed questionnaires and retained information gathered from partially completed

    ones, thus our complete data set consists of 1428 cases. Once the data had been

  • 8/14/2019 Ferraro Dissertation

    41/153

    30

    collected at our survey website, it was transferred to SPSS 16 for analysis as

    described below.

    Data Analysis

    The type of information we collected dictated the main data analysis methods

    chosen. Descriptive statistics were used to explore the data gathered by our survey.

    Comparisons against US Census data were carried out using two main statistical

    methods: Single-sample t-tests were used to compute the significance of variation in

    means when the standard deviation of the sample we were comparing against was

    not known; and example of this is the standard deviation of the ages of the US

    population. The other main statistical analysis we used was the Chi-square test for

    goodness-of-fit. We employed this test to determine the statistical significance of

    differences between our survey results and the results of the US Census. The Chi-

    square test for goodness of fit is most useful when applied to comparisons of

    categorical information with multiple categories. However, it was still useful and thus

    computed for relevant cases of dichotomous categorical information -such as sex or

    employment status- testing for fit with the percentages reported by the US Census.

    We should stress that the Chi-square tests for goodness-of-fit we conducted only

    allows us to express how free of sampling error a difference encountered is likely to

    be. It says nothing about how relevant that difference might be for any given decision.

    We should also note that non-sampling errors might influence both the differences we

    encountered and their statistical significance. These errors include non-response

    errors where, for example, respondents that do not answer a particular question are

  • 8/14/2019 Ferraro Dissertation

    42/153

    31

    significantly different from those that do, and systematic errors where the wording of

    our survey influences the likelihood of recording a particular response to a question.

    The computer tools used were SPSS version 16, for statistical analysis and graphs,

    and Microsoft Excel 2003 for graphing and table layouts.

    The statistical tests for which we could compute a p value were deemed to present

    statistical significance when the p value was below our threshold of 0.05

    Trustworthiness of the Method

    External Validity

    The main threat to external validity our study faced came from a potential self-

    selection bias. Amazon Mechanical Turk workers had to find our survey among

    hundreds of other tasks, they were given the option of accepting or rejecting our

    survey based on our description of the task as well as the incentive amount; these

    factors could have acted as pre-screening filters over which we had marginal control.

    We attempted to overcome the potential for the incentive amount to be a threat to

    external validity by altering the incentive amounts between $0.02 and $0.25 per

    survey completed we did not find a significant impact to altering the incentive

    amounts beyond the $0.12 cent mark. We indirectly attempted to overcome the filter

    of our survey being buried under thousands of other available tasks by re-posting our

    survey with every change of survey incentive. This did have a significant impact on

    the number of surveys received, as during the initial 12-24 hours after a re-posting

  • 8/14/2019 Ferraro Dissertation

    43/153

    32

    the survey would be completed significantly more often than before - regardless of

    whether the new incentive level was higher or lower. However, analysis of the data

    does not immediately reveal a significant difference between those answering the

    surveys shortly after it was re-posted and those doing so after the first 24 hours. The

    remaining potential pre-screening factor affecting our research validity was the

    content of the survey itself We do not know whether workers that read our

    instructions and decided to exclude themselves were significantly different from our

    sample. We do however know the dropout rate from our survey system. Our survey

    landing page received 1724 visits, 1423 users (83% of 1724) initiated the survey, and

    1292 (91% of 1423) users completed the survey by arriving at the final question.

    These numbers reflect 131 drop-outs (less than 10%). The average time users took

    to complete the survey was 6 minutes.

    We expected a certain level of respondent bias where a hypothetical respondent

    would have second-guessed their answers and told us what they believed we might

    be looking for. Since we stated up-front that we would be comparing the results to the

    US Census, there could have been cases were the respondent indicated they were

    answering from the US, when they were not. Our online survey platform was able to

    identify the country of origin of respondents by their internet address. This method of

    double-checking is not foolproof given the complexity of the internet and doesnt

    account for respondents traveling, however we found less than 2% of respondents

    selected a country of residence that did not match the country their internet

    communication was originating from.

    The dynamic nature of the Amazon Mechanical Turk workforce is another factor that

    affected the external validity of our study. Worker turnover rates are unknown and

  • 8/14/2019 Ferraro Dissertation

    44/153

    33

    factors such as Amazon adding or removing the ability for users of different countries

    to use the system would affect the overall worker population in manners that could

    not be accounted for by our study.

    Face, Content and Construct Validity

    In our applied context, assessing face, content and construct validity helps us

    understand to what degree the pattern of thought and ideas that we as researchers

    have, has been reliably mapped into our instrument, how it springs from our

    instrument into the people who participated in our study, and finally how it becomes a

    mental representation in the readers of this research. Any discrepancy in this three-

    way communication would undermine the validity of our research. Figure 6 illustrates

    the relationship between the entities.

    Figure 6 - Relationships critical to face, content and construct validity

    SurveyParticipants

    SurveyInstrument

    Researcher ExternalReader

    Researchresults

    CensusInstrument

  • 8/14/2019 Ferraro Dissertation

    45/153

    34

    We attempted to maximize face and content validity by minimizing the subjective

    viewpoint of the researcher role; we accomplished this by closely modeling our

    instrument after a well-known instrument believed to have strong face validity, in our

    case the US Census forms of the year 2000.

    The constructs our survey uses are relatively simple when compared to other types

    of research. As an example of a more complex construct used, the word

    institutionalized, when we asked participants to tell us if they were institutionalized

    or not, required us to expand on the wording and include the US Census definition of

    the institutionalized population. However, our survey was strictly demographic and

    thus did not include highly subjective constructs such as self-esteem or

    trustworthiness that would have necessitated a significantly different approach to

    construct validity.

    Internal Validity

    The exploratory nature of our research did not carry with it the burden of proving the

    validity of causal relationships. Our study instead explores the demographics of a

    population sample, which places focus on external validity.

    Reliability

    The reliability of our survey instrument rests primarily with a design derived from a

    known instrument with significant levels of reliability. The US Census constitutes a

    longitudinal study, whereas our research is a cross-sectional observation using a

  • 8/14/2019 Ferraro Dissertation

    46/153

    35

    similar instrument, this implies that test-retest reliability of our source instrument is

    extremely difficult to ascertain with samples being conducted only every ten years;

    this factor does bring into question the reliability of our own instrument. Due to the

    nature and construction of our research we were unable to carry out test-retest,

    equivalent forms or split-half techniques, thus reliability of our measurements were

    not directly quantified. However, the reliability of our source instrument was deemed

    appropriate for our particular task.

    Results and Analysis of Data

    Executing our 57-question survey continuously over a period of 60 days allowed us to

    gather a rich data set. We divided the analysis in logical groupings based on

    relatedness of the data analyzed as well as the categories explored by the US

    Census against which we were comparing. While there are literally countless ways in

    which the data may be analyzed and segmented, we attempted to strike a balance

    with a report that is concise even in the face of the exhaustiveness of our survey,

    thorough in the most relevant metrics, and retains the confidentiality of respondents.

    Care has been taken to avoid calculating statistics where the skip pattern and the

    incidence rate produced a sample of less than 100 cases reaching the particular

    question, as well as halt the progression of analysis where segmentation would have

    been performed on a sample with less than 100 cases. The following sections

    analyze the information from our survey and in cases where relevant and possible,

    contrasts this to the US Census demographic profile of 2000 (US Census Bureau,

    2002).

  • 8/14/2019 Ferraro Dissertation

    47/153

    36

    1. Sex

    Table 2 summarizes the findings regarding the gender of Amazon Mechanical Turk

    Participants.

    Sex

    Frequency PercentValid

    PercentCumulative

    Percent

    Female 806 56.4 59.7 59.7Male 543 38 40.3 100

    Valid

    Total1349 94.5 100

    Missing79 5.5

    Total1428 100

    Table 2 - Gender of survey respondents

    Survey respondents were 59.7% female. In contrast, US Census population is 50.9%

    Female and 49.1% Male. Figure 7 illustrates the difference graphically.

    Sex

    0 20 40 60 80 100

    Census

    Turkers

    Percent

    Male

    Female

    Figure 7 - US Census population contrasted to sampled population

    Conducting a Chi-square goodness-of-fit analysis on gender using the US Census

    percentages reveals that the difference in gender between the population sampled by

  • 8/14/2019 Ferraro Dissertation

    48/153

    37

    the US Census and that of our Amazon Mechanical Turk sample is statistically

    significantly (p

  • 8/14/2019 Ferraro Dissertation

    49/153

    38

    Descriptives for Age

    StatisticStd.Error

    Mean 33.61 0.288

    Lower Bound

    33.04

    95% Confidence Interval for Mean

    Upper Bound 34.17

    5% Trimmed Mean 32.96

    Median 31

    Variance 111.118

    Std. Deviation 10.541

    Minimum 12

    Maximum 72

    Range 60

    Interquartile Range 13

    Skewness 0.94 0.067

    Age

    Kurtosis 0.382 0.133

    Table 4 - Age distribution analysis

    Figure 8 shows the distribution of ages as percentages of total count in one-year

    increments using gender as bar stacking, with a normal curve overlay. This allows us

    to represent graphically the dispersion of ages in the sample. To examine graphically

    the relationship between male and female respondents by age, Figure 9 shows a

    population pyramid using our sample; in this figure, we can more readily see that

    male respondents tended to be younger than female respondents.

  • 8/14/2019 Ferraro Dissertation

    50/153

    39

    Figure 8 - Age distribution stacked on gender with normal curve overlay

  • 8/14/2019 Ferraro Dissertation

    51/153

    40

    Figure 9 - Population pyramid with normal curve overlay

    In order to compare the sample population to the US Census population we retrieved

    data with finer binning than that provided by the Profile of General Demographic

    Characteristics (US Census Bureau, 2002). The information in the aforementioned

    report has irregular binning and presents only 13 ranges. The information we

    retrieved from the US Census Bureau International Database (IDB, 2008) is grouped

    into 18 ranges of 5 years each plus a 90+ range; furthermore it uses regular interval

    binning, thus allowing us to compare the information as an interval variable (as

    opposed to strictly nominal). The information retrieved from the US Census Bureau

    International Database can be found in appendix C. Figure 10 shows percentage of

  • 8/14/2019 Ferraro Dissertation

    52/153

    41

    US census respondents that fall into the 19 age ranges defined by the Census report,

    and visually compares these to the percentages calculated from our sample using the

    same binning.

    Age Distribution Comparison

    0%

    5%

    10%

    15%

    20%

    25%

    30%

    0-4 5-

    9

    10-14

    15-19

    20-24

    25-29

    30-34

    35-39

    40-44

    45-49

    50-54

    55-59

    60-64

    65-69

    70-74

    75-79

    80-84

    85-89

    90+

    Age

    P

    ercent

    Percentage Census Percentage Turkers

    Figure 10 - Age distribution comparison

    The differences in the age groups are striking, and while we cannot compute a

    goodness-of-fit Chi square analysis using empty categories, we decided to conduct a

    Chi square test for goodness of fit on the range of ages where we do have

    representations in both our sample population and the US Census. These are the

    age ranges between the 10-14 and 70-74 years of age. The difference between the

    groups was found to be statistically significant (p

  • 8/14/2019 Ferraro Dissertation

    53/153

    42

    incidence rates. Also notable is that Amazons Conditions of Use document (Amazon,

    2008) requires participants to certify they are over the age of 18.

    3. Country and State

    Participants in the Amazon Mechanical Turk come to the system from multiple

    countries, thus we asked respondents the name of the country where they spent

    most of their time. Table 5 summarizes the top ten countries by number of

    respondents. The United States dominates this ranking with 78.2% of responses, the

    second closest being India with 7.9% of respondents. This does imply that market

    researchers leveraging the system have access to a mostly US-based population.

    Should researchers wish to focus on non-US markets, the dismal incidence rate of

    respondents from any other country (except maybe for India) render this platform

    unviable.

    Country

    Frequency PercentValid

    PercentCumulative

    Percent

    UNITED STATES 1055 73.9 78.2 78.2INDIA 107 7.5 7.9 86.1CANADA 28 2 2.1 88.2UNITED KINGDOM 24 1.7 1.8 90PHILIPPINES 19 1.3 1.4 91.4ITALY 11 0.8 0.8 92.2GERMANY 8 0.6 0.6 92.8ARGENTINA 6 0.4 0.4 93.3

    AUSTRALIA 6 0.4 0.4 93.7POLAND 5 0.4 0.4 94.1Other 80 5.5 6 100Total

    1349 94.5 100Missing

    79 5.5Total

    1428 100

    Table 5 - Top countries of survey respondents

  • 8/14/2019 Ferraro Dissertation

    54/153

    43

    Survey respondents were asked to provide their state of residence only if they

    indicated they were living in the US in a prior question. Table 6 summarizes the top

    ten states selected ranked by frequency.

    Frequency Percent ValidPercent

    CumulativePercent

    California 84 5.88 8.55 8.55

    Pennsylvania 62 4.34 6.31 14.87

    Texas 60 4.20 6.11 20.98

    Florida 57 3.99 5.80 26.78

    New York 47 3.29 4.79 31.57

    Massachusetts 39 2.73 3.97 35.54

    Virginia 39 2.73 3.97 39.51

    Illinois 38 2.66 3.87 43.38

    Ohio 38 2.66 3.87 47.25

    New Jersey 36 2.52 3.67 50.92Other states 482 33.75 49.08 100.00

    Total 982 68.77 100

    Missing 446 31.23

    Total 1428 100

    Table 6 - Top states of survey respondents

    4. Race

    Our survey requested self-reporting of race for respondents in the same categories

    as the US Census of 2000. We also included in our survey a section about the Latino

    population. Table 7 presents the distribution of races from our survey sample.

  • 8/14/2019 Ferraro Dissertation

    55/153

    44

    Race Frequencies

    Responses

    N Percent

    Percentof Cases

    American Indian or Alaska Native 29 2.10% 2.20%

    Asian Indian 118 8.60% 9.10%Black, African Am., or Negro 52 3.80% 4.00%Chinese 38 2.80% 2.90%Filipino 31 2.30% 2.40%Guamanian or Chamorro 2 0.10% 0.20%Japanese 11 0.80% 0.80%Korean 10 0.70% 0.80%Native Hawaiian 6 0.40% 0.50%Other Pacific Islander 9 0.70% 0.70%Samoan 3 0.20% 0.20%Vietnamese 4 0.30% 0.30%

    Race

    White 1058 77.20% 81.60%Total 1371 100.00% 105.80%

    Table 7 - Race percentages for Amazon Mechanical Turk respondents

    The Amazon Mechanical Turk sample, as well as US Census results, were

    dominated by the White category (US Census Bureau, 2002). The greatest

    difference between these two lies in the African-American and Asian Indian

    categories, which are almost reversed. This reversal is likely due to the country of

    origin of respondents as show on Table 5 being 78.2% from the US and 7.9% from

    India. Figure 11 shows the comparison of races between the survey sample and US

    Census. The difference using Chi square for goodness of fit was found to be

    statistically significant (p

  • 8/14/2019 Ferraro Dissertation

    56/153

    45

    Race Percentages

    0.00% 20.00% 40.00% 60.00% 80.00% 100.00%Whit

    eAsian

    India

    n

    Black

    ,Afric

    anAm.

    ,orN

    egroCh

    ineseF

    ilipino

    Ameri

    canIndian

    orAlas

    kaNativeJa

    paneseK

    orean

    Othe

    rPacific

    Islan

    der

    Nativ

    eHaw

    aiian

    Vietna

    meseSa

    moan

    Guam

    anian

    orCha

    morro

    Percent 'Turkers' Percent US Census

    Figure 11 - Race comparison

    The US Census found 12.5% of the population as Hispanic/Latino of various origins,

    our survey in contrast only found 5.4% of Latinos (all origins combined). Table 8