think big about data: archaeology and the big data challenge

16
Think big about data. Archaeology and the Big Data challenge. Gabriele Gattiglia, University of Pisa – MAPPA Lab

Upload: ariadnenetwork

Post on 26-Jan-2015

110 views

Category:

Data & Analytics


2 download

DESCRIPTION

Presentation by Gabriele Gattiglia, University of Pisa – MAPPA Lab EAA 2014 session: Open Access and Open Data in Archaeology Istanbul, Turkey 13 September 2013

TRANSCRIPT

Page 1: Think Big about Data: Archaeology and the Big Data Challenge

Think big about data. Archaeology and the Big Data challenge. Gabriele Gattiglia, University of Pisa – MAPPA Lab

Page 2: Think Big about Data: Archaeology and the Big Data Challenge

Think big about data: Archaeology and the Big Data challenge

is a research Lab of the University of Pisa, in which archaeologists, mathematicians and geologists deal with: • mathematical models for archaeology (www.mappaproject.org) • open data (www.mappaproject.org/mod) and now they are starting to explore risks and potentiality of a Big Data approach to archaeology.

MAPPA Lab

Page 3: Think Big about Data: Archaeology and the Big Data Challenge

Think big about data: Archaeology and the Big Data challenge

Index In this presentation we will illustrate: 1. What it is generally intended with Big Data 2. Which kind of risks are connected with a Big

Data approach 3. If a Big Data approach can be applied to

archaeology 4. The BAD project

Page 4: Think Big about Data: Archaeology and the Big Data Challenge

Think big about data: Archaeology and the Big Data challenge

• Data are what economists call a “non-rivalrous” good: data can be processed again and again and their value does not diminish

• The value of data results from what they reveal in the aggregate, i.e we can do innovative things by commingling data in new ways

• If the data are open their value increase: unfortunately, sometimes archaeological data are kept close in what we could call “data tombs”.

We are not simply talking about Big Data, but we are dreaming about Big Open Data

Introduction

Page 5: Think Big about Data: Archaeology and the Big Data Challenge

Think big about data: Archaeology and the Big Data challenge

Usually defined as high volume, high velocity, and/or high variety data, indeed: Big Data permit to learn things that we could not

comprehend using smaller amounts of data, thanks to software, hardware and algorithms empowerment;

Big Data are about prediction, i.e. about applying math to

huge quantities of data in order to infer probabilities; Big data are about seeing and understanding the relations

within and among pieces of information Big Data mean having the full (or close to the full) dataset,

this provides a lot more freedom to explore, to look at the data from different angles or to look closer at certain aspect of it.

Big Data paradigm

Page 6: Think Big about Data: Archaeology and the Big Data Challenge

Think big about data: Archaeology and the Big Data challenge

N=all The concept of sampling no more make much sense as we can harness a huge amount of data.

Using all the available data makes it possibile to spot connection that are otherwise cloacked in the vastness of information.

Big is not intended in an absolute term, but in a relative way: relative to the comprehensive dataset.

This allows to explore more details and to reach many levels of granularity.

Page 7: Think Big about Data: Archaeology and the Big Data Challenge

Think big about data: Archaeology and the Big Data challenge

Messiness The obsession with exactness is an artifact of the information-deprived analog era, so we must accept messiness. Messiness is created by: adding more data; combining different sources; the inconsistency of formatting; the extraction and the transformation of data

Page 8: Think Big about Data: Archaeology and the Big Data Challenge

Think big about data: Archaeology and the Big Data challenge

Datafication Datafication is not digitisation, datafication refers to transform a phenomenon in a quantified format so it can be tabulated and analysed. This allow us to use the information in new ways such as the predictive analysis.

Datafication permit more sophisticated analyses to identify non-linear relationships among data

The rise of the algorithmists

Page 9: Think Big about Data: Archaeology and the Big Data Challenge

Think big about data: Archaeology and the Big Data challenge

Data-driven vs hypothesis-driven In place of the hypothesis-driven approach, we can use a data-driven one.

From causation to correlation In a big data world we won’t have to be fixated on casuality: instead we can discover patterns and correlations in the data that offer us novel and invaluable insights. The correlations may not tell us precisely why something is happening, but they alert us that is happening.

Page 10: Think Big about Data: Archaeology and the Big Data Challenge

Think big about data: Archaeology and the Big Data challenge

We must be aware of the power but also of limitations of Big Data: it’s a ‘cool’ topic; considering data = truth; Big Data will spell the end of theory; everything is permitted.

Risks

Big Data will always need to be contextualized, so we must decide if a Big Data approach is useful or not for our purpose

Big Data, no matter how comprehensive or well analyzed, need to be complemented by big judgment, so Big Data do not mean the end of archaeologists

Respecting intellectual property and ethical issues.

Page 11: Think Big about Data: Archaeology and the Big Data Challenge

Think big about data: Archaeology and the Big Data challenge

Are archaeologists ready for a Big Data approach?

Page 12: Think Big about Data: Archaeology and the Big Data Challenge

Think big about data: Archaeology and the Big Data challenge

The BAD (Big Archaeological Data) project

A Big Archaeological Data approach requires a new theoretical approach that means mainly a counterintuitive approach to archaeology.

New archaeological approach

Page 13: Think Big about Data: Archaeology and the Big Data Challenge

Think big about data: Archaeology and the Big Data challenge

Can archaeology theoretically fit a Big Data approach?

The more the data from different disciplines are available, the better we can describe the general pattern of a phenomenon;

Normally archaeology deals with the complexity of large datasets, fragmentary data, data from a variety of sources and disciplines, rarely in the same format or scale. We can say that archaeological data are perfect for a Big Data approach because they are messy and difficult to structure;

Archaeology in many case is easily datafiable as in the case of tabular data of pottery quantifications, or in the case of geolocation.

Correlations are more useful for archaeological interpretation, because they permit to reject the deterministic dualism of cause and effect. Big Data inform, rather than explain, they can expose the pattern for archaeological interpretation.

Page 14: Think Big about Data: Archaeology and the Big Data Challenge

Think big about data: Archaeology and the Big Data challenge

Data Capture : API, web scraping Data storage: Hadoop Framework/NoSQL databases. Data analysis: Pig, Hive, Mahout, Giraph, R

Is it technologically possible to use a Big Data approach?

Page 15: Think Big about Data: Archaeology and the Big Data Challenge

Think big about data: Archaeology and the Big Data challenge

Big Data analysis

historical/archaeological analysis (Roman Mediterranean)

predictive models (archaeological potential)

perception of archaeology (sentiment analysis)

Page 16: Think Big about Data: Archaeology and the Big Data Challenge

Thank you Mappa Lab [email protected]

Gabriele Gattiglia [email protected]

@g_gattiglia

http://pisa.academia.edu/GabrieleGattiglia

More info

@MappaProject

http://www.mappaproject.org

Think big about data: Archaeology and the Big Data challenge