highly organised, disruptive big data science in ciat

Highly Organised, Disruptive Big Data

Science in CIAT

In a nutshell…

• Laying a foundation: highly organized data and data culture• Three exceptionally powerful examples• Key messages looking forwards – for discussion

• Data-omics (agr, phen, gen)

• Spatial

• Socio-economics

Agricultura Específica por Sitio (AEPS) Big Data para la agronomía

What we propose?

+ + =

Climate Soil Crop management (productivity/ha) (including varieties)

% ? + % ? + %? = To Explain (100 %)

A complementary bottom-up approach: Information from commercial fields - Taking advantage of modern information technologies

Empirical modelling approaches aimed to identify the combination of factors that lead to either high or low productivities (mostly based on machine learning techniques ) –Data-driven agronomy to optimize productivity in agricultural systems!!!

Crop response

Tremendous Analytical Challenges

Machine Learning

Artificial neural nets

Random Forest

Multiple linear regression

Kohonen self organizing maps

Conditional Forest

Factorial analysis

Generalised linear models

Mixed models

FEDEARROZ 733 - 34 % of productivity variation explained

Multivariate analysis for Saldaña (research station- Andean zone ): cropping events (2007 to 2012) – Irrigated rice – Technique: C-Forest

Cimarron Barinas - 56 % of productivity variation explained

Varieties perform

differently under identical

climatic conditions

Our findings:

Cimarron Barinas (N=78)

Fedearroz 733 (N=267)

Years

T h

a

201419XX

Imported technologyImported technology

Regional adapted agronomyRegional adapted agronomy

Data-driven AgronomyData-driven Agronomy

Broadly adapted technologyBroadly adapted technology

Some of the reasons why this is so exciting!!!

Open call for“Agronomicians”Open call for“Agronomicians”

Reconocimiento mundial por el trabajo conjuntoMADR-CIAT-FEDEARROZ

Genomics – CASSAVA BIG DATA

1

2

3

4

8

6

7

5

> 18,000RAD-seq

93Fluidigm SNPY CHIP

93 SNPY alleles -RAD database

1

1

1

4

4

4

8

8

8

3

3

3

6

6

5

5

5

2

2

2

7

7

Mutation in restriction sites and current analyticalapproaches cause a Reduction in estimates ofobserved heterozygocity.

RAD at currently may only be suitable to establishshallow relationships in population genetics studies

355 LAC Landraces analyzed

10 to 15 TB genomic data collected for 1,500 Land races, wild and improved materials

Near-real time pan-tropical monitoring system for naturalvegetation conversion detection

Methods to detect deforestation only worked for dense humid forests.

Forest monitoring

In 2006, only one country located in the tropics monitored deforestation: Brazil

There was no consistent estimation of deforestation trends in the world (figures based on statistics provided by the governments)

Vegetation identification and monitoring

• 2 satellites (MODIS Aqua and Terra) • Take a picture of the globe daily• With a 250m spatial resolution

(6.25ha) • We use 16 days composite images

to reduce the effect of clouds• 390 billion individual values were

analyzed

Detections

Jan 2004

Oct 2012

Context – Method – Results – Impact – Conclusions

We generate a new map of

deforestation every 16 days

with a resolution of 250m for all Latin America

Impact

• Data used for a publication in Science• Data used by independent media and

platforms such as Global Forest Watch• www.terra-i.org

• +1900 users• +250 organizations

• Terra-i Peru is now the official alerts system used by the Peruvian government.

http://www.terra-i.org/

Big Data: A behavior change

• YES big data requires large amounts of data and therefore big servers, BUT it is much more than that:

• REUSING the data: Extracting embedded knowledge from existing datasets to answer questions that don’t have to do with the initial purpose for which the data was captured.

• COMBINING datasets that were originally not supposed to meet, enable to relate more variables and uncover useful correlations.

• ANALYZING with CREATIVITY: the data scientist needs to be innovative in the uses he is giving the data. Who would have guessed that Google requests could help fighting flu?

Big Data in Ag: Greater reach

• Open ag. science to NON EXPERIMENTAL DATA: low quality can be compensated by quantity. Results are always tied to an uncertainty level = welcome to fuzzy logic world (more complexity for more exactitude).

• TO OBSERVE, not to EXPLAIN: Big data is about identifying patterns, correlations that tells you that when you do A, B will occur. Even if you don’t know the reason why, this is of great help to :

• Make tactical decisions on a farm• Characterize the impact of specific climate pattern on crops• Prioritize funds allocation in research (specific zones, genotypes)• Design plant breeding strategies• Near real-time monitoring of crops (or other products, like in the industry)

• It’s not only about PREDICTING, we can use Big Data to UNDERSTAND a phenomenon. Pulling pieces together: soil data, weather records, management practices information, price series, remote sensing products, UAV data, genomics….

Democratizing Big Data…..

About CGIAR mission: propose ANOTHER BUSINESS MODEL for the use of these techniques.

Google, Monsanto, John Deere all entered the business of big data in Ag, but with the same business model: subscribed service for commercial farmers. Smallholders also have much to benefit from BD, but can’t always pay for the service.

How do we close equity gaps instead of widening them?

Looking forwards: Data continues to support CIAT science

• Partnerships upstream: Analytics, data science, infrastructure• Looking left and right: data commodity players, ICT development • Partnerships downstream: Farmers and their organisations, local

organisations (private and public) etc.

highly organised, disruptive big data science in ciat

Science

science data

impact data

data culture

organized data

tb genomic data

dataomics agr

disruptive big data

forest monitoring