highly organised, disruptive big data science in ciat
TRANSCRIPT
Highly Organised, Disruptive Big Data
Science in CIAT
In a nutshell…
• Laying a foundation: highly organized data and data culture• Three exceptionally powerful examples• Key messages looking forwards – for discussion
• Data-omics (agr, phen, gen)
• Spatial
• Socio-economics
Agricultura Específica por Sitio (AEPS) Big Data para la agronomía
What we propose?
+ + =
Climate Soil Crop management (productivity/ha) (including varieties)
% ? + % ? + %? = To Explain (100 %)
A complementary bottom-up approach: Information from commercial fields - Taking advantage of modern information technologies
Empirical modelling approaches aimed to identify the combination of factors that lead to either high or low productivities (mostly based on machine learning techniques ) –Data-driven agronomy to optimize productivity in agricultural systems!!!
Crop response
Tremendous Analytical Challenges
Machine Learning
Artificial neural nets
Random Forest
Multiple linear regression
Kohonen self organizing maps
Conditional Forest
Factorial analysis
Generalised linear models
Mixed models
FEDEARROZ 733 - 34 % of productivity variation explained
Multivariate analysis for Saldaña (research station- Andean zone ): cropping events (2007 to 2012) – Irrigated rice – Technique: C-Forest
Cimarron Barinas - 56 % of productivity variation explained
Varieties perform
differently under identical
climatic conditions
Our findings:
Cimarron Barinas (N=78)
Fedearroz 733 (N=267)
Years
T h
a
201419XX
Imported technologyImported technology
Regional adapted agronomyRegional adapted agronomy
Data-driven AgronomyData-driven Agronomy
Broadly adapted technologyBroadly adapted technology
Some of the reasons why this is so exciting!!!
Open call for“Agronomicians”Open call for“Agronomicians”
Reconocimiento mundial por el trabajo conjuntoMADR-CIAT-FEDEARROZ
Genomics – CASSAVA BIG DATA
1
2
3
4
8
6
7
5
> 18,000RAD-seq
93Fluidigm SNPY CHIP
93 SNPY alleles -RAD database
1
1
1
4
4
4
8
8
8
3
3
3
6
6
5
5
5
2
2
2
7
7
Mutation in restriction sites and current analyticalapproaches cause a Reduction in estimates ofobserved heterozygocity.
RAD at currently may only be suitable to establishshallow relationships in population genetics studies
355 LAC Landraces analyzed
10 to 15 TB genomic data collected for 1,500 Land races, wild and improved materials
Near-real time pan-tropical monitoring system for naturalvegetation conversion detection
Methods to detect deforestation only worked for dense humid forests.
Forest monitoring
In 2006, only one country located in the tropics monitored deforestation: Brazil
There was no consistent estimation of deforestation trends in the world (figures based on statistics provided by the governments)
Vegetation identification and monitoring
• 2 satellites (MODIS Aqua and Terra) • Take a picture of the globe daily• With a 250m spatial resolution
(6.25ha) • We use 16 days composite images
to reduce the effect of clouds• 390 billion individual values were
analyzed
Detections
Jan 2004
Oct 2012
Context – Method – Results – Impact – Conclusions
We generate a new map of
deforestation every 16 days
with a resolution of 250m for all Latin America
Impact
• Data used for a publication in Science• Data used by independent media and
platforms such as Global Forest Watch• www.terra-i.org
• +1900 users• +250 organizations
• Terra-i Peru is now the official alerts system used by the Peruvian government.
Big Data: A behavior change
• YES big data requires large amounts of data and therefore big servers, BUT it is much more than that:
• REUSING the data: Extracting embedded knowledge from existing datasets to answer questions that don’t have to do with the initial purpose for which the data was captured.
• COMBINING datasets that were originally not supposed to meet, enable to relate more variables and uncover useful correlations.
• ANALYZING with CREATIVITY: the data scientist needs to be innovative in the uses he is giving the data. Who would have guessed that Google requests could help fighting flu?
Big Data in Ag: Greater reach
• Open ag. science to NON EXPERIMENTAL DATA: low quality can be compensated by quantity. Results are always tied to an uncertainty level = welcome to fuzzy logic world (more complexity for more exactitude).
• TO OBSERVE, not to EXPLAIN: Big data is about identifying patterns, correlations that tells you that when you do A, B will occur. Even if you don’t know the reason why, this is of great help to :
• Make tactical decisions on a farm• Characterize the impact of specific climate pattern on crops• Prioritize funds allocation in research (specific zones, genotypes)• Design plant breeding strategies• Near real-time monitoring of crops (or other products, like in the industry)
• It’s not only about PREDICTING, we can use Big Data to UNDERSTAND a phenomenon. Pulling pieces together: soil data, weather records, management practices information, price series, remote sensing products, UAV data, genomics….
Democratizing Big Data…..
About CGIAR mission: propose ANOTHER BUSINESS MODEL for the use of these techniques.
Google, Monsanto, John Deere all entered the business of big data in Ag, but with the same business model: subscribed service for commercial farmers. Smallholders also have much to benefit from BD, but can’t always pay for the service.
How do we close equity gaps instead of widening them?
Looking forwards: Data continues to support CIAT science
• Partnerships upstream: Analytics, data science, infrastructure• Looking left and right: data commodity players, ICT development • Partnerships downstream: Farmers and their organisations, local
organisations (private and public) etc.