Who am I?
● Postdoc Energy & Industry, TBM, TU Delft
● Focus on Industrial Ecology, Open Data, Collaborative Software, Modeling, Visualization, Analytics, etc.
Motivations
● Energy and sustainability are some of the most important topics of the 21st century
● Need both aggregated and fine-grained data
● Research can be data intensive● There's a lot out there, but
connecting it is tedious● Researchers often duplicate effort● It would be great to revolutionize
how we deal with this data● The energy sector is only slowly
embracing the ICT & Open Data revolutions
“Information wants to be free”
Information wants to be free because it has become so cheap
to distribute, copy, and recombine - too cheap to meter.
Stewart Brand
There's a Tension...
It wants to be expensive because it can be
immeasurably valuable to the recipient.
Stewart Brand
There's a Tension...
That tension will not go away. It leads to endless wrenching debate
about price, copyright, “intellectual property,” and the moral rightness of casual distribution,
because each round of new devices makes the tension worse, not better.
Stewart Brand
There's a Tension...
If you cling blindly to the expensive part
of the paradox, you miss all the action
going on in the free part.
The pressure of the paradox forces information
to explore incessantly.Stewart Brand
There's a Tension...
Enipedia.tudelft.nl
11
12
13
14
15
A tale of one (or four?) power stations and seven data sets
17
How the European Commission manages data
Large Combustion Plants Directivehttp://ec.europa.eu/environment/air/pollutants/stationary/lcp/legislation.htm
Coupling of Power Production to Water Consumption
"Water becoming a serious constraint for power generation" Aditi Nigam, The Hindu, July 18, 2012
Transparency?
Copyright
Unless specifically prohibited by a notice published on any page, you may make a print copy of such parts of the Web-site as you may reasonably require for your use provided that any copy has attached to it any relevant proprietary no-tices and/or disclaimers. You agree not for yourself or through or by way of assistance to any third party to distribute, decompile, reverse engineer, disassemble or otherwise deal in or with the Website or materials therein or otherwise commercially exploit such material or content otherwise than as permitted by law.
[...]
Costs of Access
You shall be responsible for obtaining access to the internet in order to make use of the Website and shall pay any ser-vice fees, telephone charges or other costs associated with such access.
Data for further analysing purposes are to be downloaded using the means available at the website. The use of crawlers, robots or similar tools will be seen as offensive and will lead to a temporarily or permanent disclosure of a user/company from the website.
[...]
The downloads shall be based on fair use, unproportional downloads (7.5 times more than the average user of the same category) may lead to a withdrawal of the user rights without prior notice.
http://www.gas-roads.eu/gte_tp/html/termsandconditions
http://www.entsoe.net/res/disclaimer.pdf
26
Officially Curated vs. Crowdsourced data
● Crowdsourcing generally OK for easily verifiable data● Officially curated data needed for comprehensive, hard
to verify data, small specialized communities● Crowdsourced data is only possible because of revision
control.● Crowdsourced data needs an incentive
● General interests, hobbies, gamification
27
28
29
Data Quality as a Product Data Quality as a Process
How to Measure Data Quality?
DataQuality
ResearcherSkill/Experience
# Viewers/Editors
Ease of IndependentVerification
= X X
Low Editor Diversity
High Editor Diversity
31
How to Measure Data Quality?
● Eric Raymond – “With many eyes all bugs are shallow”● But... not all eyes are evenly distributed
Distributed Air Quality Sensors
http://airqualityegg.wikispaces.com/
Distributed Air Quality Sensors
http://airqualityegg.wikispaces.com/
Two Long Tails
Contributors
Data
Two Long Tails
Contributors
Data Diminishing Marginal Returns
Diversity of User Knowledge
Club of 27?
Loosely Coupled
Open Data?
LinkedOpen Data?
41
42
enipedia.tudelft.nl/mapsenipedia.tudelft.nl/maps
http://skytruth.org/viirs/
48
Big Data?
49
Big Data
http://uncyclopedia.wikia.com/wiki/Rocket_Propelled_Chainsaw
50
Back to Basics
●API●REST●GET●POST●CSV●XML●JSON●CC BY-SA
Conclusions
● Data is embedded in a socio-technical system● Co-evolution of Data, Platforms, and Communities● Official data needs crowdsourced data & vice versa● Data Quality as a product vs. a process