loted: exploiting linked data in analyzing european procurement notices
DESCRIPTION
presentation at the EKAW 2010 workshop on knowledge injection and extraction from linked data on http://loted.eu.TRANSCRIPT
LOTED: Exploiting Linked Data in Analyzing European Procurement
Notices
Francesco Valle, Mathieu d’Aquin, Tommaso Di Noia and Enrico Motta
Technical University of Bari, Electrical and Electronics Engineering Department Information Systems Research Group
[email protected], [email protected] Knowledge Media Institute, The Open University, Milton Keynes, UK
{m.daquin, e.motta}@open.ac.uk
TED: European eProcurement
A portal with daily updates about tenders in – 27 European
countries– 14 Sectors
All available in a collection of RSS feeds
TED
…UK_Trans CZ_Comp DE_Agfo SE_Educ
Every day: Updates from RSS feeds
New tender documents
RDF Extractor
LOTED Ontology
RDF representation of tenders
geo-names
DBPedia
Linker EntityReconciliation
Enriched RDF repr. of tenders
SPARQL Endpoint
http://loted.eu
<rdf:Description rdf:about="http://loted.eu/data/tender/295984-2010"> <rdf:type rdf:resource="http://loted.eu/ontology#Tender"/> <loted:OJ rdf:resource="http://loted.eu/data/officialJournal/194-2010"/> <loted:ND>295984-2010</loted:ND> <loted:hasSector rdf:resource="http://loted.eu/data/sector/tran"/> <loted:PD>2010-10-06T00:00:00</loted:PD> <loted:hasSector rdf:resource="http://loted.eu/data/sector/teeq"/> <loted:CY rdf:resource="http://loted.eu/data/country/UK"/> <loted:TW rdf:resource="http://sws.geonames.org/2653225/"/> <loted:AU rdf:resource="http://loted.eu/data/authorityName/Royal_Mail_Group_Limited"/>
<loted:PR rdf:resource="http://loted.eu/data/procedure/2_-_Restricted_procedure"/> <loted:OL rdf:resource="http://loted.eu/data/language/EN"/> <loted:TD rdf:resource="http://loted.eu/data/document/7_-_Contract_award"/> <loted:PC>34911100_-_Trolleys</loted:PC> <loted:hasSector rdf:resource="http://loted.eu/data/sector/mapr"/> <loted:AC rdf:resource="http://loted.eu/data/awardCriteria/2_-
_The_most_economic_tender"/> <loted:TY rdf:resource="http://loted.eu/data/typeOfBid/9_-_Not_applicable"/> <loted:DS>2010-10-04T00:00:00</loted:DS> <loted:NC rdf:resource="http://loted.eu/data/contract/2_-_Supply_contract"/> <loted:HD>Member_states_-_Supply_contract_-_Contract_award_-_Restricted_procedure</
loted:HD> <loted:TI>UK-Chesterfield:_trolleys</loted:TI> <loted:OC>34911100_-_Trolleys</loted:OC> <loted:RP rdf:resource="http://loted.eu/data/regulation/4_-_European_Communities"/>
</rdf:Description><rdf:Description rdf:about="http://loted.eu/data/authorityName/Royal_Mail_Group_Limited">
<loted:IA>http://www.royalmailgroup.com/portal/rmg/jump1?catId=23200531&amp;mediaId=23300561</loted:IA>
<loted:IA>www.royalmailgroup.com</loted:IA> <loted:IA>www.royalmail.com</loted:IA> <loted:IA>http://www.royalmailgroup.com</loted:IA> <loted:IA>http://www.royalmail.com</loted:IA> <rdfs:label>Royal Mail Group Limited</rdfs:label> <rdf:type rdf:resource="http://loted.eu/ontology#4_-_Utilities"/> <rdf:type rdf:resource="http://loted.eu/ontology#6_-_Body_governed_by_public_law"/> <rdf:type rdf:resource="http://loted.eu/ontology#8_-_Other"/>
</rdf:Description>
Some Details
• Website: – http://loted.eu
• SPARQL endpoint: – http://loted.eu:8081/LOTED1Rep/sparqlpage.jsp
• URI scheme: – http://loted.eu/<data|ontology>/<type>/<ID>– http://loted.eu/data/tender/295984-2010– http://loted.eu/ontology#Tender– http://loted.eu/data/authorityName/Royal_Mail_Group_Limited
– http://loted.eu/data/country/UK– http://sws.geonames.org/2653225/ (Chesterfield, UK)
• Triple store and query engine: Jena with TDB persistent storage.• Updated everyday
But…
• This is just another interface to the data• We could mostly have done the same with a
database and some geolocation • It is not so useful in terms of data analysis• We have not learn much, we have no new
knowledge• We have not really used the links
So…
• Try mine Data+Links+LOD• Discover knowledge in the connection
between the local data and LOD datasets
• A first step: visual interface for data analysis based on “dimensions” coming both from the local data and from external data
Tender profiles
Generating data overviews
Ranking criteriaDistribution of the data
Using the links…
• Tender profiles dependent on a DBPedia property for the city in which the tender is
• 2 examples• A general approach
Using the region from DBPedia
Can also do manual ranking (e.g., north to south, east to west)
Using the political party from DBPedia
Becomes crucial to assess the bias introduced by incomplete data/lack of coverage
Lessons Learned – Linked Data
• Extracting new data from the connection with external linked datasets is feasible
• And Valuable• But is hard because– The “Linked Data Infrastructure” is not ready:
entity reconciliation, linking basic sameAs reasoning…
– Still difficult to find “exploitable” data, and this is only the first step of the challenge
Lessons Learned – Extracting knowledge from linked data
• New challenges:– You don’t know what you will get– You don’t know how much you will get– You don’t know if what you get is good
• How do we match to user need?• How can we reduce the effort in finding extracting
something which might not be useful?• How can we discover what needs to be discover?
Next Steps
• More advanced knowledge discovery techniques– Detecting trends – Identifying automatically the relevant dimensions
• Using more links• Using the links more!• Investigate the specific challenges of
Knowledge Discovery from Linked Data
Thank You!
[email protected]@mdaquin