loted: exploiting linked data in analyzing european procurement notices

18
LOTED: Exploiting Linked Data in Analyzing European Procurement Notices Francesco Valle, Mathieu d’Aquin, Tommaso Di Noia and Enrico Motta Technical University of Bari, Electrical and Electronics Engineering Department Information Systems Research Group [email protected], [email protected] Knowledge Media Institute, The Open University, Milton Keynes, UK {m.daquin, e.motta}@open.ac.uk

Upload: mathieu-daquin

Post on 31-Oct-2014

9 views

Category:

Technology


0 download

DESCRIPTION

presentation at the EKAW 2010 workshop on knowledge injection and extraction from linked data on http://loted.eu.

TRANSCRIPT

Page 1: LOTED: Exploiting Linked Data in Analyzing European Procurement Notices

LOTED: Exploiting Linked Data in Analyzing European Procurement

Notices

Francesco Valle, Mathieu d’Aquin, Tommaso Di Noia and Enrico Motta

Technical University of Bari, Electrical and Electronics Engineering Department Information Systems Research Group

[email protected], [email protected] Knowledge Media Institute, The Open University, Milton Keynes, UK

{m.daquin, e.motta}@open.ac.uk

Page 2: LOTED: Exploiting Linked Data in Analyzing European Procurement Notices

TED: European eProcurement

A portal with daily updates about tenders in – 27 European

countries– 14 Sectors

All available in a collection of RSS feeds

Page 3: LOTED: Exploiting Linked Data in Analyzing European Procurement Notices
Page 4: LOTED: Exploiting Linked Data in Analyzing European Procurement Notices

TED

…UK_Trans CZ_Comp DE_Agfo SE_Educ

Every day: Updates from RSS feeds

New tender documents

RDF Extractor

LOTED Ontology

RDF representation of tenders

geo-names

DBPedia

Linker EntityReconciliation

Enriched RDF repr. of tenders

SPARQL Endpoint

Page 5: LOTED: Exploiting Linked Data in Analyzing European Procurement Notices

http://loted.eu

Page 6: LOTED: Exploiting Linked Data in Analyzing European Procurement Notices

<rdf:Description rdf:about="http://loted.eu/data/tender/295984-2010"> <rdf:type rdf:resource="http://loted.eu/ontology#Tender"/> <loted:OJ rdf:resource="http://loted.eu/data/officialJournal/194-2010"/> <loted:ND>295984-2010</loted:ND> <loted:hasSector rdf:resource="http://loted.eu/data/sector/tran"/> <loted:PD>2010-10-06T00:00:00</loted:PD> <loted:hasSector rdf:resource="http://loted.eu/data/sector/teeq"/> <loted:CY rdf:resource="http://loted.eu/data/country/UK"/> <loted:TW rdf:resource="http://sws.geonames.org/2653225/"/> <loted:AU rdf:resource="http://loted.eu/data/authorityName/Royal_Mail_Group_Limited"/>

<loted:PR rdf:resource="http://loted.eu/data/procedure/2_-_Restricted_procedure"/> <loted:OL rdf:resource="http://loted.eu/data/language/EN"/> <loted:TD rdf:resource="http://loted.eu/data/document/7_-_Contract_award"/> <loted:PC>34911100_-_Trolleys</loted:PC> <loted:hasSector rdf:resource="http://loted.eu/data/sector/mapr"/> <loted:AC rdf:resource="http://loted.eu/data/awardCriteria/2_-

_The_most_economic_tender"/> <loted:TY rdf:resource="http://loted.eu/data/typeOfBid/9_-_Not_applicable"/> <loted:DS>2010-10-04T00:00:00</loted:DS> <loted:NC rdf:resource="http://loted.eu/data/contract/2_-_Supply_contract"/> <loted:HD>Member_states_-_Supply_contract_-_Contract_award_-_Restricted_procedure</

loted:HD> <loted:TI>UK-Chesterfield:_trolleys</loted:TI> <loted:OC>34911100_-_Trolleys</loted:OC> <loted:RP rdf:resource="http://loted.eu/data/regulation/4_-_European_Communities"/>

</rdf:Description><rdf:Description rdf:about="http://loted.eu/data/authorityName/Royal_Mail_Group_Limited">

<loted:IA>http://www.royalmailgroup.com/portal/rmg/jump1?catId=23200531&amp;amp;mediaId=23300561</loted:IA>

<loted:IA>www.royalmailgroup.com</loted:IA> <loted:IA>www.royalmail.com</loted:IA> <loted:IA>http://www.royalmailgroup.com</loted:IA> <loted:IA>http://www.royalmail.com</loted:IA> <rdfs:label>Royal Mail Group Limited</rdfs:label> <rdf:type rdf:resource="http://loted.eu/ontology#4_-_Utilities"/> <rdf:type rdf:resource="http://loted.eu/ontology#6_-_Body_governed_by_public_law"/> <rdf:type rdf:resource="http://loted.eu/ontology#8_-_Other"/>

</rdf:Description>

Page 7: LOTED: Exploiting Linked Data in Analyzing European Procurement Notices

Some Details

• Website: – http://loted.eu

• SPARQL endpoint: – http://loted.eu:8081/LOTED1Rep/sparqlpage.jsp

• URI scheme: – http://loted.eu/<data|ontology>/<type>/<ID>– http://loted.eu/data/tender/295984-2010– http://loted.eu/ontology#Tender– http://loted.eu/data/authorityName/Royal_Mail_Group_Limited

– http://loted.eu/data/country/UK– http://sws.geonames.org/2653225/ (Chesterfield, UK)

• Triple store and query engine: Jena with TDB persistent storage.• Updated everyday

Page 8: LOTED: Exploiting Linked Data in Analyzing European Procurement Notices

But…

• This is just another interface to the data• We could mostly have done the same with a

database and some geolocation • It is not so useful in terms of data analysis• We have not learn much, we have no new

knowledge• We have not really used the links

Page 9: LOTED: Exploiting Linked Data in Analyzing European Procurement Notices

So…

• Try mine Data+Links+LOD• Discover knowledge in the connection

between the local data and LOD datasets

• A first step: visual interface for data analysis based on “dimensions” coming both from the local data and from external data

Page 10: LOTED: Exploiting Linked Data in Analyzing European Procurement Notices

Tender profiles

Page 11: LOTED: Exploiting Linked Data in Analyzing European Procurement Notices

Generating data overviews

Ranking criteriaDistribution of the data

Page 12: LOTED: Exploiting Linked Data in Analyzing European Procurement Notices

Using the links…

• Tender profiles dependent on a DBPedia property for the city in which the tender is

• 2 examples• A general approach

Page 13: LOTED: Exploiting Linked Data in Analyzing European Procurement Notices

Using the region from DBPedia

Can also do manual ranking (e.g., north to south, east to west)

Page 14: LOTED: Exploiting Linked Data in Analyzing European Procurement Notices

Using the political party from DBPedia

Becomes crucial to assess the bias introduced by incomplete data/lack of coverage

Page 15: LOTED: Exploiting Linked Data in Analyzing European Procurement Notices

Lessons Learned – Linked Data

• Extracting new data from the connection with external linked datasets is feasible

• And Valuable• But is hard because– The “Linked Data Infrastructure” is not ready:

entity reconciliation, linking basic sameAs reasoning…

– Still difficult to find “exploitable” data, and this is only the first step of the challenge

Page 16: LOTED: Exploiting Linked Data in Analyzing European Procurement Notices

Lessons Learned – Extracting knowledge from linked data

• New challenges:– You don’t know what you will get– You don’t know how much you will get– You don’t know if what you get is good

• How do we match to user need?• How can we reduce the effort in finding extracting

something which might not be useful?• How can we discover what needs to be discover?

Page 17: LOTED: Exploiting Linked Data in Analyzing European Procurement Notices

Next Steps

• More advanced knowledge discovery techniques– Detecting trends – Identifying automatically the relevant dimensions

• Using more links• Using the links more!• Investigate the specific challenges of

Knowledge Discovery from Linked Data

Page 18: LOTED: Exploiting Linked Data in Analyzing European Procurement Notices

Thank You!

[email protected]@mdaquin