ict for safe digital cities – inclusive e-services – bologna 29.06.07 improving communication in...

26
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e- democracy using NLP and semantic tools Michele Carenini

Upload: margaret-thomas

Post on 28-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

Improving communication in e-democracy using NLP and semantic tools

Michele Carenini

Page 2: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

Summary

• Where does Natural Language Belong?• Natural Language (Processing) in very few words• Why Putting Semantics into the Web and…• … how to do it• EDEN: the Gap between Us and Them• Good and Bad Lessons• What now?

Page 3: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

Where Does Natural Language Belong?

vs.

Artificial LanguageNatural Language

Page 4: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

NL is the theoretical set of well-formed phrases/sentences of human languages

NL In NLP

• NLP deals with the possibility of making computers process NL;

• By definition, computers can process only computable objects;

• There is at least two main features of NL that are (or theoretically can be) computable: morphology and syntax;

• Well-formedness is a pre-requisite on which (morphology and) syntax may be computed.

Evo

lutio

n in

NLP

Complexity

morphology

syntax

semantics

pragmatics

Page 5: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

Well Formedness

((α → β) → (¬β → ¬ α))

vs.

*((α → β) → (ββ))α))

John eats the cake

vs.

*John are eaten one cakes

Page 6: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

Putting Some Semantics Into The Web

The Web: a system of interlinked, hypertext documents accessed via the Internet. With a Web browser, a user views Web pages that may contain text, images, and other multimedia and navigates between them using hyperlinks.

Page 7: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

Putting Some Semantics Into The Web

WHY:

Page 8: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

AND/OR

Inte

llige

nt s

earc

h

Putting Some Semantics Into The Web

WHY:

42,600,000?!?

CARTRUCK

MOVING

DRIVING

HOW:

Page 9: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

Putting Some Semantics Into The Web

Web 2.0

• The transition of web sites from isolated information silos to sources of content and functionality• A social phenomenon embracing an approach to generating and distributing Web content itself, characterized by open communication, decentralization of authority, freedom to share and re-use• Enhanced organization and categorization of content, emphasizing deep linking

Page 10: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

Putting Some Semantics Into The Web

Semantic Web

Some elements of the semantic web are expressed in formal specifications, including:• Resource Description Framework (RDF)• Data interchange formats (e.g. RDF/XML, N3, Turtle, N-Triples)• Notations such as RDF Schema (RDFS) • The Web Ontology Language (OWL) all of which are intended to formally describe concepts, terms, and relationships within a given knowledge domain.

Page 11: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

Putting Some Semantics Into The Web

Web 3.0 • Ubiquitous Connectivity, broadband adoption, mobile Internet access and mobile devices • Network computing, software-as-a-service business models, Web services interoperability, distributed computing, grid computing and cloud computing • Open technologies, Open APIs and protocols, open data formats, open-source software platforms and open data (e.g. Creative Commons, Open Data License) • Open identity, OpenID, open reputation, roaming portable identity and personal data • The intelligent web, Semantic web technologies such as RDF, OWL, SWRL, SPARQL, Semantic application platforms, and statement-based datastores • Distributed databases, the "World Wide Database"• Intelligent applications, natural language processing, machine learning, machine reasoning, autonomous agents

Page 12: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

Putting Some Semantics Into The Web

Web 3.0 • Ubiquitous Connectivity, broadband adoption, mobile Internet access and mobile devices • Network computing, software-as-a-service business models, Web services interoperability, distributed computing, grid computing and cloud computing • Open technologies, Open APIs and protocols, open data formats, open-source software platforms and open data (e.g. Creative Commons, Open Data License) • Open identity, OpenID, open reputation, roaming portable identity and personal data • The intelligent web, Semantic web technologies such as RDF, OWL, SWRL, SPARQL, Semantic application platforms, and statement-based datastores • Distributed databases, the "World Wide Database"• Intelligent applications, natural language processing, machine learning, machine reasoning, autonomous agents

UNDER CONSTRUCTION

Page 13: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

EDEN: Where It All Began (at least some of it)

Page 14: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

EDEN: The Gap Between Us And Them

Us: the technicians Them: the PA’s

End-Users: the Citizens

Page 15: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

General Objective Of The NLP Tools (in the eDemocracy framework)

Interacting to (CHI) or through (CMI) an artificial system...

... in order to get information that makes the participation to decision-making process more effective.

Page 16: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

Main problem: the mutual understanding of different fields of interest and expertise.

One Overall Problem

Users: difficult to deal with the very notion of Natural Language. Lost on Bad-Language World

Technicians: difficult to deal

with a less than pefect NL

definition. Lost on NLP Planet

Page 17: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

Good Lessons…

Page 18: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

… And Bad Ones

Page 19: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

1. Linguistic Resource Re-use: the main purpose of the grammar(s) developed within EDEN is information extraction, not (full) linguistic analysis. Then major effort was devoted to cover most “information-bearing” constituents, as (complex) Noun Phrases and main Verb-Noun and Verb-Adjective relations. -> Easy replication to different (Western) languages:

– the four linguistic analysers made available to the project (Dutch, English, German and Italian) have been deployed with the same development tool (Yap4NL);

– consequently, they all share the same approach to linguistic analysis (rule based, full-path parsing with post-parsing procedure, which simulates a shallow parser);

– finally, no major change, or significant integration workouts were necessary, for the localisations of modules, from the point of view of software design.

Good Lessons (1)

Page 20: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

2. Fast Prototyping: development of the Dutch Grammar was carried out completely from scratch in less than one person/year. Fast prototyping was mainly allowed by:

– the availability of an advanced dedicated tool for grammar development;– the simplicity in the approach to linguistic processing.

Interesting outcome: ouput format in terms of flat (no structure, no hierarchy, no explicit internal link) lists of “triples”.

Good Lessons (2)

Page 21: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

3. Grammar Re-usability: in the grammar format used in EDEN, each grammar rule has a syntagmatic part, which corresponds to the reduction rule, and a set of “actions” which independently build the feature structure of each syntactic phrasal constituent. This took to two interesting aspects:

– the same linguistic analyser has been embedded in several different modules; and

– an interesting experiment of grammar re-use (from Dutch to German) has been carried out, with encouraging results.

Good Lessons (3)

Page 22: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

1. NLP Exploitation in a Specific Domain: First problem concerned the very notion of Natural Language:

– Traditional NLP definition: “Natural Language is the theoretical set of all well-formed sentences used by humans to communicate”. For instance, John eats the cake is a sentence belonging to NL, while *John are eaten one cakes is not.

• -> Reason: there must be a “minimum threshold” that must be respected in order to have an artificial system properly behaving (i.e., assigning a structure).

– First EDEN definition by users: “Natural Language is whatever string expressed by citizens, possibly including mis-spellings, non-existing words, bad syntactic structures”. Therefore, any juxtaposition of strings, once it has been typed in by a citizen, belongs to NL.

• -> Reason: in communication (and especially in e-mail communication) a lot of mistakes occur; the system must be able to deal also with them.

Bad Lessons (1)

Page 23: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

2. Final User’s Expectations: EDEN modules are (of course) aimed at manipulating symbols in order to make some information accessible. Instead, citizens sometimes expected the system to “understand” what they typed in.

– They expected the system to be able to understand trans-phrasal phenomena (as personal pronouns solution – “I need a garage for my car; where can I find one?”);

– they even expect the system to manage possible pragmatic phenomena (like plan inference, over-answering, etc. – “What time is the train leaving to Rome?”).

Bad Lessons (2)

Page 24: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

• Dealing with DNLP (“Dirty NLP”):

What We Learned (1)

must be well accepted by

the system

Page 25: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

• Hiding technology:

What We Learned (2)

must become

“Bringing technology to the people”

“Bringing technology to the people without

letting them know”

Page 26: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07 Improving communication in e-democracy using NLP and semantic tools Michele Carenini

ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

What Now?

The Future • Standard data interchange formats• Interoperability• Grid Computing• Distributed systems• Standard Notation Schemes• Standard Ontologies Accessible from Different Perspective• Adaptive Filtering• Advanced Multimodal Interfaces• Remotely Accessible Applications• Privacy and Security Standards and Tools

• Real AI (Knowledge Representation, Decision Support Systems, Machine Learning, Autonomous Agents, NLP)