ict for safe digital cities – inclusive e-services – bologna 29.06.07 improving communication in...
TRANSCRIPT
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
Improving communication in e-democracy using NLP and semantic tools
Michele Carenini
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
Summary
• Where does Natural Language Belong?• Natural Language (Processing) in very few words• Why Putting Semantics into the Web and…• … how to do it• EDEN: the Gap between Us and Them• Good and Bad Lessons• What now?
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
Where Does Natural Language Belong?
vs.
Artificial LanguageNatural Language
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
NL is the theoretical set of well-formed phrases/sentences of human languages
NL In NLP
• NLP deals with the possibility of making computers process NL;
• By definition, computers can process only computable objects;
• There is at least two main features of NL that are (or theoretically can be) computable: morphology and syntax;
• Well-formedness is a pre-requisite on which (morphology and) syntax may be computed.
Evo
lutio
n in
NLP
Complexity
morphology
syntax
semantics
pragmatics
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
Well Formedness
((α → β) → (¬β → ¬ α))
vs.
*((α → β) → (ββ))α))
John eats the cake
vs.
*John are eaten one cakes
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
Putting Some Semantics Into The Web
The Web: a system of interlinked, hypertext documents accessed via the Internet. With a Web browser, a user views Web pages that may contain text, images, and other multimedia and navigates between them using hyperlinks.
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
Putting Some Semantics Into The Web
WHY:
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
AND/OR
Inte
llige
nt s
earc
h
Putting Some Semantics Into The Web
WHY:
42,600,000?!?
CARTRUCK
MOVING
DRIVING
HOW:
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
Putting Some Semantics Into The Web
Web 2.0
• The transition of web sites from isolated information silos to sources of content and functionality• A social phenomenon embracing an approach to generating and distributing Web content itself, characterized by open communication, decentralization of authority, freedom to share and re-use• Enhanced organization and categorization of content, emphasizing deep linking
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
Putting Some Semantics Into The Web
Semantic Web
Some elements of the semantic web are expressed in formal specifications, including:• Resource Description Framework (RDF)• Data interchange formats (e.g. RDF/XML, N3, Turtle, N-Triples)• Notations such as RDF Schema (RDFS) • The Web Ontology Language (OWL) all of which are intended to formally describe concepts, terms, and relationships within a given knowledge domain.
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
Putting Some Semantics Into The Web
Web 3.0 • Ubiquitous Connectivity, broadband adoption, mobile Internet access and mobile devices • Network computing, software-as-a-service business models, Web services interoperability, distributed computing, grid computing and cloud computing • Open technologies, Open APIs and protocols, open data formats, open-source software platforms and open data (e.g. Creative Commons, Open Data License) • Open identity, OpenID, open reputation, roaming portable identity and personal data • The intelligent web, Semantic web technologies such as RDF, OWL, SWRL, SPARQL, Semantic application platforms, and statement-based datastores • Distributed databases, the "World Wide Database"• Intelligent applications, natural language processing, machine learning, machine reasoning, autonomous agents
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
Putting Some Semantics Into The Web
Web 3.0 • Ubiquitous Connectivity, broadband adoption, mobile Internet access and mobile devices • Network computing, software-as-a-service business models, Web services interoperability, distributed computing, grid computing and cloud computing • Open technologies, Open APIs and protocols, open data formats, open-source software platforms and open data (e.g. Creative Commons, Open Data License) • Open identity, OpenID, open reputation, roaming portable identity and personal data • The intelligent web, Semantic web technologies such as RDF, OWL, SWRL, SPARQL, Semantic application platforms, and statement-based datastores • Distributed databases, the "World Wide Database"• Intelligent applications, natural language processing, machine learning, machine reasoning, autonomous agents
UNDER CONSTRUCTION
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
EDEN: Where It All Began (at least some of it)
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
EDEN: The Gap Between Us And Them
Us: the technicians Them: the PA’s
End-Users: the Citizens
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
General Objective Of The NLP Tools (in the eDemocracy framework)
Interacting to (CHI) or through (CMI) an artificial system...
... in order to get information that makes the participation to decision-making process more effective.
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
Main problem: the mutual understanding of different fields of interest and expertise.
One Overall Problem
Users: difficult to deal with the very notion of Natural Language. Lost on Bad-Language World
Technicians: difficult to deal
with a less than pefect NL
definition. Lost on NLP Planet
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
Good Lessons…
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
… And Bad Ones
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
1. Linguistic Resource Re-use: the main purpose of the grammar(s) developed within EDEN is information extraction, not (full) linguistic analysis. Then major effort was devoted to cover most “information-bearing” constituents, as (complex) Noun Phrases and main Verb-Noun and Verb-Adjective relations. -> Easy replication to different (Western) languages:
– the four linguistic analysers made available to the project (Dutch, English, German and Italian) have been deployed with the same development tool (Yap4NL);
– consequently, they all share the same approach to linguistic analysis (rule based, full-path parsing with post-parsing procedure, which simulates a shallow parser);
– finally, no major change, or significant integration workouts were necessary, for the localisations of modules, from the point of view of software design.
Good Lessons (1)
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
2. Fast Prototyping: development of the Dutch Grammar was carried out completely from scratch in less than one person/year. Fast prototyping was mainly allowed by:
– the availability of an advanced dedicated tool for grammar development;– the simplicity in the approach to linguistic processing.
Interesting outcome: ouput format in terms of flat (no structure, no hierarchy, no explicit internal link) lists of “triples”.
Good Lessons (2)
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
3. Grammar Re-usability: in the grammar format used in EDEN, each grammar rule has a syntagmatic part, which corresponds to the reduction rule, and a set of “actions” which independently build the feature structure of each syntactic phrasal constituent. This took to two interesting aspects:
– the same linguistic analyser has been embedded in several different modules; and
– an interesting experiment of grammar re-use (from Dutch to German) has been carried out, with encouraging results.
Good Lessons (3)
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
1. NLP Exploitation in a Specific Domain: First problem concerned the very notion of Natural Language:
– Traditional NLP definition: “Natural Language is the theoretical set of all well-formed sentences used by humans to communicate”. For instance, John eats the cake is a sentence belonging to NL, while *John are eaten one cakes is not.
• -> Reason: there must be a “minimum threshold” that must be respected in order to have an artificial system properly behaving (i.e., assigning a structure).
– First EDEN definition by users: “Natural Language is whatever string expressed by citizens, possibly including mis-spellings, non-existing words, bad syntactic structures”. Therefore, any juxtaposition of strings, once it has been typed in by a citizen, belongs to NL.
• -> Reason: in communication (and especially in e-mail communication) a lot of mistakes occur; the system must be able to deal also with them.
Bad Lessons (1)
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
2. Final User’s Expectations: EDEN modules are (of course) aimed at manipulating symbols in order to make some information accessible. Instead, citizens sometimes expected the system to “understand” what they typed in.
– They expected the system to be able to understand trans-phrasal phenomena (as personal pronouns solution – “I need a garage for my car; where can I find one?”);
– they even expect the system to manage possible pragmatic phenomena (like plan inference, over-answering, etc. – “What time is the train leaving to Rome?”).
Bad Lessons (2)
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
• Dealing with DNLP (“Dirty NLP”):
What We Learned (1)
must be well accepted by
the system
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
• Hiding technology:
What We Learned (2)
must become
“Bringing technology to the people”
“Bringing technology to the people without
letting them know”
ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07
What Now?
The Future • Standard data interchange formats• Interoperability• Grid Computing• Distributed systems• Standard Notation Schemes• Standard Ontologies Accessible from Different Perspective• Adaptive Filtering• Advanced Multimodal Interfaces• Remotely Accessible Applications• Privacy and Security Standards and Tools
• Real AI (Knowledge Representation, Decision Support Systems, Machine Learning, Autonomous Agents, NLP)