![Page 1: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/1.jpg)
Pharos Summer School Fundamentals
of Social Applications
June 2009Avaré Stewart
http://www.l3s.uni-hannover.de/~stewart/pharos/
![Page 2: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/2.jpg)
Roadmap
• Part I: Overview Social Applications– current shortcomings, solutions
• Part II : Information Extraction (IE)– tasks, techniques, tools
• Part III: Evaluation
• Part IV: IE & IR Applications in Context
![Page 3: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/3.jpg)
Overview of Social Applications
![Page 4: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/4.jpg)
The Social Applications Phenomena
The Social Application
Phenomena today is driven
by Social Media
Social Media:• information content of
the “citizen journalist”, user generated content
• popular way, people connect in online world, personal & business relationships
20. April 2023Avaré Stewart4
![Page 5: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/5.jpg)
What ‘s the Social Media Hype?
• Coverage:– Reach small or large audiences– Breaks publication barriers
• Business / Advertisement – Repeated Visiting: best links readers will
come back• Information Gathering / Sharing:
– Cut time you spend looking– Link economy is real…Give some, get
some– Dynamic Content: not endpoint of
conversation, but the beginning…• Social Intervention / Detection
– Rumors , fads, infectious disease
Capitalize on Social Processes Diffusion / Cascade
The core concepts of social mediaEspoo, April 2007
![Page 6: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/6.jpg)
The Many Faces of Social Applications
Domain:• Music, politics, cycling, medicineMedia Type:• Video: YouTube, Daily MotionFacebookServices:• meeting people• expressing point view• serendipitous discovery
![Page 7: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/7.jpg)
What Are Some Limitations with
Social Applictions?
![Page 8: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/8.jpg)
20. April 2023
Avaré Stewart 8
Social Sites intentionally seek distinction
Problem: sheer number: redundancy, overlap:
• type of media, resources• topics
Overlaps exists: untapped to the benefit of those who actually constitute the social networking ecosystem
Social Networking Divide
Where's the “Social” Web ?
The ,so called, Social Web is ironically divided
![Page 9: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/9.jpg)
Open Social Networking (OSN)
Aspects of an Open Social Network
• Unified Data Spaces• Personal Identity Unification• Unified Applications
![Page 10: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/10.jpg)
10
http://esw.w3.org/topic/SweoIG/TaskForces/
CommunityProjects/LinkingOpenData
Unified Data Spaces Linking Open Data Cloud
![Page 11: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/11.jpg)
Personal Indentity Unification• OpenID : a single digital• Retaggr : social media
profile card• Geek Chart : graphical
profile - pie chart• DandyID : collect online
profiles in one place
• FriendFeed : real-time aggregator, consolidates the updates from sites
![Page 12: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/12.jpg)
Unified Applications
Multi-Site APIs: common API for social applications across multiple websites– OpenSocial
– Data Portability Project
Single Site –APIs: partner / interact programmatically– YouTube Data API: videos
– Spinn3r: indexing blogosphere
– etc....
![Page 13: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/13.jpg)
13
Bloggers Who
Don’t Tag
Taggers Who
Don’t Blog
???
Social Network Divide
Pharos Scenario
![Page 14: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/14.jpg)
Missing Link: Cross-Tagging
Avaré Bonaparte Stewart
14
Exploit the tags assertions made by users of one social site to personalize theexperience for users in another, comparable site
![Page 15: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/15.jpg)
Overview: Cross Tagging
15
Better Recommendations
Cross-Tagging for Personalized Open Social
Networking, Stewart, Diaz, Balby Marinho 2008
Better Browsing
Better Search
![Page 16: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/16.jpg)
What More Can We Do with Social
Applications?
![Page 17: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/17.jpg)
Social Medial Communities & Content
Espoo, April 2007
Social media: examined, primarily for popularity in connecting people
In Pharos: examine blogs improved, personalized information access
![Page 18: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/18.jpg)
Complex Information Needs & Social Media Search
• Polarity, opinion• Meme and themes• Related, multi-lingual resources• Entities: people, organizations, etc.• Relationships between entities• Event: who, what, where, when,
how
![Page 19: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/19.jpg)
Events ? ... Momentum is Shifting
• Industry: – Complex Event Processing (CEP)– Event correlation:
• Event Filtering , Event Aggregation• Event Masking, Root Cause Analysis
• Research:– Event detection– Associations– De-duplicate
Humans think in terms of events
and entities
Events - natural abstraction of real
world
Humans think in terms of events
and entities
Events - natural abstraction of real
world
![Page 20: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/20.jpg)
Information Retrieval, Meet Information Extraction ... from Blogs• Information Extraction IE :
– a subarea of Natural Language Processing (NLP)
– Needed to solve complex (event-driven) information needs
– hard, because natural language is complex, vague and ambiguous, i.e.: unstructured
• potentially harder, for blogs & informal sources
IEIR
Social Media
![Page 21: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/21.jpg)
Anatomy of a Blog
Tag
Content
Permalink
Timestamp
TitleFeedBlogroll
Comment
Trackback
Archive Author
Rich Source for Personalized Information
![Page 22: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/22.jpg)
Part II: Information Extraction
Tasks, Techniques and Tools
![Page 23: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/23.jpg)
What is Information Extraction ?
![Page 24: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/24.jpg)
Unstructured Data
• Encoded in a way that makes is difficult for computers to immediately interpret
• Multiple languages, across multiple documents
![Page 25: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/25.jpg)
20. April 2023 25
Why Information Extraction?
• Large amount of unstructured or semistructured information– Web pages, email, news articles, call-center text records, business
reports, annotations, spreadsheets, research papers, blogs, tags, instant messages (IM), …
• High impact applications– Business intelligence, personal information management, Web
communities, Web search and advertising, scientific data management, e-government, medical records management, …
• Open ended and growing rapidly
• Information Extraction:– Superimpose formal meaning on unstructured information– Elicit facts and relationships– Feed database/knowledgebase
![Page 26: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/26.jpg)
Why? ... Information is Locked Away...
Inaccesible data .... growing and sophisticated needs ... growing
Events, Facts, Relationships
![Page 27: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/27.jpg)
What is Information Extraction (IE) ?
• ...isolates relevant text fragments, extracts relevant information from the fragments, and pieces together the targeted information in a coherent framework
• ... build systems that finds and link relevant information while ignoring extraneous and irrelevant information
• Cowie and Lehnert, 1996 p.81
IE is used to get some information out of unstructured data
![Page 28: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/28.jpg)
Information Extraction : i.e. Disaster
Information Extraction (IE) System
Unstructured Text
StructuredText
![Page 29: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/29.jpg)
20. April 2023 29
Information Extraction: Major Tasks • Segmentation
– Tokenization, Sentence Splitting• Classification
– POS Tagging, Lemmatization, Disambiguation, …– Entity Detection
• Association– Noun Phrase Chunking– Parsing– Relationship Detection
• Normalization & Deduplication– Anaphora Resolution– Normalization of Formats, Schema– Record Linkage, Record Deduplication– Mention Tracking
![Page 30: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/30.jpg)
What are the Components and Tasks
of an Information Extraction
System?
![Page 31: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/31.jpg)
ExternalKnowledge
General View of IE System
Thesaurus
Ontology
Knowledge Base
Preprocessing
OUTPUT:StructuredInformation
ExtractionAquisitionLearning
ExtractionGrammar
Feedback
INPUT:Source Text
INPUT:Training corpus
Moen 06
Preprocessing
Training Phase Deployment Phase
Inforamtion Extraction , Moens
![Page 32: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/32.jpg)
Common IE Tasks: Preprocessing & Recognition
Pre-Processing Tasks
Normalization
Sentence Splitting
Tokenization
POS Tagging
Chunking
Parsing
Sense Disambiguation
Recognition Tasks
Named Entity (NE)
Co-reference Resolution(CO)
Template Element Construction (TE)
Template Relation Construction (TR)
Scenario Template (ST)
Semantic Role
Timex Line Recognition
![Page 33: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/33.jpg)
Ex: Text Normalization
AVIAN INFLUENZA, HUMAN (101): EGYPT, 79TH, 80TH CASES*****************************************************A ProMED-mail post<http://www.promedmail.org>ProMED-mail is a program of theInternational Society for Infectious Diseaseshttp://www.isid.orgDate: Mon 8 Jun 2009Source: Egyptian Chronicles [edited]<http://egyptianchronicles.blogspot.com/2009/06/h5n1-follow-
up-no80.html>
Clean junk formatting
•Transformed to make it consistent•Performed before text is processed
![Page 34: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/34.jpg)
Sentence Splitting
• Segments text into sentences
• Required for the tagger
• Domain- and application-independent
He called Mr. White at 4p.m. in Washington, D.C. Mr. Green responded.
The computer must tell which of the dots denote an actual sentence
![Page 35: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/35.jpg)
Tokenization
• Tokenization / Word Segmentation:
– Numbers, punctuation, symbols
– string of contiguous alphanumeric characters with space on either side?
Words are not always surrounded by whitespace:
Abbreviation are etc. and Calif.
A text-based medium.
White space not indicating a word break:
San Franciso
Ditto: in spite of
Phone: 0171 378 0647
![Page 36: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/36.jpg)
Parts of Speech (POS)
• POS: category / class• Words in same class have similar syntactic
behavior• Ex: Noun: person, place, thing, animal• Ex: verbs express action
![Page 37: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/37.jpg)
Ex: Penn Treebank POS TagsetTag
Description
Example
CC Coord conjuction and, but, or
CD Cardinal number one, two
DT Determiner a , the
EX Existential there There
FW Foreign Word Mea culpa
IN Prep/ subordinate conjunction
of, in, by
JJ Adjective Yellow
JJR Adjective, comparative
Bigger
JJS Adjective, superlative
Wildest
LS List item marker 1, 2, One
MD Modal Can, should
NN Noun, Sing Dog
NNS Noun, plural dogs
Tag
Description
Example
NNP Proper noun, sing IBM
NNPS Proper noun, plural
West Indies
PDT predeterminer All, both
POS Possesive ending ´s
PRP Personal pronoun I , you , he
RB Adverb Quickly, never
RBR Adverb, comparative
faster
RBS Adverb, superlative
fastest
RP Particle Up, off
SYM Symbol +, %, &
TO To to
UH Interjection Ah, oops
VB Verb base form eat
Tag
Description
Example
VBD Verb, past tense ate
VBG Verb, gerund eating
VBN Verb, past partici Eaten
VBP Verb non-3prs eat
VBZ Verb, 3prs eats
WDT Wh-determ Which, that
WP Wh-pronoun What, who
WP$ Possesive-wh whose
WRB Wh-adverb How, where
$
#
(
)
![Page 38: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/38.jpg)
Chunking
• Words are organized into groups• Phrases: word groupings, clumped as a
unit
![Page 39: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/39.jpg)
Parsing
• Labeled syntactic tree corresponding to the interpretation of the sentence
• Resolution of syntactic ambiguities
![Page 40: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/40.jpg)
Fruit flies like a banana
Time flies like an arrow
Sense Disambiguation
![Page 41: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/41.jpg)
What are Some Basic RecognitionTasks?
![Page 42: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/42.jpg)
IE Recognition Tasks
MUC Recognition Tasks
Named Entity (NE)
Co-reference Resolution (CO)
Template Element Construction (TE)
Template Relation Construction (TR)
Scenario Template (ST)
ACE Recognition Tasks
Entity detection and tracking (EDT)
Relation detection and characterization (RDC)
Event detection and characterization (EDC)
Temporal expression detection (TERN)
1987 1989 1991 1992 1993 1995 1998 2002 2009
MUC-1 MUC-2 MUC-3 MUC-4 MUC-5 MUC-6 MUC-7ACE
Pilot
Event
1999
ACE
Year
. . .
ACE +
Text Analysis
Conference (TAC)
![Page 43: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/43.jpg)
Named Entity Recognition (NE)
• recognition of entity names: – people, organizations – place names – temporal expressions &
numerical expressions
![Page 44: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/44.jpg)
Co-reference Resolution (CO)
• Identify chains of noun phrases that refer to the same object
• Scope:– Within document– Across document
John saw Mary. The girl was very beautiful; she wore a new red dress.
• Types: Pronominal : ’they’, ’it’, ’he’, ’hers’,
’themselves’, etc. resolve to : proper nouns, common nouns , other pronouns
![Page 45: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/45.jpg)
Proper Noun Coreference• Names of people, places, products
and companies referred to in many different variations.
Minnesota Mining and Manufacturing
3M Corp.
New York
New York City
NYC
N.Y.C
3M
Ref: Coreference as a Foundation for Link Analysis over Free Text
![Page 46: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/46.jpg)
Other Coreference Types
John Smith, chairman of General Electric, resigned yesterday.
John is the finest juggler in the world.
• Apposition: noun phrases, side by
side one define or
modified the other
• Predicate Nominal: noun phrase is main predicate of a sentence subject and predicate nominal connected by
a linking verb (copula)
![Page 47: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/47.jpg)
Template Element Construction (TE)
• Specified classes and attributes of entities:
– person : name (name variants),– title, nationality, – description in the text– subtype
![Page 48: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/48.jpg)
Template Relation Construction (TR)
• Two-slot template representing a binary relation:
– e.g., employee_of, product_of, location_of
– pointers to template elements
Fei-Yu Xu 08
![Page 49: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/49.jpg)
Scenario Template Production (ST)
• information involvingseveral relations or events:
– Joint venture
– Partners
– Products
– Profits
Fei-Yu Xu 08
![Page 50: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/50.jpg)
Can We Extract Temporal Expressions?
![Page 51: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/51.jpg)
Temporal expression detection (TERN)
• Time Expression Recognition and Normalization– recognize and normalize expressions that refer to date
and time– Timestamp of events– Meaning of temporal expressions– Conditions associating time with a relation / event
• TIMEX2 Standard• XML tags + time • second generation TIMEX
![Page 52: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/52.jpg)
Some Examples: TIMEX2 Time
I was sick <TIMEX2 VAL="1999-07-14"> yesterday </TIMEX2>.
I will be on vacation for <TIMEX2 VAL="P3W" ANCHOR_DIR="AFTER" ANCHOR_VAL="1999-07-15"> three weeks </TIMEX2>.
The contractor submitted a proposal on <TIMEX2 VAL="1999-07-13"> Tuesday </TIMEX2>.
<TIMEX2 VAL="1999-07-14"> The day after <TIMEX2 VAL="1999-07-13"> that </TIMEX2> </TIMEX2>, the contract was awarded.
Precise Time:
Duration:
Pronouns:
Thursday, July 15, 1999
![Page 53: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/53.jpg)
20. April 2023 54
State of the Art Performance
• Named entity recognition– Person, Location, Organization, …– F1 in high 80’s or low- to mid-90’s
• Binary relation extraction– Contained-in (Location1, Location2)
Member-of (Person1, Organization1)– F1 in 60’s or 70’s or 80’s
• N-ary relation extraction, event detection– Much lower -> errors accumulate!
![Page 54: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/54.jpg)
How Can Information Extraction Be Performed?
![Page 55: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/55.jpg)
Common IE Techniques
• Knowledge Engineering
• Corpus Based / Machine Learning
![Page 56: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/56.jpg)
Classification for IE
• Many problems needed for IE can be re-formulated as a classification problem
• Features: object description, context
• Class: which object belongs
• Input: Training Data• Classifier : Learning
Algorithm• Output: Hypothesis
fits the data
![Page 57: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/57.jpg)
Classification Scheme
• The class /semantic disctintion that we want to assign information unit:
– Named Entitiy: protein, drug, disease– Semantic Role: i.e verb : agent– Grammatic Role: object, subject– Domain Independent: person, organization– Sentence boundary : {!,.,-}
![Page 58: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/58.jpg)
Ex: FeaturesSemantic Role Recognition
Feature Value
Phrase type Noun / Verb phase, determined by the POS tag of syntactic head
Syntactic head Word that composes syntactic head of the phrase that represents i
Voice Active or passive
Named Entity Class Class : person, organization of syntactic head
Moens06
The actual set of features used is determined by a feature selecton strategy
Specific to the problem at hand
![Page 59: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/59.jpg)
Ex. Features Coreference Resolution (CO)
Feature Value
Number Agreement True if i and j agree in number
Gender Aggrement True if i and j agree in gender
Alias True if is an alias of j, vice versa
Pronoun i ( j) True if i (j ) is a pronoun
Appositive True if j is appositve of i
Definitieness True is j is preceeded by „the“ or demonstrative pronoun
Grammatical Role
True if grammatical role of i and j matchi.e: subject, direct /indirect object,
Proper name True is both are proper names
Name entity class True is both have the same semantic class
Discourse distance Number of sentences or words that i and j are apart
Moens06
![Page 60: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/60.jpg)
Do It Yourself: IE Task • A sample of text from the
Wall Street Journal is given, together with a template
• The task is to fill the template with information about succession events extracted from the text
• There are six events in total, although complete information is not available for all of them
Text:
New York Times Co. named Russell T. Lewis,
45, president and general manager of its
flagship New York Times newspaper,
responsible for all business-side
activities.
He was executive vice president and deputy
general manager. He succeeds Lance R.
Primis, who in September was named president
and chief operating officer of the parent.
Template:
<ORGANIZATION-1>
NAME : "New York Times Co.“
<ORGANIZATION-2>
NAME : "New York Times"
<PERSON-1>
NAME : "Russell T. Lewis“
<PERSON-2>
NAME : "Lance R. Primis"
http://gate.ac.uk/ie/ie_example.html
![Page 61: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/61.jpg)
Some Techniques : At a Glance
![Page 62: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/62.jpg)
What Tools Can I Use to Perform Information Extraction?
![Page 63: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/63.jpg)
An IE Toolkit: Lexical Resources
Ontology
Treebank
Dictionary
Brown
Penn Treebank
WordNet
Machine Readable corpus, dictionary, etc.. and tools for processing them
BCO
Tools
Parser
NER Tagger
UMLS
GENIA
VerbNet Comlex
Linguistic Data Consortium (LDC)
GATEUIMA
Open Biomedical Ontology
![Page 64: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/64.jpg)
Part III: Evaluation in Information Extraction
![Page 65: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/65.jpg)
Evaluation
• We evaluate our systems to:– See how they are behaving w.r.t
golden standard– Compare them with other systems• Types of Evaluations:– Intrinsic: specific to extraction task– Extrinsic: task on which extraction relies,
e.g.: Information Retrieval task
![Page 66: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/66.jpg)
Evaluation Precision / Recall
ExpertYes
ExpertNo
SystemYes
TP FP
SystemNo
FN TN
Recall = TP / (TP + FN) Precision = TP / (TP + FP) Fall Out = FP / (FP + TN)
fraction of correct/relevant answers which are predicted
proportion of incorrect class members given the number of incorrect class members i.e., Expert No
fraction of predictions which are correct/relevant
![Page 67: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/67.jpg)
F Measure
Combine measure for Precision and Recall
P = precisionR = recallB = a factor that indicates the relative importance of recall and precision
When B = 1, recall and precision are of equal importance = > harmonic mean (F1-measure)
(B2 + 1) PR
B2 P + RF =
![Page 68: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/68.jpg)
What Other Types of Metrics Exist Besides Precision and Recall?
![Page 69: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/69.jpg)
John saw Mary. He thought she was a very beautiful girl and she wore a new red dress.
Vilain Metric : Pron. Coreference
• Equivalence Class evaluation
– Groups built by system compared against gold standard (Key)
– Compare equivalence classes defined by links in key and computed values (Response)
A Model-Theoretic Coreference Scoring Schem e
Marc Vilain, John Burger, John Aberdeen, Dennis Connolly, Lynette
Hirschman
Coreference Chains
Mary
girl
she
he
John
![Page 70: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/70.jpg)
Vilain Recall: Concepts
Key Links: <A-B , B-C>Response Links: { (A-C) }
S : equivalence class relative to KeyS = {A,B,C}, where |S| = 3
p(S): Response partition on S (from Key)
• intersection of S and Response• elements in Key, not
Response
p(S) = { (A-C) , (B) }
|p(S)| = 2
c(S): minimal number of "correct links” to generate S
c(S) = (|S| - 1) = 2
m(S): no. "missing" Response Links m(S) = (|p(S)| - 1)
![Page 71: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/71.jpg)
Vilain: Recall / Precision
Recall
Precision
KeyEquiv Class
ResponseEquivClass
Precision : links added to Key
Recall : links added Response
![Page 72: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/72.jpg)
Do it Youself: Vilain Metric
![Page 73: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/73.jpg)
Part IV: Exploiting Information Extraction with IR in Social
Applications
![Page 74: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/74.jpg)
77
IE in Context
Create ontology
SegmentClassifyAssociateCluster
Load DB
Spider
Query,Search
Datamine
IE
Documentcollection
Database
Filter by relevance
Label training data
Train extraction models
![Page 75: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/75.jpg)
What does an Entity Extraction Scenario
Look Like?
![Page 76: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/76.jpg)
Scenario I: OKKAM tackling the Flood of Identifiers
http://en.wikipedia.org/wiki/Barack_obama
http://dbpedia.org/resource/Barack_Obama
http://www.linkedin.com/in/barackobama http://farm4.static.flickr.com/3193/2437394249_824e76ed76.jpg?v=0
http://current.com/index.php/items/89822170/obama_to_sign_stimulus_bill_today_in_denver.htm
http://www.facebook.com/home.php#/barackobama?ref=s
http://www.reuters.com/news/globalcoverage/barackobama
http://www.OPENCALAIS.com/watch?v=z4W2_raF_iw
??
OKKAM & Information Extraction 79
![Page 77: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/77.jpg)
Information Extraction & OKKAMization
OKKAM & Information Extraction 80
NER:
detect named
entity
decide about
type
(e.g.)
send ID Request (based
on entity name, type +
context information)
OKKAM
ENS
OKKAM
ENS
return OKKAM ID
(or list of candidates)
attach ID to entity
reference in text
Person
http://www.okkam.org/ens/idb3016709-b9e1-42c0-ac5f-6383d2e5b235
=> prepare for information integration,
entity cenrtic search, semantic
infusion (attachment of information
about entity)
=> prepare for information integration,
entity cenrtic search, semantic
infusion (attachment of information
about entity)
http://www.okkam.org/
![Page 78: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/78.jpg)
What Does an Event Extraction Scenario
Look Like?
![Page 79: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/79.jpg)
Scenaio II: Epidemic Intelligence
20. April 2023
Avaré Stewart 82
Goal: early identification of potential health threats:
• verification, assessment, investigation
State of Art: Event-Based• web data • NLP, Data Mining, Machine
Learning techniques• extract epidemic events from
the unstructured text.. • News, domain-specific
reports, blogs
online news
![Page 80: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/80.jpg)
Event Mining for Early Detection, Rapid Response ...
![Page 81: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/81.jpg)
How Can Events Be Used in Pharos Audio-
Visual Search?
![Page 82: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/82.jpg)
Scenario III: Facets in Pharos
• Event-Centric Search / Browsing– Document representation no longer Bag-of-Words:– Events => N-ary relations between entities or classes
![Page 83: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/83.jpg)
Scenario III: Extraction from Informal Text• Transcribed Speech
– Discourse structure of „Speech Text“ differs from written text
– Transcription errors– Missing orthographic features
• Sentence Boundaries difficult to detect• Automatic Speech Recognition (ASR) Vocabulary Problem
• Blogs– Affective, opinionated– Topic fluctuating, prose – Many authors, different style– Inconsistent capitalization patterns– Malformed sentences & phrases, Slang, .....
![Page 84: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/84.jpg)
• Part V: Wrap Up & Conclusion
![Page 85: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/85.jpg)
What Considerations Do I Need to Make for
My Information Extraction System?
![Page 86: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/86.jpg)
Consideration for IE System
Description Dimension
document structure of the input text
• free text• semi-structured
richness of the natural language processing (NLP)
• shallow NLP• deep NLP
complexity of the pattern rules
• single slot• multiple slots
data size • training data • application data
degree of automation • supervised• semi-supervised• unsupervised
type of evaluation • gold standard corpus?• evaluation measures used ?• evaluation of machine learning
![Page 87: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/87.jpg)
What Are Some Important Directions
in Information Extracation?
![Page 88: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/88.jpg)
Research Trends in IE
Concept Description
[1] Semi / Un – Supervised, SelfLearning
Supervised methods assume: • annotated documents • broad coverage • suffcient data redundancy
[2] Open Information Extraction
•Target relations not know in advance
[3] Web Scale Systems • Number of relations is large
![Page 89: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/89.jpg)
20. April 2023 92
Research trends in IE• Selfsupervised Information Extraction
at WebScale– KnowItAll: Extracting closed set of relations
[Etzioni 2005]– TextRunner: Extracting open set of of relations
[Banko 2007]– Open IE : The Tradeoffs Between Open and
Traditional Relation Extraction [Banko 2008]– SRES [Feldman 2006], LEILA [Suchanek 2006]:
Extracting closed relation set with more elaborate linguistic preprocessing
Scalability:• Large set of seed relations (e.g. entire IMDB)• Open ended corpora
Noise: Incorrect seed interpretations
![Page 90: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/90.jpg)
In Summary ....
![Page 91: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/91.jpg)
Information is No Longer Locked Away...
Events, Facts, Relationships, Opinions
Social Application Integration
![Page 92: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/92.jpg)
IR and EI Tradeoffs
• IE needs more CPU power, suitable tradeoff between data size, analysis depth, complexity , time, etc.
• Deeper analysis , complex template structures consumes more time than shallow analysis and simple named entity recognition or binary relation extraction
• Ease of use needs improvement
![Page 93: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/93.jpg)
… Lighting the Way …IE is acknowledged: an urgently needed information
technology - a constantly growing digitized world
society winners ?
Globalized information
…Those who outstrip competitors, comprehensive, integrated and precise access to digital information for decision making processes!
![Page 94: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/94.jpg)
Thank You
![Page 95: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/95.jpg)
Useful Tools
• ANNIE : Information Extraction System– http://gate.ac.uk/ie/annie.html
• Stanford Parser– http://nlp.stanford.edu:8080/parser/
• WhatsWhyWithMyNLP?– http://code.google.com/p/whatswrong/
• LingPipe– http://alias-i.com/lingpipe/html– http://www-nlp.stanford.edu/downloads/
![Page 96: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/96.jpg)
Useful Links
• Software Tools for NLP– http://www-a2k.is.tokushima-u.ac.jp/member/
kita/NLP/nlp_tools.html
• Statistical NLP / corpus-based computational linguistics resources– http://nlp.stanford.edu/links/statnlp.html
• Stanford NLP Group– http://www-nlp.stanford.edu/downloads/
• Linguist List - Language and Resources– http://www.linguistlist.org/langres/index.html
![Page 97: Pharos Summer School Fundamentals of Social Applications](https://reader034.vdocuments.mx/reader034/viewer/2022042703/56813fce550346895daaaad3/html5/thumbnails/97.jpg)
Selected References
• Foundations of Statistical Natural Language Processing, Manning and Schutze
• Information Extraction, Moens• Text Mining Handbook, Feldman,
Sanger• Maximum Entropy Model for NLP,
Ratnaparkhi