tdt 2003 evaluation workshop, nist, november 17-18, 2003 creating the annotated tdt-4 y2003...
Post on 15-Dec-2015
214 Views
Preview:
TRANSCRIPT
TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003
Creating the Annotated TDT-4 Y2003 Evaluation Corpus
Stephanie Strassel, Meghan Glenn
Linguistic Data Consortium - University of Pennsylvania
{strassel, mlglenn@ldc.upenn.edu}
TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003
Data Collection/Preparation Collection
Multiple sources, languages October 2000 – July 2001
TDT-4 Corpus V1.0 Arabic, Chinese, English only October 2000 – January 2001 Collection subsampled for annotation
Goal: Reduce licensing, transcription and segmentation costs Broadcast sources: select 4 of 7 or 3 of 5 days, stagger selection to
maximize coverage by day Newswire sources: sampling consistent with previous years
• No down-sampling of Arabic NW Reference transcripts
Closed-caption text where available Commercial transcription agencies otherwise
• Spell-check names for English commercial transcripts Provide initial story boundaries & timestamps
ASR Output & Machine Translation TDT-4 Corpus V 1.1
Incorporates patches to Mandarin ASR data to fix encoding; removes empty files
TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003
TDT-4 Corpus Overview
AFA Newswire 19126 --- ---ALH Newswire 10656 --- ---ANN Newswire 9682 --- ---VAR Radio 2378 68 commercialNTV (Web) Television 871 20.5 commercial
APW Newswire 10268 --- ---NYT Newswire 4842 --- ---VOA Radio 2694 70 commercial + spell checkPRI Radio 1965 62 commercial + spell checkCNN Television 4698 64.5 closed-captionABC Television 1692 38.5 closed-captionNBC Television 1234 35 closed-caption
MNB Television 997 43 closed-caption
XIN Newswire 9837 --- ---ZBN Newswire 8114 --- ---VOM Radio 1780 64 commercialCNR (Web) Radio 2259 43 commercialCTV Television 1483 32.5 commercialCTS (Web) Television 2221 44 commercialCBS (Web) Television 1451 34 commercial
BBN
LIMSI
BBN
IBM TJ Watson
Research Center
N/A
Systran (run at LDC)
ASRmachine
translationreference transcripts
Arabic
English
Mandarin
language source data type # documentstotal audio
(hours)
TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003
TDT Concepts
STORY In TDT2, story is “a section containing at least two
independent declarative clauses on same topic” In TDT3, definition modified to capture annotators’ intuitions
about what constitutes story Distinction between “preview/teaser” and complete news story
TDT4 preserves this content-based story definition Greater emphasis on consistent application of story definition
among annotation crew
EVENT A specific thing that happens at a specific time and place
along with all necessary preconditions and unavoidable consequences
TOPIC An event or activity along with all directly related events and
activities
TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003
Topics for 2003
40 new topics selected, defined, annotated for 2003 evaluation 20 from Arabic seed stories 10 each from Mandarin, English
Topic selection strategy same as in 2002Arabic topics are somewhat different
Despite same selection strategy First time we’ve had Arabic seed stories
“Topic well” is running dry 80 news topics with high likelihood of cross-
language hits from 4-month span!
TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003
Selection Strategy
Team leaders examine randomly-selected seed story Potential seeds balanced across corpus
(source/date/lang) Identify TDT-style seminal event within story Apply rule of interpretation to convert event to
topic 13 rules state, for each type of seminal event, what other
types of events should be considered related No requirement that selected topics have cross-
language hits But team leaders use knowledge of corpus to select
stories likely to produce hits in other language sources Handful of “easily confusable” topics
TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003
Rules of Interpretation
1. Elections, e.g. 30030: Taipei Mayoral Elections
Seminal events include: a specific political campaign, election day coverage, inauguration, voter turnouts, election results, protests, reaction.
Topic includes: the entire process, from announcements of a candidate's intention to run through the campaign, nominations, election process and through the inauguration and formation of a newly-elected official's cabinet or government.
2. Scandals/Hearings, e.g. 30038: Olympic Bribery Scandal
3. Legal/Criminal Cases, e.g. 30003: Pinochet Trial
4. Natural Disasters, e.g., 30002: Hurricane Mitch
5. Accidents, e.g., 30014: Nigerian Gas Line Fire
6. Acts of Violence or War, e.g., 30034: Indonesia/East Timor Conflict
7. Science and Discovery News, e.g., 31019: AIDS Vaccine Testing Begins
8. Financial News, e.g., 30033: Euro Introduced
9. New Laws, e.g., 30009: Anti-Doping Proposals
10. Sports News, e.g., 31016: ATP Tennis Tournament
11. Political and Diplomatic Meetings, e.g., 30018: Tony Blair Visits China
12. Celebrity/Human Interest News, e.g., 31036: Joe DiMaggio Illness
13. Miscellaneous News, e.g., 31024: South Africa to Buy $5 Billion in Weapons
TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003
Topic Research
Provides context Annotators specialize in
particular topics (of their choosing)
Includes timelines, maps, keywords, named entities, links to online resources for each topic
Feeds into annotation queries
TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003
Topic Definition
Fixed format to enhance consistency
Seminal event lists basic facts – who/what/when/where
Topic explication spells out scope of topic and potential difficulties
Rule of interpretation link Link to additional
resources Feeds directly into topic
annotation
TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003
Annotation Strategy Overview
Search-guided complete annotation Work with one topic at a time Multiple stages for each topic; multiple iterations of each
stage Two-way topic labeling decision
Topic Labels YES: story discusses the topic in a substantial way NO: story does not discuss the topic at all, or only
mentions the topic in passing without giving any information about the topic
No BRIEF in TDT-4 “Not Easy” label for tricky decisions
Triggers additional QC
TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003
Annotation Search Stages
Stage 1: Initial query Submit seed story or keywords as query to search engine Read through resulting relevance-ranked list Label each story as YES/NO Stop after finding 5-10 on-topic stories, or After reaching “off-topic threshold”
At least 2 off-topic stories for every 1 OT read AND The last 10 consecutive stories are off-topic
Stage 2: Improved query using OT stories from Stage 1 Issue new query using concatenation of all known OT stories Read and annotate stories in resulting relevance-ranked list until
reaching off-topic threshold Stage 3: Text-based queries
Issue new query drawn from topic research & topic definition documents plus any additional relevant text
Read and annotate stories in resulting relevance-ranked list until reaching off-topic threshold
Stage 4: Creative searching Annotators instructed to use specialized knowledge, think creatively
to find novel ways to identify additional OT stories
TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003
Additional Annotation & QC Top-Ranked Off-Topic Stories (TROTS)
Define search epoch First 4 on-topic stories chronologically sorted
Find two highly-ranked off-topic documents for each topic-language Precision
All on-topic (YES) stories reviewed by senior annotator to identify false alarms All “not easy” off-topic stories reviewed
Adjudication Review pooled site results and adjudicate cases of disagreement with LDC annotators’ judgments
Pooled 3 sites’ tracking results Reviewed all purported LDC FAs For purported LDC Misses
• English and Arabic: reviewed cases where all 3 sites disagreed with LDC• Mandarin: reviewed cases where 2 or more sites disagreed with LDC
top related