cutting edge technologyused in epadd
TRANSCRIPT
Cutting Edge Technology
used in ePADD
RBMS - June 25, 2015
Peter Chan, Digital Archivist
5 Cutting Edge Technologies in ePADD
1. Intellectual Content Appraisal
2. Lexicon Search
3. Query Generator
4. Attachment Browsing
5. Entity Resolution
Intellectual Content Appraisal
Common practices
• Physical attribute – File count
– File size
– File listing
– File format
ePADD information extraction
• Intellectual content – Personal name
– Organizational name
– Location name
• Physical attributes listed on the left.
Lexicon Search
Regular search
• One query at a time
• Exact terms
• Search terms cannot be saved for later use
• Search terms not grouped
ePADD Lexicon search
• Multiple queries at a time
• Stemming search
• Search terms saved for future use
• Search term groupings saved for future use
Query Generator
Regular search
• Users enter search terms
ePADD Query Generator
• System generate search terms from text provided
Attachment Browsing
Inside the email message
• Email client – Gmail, Hotmail, Yahoo Mail,
etc.
• Email archiving software – MailStore
ePADD Attachment Browsing
• Consolidate all attached images in one place and link images back to originating messages
• Consolidate all document attachments in one place for easy download
• Consolidate all other attachments in one place for easy download
Entity Resolution
Not seen elsewhere
ePADD Disambiguation • External resolution
– ePADD resolves entities to the FAST for person names, and Freebase for location and organizations.
• Internal resolution – ePADD generates a list of internal
“authority” records consisting of all recognized entities and multi-word address book names
– When the user hovers over such names or acronyms in an email, possible resolutions to internal authority records are shown in decreasing order of confidence.