textual report generation from email utilizing temporal topic analysis · 2019-01-02 · input...
TRANSCRIPT
● Use doc2vec for topic calculus
● Use model trained on Wikipedia articles for topics
● Extract topic labels by compare email vectors & cluster
keyword sets to topic vectors
● Choose a set of topics that together best describe a email
Topic AnalysisInput
Communication groups
Temporal Chains
Textual Report Generation from Email utilizing Temporal Topic Analysis
● Two email datasets: ENRON & Avocado
● Enron contains ~500K emails from 150 employees
● Avocado Research Email Collection contains ~1M emails from 282 accounts
● Group people into clusters based on communication frequency
● Draw graph of communications, weigh edges with email count
● Extract topics for each cluster
● Use clusters to determine communication patterns & anomalies
● Resulting components represent communication groups
Report Generation
Topic Ranking
● Use the hierarchical structure from the analysis (communication groups, email grouping, topic chains, anomalies, etc.)
● Select relevant details to help user understand context of report, based on particular template of choice (summary vs anomalies)
● Reason over content to select good organization/display style.
● Supports multiple report templates, including summary- and anomaly-focused output, with modular extensibility for other styles
Reply /Forward /
Related
● Organize emails into topic chains by looking at replies, forwards, and by comparing topics
● Identify topic flow/change over time
Collaboration
We are proud of a successful collaboration between NC State and the LAS, including monthly meetings with excellent feedback and ideas.
• We use doc2vec to compute similarity via cosine distance
• For topic labeling, we rank topics using additional criteria:
○ PageRank
○ Coverage
○ Redundancy
Colin M. PottsNC State [email protected]
Sean Lynch & Tracy StandaferLaboratory for Analytic Science
[email protected] | [email protected]
θ