mining unstructured data (text data mining) - chapters site iia nov5...mining unstructured data...
TRANSCRIPT
Mining Unstructured Data(Text Data Mining)
Where We’re Going Today
1. Getting on the same page
2. Learning Latent Semantic Analysis
3. “Cottage Tools” of LSA technology and how they help in investigations.
4. Text Mining tools and tips for beginning to use text mining in your investigations
5. Q&A
Growth of Unstructured Data
Why This is Important
80% of entity data is unstructured
That 80% comprises communications, both formal and informal
Text = rich source of evidence (Text is a window to the soul)
Analysis of text will be as common as using ACL in the next 2-3 years.
Types of Data Analyzed
free form text 38%social networks 18%web content 13%email 11%
Source: KD Nuggets Annual Survey
Blends the strengths of human intelligence and artificial intelligence.
95-98% reduction in volume to review, and in time taken to find relevant content.
Also called “Natural Language Processing”
Latent Semantic Analysis
From: The BossTo: Everyone
I wanted to congratulate Lisa on her really outstanding presentation at the meeting! The update was easy to follow and was very well received by the group. Great job, Lisa! Kim also gave a nice presentation. We appreciate you both.
Lisa Kimnice
Latent Semantics
Latent Semantics
To: Vendor RepFrom: Employee---------------------------------------------Thank you for the “gift” – I’m so excited! It looks great in my driveway! I can’t wait to take it out on the open road! My neighbors are soooo jealous!
Translation Thanks for the kickback The kickback is a car I’m excited about being used Social status is more important
to me than getting fired I use way too many exclamation
points!
Latent Semantics
To: EmployeeFrom: Vendor Rep---------------------------------------------Think nothing of it, you deserve a treat every now and then for all you’ve done for us.
Translation You’re welcome, glad you like the
kickback We’re using you as a pawn. I talk to you like I would my dog Notice the “us”, that should unsettle
you…
Text is a Window to the Soul…
Source: ACFE Annual Report
Underlying Functions
Underlying Functions
Topic Maps and Word Clouds
Underlying Functions
POS Tagging in Python
How Tone Detection Works
EvasivenessVaguenessTension, Nervousness
How Tone Detection Works
EvasivenessVaguenessTension, Nervousness
Underlying Functions
Relationship Networks
Relationship Networks
$600k development at Twin Pines
Tools for Text AnalysisForensics tools – Forensic Toolkit ($2.5k), EnCase ($7k), XWays ($700)
Email – Aid4Mail ($300), DT Search ($199), forensic toolsDocuments – DT Search, forensics toolsWord Clouds – Wordle (internet), Pytagcloud (python),Wordcloud (R)
Relationship maps – NodeXL (free), Analyst’s Notebook ($6k)
Latent Semantics ($$$) – Relativity, ViewPointLatent Semantics (free) – tm (R), NLTK & Gensim (Python)
Lanny [email protected]
@LannyMorrow