human language technology in a big data world · human language technology in a big data world...
TRANSCRIPT
![Page 1: Human Language Technology in a Big Data World · Human Language Technology in a Big Data World @chris_biow #HLTCon (2016) Big Data Universe . Sooo Big! Tooo Big! Taming Big . Taming](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fb7ccbe4abcef0638110778/html5/thumbnails/1.jpg)
Human Language Technology in a Big Data World
@chris_biow #HLTCon (2016)
![Page 2: Human Language Technology in a Big Data World · Human Language Technology in a Big Data World @chris_biow #HLTCon (2016) Big Data Universe . Sooo Big! Tooo Big! Taming Big . Taming](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fb7ccbe4abcef0638110778/html5/thumbnails/2.jpg)
Big Data Universe
![Page 3: Human Language Technology in a Big Data World · Human Language Technology in a Big Data World @chris_biow #HLTCon (2016) Big Data Universe . Sooo Big! Tooo Big! Taming Big . Taming](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fb7ccbe4abcef0638110778/html5/thumbnails/3.jpg)
Sooo Big!
![Page 4: Human Language Technology in a Big Data World · Human Language Technology in a Big Data World @chris_biow #HLTCon (2016) Big Data Universe . Sooo Big! Tooo Big! Taming Big . Taming](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fb7ccbe4abcef0638110778/html5/thumbnails/4.jpg)
Tooo Big!
![Page 5: Human Language Technology in a Big Data World · Human Language Technology in a Big Data World @chris_biow #HLTCon (2016) Big Data Universe . Sooo Big! Tooo Big! Taming Big . Taming](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fb7ccbe4abcef0638110778/html5/thumbnails/5.jpg)
Taming Big
![Page 6: Human Language Technology in a Big Data World · Human Language Technology in a Big Data World @chris_biow #HLTCon (2016) Big Data Universe . Sooo Big! Tooo Big! Taming Big . Taming](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fb7ccbe4abcef0638110778/html5/thumbnails/6.jpg)
Taming Too Big
![Page 7: Human Language Technology in a Big Data World · Human Language Technology in a Big Data World @chris_biow #HLTCon (2016) Big Data Universe . Sooo Big! Tooo Big! Taming Big . Taming](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fb7ccbe4abcef0638110778/html5/thumbnails/7.jpg)
Exponential Hyperbole!!
Yer gonna die. Standard mountaineering warning
● Data is exploding without limit ● I can draw a curve on a semi-log-scale
graph ● Even if that almost never happens in reality
● Buy my vision or drown in data
![Page 8: Human Language Technology in a Big Data World · Human Language Technology in a Big Data World @chris_biow #HLTCon (2016) Big Data Universe . Sooo Big! Tooo Big! Taming Big . Taming](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fb7ccbe4abcef0638110778/html5/thumbnails/8.jpg)
Wgsimon / Wikimedia Commons / Creative Commons Attribution-Share Alike 3.0 Unported
![Page 9: Human Language Technology in a Big Data World · Human Language Technology in a Big Data World @chris_biow #HLTCon (2016) Big Data Universe . Sooo Big! Tooo Big! Taming Big . Taming](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fb7ccbe4abcef0638110778/html5/thumbnails/9.jpg)
Exponential Reality
![Page 10: Human Language Technology in a Big Data World · Human Language Technology in a Big Data World @chris_biow #HLTCon (2016) Big Data Universe . Sooo Big! Tooo Big! Taming Big . Taming](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fb7ccbe4abcef0638110778/html5/thumbnails/10.jpg)
Qef / Wikimedia Commons / Public Domain
![Page 11: Human Language Technology in a Big Data World · Human Language Technology in a Big Data World @chris_biow #HLTCon (2016) Big Data Universe . Sooo Big! Tooo Big! Taming Big . Taming](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fb7ccbe4abcef0638110778/html5/thumbnails/11.jpg)
Human Language World
![Page 12: Human Language Technology in a Big Data World · Human Language Technology in a Big Data World @chris_biow #HLTCon (2016) Big Data Universe . Sooo Big! Tooo Big! Taming Big . Taming](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fb7ccbe4abcef0638110778/html5/thumbnails/12.jpg)
Exponential Sobriety
Most growth is exponential. Chris Lindblad
MarkLogic Founder
Measure 10^ 2^ Example
Kilobyte 3 10 12 lines of 80 characters
Megabyte 6 20 500 pages, 48 hours typing
Gigabyte 9 30 30 minutes Twitter text feed
Terabyte 12 40 2 weeks Twitter text feed
Petabyte 15 50 Humanity typing for 8 hours
Exabyte 18 60 Humanity typing for 1 year
Zettabyte 21 70 Global IP traffic 2016 [Cisco 2013]
Yottabyte 24 80 (break glass in case of need)
![Page 13: Human Language Technology in a Big Data World · Human Language Technology in a Big Data World @chris_biow #HLTCon (2016) Big Data Universe . Sooo Big! Tooo Big! Taming Big . Taming](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fb7ccbe4abcef0638110778/html5/thumbnails/13.jpg)
Distinguishing Big Data Follow the money.
Volume Bounded
Variety Text and voice
Velocity Latency
Value Fixed % of all
Veracity Not necc. required
![Page 14: Human Language Technology in a Big Data World · Human Language Technology in a Big Data World @chris_biow #HLTCon (2016) Big Data Universe . Sooo Big! Tooo Big! Taming Big . Taming](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fb7ccbe4abcef0638110778/html5/thumbnails/14.jpg)
Big Data Tech
![Page 15: Human Language Technology in a Big Data World · Human Language Technology in a Big Data World @chris_biow #HLTCon (2016) Big Data Universe . Sooo Big! Tooo Big! Taming Big . Taming](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fb7ccbe4abcef0638110778/html5/thumbnails/15.jpg)
I shall not today attempt further to define [it], and perhaps I could never succeed in intelligibly doing so. But I know it when I see it…
Justice Potter Stewart, 1964 (emphasis added)
Defining Big Data
Data whose volume, velocity, and variety determines your choice of software and infrastructure.
![Page 16: Human Language Technology in a Big Data World · Human Language Technology in a Big Data World @chris_biow #HLTCon (2016) Big Data Universe . Sooo Big! Tooo Big! Taming Big . Taming](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fb7ccbe4abcef0638110778/html5/thumbnails/16.jpg)
Achieving Big Data
Year Company Customer Project Quantity (M)
Size (GB)
Project Cost ($M)
2003 Verity TRW, DIA WISE 40 200 10
2006 Veronomy Bloomberg News 200 1,000 30
2009 MarkLogic Gov & Comm. OSINT 2,000 200,000 100
2014 MongoDB AWS ReInvent goo.gl/xZVgdl
7,000 1,000,000 0.003
![Page 17: Human Language Technology in a Big Data World · Human Language Technology in a Big Data World @chris_biow #HLTCon (2016) Big Data Universe . Sooo Big! Tooo Big! Taming Big . Taming](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fb7ccbe4abcef0638110778/html5/thumbnails/17.jpg)
Features & Functions
![Page 18: Human Language Technology in a Big Data World · Human Language Technology in a Big Data World @chris_biow #HLTCon (2016) Big Data Universe . Sooo Big! Tooo Big! Taming Big . Taming](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fb7ccbe4abcef0638110778/html5/thumbnails/18.jpg)
![Page 19: Human Language Technology in a Big Data World · Human Language Technology in a Big Data World @chris_biow #HLTCon (2016) Big Data Universe . Sooo Big! Tooo Big! Taming Big . Taming](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fb7ccbe4abcef0638110778/html5/thumbnails/19.jpg)
![Page 20: Human Language Technology in a Big Data World · Human Language Technology in a Big Data World @chris_biow #HLTCon (2016) Big Data Universe . Sooo Big! Tooo Big! Taming Big . Taming](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fb7ccbe4abcef0638110778/html5/thumbnails/20.jpg)
Text-Ready Tech
![Page 21: Human Language Technology in a Big Data World · Human Language Technology in a Big Data World @chris_biow #HLTCon (2016) Big Data Universe . Sooo Big! Tooo Big! Taming Big . Taming](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fb7ccbe4abcef0638110778/html5/thumbnails/21.jpg)
State of the Mission in Text Analytics
Entity Extraction
Text Translation
Relationship Extraction
Name Translation
Search
Database
Language ID
Sentiment Analysis Rare, new
Languages
Name Translation
Alerting
Voice of the X
Partial Parse
Gap Solved
![Page 22: Human Language Technology in a Big Data World · Human Language Technology in a Big Data World @chris_biow #HLTCon (2016) Big Data Universe . Sooo Big! Tooo Big! Taming Big . Taming](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fb7ccbe4abcef0638110778/html5/thumbnails/22.jpg)
What language? bú
ana raye7 el gam3a el sa3a 3 el 3asr. el gaw 3amel eh elnaharda f eskendereya?
![Page 23: Human Language Technology in a Big Data World · Human Language Technology in a Big Data World @chris_biow #HLTCon (2016) Big Data Universe . Sooo Big! Tooo Big! Taming Big . Taming](https://reader031.vdocuments.mx/reader031/viewer/2022011914/5fb7ccbe4abcef0638110778/html5/thumbnails/23.jpg)
Lessons Learned • Requirements are wrong
• Every power of 4 will invalidate some requirements and solutions
• Agile processes fit Big HLT
• Measure to costs and to mission at each increment
• Express requirements exponentially
• Expect competence and confidence with Big Data
• Progress exponentially (powers of 4)
• Adjust requirements as you learn how they meet the mission