ii-sdv 2014 a new approach to flexible, meaning-rich document parsing (paul barba -- lexalytics,...

18
Nice And how to get what you need.

Upload: dr-haxel-congress-and-event-management-gmbh

Post on 11-May-2015

324 views

Category:

Software


0 download

TRANSCRIPT

Page 1: II-SDV 2014 A New Approach to Flexible, Meaning-Rich Document Parsing (Paul Barba -- Lexalytics, USA)

Nice

And how to get what you need.

Page 2: II-SDV 2014 A New Approach to Flexible, Meaning-Rich Document Parsing (Paul Barba -- Lexalytics, USA)

Lexalytics is

• A software company

• We sell the “Salience Engine”

• Salience is a Text Analytics Engine that fits into your software, services, or applications

• What we ship is a set of libraries and configuration files

© 2014 Lexalytics Inc. All rights reserved. lexalytics.com2

S A L I E N C E 5 . 2

Page 3: II-SDV 2014 A New Approach to Flexible, Meaning-Rich Document Parsing (Paul Barba -- Lexalytics, USA)

Market Proven IP: 11 Years of R&D

© 2014 Lexalytics Inc. All rights reserved. lexalytics.com3

Approximately 3 Billion documents/day go through Salience.

2/2012: Mobile Functionality – Port the Salience engine to Android mobile devices

11/2010: Salience 4.4 released, includes support for first non-English language (French)

10/2011: Salience v5.0 incorporates innovative Concept Matrix functionality

06/2012: Salience v5.1 released, expansion of available options and optimized sentiment analysis functionality

08/2013: Chinese language released; multi-lingual support in 6 languages

Q4/2014: Salience v6 – new underpinnings, easier tuning, and “Intent” extraction

2004: Lexalytics launches first commercial text and sentiment analysis engine, Salience v1.0

10/2008: Salience 4.0 released, based on maximum entropy model for detection and labeling of novel entities

08/2010: Salience 4.3 to include custom handling of Twitter and micro-blog content

2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014

Q4/2014: Salience v5.2 released with various feature enhancements

Page 4: II-SDV 2014 A New Approach to Flexible, Meaning-Rich Document Parsing (Paul Barba -- Lexalytics, USA)

A Multi-lingual World WLOA (With Lots of Acronyms) and Context Everywhere

4Lexalytics Salience Training prepared for Analytics 8

NLP

• New Labor Party

• National Landcare Program

• Network Layer Packet

• NeuroLinguistic Programming

• Wicked

• Sick

• Hack

Page 5: II-SDV 2014 A New Approach to Flexible, Meaning-Rich Document Parsing (Paul Barba -- Lexalytics, USA)

Always running to catch up…

5Lexalytics Salience Training prepared for Analytics 8

Page 6: II-SDV 2014 A New Approach to Flexible, Meaning-Rich Document Parsing (Paul Barba -- Lexalytics, USA)

New Tools

Page 7: II-SDV 2014 A New Approach to Flexible, Meaning-Rich Document Parsing (Paul Barba -- Lexalytics, USA)

God Bless Moore’s Law and Librarians

7Lexalytics Salience Training prepared for Analytics 8

Page 8: II-SDV 2014 A New Approach to Flexible, Meaning-Rich Document Parsing (Paul Barba -- Lexalytics, USA)

Unsupervised learning is the key

8Lexalytics Salience Training prepared for Analytics 8

Page 9: II-SDV 2014 A New Approach to Flexible, Meaning-Rich Document Parsing (Paul Barba -- Lexalytics, USA)

Meaning Matters

9Lexalytics Salience Training prepared for Analytics 8

It ’s not that I don’t like tea I just prefer coffee

Page 10: II-SDV 2014 A New Approach to Flexible, Meaning-Rich Document Parsing (Paul Barba -- Lexalytics, USA)

Meaning Matters

10Lexalytics Salience Training prepared for Analytics 8

Jane will be joining already with a search experta team

Page 11: II-SDV 2014 A New Approach to Flexible, Meaning-Rich Document Parsing (Paul Barba -- Lexalytics, USA)

Meaning Matters

11Lexalytics Salience Training prepared for Analytics 8

Jane will be joining a team already with some search experience

Page 12: II-SDV 2014 A New Approach to Flexible, Meaning-Rich Document Parsing (Paul Barba -- Lexalytics, USA)

Episode 4: A New Hope

12Lexalytics Salience Training prepared for Analytics 8

Sentence POS Tagger ChunkerRulesFile

CandidateParseTerms

Page 13: II-SDV 2014 A New Approach to Flexible, Meaning-Rich Document Parsing (Paul Barba -- Lexalytics, USA)

Jane and her team

<Jane will be joining a team already with search experience>

• Pos Tag<Jane_NNP will_MD be_VBjoining_VBP a_DT team_NNalready_RB with_PP search_JJexperience_NN>

• Chunk<Jane> <will be joining> <a team> <already with search experience>

13

• Extract possible links

Jane => will be joining

will be joining => a team

a team => already with search experience

will be joining => already with search experience

Jane => already with search experience.

Lexalytics Salience Training prepared for Analytics 8

Page 14: II-SDV 2014 A New Approach to Flexible, Meaning-Rich Document Parsing (Paul Barba -- Lexalytics, USA)

Matrices of Meaning

14Lexalytics Salience Training prepared for Analytics 8

Page 15: II-SDV 2014 A New Approach to Flexible, Meaning-Rich Document Parsing (Paul Barba -- Lexalytics, USA)

Matrix Math

15Lexalytics Salience Training prepared for Analytics 8

All noun phrases

All verb

phrases

Page 16: II-SDV 2014 A New Approach to Flexible, Meaning-Rich Document Parsing (Paul Barba -- Lexalytics, USA)

Now look at how easy it is

• <Do you want me to get anything else while I go to the store for milk?>

• pos tag and chunk it.

<Do> <you> <want> <me> <to get> <anything else> <while> <I> <go> <to the store> <for milk>

16

Find the possible links.

do want

you want

want me

you to get

want to get

me to get

to get anything else

want while

to get while

while go

I go

go to the store

I to the store

get to the store

want to the store

to the store for milk

go for milk

want for milk

Lexalytics Salience Training prepared for Analytics 8

Page 17: II-SDV 2014 A New Approach to Flexible, Meaning-Rich Document Parsing (Paul Barba -- Lexalytics, USA)

A world of new possibilities

17Lexalytics Salience Training prepared for Analytics 8

Page 18: II-SDV 2014 A New Approach to Flexible, Meaning-Rich Document Parsing (Paul Barba -- Lexalytics, USA)