natural language processing (nlp) · natural language processing (nlp) pradnya nimkar, acas, maaa....

39
Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA Disclaimer: This presentation is going to be…..wordy!

Upload: others

Post on 24-May-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Natural Language Processing (NLP)

Pradnya Nimkar, ACAS, MAAA

Disclaimer: This presentation is going to be…..wordy!

Page 2: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

NLP is everywhere!

Page 3: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Business cases in Insurance:● Lot of unstructured data

○ Data that does not follow a predefined pattern ○ accident description, injury description, claim notes, doctor notes, nurses notes, policy terms

etc.

● Claim Triage model○ Analyze claim notes, accident descriptions, injury descriptions to identify large losses early on.

● Risk Management Practices○ Identify and label areas that need attention

● Fraud Models○ Analyze settlement notes, claim notes to identify fraud claims

● Underwriting/ Policy Management○ Avoid costly mistakes by pointing underwriters to inconsistencies in tailor made wordings

● Claims Management:○ Analyze claims/complaints and direct them to appropriate claim adjuster○ Speed up decision making process by matching claim notes with existing claims

Page 4: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Why should an actuary care about NLP?ASOP 38: Using Models Outside the Actuary’s Area of Expertise (Property and Casualty)

Page 5: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

What is Natural language Processing (NLP)?● How to program computers to process and analyze large amounts of data centered around human

language.● Focus is to capture syntactic and semantic meaning of the natural language.

History of NLP:● Dates back to 1950● 1950-1980:

○ Handwritten rules, lot of if...then...statements○ Hard to maintain

● 1980- Now○ Corpus*/Statistical methods

● Now- Future○ Deep learning methods + Statistical Methods

Page 6: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Typical workflow with unstructured data:

Page 7: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Preprocessing - some NLP terminology

WORKER slipped while carrying groceries. Worker fractured his elbow

worker developed carpal tunnel from repetitive typing

worker got traumatized from NLP presentation

Corpus (Collection of texts (paragraphs, papers, books))

NLP,WORKER,carpal,carrying,developed,elbow,fractured,from,got,groceries,his,presentation,repetitive,slipped,traumatized,tunnel,typing,while,worker

Vocabulary (The unique list of words observed)

Page 8: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Preprocessing - Casing

WORKER slipped while carrying groceries. Worker fractured his elbow

worker slipped while carrying groceries. worker fractured his elbow

worker developed carpal tunnel from repetitive typing

worker developed carpal tunnel from repetitive typing

worker got traumatized from nlp presentation

worker got traumatized from NLP presentation

Page 9: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Preprocessing - Lemmatization (Reduce word to canonical form)

WORKER slipped while carrying groceries. Worker fractured his elbow

worker slip while carry grocery. worker fracture his elbow

worker developed carpal tunnel from repetitive typing

worker develop carpal tunnel from repetitive typing

worker get traumatize from nlp presentation

worker got traumatized from NLP presentation

tokens

Page 10: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Preprocessing - Stemming (Removing affixes to get stem word)

WORKER slipped while carrying groceries. Worker fractured his elbow

worker slip while carri groceri. worker fractur hi elbow

worker developed carpal tunnel from repetitive typing worker develop carpal tunnel from

repetit type

worker got traumat from nlp present

worker got traumatized from NLP presentation

Page 11: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Other preprocessing steps to be considered:

● Part of speech tagging● Remove stop words like a, an, the, in etc.● Remove special characters● Expanding contractions● Dealing with abbreviations and misspellings

Main take-away: balance between simplification and retention of language nuance, encoding as much information as possible in the most tightly organized way possible

Page 12: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Time to go to space...Vector Space ● Two words: Word vectors!● Core idea: Map the text to mathematical entities (vectors)● Vector space Models(VSM)are most common models in NLP

○ Translate raw texts to vectors○ There are many!

● Popular VSMs○ Sparse Representation( Don’t reduce the vector space):

■ Counts (Term-Frequency)■ Absence or presence of word (One-hot-encoding)■ TF-IDF (Term frequency Inverse document frequency)

○ Dense Representation: (Reduce the space)■ LSI /LDA (Dimensionality reduction)■ Word embeddings : e.g. Word2vec (Neural net), Glove■ Sentence, Document embeddings: e.g. Doc2vec, SkipThought

Page 13: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Vector Space Model I : Counts or Term-frequency

● Count number times each word occurs● Order of words does not matter● Hence, the term bag of words (BOW)

worker

deve lop

carpal

note_2[1,1,1]

carpal carry develop elbow worker

note_1 0 1 0 1 2

note_2 1 0 1 0 1

note_3 0 0 0 0 1

note_1 “worker carpal deve lop” [2,0 ,0 ]

Page 14: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Vector Space Model II: Binary or One hot encoding

● Zipf’s law for word distributions○ Word counts follow a long tailed distribution

● Presence or absence of a word○ 1 = if term occurs at least once○ 0 = if word does not occur

Page 15: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

VSM III: Term Frequency-Inve rse Document Frequency (TF-IDF)

● Calculate s importance of a te rm for a particular document

tf(t) * idf(t)

Greate r when the te rm isFrequent in a particular document

Greate r when the te rm isRare in ALL the documents (corpus)

● Diffe rent we ighing schemes for idf part - most common is logarithmic

Total number of documents

Number of documents in which that te rm appears

Page 16: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

VSM III: TF-IDF Python implementation example

tf(note_1, worker) = 2, N= 3, df(worker) = 3

carpal carry develop elbow worker

note_1 0 0.3451 0 0.3451 0.407

note_2 0.4107 0 0.4107 0 0.2425

note_3 0 0 0 0 0.2660

= 0.407

carpal carry develop elbow worker

note_1 0 1 0 1 2

note_2 1 0 1 0 1

note_3 0 0 0 0 1

Count model

Page 17: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

VSM III: Other Considerations in TF-IDF

Min df: removes highly infrequent termsmin_df = 0.10 => ignore the terms that occur in less than 1% of the documentsmin_df = 3 => ignore the terms that occur in less than 3 documents

Max df: removes terms that occur too frequentlymax_df = 0.5 => ignore the terms that occur in more than 50% of the documentsmax_df = 5 => ignore the terms that occur in more than 5 documents

Ngrams: continuous sequence of wordstries to captures the context of the sentencebi-grams and tri-grams are common

Bi-grams example: worker developed carpal tunnel from repetitive typing => (worker developed, developed carpal, carpal tunnel, …..repetitive typing)

Page 18: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

● Advantages:○ Simple but surprisingly effective○ Quick ○ Interpretable

● Disadvantages:○ Assumes all words are independent or equidistant, which is not the case in real world○ Very sparse representation (sparse = bad because few examples to learn from)

Page 19: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Cosine Similarity● Any text can be represented by V-dimensional vector space. ● Cosine similarity used for measuring the similarity between the two vectors:

○ Measures the cosine of the angle between the two vectors

○ cosine is bound by [-1,1]: 1 being similar, 0 being dissimilar and -1 being opposite

● Basic Fraud Model : Rank other claim notes with respect to cosine similarity wrt fraudulent claim

Claim note associated with Fraudulent claim

Investigate claim associated with red claim note

Page 20: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Curse of Dimensionality● Dimensionality increases, the volume of the space increases so fast that the available data become

sparse

● Matrix view○ Sparse Lot of zero values○ Do not provide any additional information○ Arithmetic operations take a lot of time○ Takes lot of space in the memory

● Distance calculations○ In high dimensional vector space, distances are far.○ When a measure such as a Euclidean distance is defined using many coordinates, there is little

difference in the distances between different pairs of samples.

● Answer: Reduce the dimensions (Dense representations)

Page 21: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Dense Representation:

● Use Matrix Factorization ○ Singular value decomposition (Latent Semantic Indexing)○ Non-Negative Matrix Factorization

● Use Probabilistic inference ○ (Bayesian inference/ Latent Dirichlet allocation)

● Use Neural network approach ○ Word2vec (Google model)○ Glove○ FastText (Facebook model)○ BlazingText ( Amazon)○ Train your own!

Page 22: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Topic Modeling:

Q) Find Topics that best represents the information in these documents?

● Assumptions:○ Each topic consists of collection of words○ Each document consists of mixture of topics

● Uses:○ Unsupervised learning algorithms○ But, can also be an input to other supervised algorithms○ Labels the clusters

Page 23: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Latent Semantic Analysis/Indexing:

● Performs matrix factorization on the document-term matrix○ Matrix factorization done using Singular value decomposition○ Document term matrix : earlier tf-idf matrix transposed!

● Singular Value Decomposition

Documents (n)

Terms(m)

Tf-idf matrix

k k

k k

mxn

term document matrix

document space

topic weights

Page 24: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

LSI parameter

K: Number of dimensions to reduce to:

● Depends on the data size○ old standard (300)○ new standard (500-1000)

Page 25: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

LSI Example with k = 2:

note_1 note_2 note_3

carpal 0 0.4107 0

carry 0.3451 0 0

develop 0 0.4107 0

elbow 0.3451 0 0

worker 0.407 0.2425 0.266

0.222 -0.17

0.153 0.311

0.222 -0.17

0.15 0.311

0.358 -0.23

1.12 0

0 0.96

0.497 0.607 0.618

0.865 -0.398 -0.304

original space reduced 2-dimensional spacenote_1 ([0, 0.3451, 0, 0.3451, …...]) ===> note_1([0.497, 0.865])

Word assignment to

topics

Topic importance Topic distribution across

documents

Page 26: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

NNMF Factorization:

● Another matrix factorization method!● Decomposes document-term matrix in 2 matrices, instead of 3● Main advantage over SVD

○ Elements in both matrices are non-negative○ Input matrix has non-negative elements

● Weakness:○ Factorization is not unique

Page 27: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Topic Modeling I : Latent Dirichlet Allocation

● Developed in 2003

Assumptions:

● There are k latent topics according to which documents are generated. ● Distribution of words for each topic

○ Each topic is represented by set of terms○ Models the probability of topics each word belongs to.○ Same word can appear in multiple topics

● Mixture of topics within a document

Page 28: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Topic Modeling I : Latent Dirichlet AllocationTopic 1: '0.038*"injury" + 0.027*"neck" + 0.024*"whiplash" + 0.017*"sti" + 0.015*"strain" + 0.011*"spin" + 0.010*"cerv" + 0.010*"low" + 0.009*"whiplash injury"+ ……..'

Topic 2: '0.019*"anxy" + 0.013*"disord" + 0.012*"depress" + 0.009*"ptsd" + 0.007*"stress" + 0.007*"adjust" + 0.006*"adjust disord" + 0.006*"traum" + 0.006*"post" + 0.005*"shock+ ……."

Topic 3: 0.017*"bru" + 0.015*"rt" + 0.012"left" + 0.010*"lt" + 0.010*"injury" + 0.009*"abras" + 0.009*"lac" + 0.011**"kne" + 0.007*"cut" + 0.006*"fall+ …..."'

Topic 4: '0.014*"rt" + 0.013*"kne" + 0.009*"left" + 0.009*"fract" + 0.007*"right" + 0.006*"ankl" + 0.005*"lt" + 0.005*"dist" + 0.005*"tib" + 0.004*"foot + ….."'

Document Level:

ip suff whiplash injury and has return to work whiplash injury of the neck musculoliga strain of the back sti of the l should adjust disord w depress mood anxy aggrav pre ex deg chang up low spin whiplash injury of the neck muscul liga strain of the back soft tissu injury of left should

Topic 1 Topic 2 Topic 3 Topic 4

0.411 0.316 0.153 0.120

Page 29: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Drawbacks of topic models:

● Sensitive to pre-processing● Training time is relatively much longer + more memory

Page 30: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Statistical word counts Topic models / grouping words Word Embeddings

Page 31: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Dense Representation● In 2013, a team at Google led by Tomas Mikolov

created word2vec

● 1956 motivated by Harris Distributional hypothesis - intuition that similar words have similar contexts, know words by the “neighbors they keep”

● Other dense word embedding variants such as Glove, matrix factorization methods, fastText

Page 32: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!
Page 33: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

● Similar words become close in space, can do vector operations that “make sense” (king-man=queen)

● Can capture synonyms, misspellings, etc., can apply transfer learning

● Drawbacks of one dominating meaning, context dependent, relatively data hungry

Dense Representation Cont.

fall

burn

burnt

fe ll

ve rb-tense

Page 34: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Modeling Architecture

worker slipped on water

onworke r wate r

slipped

CBOW Skip-gram

worker on water

slipped slippedslipped

Page 35: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

FastText - Dealing with words not in Vocabulary

Page 36: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Dense Representation Cont.● Might wonder, ok we now have vectors for each word… how’s that work for

sentences? Paragraphs?● Many answers (its own topic in research and practice):● Average vectors, concatenate, sentence embeddings, document embeddings

Page 37: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Recap / Summary

● We saw different ways to turn words into numbers: counts, groups, embeddings

● Simplicity + Speed vs Complexity + Cost● Implicit: Data Dependent (Conservation of Garbage)

Page 38: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Additional resources:● Code for generating some of the demonstrated VSM:

https://github.com/pradnya-nimkar/CABA-presentation/blob/master/CABA%20NLP%20presentation-May31.ipynb

● LDA Blei paper

https://ai.stanford.edu/~ang/papers/nips01-lda.pdf

● Word2vec paper link

https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

● Stanford NLP

https://nlp.stanford.edu/

Page 39: Natural Language Processing (NLP) · Natural Language Processing (NLP) Pradnya Nimkar, ACAS, MAAA. Disclaimer: This presentation is going to be…..wordy!

Thank you!