tal: tâches de nlp introduction à la classification des
TRANSCRIPT
![Page 1: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/1.jpg)
TAL: TÂCHES DE NLPINTRODUCTION À LA CLAS-SIFICATION DES DOCUMENTSTEXTUELS
Vincent Guigue
![Page 2: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/2.jpg)
INTRODUCTION:différentes tâchesen analysede données textuelles
![Page 3: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/3.jpg)
C’est quoi du texte?
Une suite de lettres
l e _ c h a t _ e s t ...
Une suite de mots
le chat est ...
Un ensemble de mots
Dans l’ordre alphabétique
chatestle...
Au moins une source à vérifier: C. Manning, Stanford:https://nlp.stanford.edu/cmanning/
http://web.stanford.edu/~jurafsky/NLPCourseraSlides.html
http://web.stanford.edu/class/cs224n/TAL – Introduction 2/44
![Page 4: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/4.jpg)
Différentes tâches de base
Analyse grammaticalePart-Of-Speech (POS)
Nom, Nom propre, Déterminant, Verbe...NER = Named Entity Recognition
détection des noms propres, travail sur les co-référencesSRL = Semantic Role Labeling sujet, verbe, compléments...Résolution de co-références
Le chat est dans le jardin, il mange un morceau jambon.
Analyse thématique / sémantique
Construire une métrique entre les motsClassification thématique de documents
e.g. football, article scientifique, analyse politiqueClassification de sentiments positif, négatif, neutre, agressif, . . .
⇒ Ces tâches sont très variées et à différentes échelles: mots,phrases, documents
TAL – Introduction 3/44
![Page 5: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/5.jpg)
Tâches de plus haut niveau
Topic detection & trackingTraduction automatiqueQuestion AnsweringExtraction d’informationGénération de textes
TAL – Introduction 4/44
![Page 6: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/6.jpg)
Illustrations
Part-of-Speech (POS) Tagging
tags: ADJECTIVE, NOUN, PREPOSITION, VERB, ADVERB,ARTICLE...
Exemple: Bob drank coffee at Starbucks⇒ Bob (NOUN) drank (VERB) coffee (NOUN) at (PREPOSITION)Starbucks (NOUN).
Named Entity Recognition (NER)
TAL – Introduction 5/44
![Page 7: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/7.jpg)
Illustrations (suite)
Parsing
crédit :
CoreNLP
Semantic Role Labeling :Information ExtractionQuestion answering (QA)
TAL – Introduction 6/44
![Page 8: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/8.jpg)
Illustrations (suite)
ParsingSemantic Role Labeling :
Crédit: Stanford NLP
Information ExtractionQuestion answering (QA)
TAL – Introduction 6/44
![Page 9: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/9.jpg)
Illustrations (suite)
ParsingSemantic Role Labeling :Information Extraction
Dan$Jurafsky$
Informa%on(Extrac%on(
Subject:$curriculum(mee%ng($$$$$Date:$January$15,$2012$
$$$$$$$$$To:$Dan$Jurafsky$
$
Hi$Dan,$we’ve$now$scheduled$the$curriculum$meeIng.$
It$will$be$in$Gates$159$tomorrow$from$10:00O11:30.$
OChris$
3$
Create new Calendar entry
Event: Curriculum mtg Date: Jan-16-2012 Start: 10:00am End: 11:30am Where: Gates 159
Question answering (QA)
TAL – Introduction 6/44
![Page 10: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/10.jpg)
Illustrations (suite)
ParsingSemantic Role Labeling :Information ExtractionQuestion answering (QA)
TAL – Introduction 6/44
![Page 11: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/11.jpg)
Illustrations (suite)
Extraction d’information & analyse de sentimentsDan$Jurafsky$
Informa%on(Extrac%on(&(Sen%ment(Analysis(
• nice$and$compact$to$carry!$$
• since$the$camera$is$small$and$light,$I$won't$need$to$carry$
around$those$heavy,$bulky$professional$cameras$either!$$
• the$camera$feels$flimsy,$is$plasIc$and$very$light$in$weight$you$have$to$be$very$delicate$in$the$handling$of$this$camera$4$
Size$and$weight$
AWributes:$$zoom$$affordability$
$size$and$weight$$flash$$
$ease$of$use$
✓$
✗$
✓$
TAL – Introduction 7/44
![Page 12: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/12.jpg)
Illustrations (suite)
Traduction automatiqueAligner des motsGénérer une phrase intelligible / vraisemblable
Historique (en évolution rapide)traduction de motstraduction de séquencestraduction de connaissances / signification
TAL – Introduction 8/44
![Page 13: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/13.jpg)
Petit bilan (avant de commencer !)
Dan$Jurafsky$
Language(Technology(
Coreference$resoluIon$
QuesIon$answering$(QA)$
PartOofOspeech$(POS)$tagging$
Word$sense$disambiguaIon$(WSD)$
Paraphrase$
Named$enIty$recogniIon$(NER)$
Parsing$SummarizaIon$
InformaIon$extracIon$(IE)$
Machine$translaIon$(MT)$Dialog$
SenIment$analysis$
$$$
mostly$solved$
making$good$progress$
sIll$really$hard$
Spam$detecIon$Let’s$go$to$Agra!$
Buy$V1AGRA$…$
✓✗
Colorless$$$green$$$ideas$$$sleep$$$furiously.$
$$$$$ADJ$$$$$$$$$ADJ$$$$NOUN$$VERB$$$$$$ADV$
Einstein$met$with$UN$officials$in$Princeton$PERSON$$$$$$$$$$$$$$ORG$$$$$$$$$$$$$$$$$$$$$$LOC$
You’re$invited$to$our$dinner$party,$Friday$May$27$at$8:30$
Party$May$27$
add$
Best$roast$chicken$in$San$Francisco!$
The$waiter$ignored$us$for$20$minutes.$
Carter$told$Mubarak$he$shouldn’t$run$again.$
I$need$new$baWeries$for$my$mouse.$
The$13th$Shanghai$InternaIonal$Film$FesIval…$
�13����������…�
The$Dow$Jones$is$up$
Housing$prices$rose$
Economy$is$good$
Q.$How$effecIve$is$ibuprofen$in$reducing$fever$in$paIents$with$acute$febrile$illness?$
I$can$see$Alcatraz$from$the$window!$
XYZ$acquired$ABC$yesterday$
ABC$has$been$taken$over$by$XYZ$
Where$is$CiIzen$Kane$playing$in$SF?$$
Castro$Theatre$at$7:30.$Do$you$want$a$Icket?$
The$S&P500$jumped$
TAL – Introduction 9/44
![Page 14: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/14.jpg)
Quelques difficultés pour pimenter le cours
Dan$Jurafsky$
nonBstandard(English$Great$job$@jusInbieber!$Were$
SOO$PROUD$of$what$youve$
accomplished!$U$taught$us$2$
#neversaynever$&$you$yourself$
should$never$give$up$either�$
segmenta%on(issues( idioms(dark$horse$get$cold$feet$lose$face$
throw$in$the$towel$
neologisms(unfriend$Retweet$bromance$
$
tricky(en%ty(names(Where$is$A(Bug’s(Life$playing$…(Let(It(Be$was$recorded$…$…$a$mutaIon$on$the$for$gene$…(
world(knowledge(Mary$and$Sue$are$sisters.$
Mary$and$Sue$are$mothers.$
But$that’s$what$makes$it$fun!$
the$New$YorkONew$Haven$Railroad$
the$New$YorkONew$Haven$Railroad$
Why else is natural language understanding difficult?(
TAL – Introduction 10/44
![Page 15: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/15.jpg)
Modélisation(s) du texte
![Page 16: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/16.jpg)
Chaine de traitements standard
1. Preprocessing 2. Mise en forme 3. Traitements
encodage (latin,utf8, ...)ponctuationstemminglemmatisationtokenizationminuscule/majregex...
Construction d’undictionnaire(index)+ Index inversé(pour l’explicationdes traitementsMise en formevectorielleConservation desséquences
Classification desdocs, des mots,des phrasesSémantique...Perceptron ouHMM?
TAL – Introduction 11/44
![Page 17: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/17.jpg)
BOW:Bag Of Words
![Page 18: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/18.jpg)
Handling textual data: the classification case
1 Big corpus ⇔ Huge vocabulary2 Sentence structure is hard to model3 Words are polymorphous: singular/plural, masculine/feminine4 Machine learning + large dimensionality = problems
TAL – Introduction 12/44
![Page 19: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/19.jpg)
Handling textual data: the classification case
1 Big corpus ⇔ Huge vocabularyPerceptron, SVM, Naive Bayes... Boosting, Bagging...
Distributed & efficient algorithms2 Sentence structure is hard to modelRemoving the structure...
3 Words are polymorphous: singular/plural, masculine/feminineSeveral approaches... (see below)
4 Machine learning + large dimensionality = problemsRemoving useless words
TAL – Introduction 12/44
![Page 20: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/20.jpg)
Bag of words
Sentence structure = costly handling ⇒ Elimination !
Thus : Document = set of words + countings
Bag of words representation1 Extraction of vocabulary V
2 Each document becomes a counting vector : d ∈ N|V |
Note: d is always a sparse vectors, mainly composed of 0
TAL – Introduction 13/44
![Page 21: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/21.jpg)
Example
Set of toy documents:1 documents = [ ’The␣ l i o n ␣ does ␣ not ␣ l i v e ␣ i n ␣ the ␣ j u n g l e ’ ,\2 ’ L i on s ␣ ea t ␣ b i g ␣ p r e y s ’ ,\3 ’ I n ␣ the ␣zoo , ␣ the ␣ l i o n ␣ s l e e p ’ ,\4 ’ S e l f−d r i v i n g ␣ c a r s ␣ w i l l ␣be␣autonomous␣ i n ␣ towns ’ ,\5 ’ The␣ f u t u r e ␣ ca r ␣ has ␣no␣ s t e e r i n g ␣whee l ’ ,\6 ’My␣ ca r ␣ a l r e a d y ␣ has ␣ s e n s o r s ␣and␣a␣camera ’ ]
Dictionary
Green level ∝ nb occurences
TAL – Introduction 14/44
![Page 22: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/22.jpg)
Information coding
Counting words appearing in 2 documents:
The lion does not live in the jungleIn the zoo, the lion sleep
The
10
does
10
in
10
jung
le
10
lion
11
live
10
not
10
the
12
Lions
00
big
00
eat
00
prey
s
00
In
01
sleep
01
zoo,
01
Self-
driv
ing
00
auto
nom
ous
00
be
00
cars
00
town
s
00
will
00
car
00
futu
re
00
has
00
no
00
stee
ring
00
whee
l
00
My
00
a
00
alre
ady
00
and
00
cam
era
00
sens
ors
00
+ We are able to vectorize textual information− Dictionary requires preprocessing
TAL – Introduction 15/44
![Page 23: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/23.jpg)
Word representation & semantic gap
All words are orthogonal:
Considering virtual 2 documents made of a single word :[0 . . . 0 dik > 0 . . . 00 . . . djk ′ > 0 0 . . . 0
]
Then: k 6= k ′ ⇒ di · dj = 0
...even if wk = lion and wk ′ = lions
⇒ Definition of the semantic gapNo metrics between words
TAL – Introduction 16/44
![Page 24: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/24.jpg)
Semantic issue
Understanding documents = matching relevant descriptors
Syntactic difference ⇒orthogonality of the representation vectors
Word groups : more intrinsic semantics...... but fewer match with other document
N-grams ⇒ dictionary size ↗N-grams = great potential...
but require careful preprocessings
This film was not interesting
Unigrams: this, film, was, not, interesting
bigrams: this_film, film_was, was_not, not_interesting
N-grams... + combination: e.g. 1-3 grams
TAL – Introduction 17/44
![Page 25: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/25.jpg)
Implementation issues
How many unique words in a corpus of 10k movie reviews?
Example:Story of a man who has unnatural feelings for a pig. Starts out with a openingscene that is a terrific example of absurd comedy. A formal orchestra audienceis turned into an insane, violent mob by the crazy chantings of it’s singers.Unfortunately it stays absurd the WHOLE time with no general narrativeeventually making it just too off putting. Even those from the era should beturned off. The cryptic dialogue would make Shakespeare seem easy to a thirdgrader. On a technical level it’s better than you might think with some goodcinematography by future great Vilmos Zsigmond. Future stars Sally Kirklandand Frederic Forrest can be seen briefly.
104077 unique words...104 × 105 × 4 bytes = 4 · 109 ⇒ 4Gb...
Against 100Mb of raw textual data. How to improve?Sparse coding / hash table ⇒ 0 are no longer encoded
TAL – Introduction 18/44
![Page 26: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/26.jpg)
Implementation issues
How many unique words in a corpus of 10k movie reviews?
Example:Story of a man who has unnatural feelings for a pig. Starts out with a openingscene that is a terrific example of absurd comedy. A formal orchestra audienceis turned into an insane, violent mob by the crazy chantings of it’s singers.Unfortunately it stays absurd the WHOLE time with no general narrativeeventually making it just too off putting. Even those from the era should beturned off. The cryptic dialogue would make Shakespeare seem easy to a thirdgrader. On a technical level it’s better than you might think with some goodcinematography by future great Vilmos Zsigmond. Future stars Sally Kirklandand Frederic Forrest can be seen briefly.
104077 unique words...
104 × 105 × 4 bytes = 4 · 109 ⇒ 4Gb...Against 100Mb of raw textual data. How to improve?
Sparse coding / hash table ⇒ 0 are no longer encoded
TAL – Introduction 18/44
![Page 27: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/27.jpg)
Implementation issues
How many unique words in a corpus of 10k movie reviews?
Example:Story of a man who has unnatural feelings for a pig. Starts out with a openingscene that is a terrific example of absurd comedy. A formal orchestra audienceis turned into an insane, violent mob by the crazy chantings of it’s singers.Unfortunately it stays absurd the WHOLE time with no general narrativeeventually making it just too off putting. Even those from the era should beturned off. The cryptic dialogue would make Shakespeare seem easy to a thirdgrader. On a technical level it’s better than you might think with some goodcinematography by future great Vilmos Zsigmond. Future stars Sally Kirklandand Frederic Forrest can be seen briefly.
104077 unique words...104 × 105 × 4 bytes = 4 · 109 ⇒ 4Gb...
Against 100Mb of raw textual data. How to improve?
Sparse coding / hash table ⇒ 0 are no longer encoded
TAL – Introduction 18/44
![Page 28: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/28.jpg)
Implementation issues
How many unique words in a corpus of 10k movie reviews?
Example:Story of a man who has unnatural feelings for a pig. Starts out with a openingscene that is a terrific example of absurd comedy. A formal orchestra audienceis turned into an insane, violent mob by the crazy chantings of it’s singers.Unfortunately it stays absurd the WHOLE time with no general narrativeeventually making it just too off putting. Even those from the era should beturned off. The cryptic dialogue would make Shakespeare seem easy to a thirdgrader. On a technical level it’s better than you might think with some goodcinematography by future great Vilmos Zsigmond. Future stars Sally Kirklandand Frederic Forrest can be seen briefly.
104077 unique words...104 × 105 × 4 bytes = 4 · 109 ⇒ 4Gb...
Against 100Mb of raw textual data. How to improve?Sparse coding / hash table ⇒ 0 are no longer encoded
TAL – Introduction 18/44
![Page 29: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/29.jpg)
Implementation issues (2)
Hash table...⇒ no operator !
higher level sparse coding = sparse matrixSeveral implementations
key = line/column indexing, case indexing or block matrices
Sparse matrices are rather well integrated...but take care: if your program has a strange behavior (eg inscikit learn); may be there is an implicit conversion to fullmatrix inside.
TAL – Introduction 19/44
![Page 30: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/30.jpg)
Implementation issues (2)
Hash table...⇒ no operator !
higher level sparse coding = sparse matrixSeveral implementations
key = line/column indexing, case indexing or block matrices
Sparse matrices are rather well integrated...but take care: if your program has a strange behavior (eg inscikit learn); may be there is an implicit conversion to fullmatrix inside.
TAL – Introduction 19/44
![Page 31: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/31.jpg)
Implementation issues (2)
Hash table...⇒ no operator !
higher level sparse coding = sparse matrixSeveral implementations
key = line/column indexing, case indexing or block matrices
Sparse matrices are rather well integrated...but take care: if your program has a strange behavior (eg inscikit learn); may be there is an implicit conversion to fullmatrix inside.
TAL – Introduction 19/44
![Page 32: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/32.jpg)
Modélisations du texte
Approche classique:
Sac de mots (bag-of-words, BoW)+ Avantages BoW
plutôt simple, plutôt légerrapide (systèmes temps-réel, RI, indexation naturelle...)nb possibilités d’enrichissement
(POS, codage du contexte, N-gram...)bien adapté pour la classification de documentsImplémentations existantes efficaces nltk, sklearn
− Inconvénient(s) BoWperte de la structure des phrases/documents
⇒ Plusieurs tâches difficiles à attaquerNER, POS tagging, SRLGénération de textes
TAL – Introduction 20/44
![Page 33: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/33.jpg)
Traitement des donnéesséquentielles
![Page 34: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/34.jpg)
Mieux gérer les séquences
1 Enrichissement de la description vectorielleN-grams,Description du contexte des mots...Usage type : amélioration des tâches de classification auniveau document
2 Approche par fenêtre glissante : prendre une décision àl’échelle intra-documentaire
Taille fixe ⇒ possibilité de description vectorielleClassifieur sur une représentation locale du texte
Traitement du signal (AR, ARMA...)Détection de pattern (frequent itemsets, règles d’association)
3 Modèles séquentielsHidden Markov Model (=Modèles de Markov Cachés)
Les dépendances dans un MMC
8
UPM
C -
M1
- MQ
IA -
T. A
rtièr
es
CRF (Conditional Random Fields) : approche discriminante
TAL – Introduction 21/44
![Page 35: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/35.jpg)
Mieux gérer les séquences
1 Enrichissement de la description vectorielleN-grams,Description du contexte des mots...Usage type : amélioration des tâches de classification auniveau document
2 Approche par fenêtre glissante : prendre une décision àl’échelle intra-documentaire
Taille fixe ⇒ possibilité de description vectorielleClassifieur sur une représentation locale du texte
Traitement du signal (AR, ARMA...)Détection de pattern (frequent itemsets, règles d’association)
3 Modèles séquentielsHidden Markov Model (=Modèles de Markov Cachés)
Les dépendances dans un MMC
8
UPM
C -
M1
- MQ
IA -
T. A
rtièr
es
CRF (Conditional Random Fields) : approche discriminante
TAL – Introduction 21/44
![Page 36: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/36.jpg)
Mieux gérer les séquences
1 Enrichissement de la description vectorielleN-grams,Description du contexte des mots...Usage type : amélioration des tâches de classification auniveau document
2 Approche par fenêtre glissante : prendre une décision àl’échelle intra-documentaire
Taille fixe ⇒ possibilité de description vectorielleClassifieur sur une représentation locale du texte
Traitement du signal (AR, ARMA...)Détection de pattern (frequent itemsets, règles d’association)
3 Modèles séquentielsHidden Markov Model (=Modèles de Markov Cachés)
Les dépendances dans un MMC
8
UPM
C -
M1
- MQ
IA -
T. A
rtièr
es
CRF (Conditional Random Fields) : approche discriminante
TAL – Introduction 21/44
![Page 37: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/37.jpg)
Historique des approches POS, SRL, NER
Modélisation par règle d’association 80’sQuelles sont les cooccurrences fréquentes entre un POS et unitem dans son contexte?⇒ Règles
Modélisation bayésiennePour un POS i , modélisation de la distribution du contextep(contexte|θi )Décision en MV: arg maxi p(contexte|θi )
Extension structurée (Hidden Markov Model) > 1985/90
HMM taggers are fast and achieve precision/recall scores ofabout 93-95%
Vers une modélisation discriminante (CRF) > 2001Recurrent Neural Network (cf cours ARF, AS) > 2010
TAL – Introduction 22/44
![Page 38: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/38.jpg)
TAL / ML : beaucoup de choses en commun
Des financements liés (et pas toujours glorieux) :Conférences MUC / TREC (...)
RI, extraction d’info, classification de doc, sentiments, QAMultiples domaines: général, médecine, brevet, judiciaire...⇒ Construction de bases, centralisation des résultats,échanges
NSA,Boites noires (loi sur le renseignement 2015)
Avancée ML tirée par le TAL : HMM, CRFDe nombreux projets ambitieux:
Google Knowledge Graph,NELL: Never-Ending Language Learning (Tom Mitchell, CMU)
TAL – Introduction 23/44
![Page 39: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/39.jpg)
HMM
![Page 40: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/40.jpg)
Formalisation séquentielle
Observations : Le chat est dans le salonEtiquettes : DET NN VBZ ...
Le HMM est pertinent: il est basé sur
les enchainements d’étiquettes,les probabilités d’observation
TAL – Introduction 24/44
![Page 41: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/41.jpg)
Notations
La chaine de Markov est toujours composée de:d’une séquence d’états S = (s1, . . . , sT )dont les valeurs sont tirées dans un ensemble finiQ = (q1, ..., qN)Le modèle est toujours défini par {Π,A}
πi = P(s1 = qi)aij = p(st+1 = qj |st = qi)
Les observations sont modélisées à partir des stséquence d’observation: X = (x1, . . . , xT )loi de probabilité: bj(t) = p(xt |st = qj)B peut être discrète ou continue
MMC: λ = {Π,A,B}
TAL – Introduction 25/44
![Page 42: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/42.jpg)
Ce que l’on manipule
Séquence d’observationsX = (x1, . . . , xT )
Séquence d’états (cachée = manquante)S = (s1, . . . , sT )
TAL – Introduction 26/44
![Page 43: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/43.jpg)
Rappels sur la structure d’un MMC
Constitution d’un MMC:
1
2
N
S1 S2 S3 S4 ST
X1 X2 X3 X4 XT
Observations
Etats
...
...
...
... ... ... ... ...
Les états sont inconnus...
TAL – Introduction 27/44
![Page 44: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/44.jpg)
Rappels sur la structure d’un MMC
Constitution d’un MMC:
1
2
N
S1 S2 S3 S4 ST
X1 X2 X3 X4 XT
Observations
Etats
...
...
...
... ... ... ... ...Pi
Les états sont inconnus...
TAL – Introduction 27/44
![Page 45: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/45.jpg)
Rappels sur la structure d’un MMC
Constitution d’un MMC:
1
2
N
S1 S2 S3 S4 ST
X1 X2 X3 X4 XT
Observations
Etats
...
...
...
... ... ... ... ...
A
Hyp. Ordre 1:chaque état nedépend que duprécédent
Les états sont inconnus...La combinatoire à envisager est problématique!
TAL – Introduction 27/44
![Page 46: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/46.jpg)
Rappels sur la structure d’un MMC
Constitution d’un MMC:
1
2
N
S1 S2 S3 S4 ST
X1 X2 X3 X4 XT
Observations
Etats
...
...
...
... ... ... ... ...B
Hyp. Ordre 1:chaque état nedépend que duprécédent
Chaque obs. nedépend que del’état courant
Les états sont inconnus...
TAL – Introduction 27/44
![Page 47: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/47.jpg)
Les trois problèmes des MMC (Fergusson - Rabiner)
Evaluation: λ donné, calcul de p(xT1 |λ)
Décodage: λ donné, quelle séquence d’états a généré lesobservations?
sT?1 = arg maxsT1
p(xT1 , sT1 |λ)
Apprentissage: à partir d’une série d’observations, trouver λ?
λ? = {Π?,A?,B?} = arg maxsT1 ,λ
p(xT1 , sT1 |λ)
TAL – Introduction 28/44
![Page 48: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/48.jpg)
PB1: Algorithme forward (prog dynamique)
αt(i) = p(x t1, st = i |λ)
Initialisation:αt=1(i) = p(x11 , s1 = i |λ) = πibi (x1)
Itération:
αt(j) =
[N∑
i=1
αt−1(i)aij
]bj(xt)
Terminaison:
p(xT1 |λ) =N∑
i=1
αT (i)
Complexité linéaire en T
Usuellement: T >> N
1
2 j
N
S1 S2 S3
X1 X2 X3
Observations
Etats
... ... ...
A
TAL – Introduction 29/44
![Page 49: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/49.jpg)
PB2: Viterbi (récapitulatif)
δt(i) = maxst−11
p(st−11 , st = i , x t1|λ)
1 Initialisationδ1(i) = πibi (x1)Ψ1(i) = 0
2 Récursion
δt(j) =
[max
iδt−1(i)aij
]bj(xt)
Ψt(j) = arg maxi∈[1, N]
δt−1(i)aij
3 TerminaisonS? = maxiδT (i)
4 Cheminq?T = arg max
iδT (i)
q?t = Ψt+1(q?t+1)
1
2 j
N
S1 S2 S3
X1 X2 X3
Observations
Etats
... ... ...
A
TAL – Introduction 30/44
![Page 50: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/50.jpg)
PB3: Apprentissage des MMC
Version simplifiée (hard assignment): type k-meansNous disposons de :
Evaluation: p(xT1 |λ)Décodage: sT?
1 = arg maxsT1 p(xT1 |λ)
Proposition:Data: Observations : X , Structure= N,KResult: Π?, A?, B?
Initialiser λ0 = Π0,A0,B0;→ finement si possible;
t = 0;while convergence non atteinte do
St+1 = decodage(X , λt);λ?t+1 = Πt+1,At+1,B t+1 obtenus par comptage des transitions ;
t = t + 1;endAlgorithm 1: Baum-Welch simplifié pour l’apprentissage d’un MMC
Vous avez déjà tous les éléments pour faire ça!
TAL – Introduction 31/44
![Page 51: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/51.jpg)
Apprentissage en contexte supervisé
Observations : Le chat est dans le salonEtiquettes : DET NN VBZ ...
Beaucoup plus simple (après la couteuse tâche d’étiquetage) :Matrices A,B,Π obtenues par comptage...Inférence = viterbi
Philosophie & limites:Trouver l’étiquetage qui maximise la vraisemblance de laséquence états–observations...
... sous les hypothèses des HMM – indépendance desobservations étant donnés les états, ordre 1 –
TAL – Introduction 32/44
![Page 52: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/52.jpg)
HMM ⇒ CRF
Introduction to CRF, Sutton & McCallum1.2 Graphical Models 7
Logistic Regression
HMMs
Linear-chain CRFs
Naive BayesSEQUENCE
SEQUENCE
CONDITIONAL CONDITIONAL
Generative directed models
General CRFs
CONDITIONAL
GeneralGRAPHS
GeneralGRAPHS
Figure 1.2 Diagram of the relationship between naive Bayes, logistic regression,
HMMs, linear-chain CRFs, generative models, and general CRFs.
Furthermore, even when naive Bayes has good classification accuracy, its prob-
ability estimates tend to be poor. To understand why, imagine training naive
Bayes on a data set in which all the features are repeated, that is, x =
(x1, x1, x2, x2, . . . , xK , xK). This will increase the confidence of the naive Bayes
probability estimates, even though no new information has been added to the data.
Assumptions like naive Bayes can be especially problematic when we generalize
to sequence models, because inference essentially combines evidence from di↵erent
parts of the model. If probability estimates at a local level are overconfident, it
might be di�cult to combine them sensibly.
Actually, the di↵erence in performance between naive Bayes and logistic regression
is due only to the fact that the first is generative and the second discriminative;
the two classifiers are, for discrete input, identical in all other respects. Naive Bayes
and logistic regression consider the same hypothesis space, in the sense that any
logistic regression classifier can be converted into a naive Bayes classifier with the
same decision boundary, and vice versa. Another way of saying this is that the naive
Bayes model (1.5) defines the same family of distributions as the logistic regression
model (1.7), if we interpret it generatively as
p(y,x) =exp {Pk �kfk(y,x)}Py,x exp {Pk �kfk(y, x)} . (1.9)
This means that if the naive Bayes model (1.5) is trained to maximize the con-
ditional likelihood, we recover the same classifier as from logistic regression. Con-
versely, if the logistic regression model is interpreted generatively, as in (1.9), and is
trained to maximize the joint likelihood p(y,x), then we recover the same classifier
as from naive Bayes. In the terminology of Ng and Jordan [2002], naive Bayes and
logistic regression form a generative-discriminative pair.
The principal advantage of discriminative modeling is that it is better suited to
TAL – Introduction 33/44
![Page 53: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/53.jpg)
Modélisation CRF
Séquence de mots x = {x1, . . . , xT}Sequence d’étiquettes y (=POS tag)
Estimation paramétrique des probabilités basées sur la familleexponentielle:
p(y, x) =1Z
∏
t
Ψt(yt , yt−1, xt),
Ψt(yt , yt−1, xt) = exp (∑
k θk fk(yt , yt−1, xt))
Dépendances de Ψt ⇒ forme du modèle HMMθk : paramètres à estimer (cf regression logistique)fk(yt , yt−1, xt) : expression générique des caractéristiques
(détails plus loin)
TAL – Introduction 34/44
![Page 54: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/54.jpg)
CRF = généralisation des HMM
Cas général (slide précédent):
p(y, x) =1Z
∏
t
exp
(∑
k
θk fk(yt , yt−1, xt)
)
Cas particulier :fk = existence de (yt , yt−1) ou (yt , xt)
p(y, x) =
1Z
∏
t
exp
∑
i ,j∈Sθi ,j1yt=i&yt−1=j +
∑
i∈S,o∈Oµo,i1yt=i&xt=o
Avec :
θi ,j = log p(yt = i |yt−1 = j)
µo,i = log p(x = o|yt = i)
Z = 1
⇒ Dans ce cas, les caractéristiques sont binaires (1/0)TAL – Introduction 35/44
![Page 55: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/55.jpg)
CRF : passage aux probas conditionnelles
p(y, x) =1Z
∏
t
exp
(∑
k
θk fk(yt , yt−1, xt)
)
⇒p(y|x) =
∏t exp (
∑k θk fk(yt , yt−1, xt))∑
y ′∏
t exp(∑
k θk fk(y ′t , y′t−1, xt)
)
=1
Z (x)
∏
t
exp
(∑
k
θk fk(yt , yt−1, xt)
)
TAL – Introduction 36/44
![Page 56: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/56.jpg)
Apprentissage CRF
p(y|x) =1
Z (x)
∏
t
exp
(∑
k
θk fk(yt , yt−1, xt)
)
Comme pour la regression logistique:
Lcond =∑
n
log p(yn|xn) =
∑
n
∑
t
∑
k
θk fk(xn, ynt , ynt−1)−
∑
n
log(Z (xn))
Comment optimiser?
TAL – Introduction 37/44
![Page 57: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/57.jpg)
Apprentissage CRF
p(y|x) =1
Z (x)
∏
t
exp
(∑
k
θk fk(yt , yt−1, xt)
)
Comme pour la regression logistique:
Lcond =∑
n
log p(yn|xn) =
∑
n
∑
t
∑
k
θk fk(xn, ynt , ynt−1)−
∑
n
log(Z (xn))
Comment optimiser?
Montée de gradient ∂L∂θk
θk ← θk +∑
n,t
fk(xn, ynt , ynt−1)−
∑
n,t
∑
y ′t ,y′t−1
fk(xn, y ′t , y′t−1)p(y ′t , y
′t−1|xn)
Calcul exact possible O(TM2N), solutions approchées plus rapides
M : nb labels, T : lg chaine, N : nb chaines
TAL – Introduction 37/44
![Page 58: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/58.jpg)
Apprentissage CRF
p(y|x) =1
Z (x)
∏
t
exp
(∑
k
θk fk(yt , yt−1, xt)
)
Comme pour la regression logistique:
Lcond =∑
n
log p(yn|xn) =
∑
n
∑
t
∑
k
θk fk(xn, ynt , ynt−1)−
∑
n
log(Z (xn))
Comment optimiser?
La difficulté réside essentiellement dans le facteur denormalisation
TAL – Introduction 37/44
![Page 59: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/59.jpg)
Régularisation
Que pensez-vous de la complexité du modèle?Comment la limiter?
Lcond =∑
n,k,t
θk fk(xn, ynt , ynt−1)−
∑
n
log(Z (xn))
⇒
Lcond =∑
n,k,t
θk fk(xn, ynt , ynt−1)−
∑
n
log(Z (xn))+1
2σ2‖θ‖2
Lcond =∑
n,k,t
θk fk(xn, ynt , ynt−1)−
∑
n
log(Z (xn))+α∑
k
|θk |
TAL – Introduction 38/44
![Page 60: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/60.jpg)
Et les fj alors???
p(y|x) =1
Z (x)exp
{∑
k
T∑
t=1
θk fk(x, yt , yt−1)
}
Par défaut, les fk sont des features et ne sont pas apprisesExemples:
f1(x, yt , yt−1) = 1 if yt = ADVERB and the word ends in"-ly"; 0 otherwise.
If the weight θ1 associated with this feature is large andpositive, then this feature is essentially saying that we preferlabelings where words ending in -ly get labeled as ADVERB.
f2(x, yt , yt−1) = 1 si t=1, yt = VERB, and the sentence endsin a question mark; 0 otherwise.f3(x, yt , yt−1) = 1 if yt−1 = ADJECTIVE and yt = NOUN; 0otherwise.
Il est possible d’apprendre des features sur des données
TAL – Introduction 39/44
![Page 61: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/61.jpg)
Et les fj alors???
p(y|x) =1
Z (x)exp
{∑
k
T∑
t=1
θk fk(x, yt , yt−1)
}
Par défaut, les fk sont des features et ne sont pas apprisesExemples:
f1(x, yt , yt−1) = 1 if yt = ADVERB and the word ends in"-ly"; 0 otherwise.f2(x, yt , yt−1) = 1 si t=1, yt = VERB, and the sentence endsin a question mark; 0 otherwise.
Again, if the weight θ2 associated with this feature is largeand positive, then labelings that assign VERB to the firstword in a question (e.g., Is this a sentence beginning with averb??) are preferred.
f3(x, yt , yt−1) = 1 if yt−1 = ADJECTIVE and yt = NOUN; 0otherwise.
Il est possible d’apprendre des features sur des données
TAL – Introduction 39/44
![Page 62: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/62.jpg)
Et les fj alors???
p(y|x) =1
Z (x)exp
{∑
k
T∑
t=1
θk fk(x, yt , yt−1)
}
Par défaut, les fk sont des features et ne sont pas apprisesExemples:
f1(x, yt , yt−1) = 1 if yt = ADVERB and the word ends in"-ly"; 0 otherwise.f2(x, yt , yt−1) = 1 si t=1, yt = VERB, and the sentence endsin a question mark; 0 otherwise.f3(x, yt , yt−1) = 1 if yt−1 = ADJECTIVE and yt = NOUN; 0otherwise.
Again, a positive weight for this feature means that adjectivestend to be followed by nouns.
Il est possible d’apprendre des features sur des données
TAL – Introduction 39/44
![Page 63: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/63.jpg)
Feature engineering
Label-observation features : les fk qui s’exprimentfk(x, yt) = 1yt=cqk(x) sont plus facile à calculer (ils sontcalculés une fois pour toutes)
e.g. : si x = moti , x termine par -ing, x a une majuscule...alors 1 sinon 0
Node-obs feature : même si c’est moins précis, essayer dene pas mélanger les références sur les observations et sur lestransitions.
fk(x, yt , yt−1) = qk(x)1yt=c1yt−1=c′ ⇒fk(xt , yt) = qk(x)1yt=c , & fk+1(yt , yt−1) = 1yt=c1yt−1=c′
Boundary labels
TAL – Introduction 40/44
![Page 64: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/64.jpg)
Feature engineering (2)
Unsupported features : générée automatiquement à partirdes observations (e.g. with n’est pas un nom de ville)... Maispas très pertinente.
Utiliser ces features pour désambiguiser les erreurs dans la based’apprentissage
Feature inductionFeatures from different time stepsRedundant featuresComplexe features = model ouputs
TAL – Introduction 41/44
![Page 65: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/65.jpg)
Inférence
Processus:
1 Définir des caractéristiques,2 Apprendre les paramètres du modèles (θ)3 Classer les nouvelles phrases (=inférence)
y? = arg maxy
p(y|x)
⇒ Solution très proche de l’algorithme de Viterbi
TAL – Introduction 42/44
![Page 66: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/66.jpg)
Inférence (2)
p(y|x) =1
Z (x)
∏
t
Ψt(yt , yt−1, xt),
Ψt(yt , yt−1, xt) = exp
(∑
k
θk fk(yt , yt−1, xt)
)
1 Passerelle avec les HMM:
Ψt(i , j , x) = p(yt = j |yt−1 = i)p(xt = x |yt = j)
αt(j) =∑
i
Ψt(i , j , x)αt−1(i), βt(i) =∑
j
Ψt(i , j , x)βt+1(j)
δt(j) = maxi
Ψt(i , j , x)δt−1(i)
2 Les définitions restent valables avec les CRF (cf Sutton,McCallum), avec:
Z (x) =∑
i
αT (i) = β0(y0)
TAL – Introduction 43/44
![Page 67: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/67.jpg)
Inférence (2)
p(y|x) =1
Z (x)
∏
t
Ψt(yt , yt−1, xt),
Ψt(yt , yt−1, xt) = exp
(∑
k
θk fk(yt , yt−1, xt)
)
1 Passerelle avec les HMM:
Ψt(i , j , x) = p(yt = j |yt−1 = i)p(xt = x |yt = j)
αt(j) =∑
i
Ψt(i , j , x)αt−1(i), βt(i) =∑
j
Ψt(i , j , x)βt+1(j)
δt(j) = maxi
Ψt(i , j , x)δt−1(i)
2 Les définitions restent valables avec les CRF (cf Sutton,McCallum), avec:
Z (x) =∑
i
αT (i) = β0(y0)
TAL – Introduction 43/44
![Page 68: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/68.jpg)
Applications
Analyse des phrases: analyse morpho-syntaxiqueNER: named entity recognition
... Mister George W. Bush arrived in Rome together with ...
... O name name name O O city O O ...
Passage en 2D... Analyse des images
TAL – Introduction 44/44
![Page 69: TAL: Tâches de NLP Introduction à la classification des](https://reader033.vdocuments.mx/reader033/viewer/2022041617/6253290ace95da0181372549/html5/thumbnails/69.jpg)
Applications
Analyse des phrases: analyse morpho-syntaxiqueNER: named entity recognition
... Mister George W. Bush arrived in Rome together with ...
... O name name name O O city O O ...
Passage en 2D... Analyse des images
Détection des contours
Classification d’objets
features =
cohésion de l’espace
enchainementsusuels/impossibles
crédit: DGM lib
TAL – Introduction 44/44