natural language processing, topic modeling, neural text … › ysxu › files ›...

32
Natural Language Processing, Topic Modeling and Neural Text Generation Yueshen Xu (lecturer) [email protected] Software Engineering Xidian University NLP & Text Mining & Machine Learning

Upload: others

Post on 27-Jun-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Natural Language Processing, Topic

Modeling and Neural Text

Generation

Yueshen Xu (lecturer)

[email protected]

Software Engineering

Xidian University

NLP & Text Mining & Machine Learning

Page 2: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Outline

Natural Language Processing

Language Understanding, Language Modeling and Language

Generation

Topic Modeling

Basic Topic Modeling

Hierarchical Topic Modeling

Neural Text Generation

Ali Xiaomi

Supplement & Reference

2

Keywords: natural language processing, topic modeling, Bayesian model, neural

network

Page 3: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Natural Language Processing

3

Natural Language

Processing (NLP)

Language Understanding

Language Modeling

Language Generation

To find latent structures, relations and rules of or in

text corpus Text Mining (Not always automatic)

To understand the structure, relation, constitution of

linguistic elements Computational Linguistics

To generate different types of linguistic texts

Artificial Intelligence

Related to Speech, Graphics, Video

Artificial Intelligence

Multimedia

Page 4: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Natural Language Processing Language Understanding

Language Understanding (a few)

Stemming: runs, ran, running run

Segmentation:我是一名大学老师我 / 是 / 一名 / 大学 / 老师

Part of speech (POS):I am a teacher I (pronoun) am (copula)

a (article) teacher (noun)

Dependency parsing:

Coreference: 小明和小江去吃饭,他说饭很好吃他?

Page 5: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Natural Language Processing Language Understanding

Language Understanding

5

Page 6: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Natural Language Processing Language Modeling

Language Modeling (a few)

≈ Text Mining

Text/Document Clustering

Text/Document Classification

Topic Modeling

➢ Hierarchical topic modeling

Sentiment Classification

➢ Aspect-level sentiment classifiction

Entity (Relation) Extraction

…etc

6

Page 7: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Natural Language Processing Language Generation

Language Generation (a few)

Machine Translation

Document Summarization

Q&A (小冰,小娜)

Poetry Generation

News Generation

Short Text Generation (sentence, weibo)

…etc

7

Topic

Modeling

Page 8: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Topic Modeling

Information Overloading

8

we need

summarization

Visualization

Dimensional Reduction

Big Data

Cloud Computing

Artificial Intelligence

Deep Learning

…, etc

Page 9: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Background

Dimensional Reduction(Text)

Document Summarization

What do these docs (or this doc) talk about?

Sentiment Analysis

What do these consumers care about or complain about?

Short Text/Tweets Mining

What are people discussing about?

9

Basic tool

Topic modeling: learn latent semantic topics from a corpus/ text collection

Page 10: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Topic Modeling

Topic modeling

an example in Chinese (from my doctorate thesis)

10

继续实施稳健的货币政策,保持松紧适度适时预调微调,做好与供给侧结构,并综合运用数量、价格等多种货币政策

从员额上来看,这次改革远远超过了裁军的数量,它是一种结构性的改革,是军队组织结构现代化的一个关键步骤

美元作为主要国际货币的地位在可预见的将来仍无可取代,唯一的出路是推动全球治理向更均衡的方向发展。国际货币基金组织总裁拉加德日前在美国马里兰大学演讲时就呼吁,国际治理改革应认清新兴经济体越来越重要这一现实。

独立学院从母体高校“断奶”后,可能会面临品牌、招生等方面阵痛,但是在国家和省市鼓励民间资本进入教育领域的实施意见发布后,一些独立学院果断切割连接母体大学的“脐带”,自立门户发展。

Corpus

Doc1Doc2

Doc3Doc4

Page 11: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Topic3

Topic Modeling

After automatic topic modeling

继续实施稳健的货币政策,保持松紧适度适时预调微调,做好与供给侧结构,并综合运用数量、价格等多种货币政策

政策 0.082改革 0.063…

金融 0.074货币 0.051…

学院 0.077教育 0.071…

军队 0.083组织 0.079…

从员额上来看,这次改革远远超过了裁军的数量,它是一种结构性的改革,是军队组织结构现代化的一个关键步骤

美元作为主要国际货币的地位在可预见的将来仍无可取代,唯一的出路是推动全球治理向更均衡的方向发展。国际货币基金组织总裁拉加德日前在美国马里兰大学演讲时就呼吁,国际治理改革应认清新兴经济体越来越重要这一现实。

独立学院从母体高校“断奶”后,可能会面临品牌、招生等方面阵痛,但是在国家和省市鼓励民间资本进入教育领域的实施意见发布后,一些独立学院果断切割连接母体大学的“脐带”,自立门户发展。 …

……

Corpus

Doc1 Doc2

Doc3 Doc4 Topic2

Topic4

Topic1

topic1 topic4topic3topic2

Page 12: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Models

Parametric models

➢ Latent Semantic Indexing (LSI)

➢ PLSI; Latent Dirichlet Allocation (LDA)

Non-parametric models (Dirichlet Process)

➢ (Nested) Chinese Restaurant Process

➢ Indian Buffet Process

Topic Modeling

A topic

A word cluster a group of words with coherent semantics

Not clustered randomly, but meaningfully (not semantically)

Probabilistic Graphic Model

Page 13: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Topic Modeling Probabilistic Latent Semantic Indexing

14

PLSI Model

Assumption

➢ Conditioned on z, w is generated

independently of d

➢ Words in a document are

exchangeable

➢ Latent topics z are independent

➢ Documents are exchangeable

∑∑∈∈ ZzZz

dzpzwpdpdzwpdpdpdwpwdp )|()|()(=)|,()(=)()|(=),(

d z w

N

M

Probabilistic Graphic Model

Page 14: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Topic Modeling

Latent Dirichlet Allocation (LDA)

David M. Blei, Andrew Y. Ng, Michael I. Jordan

Hierarchical Bayesian model; Bayesian pLSI

θ z w

N

β

Generative process of LDA

➢ Choose N ~ Poisson(𝜉);

➢ For each document d={𝑤1, 𝑤2…𝑤𝑛}

Choose 𝜃 ~𝐷𝑖𝑟(𝛼); For each of the N

words 𝑤𝑛 in d:

a) Choose a topic 𝑧𝑛~𝑀𝑢𝑙𝑡𝑖𝑛𝑜𝑚𝑖𝑛𝑎𝑙 𝜃

b) Choose a word 𝑤𝑛 from 𝑝 𝑤𝑛 𝑧𝑛, 𝛽 ,

a multinomial distribution conditioned

on 𝑧𝑛

Page 15: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Gibbs Sampling (MCMC, Markov Chain Monte Carlo)

➢ ‘I want to know a distribution, but I haven’t known yet, so I find a

way to generate its samples’

➢ Not complex but relatively slow

lim𝑛→∞

𝜋0𝑃𝑛 =

𝜋(1) … 𝜋(|𝑆|)⋮ ⋮ ⋮

𝜋(1) 𝜋(|𝑆|) 𝜋 = {𝜋 1 , 𝜋 2 ,… , 𝜋 𝑗 , … , 𝜋(|𝑆|)}

Topic Modeling

Parameter Estimation

Variational Inference :Complex but efficient

➢ ‘I want to know a distribution, but I haven’t known yet, so I find a

similar distribution (tight upper bound or lower bound)’

➢ K-L divergence

Stationary Distribution

Page 16: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Hierarchical Topic Modeling

Topic modeling is not enough

17

Hierarchical Structure

Page 17: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Hierarchical Topic Modeling

18

Chinese Restaurant Process (Dirichlet Process)

A restaurant with an infinite number of tables (topic), and

customers (word) enter this restaurant sequentially. The ith

customer (𝜃𝑖) sits at a table (𝜙𝑘) according to the probability

topic

Page 18: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Hierarchical Topic Modeling

The generative process of HTM

1. Sample a path assignment cm={cm,l} for each document m

2. Sample a level l (a topic zm,n) along the path for wm, n, the nth

word in m

c1

c2

c3

...

...

:customer/word

document m

,1mw,2mw,3mw,4mw...,3mw

,1mw

,2mw ,4mw

,m nw:table/topic

Page 19: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Hierarchical Topic Modeling

Examples

20

root topic analysis obtain base system concentration

thermal

polymer acid

property

diamine

activity compound acid

derivative active

compound ligand group

investigate synergistic

reaction

derivative

yield synthesis

microwave

assay food quality content

analysis

decoction

component

radix quality

constituent

compound

activity

synthesize salt

derivative

antioxidant

activity extract

inhibitory

flavonoid

interaction

cation metal

energy

solution

Page 20: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Neural Text Generation

Neural text generation

Topic Modeling word/phrase-level

➢ The results are semantically coherent, but still need re-

organization, and cannot be read directly by a human

Text Clustering/Classification document-level

➢ We can only receive a sketch of whole corpus

21

Text generation language-level

➢ The results can be read by a human directly, which means true

AI

➢ Methods based on (deep) neural network are prevailing

➢ Recurrent Neural Network (RNN)

Page 21: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Neural Text Generation

RNN

RNN is a natural tool for NLP

RNN is closely related to sequence

➢ xt: input; ht: output; A: neural network model

➢ Typically, xt and ht are all vectors with distributed representation

22

Page 22: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Neural Text Generation

Distributed representation ≈ word embedding

traditional representation : one-hot representation

Distributed representation : more semantic, more

expressive, more flexible

moon

Page 23: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Neural Text Generation

Distributed representation

One prevailing tool: word2vec (2 methods)

➢ Continuous Bag-of-Words (CBOW)

➢ Continuous Skip-gram (Skip-gram)

Page 24: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Neural Text Generation

Long-term dependencies

short-term dependency

➢ Snows fall down from the sky

Snows fall

sky

long-term dependency

➢ I was born in China, …, I can speak mandarin

Page 25: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Neural Text Generation

Long Short Term Memory networks (LSTM)

26

Three different gates

➢ a sigmoid neural net layer and a pointwise

multiplication operation

Page 26: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Neural Text Generation

LSTM (use for example)

review-level sentiment classification

Bi-directional LSTM

Page 27: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Neural Text Generation

Attention Mechanisms

Focus on some more important terms

𝑐𝑡

𝑎𝑡𝑗 =exp(𝑒𝑡𝑗)

σ𝑘=1𝑇ℎ exp(𝑒𝑡𝑘)

𝑐𝑡 =

𝑗=1

𝑎𝑡𝑗ℎ𝑗

𝑣𝑎 , 𝑊 and 𝑈 are three matrices that need to be optmized

𝑒𝑡𝑗 = 𝑣𝑎𝑇tanh(𝑊𝑎𝑠𝑡−1 + 𝑈𝑎ℎ𝑗)

Page 28: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Neural Text Generation

Generative Adversarial Network (GAN)

29

➢ Generator, G(・) pg(x), generated fake data;

➢ Discriminator, D(・) pdata(x), real data

Page 29: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Reference

• David Blei, etc. Latent Dirichlet Allocation, JMLR, 2003

• Yee Whye Teh. Dirichlet Processes: Tutorial and Practical Course, 2007

• Yee Whye Teh, Jordan M I, etc. Hierarchical Dirichlet Processes, American Statistical

Association, 2006

• David Blei. Probabilstic topic models. Communications of the ACM, 2012

• David Blei, etc. The Nested Chinese Restaurant Process and Bayesian Inference of Topic

Hierarchies. Journal of the ACM, 2010

• Gregor Heinrich. Parameter Estimation for Text Analysis, 2008

• T.S., Ferguson. A Bayesian Analysis of Some Nonparametric Problems. The Annals of

Statistics, 1973

• Martin J. Wainwright. Graphical Models, Exponential Families, and Variational Inference

• Christopher Bishop. Pattern Recognition and Machine Learning, 2007

• Vasilis Vryniotis. DatumBox: The Dirichlet Process Mixture Model, 2014

• J. J. Hopfield. Neural networks and physical systems with emergent collective computational abilities.PNAS, pp. 2554–2558, 1982

• Y. Bengio , R Ducharme, P. Vincent. A neural probabilistic language model. JMLR, 2003

30

Page 30: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Reference

• T. Mikolov, W.T.Yih, G.Zweig. Linguistic Regularities in Continu9ous Space Word Representations. NAACL 2013

• Tomas Mikolov, Kai Chen, etc. Efficient Estimation of Word Representations in Vector Space. NIPS, 2013

• Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Computation. 9 (8): 1735–1780, 1997

• Felix A. Gers, Jürgen Schmidhuber and Fred Cummins . Learning to Forget: Continual Prediction with LSTM. Neural Computation. 12 (10), 2000

• Yang Hu. https://zhuanlan.zhihu.com/p/29168803, 2017

• Christopher Olah. Understanding LSTM Networks, 2017

• Manish Chablani. https://medium.com/towards-data-science/sentiment-analysis-using-rnns-lstm-60871fa6aeba, 2017

• Ashish Vaswani, Noam Shazeer, etc. Attention is all you need. https://arxiv.org/abs/1706.03762

• 阿里智能助理在电商领域的架构搭建与实现. https://yq.aliyun.com/tags/type_blog-tagid_15372-page_?

31

Page 31: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25

Reference

My previous tutorials/notes (ZJU/UIC/Netease/ITRZJU as a Ph.D)

➢ ‘Topic modeling (an introduction)’

➢ ‘Non-parametric Bayesian learning in discrete data’

➢ ‘The research of topic modeling in text mining’

➢ ‘Matrix factorization with user generated content’

➢ …, etc

Website

You can download all slides of mine

➢ http://web.xidian.edu.cn/ysxu/teach.html

➢ http://liu.cs.uic.edu/yueshenxu/

➢ http://www.slideshare.net/obamaxys2011

➢ https://www.researchgate.net/profile/Yueshen_Xu

32

Page 32: Natural Language Processing, Topic Modeling, Neural Text … › ysxu › files › 20171025_163214.pdf · 2017-10-25 · Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering

Software Engineering2017/10/25 33

Q&A

http://web.xidian.edu.cn/ysxu