recurrent convolutional neural networks for text classification

15
Recurrent Convolutional Neural Networks for Text Classification Siwei Lai, Liheng Xu, Kang Liu, Jun Zhao AAAI 2015 読み手:周 双双 6/30/16 1

Upload: shuangshuang-zhou

Post on 14-Apr-2017

759 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: Recurrent Convolutional Neural Networks for Text Classification

RecurrentConvolutionalNeuralNetworksforTextClassification

Siwei Lai,Liheng Xu,KangLiu,JunZhaoAAAI2015

読み手:周 双双

6/30/16 1

Page 2: Recurrent Convolutional Neural Networks for Text Classification

Background- Textclassification◦ Featurerepresentationinpreviousstudies:bag-of-words(BoW)model,whereunigrams,bigrams,n-gramsorsomeexquisitelydesignedpatterns.

◦ Disadvantages:“ignorethecontextualinformationorwordorderintextsandremainunsatisfactoryforcapturingthesemanticsofthewords.”

Wordembeddinganddeepneuralnetwork

6/30/16 2

Page 3: Recurrent Convolutional Neural Networks for Text Classification

WhyRecurrentConvolutionalNeuralNetwork?◦ RecursiveNeuralNetwork(RecursiveNN)◦ Disadvantage:”ItsperformanceheavilydependsontheperformanceofthetextualtreeexhibitsatimecomplexityofatleastO(n2),wherenisthelengthof thetext.”

◦ RecurrentNeuralNetwork(RecurrentNN)◦ Disadvantage: “abiasedmodel,wherelaterwordsaremoredominant thanearlierwords”

◦ ConvolutionalNeuralNetwork(CNN)◦ Advantages:◦ capturethesemanticoftextsbetterthanrecursiveorrecurrentNN.◦ TimecomplexityofCNNisO(n).

◦ Disadvantages:◦ Itisdifficult todetermine thewindowsize

RecurrentConvolutional NeuralNetwork(RCNN):Learnmorecontextualinformation thanconventionalwindow-basedneuralnetworks

6/30/16 3

Page 4: Recurrent Convolutional Neural Networks for Text Classification

RecurrentConvolutionalNeuralNetworkWordRepresentationLearning

TextRepresentationLearning

6/30/16 4

Page 5: Recurrent Convolutional Neural Networks for Text Classification

WordRepresentationLearningTherepresentationofwordwi istheconcatenationoftheleft-sidecontextvectorcl(wi ),thewordembeddinge(wi )andtheright-sidecontextvectorcr(wi )

• Theleft-sidecontextvectorcl(wi ) andtheright-sidecontextvectorcr (wi )arecalculatedbythesimilarway.

• e(wi -1)isthewordembeddingofwordwi-1 ,cl(wi -1)istheleft-sidecontextvectorofthepreviouswordwi -1

• e(wi+1)isthewordembeddingofwordwi+1 ,cr (wi +1)istheright-sidecontextvectorofthenextwordwi +1

6/30/16 5

Page 6: Recurrent Convolutional Neural Networks for Text Classification

RecurrentConvolutionalNeuralNetworkWordRepresentationLearning

TextRepresentationLearning

• Applyalineartransformationtogetherwiththetanh activationfunctiontoxi andsenttheresulttothenextlayer.

• yi(2)isalatentsemanticvector,willbeusedtodeterminethemostusefulfactorforrepresentingthetext.

6/30/16 6

Page 7: Recurrent Convolutional Neural Networks for Text Classification

TextRepresentationLearningWordRepresentationLearning

TextRepresentationLearning

• Amax-pooling layerisappliedonalltherepresentationsofwords.

• Avariouslengthstextisconvertedintoafixed-lengthvector.

• Thek-th elementofy(3)isthemaximuminthek–th elementsofyi(2)

Whymaxpooling, notaveragepooling?• Aimtofind themostimportantlatent

semanticfactorsinthetexts.

6/30/16 7

Page 8: Recurrent Convolutional Neural Networks for Text Classification

OutputlayerWordRepresentationLearning

TextRepresentationLearning

Softmax function isappliedtoy(4)

6/30/16 8

Page 9: Recurrent Convolutional Neural Networks for Text Classification

Trainingnetworkparameters

Theydefinealloftheparameterstobetrainedas θ

Thetraining targetofthenetworkisusedtomaximizethelog-likelihood withrespectto θ,whereDisthetrainingdocumentsetandclassD isthecorrectclassofdocumentD.

Stochasticgradientdescentisusedtooptimizethetrainingtarget.

6/30/16 9

Page 10: Recurrent Convolutional Neural Networks for Text Classification

DatasetandExperimentssettings

MeasureMarco-F1AccuracyAccuracyAccuracy

• Choseacommonusedhyper-parametersfollowing previousstudies(Collobert etal.2011;Turian,Ratinov,andBengio 2010)

• Setlearning rateα as0.01.Hidden layersizeHas100,wordembedding dimension as50,contextvectordimension as50.

• Wordembedding arerespectivelypre-trainedonEnglishandChineseWikipediadumpsbythedefaultparameterinword2vecwiththeskip-gramalgorithm.

6/30/16 10

Page 11: Recurrent Convolutional Neural Networks for Text Classification

ExperimentsResults– ComparisonofMethods

• Bag-of-words(BoW)/Bigram +logisticregression(LR)/SVM (WangandManning2012

• AverageEmbedding+LR: atf-idf weightedaverageofthewordembeddings andsubsequently appliesasoftmax layer

• LDA:ClassifyLDA-EMandLabeled-LDA• TreeKernels:PostandBergsma(2013)used

varioustreekernelsasfeatures.Here,theycomparedcontext-freegrammar(CFG)producedbyBerkeleyparserandtherankingfeaturesetofC&J.

• RecursiveNN:tworecursive-basedmethods.RecursiveNN (Socher etal.2011a)andRecursiveNeuralTensorNetworks(RNTNs)(Socher etal.2013)

• CNN:aconvolutionkernelconcatenatedthewordembeddings inapre-definedwindow inCollobert etal.2011.

• Paragraph-Vector:LeandMikolov 2014.6/30/16 11

Page 12: Recurrent Convolutional Neural Networks for Text Classification

ExperimentsResultsandDiscussion

6/30/16 12

• Neuralnetworkapproaches(RecursiveNN,CNN,RCNN)outperformthetraditionalmethods(e.g.BoW+LR)forallfourdatasets.Neuralnetworkscancapturemorecontextualinformation offeatures.

• RecursiveNN outperformed CNN,RCNNonSSTdataset.Theimportanceofmax-pooling layerandconvolutional layer.RNTN(3-5hours)vs.RCNN(severalminutes)

• ComparingRCNNwithtreekernelsmethods,RCNNmightbeuseful inlow-resourcelanguages.

• RCNNvs.CNN.Recurrentstructureisbetterthanwindow-basedstructure.

Page 13: Recurrent Convolutional Neural Networks for Text Classification

RCNNvs.CNN

6/30/16 13

• Theyconsideralloddwindowsizefrom1to19totrainandtesttheCNNmodel.

• RCNNoutperformstheCNNforallwindowsizesanditdoesnotrelyonthewindowsize.

• Therecurrentstructurecanpreservelongercontextualinformationandintroduceslessnoise.

Page 14: Recurrent Convolutional Neural Networks for Text Classification

LearnedKeywords

6/30/16 14

• Themax-pooling layerselectsthemostimportantwordsintexts.

• Presentthecenterwordanditsneighboring trigram.

• Listthecomparison resultswithRNTN(Socher etal.2013)onsentimentalclassification.

Page 15: Recurrent Convolutional Neural Networks for Text Classification

感想

◦ RCNNDoesrelyonasyntacticparser.◦ RCNNmightbeusefulinlow-resourcelanguages.◦ RCNNhasalowertimecomplexityofO(n).◦ LimitationofRCNN?◦ Theirmodelcancapturethekeycomponentsintextsfortextclassification

Capturethekeycomponentsfordisambiguation?PlantoapplyRCNNonothertasks

6/30/16 15

Sourcecodes:https://github.com/knok/rcnn-text-classification