recurrent convolutional neural networks for text classification

RecurrentConvolutionalNeuralNetworksforTextClassification

Siwei Lai,Liheng Xu,KangLiu,JunZhaoAAAI2015

読み手：周双双

6/30/16 1

Background- Textclassification◦ Featurerepresentationinpreviousstudies:bag-of-words(BoW)model,whereunigrams,bigrams,n-gramsorsomeexquisitelydesignedpatterns.

◦ Disadvantages:“ignorethecontextualinformationorwordorderintextsandremainunsatisfactoryforcapturingthesemanticsofthewords.”

Wordembeddinganddeepneuralnetwork

6/30/16 2

WhyRecurrentConvolutionalNeuralNetwork?◦ RecursiveNeuralNetwork(RecursiveNN)◦ Disadvantage:”ItsperformanceheavilydependsontheperformanceofthetextualtreeexhibitsatimecomplexityofatleastO(n2),wherenisthelengthof thetext.”

◦ RecurrentNeuralNetwork(RecurrentNN)◦ Disadvantage: “abiasedmodel,wherelaterwordsaremoredominant thanearlierwords”

◦ ConvolutionalNeuralNetwork(CNN)◦ Advantages:◦ capturethesemanticoftextsbetterthanrecursiveorrecurrentNN.◦ TimecomplexityofCNNisO(n).

◦ Disadvantages:◦ Itisdifficult todetermine thewindowsize

RecurrentConvolutional NeuralNetwork(RCNN):Learnmorecontextualinformation thanconventionalwindow-basedneuralnetworks

6/30/16 3

RecurrentConvolutionalNeuralNetworkWordRepresentationLearning

TextRepresentationLearning

6/30/16 4

WordRepresentationLearningTherepresentationofwordwi istheconcatenationoftheleft-sidecontextvectorcl(wi ),thewordembeddinge(wi )andtheright-sidecontextvectorcr(wi )

• Theleft-sidecontextvectorcl(wi ) andtheright-sidecontextvectorcr (wi )arecalculatedbythesimilarway.

• e(wi -1)isthewordembeddingofwordwi-1 ,cl(wi -1)istheleft-sidecontextvectorofthepreviouswordwi -1

• e(wi+1)isthewordembeddingofwordwi+1 ,cr (wi +1)istheright-sidecontextvectorofthenextwordwi +1

6/30/16 5

RecurrentConvolutionalNeuralNetworkWordRepresentationLearning


• Applyalineartransformationtogetherwiththetanh activationfunctiontoxi andsenttheresulttothenextlayer.

• yi(2)isalatentsemanticvector,willbeusedtodeterminethemostusefulfactorforrepresentingthetext.

6/30/16 6

TextRepresentationLearningWordRepresentationLearning


• Amax-pooling layerisappliedonalltherepresentationsofwords.

• Avariouslengthstextisconvertedintoafixed-lengthvector.

• Thek-th elementofy(3)isthemaximuminthek–th elementsofyi(2)

Whymaxpooling, notaveragepooling?• Aimtofind themostimportantlatent

semanticfactorsinthetexts.

6/30/16 7

OutputlayerWordRepresentationLearning


Softmax function isappliedtoy(4)

6/30/16 8

Trainingnetworkparameters

Theydefinealloftheparameterstobetrainedas θ

Thetraining targetofthenetworkisusedtomaximizethelog-likelihood withrespectto θ，whereDisthetrainingdocumentsetandclassD isthecorrectclassofdocumentD.

Stochasticgradientdescentisusedtooptimizethetrainingtarget.

6/30/16 9

DatasetandExperimentssettings

MeasureMarco-F1AccuracyAccuracyAccuracy

• Choseacommonusedhyper-parametersfollowing previousstudies(Collobert etal.2011;Turian,Ratinov,andBengio 2010)

• Setlearning rateα as0.01.Hidden layersizeHas100,wordembedding dimension as50,contextvectordimension as50.

• Wordembedding arerespectivelypre-trainedonEnglishandChineseWikipediadumpsbythedefaultparameterinword2vecwiththeskip-gramalgorithm.

6/30/16 10

ExperimentsResults– ComparisonofMethods

• Bag-of-words(BoW)/Bigram +logisticregression(LR)/SVM (WangandManning2012

• AverageEmbedding+LR: atf-idf weightedaverageofthewordembeddings andsubsequently appliesasoftmax layer

• LDA:ClassifyLDA-EMandLabeled-LDA• TreeKernels:PostandBergsma(2013)used

varioustreekernelsasfeatures.Here,theycomparedcontext-freegrammar(CFG)producedbyBerkeleyparserandtherankingfeaturesetofC&J.

• RecursiveNN:tworecursive-basedmethods.RecursiveNN (Socher etal.2011a)andRecursiveNeuralTensorNetworks(RNTNs)(Socher etal.2013)

• CNN:aconvolutionkernelconcatenatedthewordembeddings inapre-definedwindow inCollobert etal.2011.

• Paragraph-Vector:LeandMikolov 2014.6/30/16 11

ExperimentsResultsandDiscussion

6/30/16 12

• Neuralnetworkapproaches(RecursiveNN,CNN,RCNN)outperformthetraditionalmethods(e.g.BoW+LR)forallfourdatasets.Neuralnetworkscancapturemorecontextualinformation offeatures.

• RecursiveNN outperformed CNN,RCNNonSSTdataset.Theimportanceofmax-pooling layerandconvolutional layer.RNTN(3-5hours)vs.RCNN(severalminutes)

• ComparingRCNNwithtreekernelsmethods,RCNNmightbeuseful inlow-resourcelanguages.

• RCNNvs.CNN.Recurrentstructureisbetterthanwindow-basedstructure.

RCNNvs.CNN

6/30/16 13

• Theyconsideralloddwindowsizefrom1to19totrainandtesttheCNNmodel.

• RCNNoutperformstheCNNforallwindowsizesanditdoesnotrelyonthewindowsize.

• Therecurrentstructurecanpreservelongercontextualinformationandintroduceslessnoise.

LearnedKeywords

6/30/16 14

• Themax-pooling layerselectsthemostimportantwordsintexts.

• Presentthecenterwordanditsneighboring trigram.

• Listthecomparison resultswithRNTN(Socher etal.2013)onsentimentalclassification.

感想

◦ RCNNDoesrelyonasyntacticparser.◦ RCNNmightbeusefulinlow-resourcelanguages.◦ RCNNhasalowertimecomplexityofO(n).◦ LimitationofRCNN?◦ Theirmodelcancapturethekeycomponentsintextsfortextclassification

Capturethekeycomponentsfordisambiguation?PlantoapplyRCNNonothertasks

6/30/16 15

Sourcecodes:https://github.com/knok/rcnn-text-classification

recurrent convolutional neural networks for text classification

Technology