recurrent convolutional neural networks for text classification
TRANSCRIPT
RecurrentConvolutionalNeuralNetworksforTextClassification
Siwei Lai,Liheng Xu,KangLiu,JunZhaoAAAI2015
読み手:周 双双
6/30/16 1
Background- Textclassification◦ Featurerepresentationinpreviousstudies:bag-of-words(BoW)model,whereunigrams,bigrams,n-gramsorsomeexquisitelydesignedpatterns.
◦ Disadvantages:“ignorethecontextualinformationorwordorderintextsandremainunsatisfactoryforcapturingthesemanticsofthewords.”
Wordembeddinganddeepneuralnetwork
6/30/16 2
WhyRecurrentConvolutionalNeuralNetwork?◦ RecursiveNeuralNetwork(RecursiveNN)◦ Disadvantage:”ItsperformanceheavilydependsontheperformanceofthetextualtreeexhibitsatimecomplexityofatleastO(n2),wherenisthelengthof thetext.”
◦ RecurrentNeuralNetwork(RecurrentNN)◦ Disadvantage: “abiasedmodel,wherelaterwordsaremoredominant thanearlierwords”
◦ ConvolutionalNeuralNetwork(CNN)◦ Advantages:◦ capturethesemanticoftextsbetterthanrecursiveorrecurrentNN.◦ TimecomplexityofCNNisO(n).
◦ Disadvantages:◦ Itisdifficult todetermine thewindowsize
RecurrentConvolutional NeuralNetwork(RCNN):Learnmorecontextualinformation thanconventionalwindow-basedneuralnetworks
6/30/16 3
RecurrentConvolutionalNeuralNetworkWordRepresentationLearning
TextRepresentationLearning
6/30/16 4
WordRepresentationLearningTherepresentationofwordwi istheconcatenationoftheleft-sidecontextvectorcl(wi ),thewordembeddinge(wi )andtheright-sidecontextvectorcr(wi )
• Theleft-sidecontextvectorcl(wi ) andtheright-sidecontextvectorcr (wi )arecalculatedbythesimilarway.
• e(wi -1)isthewordembeddingofwordwi-1 ,cl(wi -1)istheleft-sidecontextvectorofthepreviouswordwi -1
• e(wi+1)isthewordembeddingofwordwi+1 ,cr (wi +1)istheright-sidecontextvectorofthenextwordwi +1
6/30/16 5
RecurrentConvolutionalNeuralNetworkWordRepresentationLearning
TextRepresentationLearning
• Applyalineartransformationtogetherwiththetanh activationfunctiontoxi andsenttheresulttothenextlayer.
• yi(2)isalatentsemanticvector,willbeusedtodeterminethemostusefulfactorforrepresentingthetext.
6/30/16 6
TextRepresentationLearningWordRepresentationLearning
TextRepresentationLearning
• Amax-pooling layerisappliedonalltherepresentationsofwords.
• Avariouslengthstextisconvertedintoafixed-lengthvector.
• Thek-th elementofy(3)isthemaximuminthek–th elementsofyi(2)
Whymaxpooling, notaveragepooling?• Aimtofind themostimportantlatent
semanticfactorsinthetexts.
6/30/16 7
OutputlayerWordRepresentationLearning
TextRepresentationLearning
Softmax function isappliedtoy(4)
6/30/16 8
Trainingnetworkparameters
Theydefinealloftheparameterstobetrainedas θ
Thetraining targetofthenetworkisusedtomaximizethelog-likelihood withrespectto θ,whereDisthetrainingdocumentsetandclassD isthecorrectclassofdocumentD.
Stochasticgradientdescentisusedtooptimizethetrainingtarget.
6/30/16 9
DatasetandExperimentssettings
MeasureMarco-F1AccuracyAccuracyAccuracy
• Choseacommonusedhyper-parametersfollowing previousstudies(Collobert etal.2011;Turian,Ratinov,andBengio 2010)
• Setlearning rateα as0.01.Hidden layersizeHas100,wordembedding dimension as50,contextvectordimension as50.
• Wordembedding arerespectivelypre-trainedonEnglishandChineseWikipediadumpsbythedefaultparameterinword2vecwiththeskip-gramalgorithm.
6/30/16 10
ExperimentsResults– ComparisonofMethods
• Bag-of-words(BoW)/Bigram +logisticregression(LR)/SVM (WangandManning2012
• AverageEmbedding+LR: atf-idf weightedaverageofthewordembeddings andsubsequently appliesasoftmax layer
• LDA:ClassifyLDA-EMandLabeled-LDA• TreeKernels:PostandBergsma(2013)used
varioustreekernelsasfeatures.Here,theycomparedcontext-freegrammar(CFG)producedbyBerkeleyparserandtherankingfeaturesetofC&J.
• RecursiveNN:tworecursive-basedmethods.RecursiveNN (Socher etal.2011a)andRecursiveNeuralTensorNetworks(RNTNs)(Socher etal.2013)
• CNN:aconvolutionkernelconcatenatedthewordembeddings inapre-definedwindow inCollobert etal.2011.
• Paragraph-Vector:LeandMikolov 2014.6/30/16 11
ExperimentsResultsandDiscussion
6/30/16 12
• Neuralnetworkapproaches(RecursiveNN,CNN,RCNN)outperformthetraditionalmethods(e.g.BoW+LR)forallfourdatasets.Neuralnetworkscancapturemorecontextualinformation offeatures.
• RecursiveNN outperformed CNN,RCNNonSSTdataset.Theimportanceofmax-pooling layerandconvolutional layer.RNTN(3-5hours)vs.RCNN(severalminutes)
• ComparingRCNNwithtreekernelsmethods,RCNNmightbeuseful inlow-resourcelanguages.
• RCNNvs.CNN.Recurrentstructureisbetterthanwindow-basedstructure.
RCNNvs.CNN
6/30/16 13
• Theyconsideralloddwindowsizefrom1to19totrainandtesttheCNNmodel.
• RCNNoutperformstheCNNforallwindowsizesanditdoesnotrelyonthewindowsize.
• Therecurrentstructurecanpreservelongercontextualinformationandintroduceslessnoise.
LearnedKeywords
6/30/16 14
• Themax-pooling layerselectsthemostimportantwordsintexts.
• Presentthecenterwordanditsneighboring trigram.
• Listthecomparison resultswithRNTN(Socher etal.2013)onsentimentalclassification.
感想
◦ RCNNDoesrelyonasyntacticparser.◦ RCNNmightbeusefulinlow-resourcelanguages.◦ RCNNhasalowertimecomplexityofO(n).◦ LimitationofRCNN?◦ Theirmodelcancapturethekeycomponentsintextsfortextclassification
Capturethekeycomponentsfordisambiguation?PlantoapplyRCNNonothertasks
6/30/16 15
Sourcecodes:https://github.com/knok/rcnn-text-classification