Recurrent Convolutional Neural Networks for Text Classification

Download Recurrent Convolutional Neural Networks for Text Classification

Post on 14-Apr-2017

688 views

Category:

Technology

5 download

Embed Size (px)

TRANSCRIPT

  • RecurrentConvolutionalNeuralNetworksforTextClassification

    Siwei Lai,Liheng Xu,KangLiu,JunZhaoAAAI2015

    6/30/16 1

  • Background- Textclassification Featurerepresentationinpreviousstudies:bag-of-words(BoW)model,whereunigrams,bigrams,n-gramsorsomeexquisitelydesignedpatterns.

    Disadvantages:ignorethecontextualinformationorwordorderintextsandremainunsatisfactoryforcapturingthesemanticsofthewords.

    Wordembeddinganddeepneuralnetwork

    6/30/16 2

  • WhyRecurrentConvolutionalNeuralNetwork? RecursiveNeuralNetwork(RecursiveNN) Disadvantage:ItsperformanceheavilydependsontheperformanceofthetextualtreeexhibitsatimecomplexityofatleastO(n2),wherenisthelengthof thetext.

    RecurrentNeuralNetwork(RecurrentNN) Disadvantage: abiasedmodel,wherelaterwordsaremoredominant thanearlierwords

    ConvolutionalNeuralNetwork(CNN) Advantages: capturethesemanticoftextsbetterthanrecursiveorrecurrentNN. TimecomplexityofCNNisO(n).

    Disadvantages: Itisdifficult todetermine thewindowsize

    RecurrentConvolutional NeuralNetwork(RCNN):Learnmorecontextualinformation thanconventionalwindow-basedneuralnetworks

    6/30/16 3

  • RecurrentConvolutionalNeuralNetworkWordRepresentationLearning

    TextRepresentationLearning

    6/30/16 4

  • WordRepresentationLearningTherepresentationofwordwi istheconcatenationoftheleft-sidecontextvectorcl(wi ),thewordembeddinge(wi )andtheright-sidecontextvectorcr(wi )

    Theleft-sidecontextvectorcl(wi ) andtheright-sidecontextvectorcr (wi )arecalculatedbythesimilarway.

    e(wi -1)isthewordembeddingofwordwi-1 ,cl(wi -1)istheleft-sidecontextvectorofthepreviouswordwi -1

    e(wi+1)isthewordembeddingofwordwi+1 ,cr (wi +1)istheright-sidecontextvectorofthenextwordwi +1

    6/30/16 5

  • RecurrentConvolutionalNeuralNetworkWordRepresentationLearning

    TextRepresentationLearning

    Applyalineartransformationtogetherwiththetanh activationfunctiontoxi andsenttheresulttothenextlayer.

    yi(2)isalatentsemanticvector,willbeusedtodeterminethemostusefulfactorforrepresentingthetext.

    6/30/16 6

  • TextRepresentationLearningWordRepresentationLearning

    TextRepresentationLearning

    Amax-pooling layerisappliedonalltherepresentationsofwords.

    Avariouslengthstextisconvertedintoafixed-lengthvector.

    Thek-th elementofy(3)isthemaximuminthekth elementsofyi(2)

    Whymaxpooling, notaveragepooling? Aimtofind themostimportantlatent

    semanticfactorsinthetexts.

    6/30/16 7

  • OutputlayerWordRepresentationLearning

    TextRepresentationLearning

    Softmax function isappliedtoy(4)

    6/30/16 8

  • Trainingnetworkparameters

    Theydefinealloftheparameterstobetrainedas

    Thetraining targetofthenetworkisusedtomaximizethelog-likelihood withrespectto whereDisthetrainingdocumentsetandclassD isthecorrectclassofdocumentD.

    Stochasticgradientdescentisusedtooptimizethetrainingtarget.

    6/30/16 9

  • DatasetandExperimentssettings

    MeasureMarco-F1AccuracyAccuracyAccuracy

    Choseacommonusedhyper-parametersfollowing previousstudies(Collobert etal.2011;Turian,Ratinov,andBengio 2010)

    Setlearning rate as0.01.Hidden layersizeHas100,wordembedding dimension as50,contextvectordimension as50.

    Wordembedding arerespectivelypre-trainedonEnglishandChineseWikipediadumpsbythedefaultparameterinword2vecwiththeskip-gramalgorithm.

    6/30/16 10

  • ExperimentsResults ComparisonofMethods

    Bag-of-words(BoW)/Bigram +logisticregression(LR)/SVM (WangandManning2012

    AverageEmbedding+LR: atf-idf weightedaverageofthewordembeddings andsubsequently appliesasoftmax layer

    LDA:ClassifyLDA-EMandLabeled-LDA TreeKernels:PostandBergsma(2013)used

    varioustreekernelsasfeatures.Here,theycomparedcontext-freegrammar(CFG)producedbyBerkeleyparserandtherankingfeaturesetofC&J.

    RecursiveNN:tworecursive-basedmethods.RecursiveNN (Socher etal.2011a)andRecursiveNeuralTensorNetworks(RNTNs)(Socher etal.2013)

    CNN:aconvolutionkernelconcatenatedthewordembeddings inapre-definedwindow inCollobert etal.2011.

    Paragraph-Vector:LeandMikolov 2014.6/30/16 11

  • ExperimentsResultsandDiscussion

    6/30/16 12

    Neuralnetworkapproaches(RecursiveNN,CNN,RCNN)outperformthetraditionalmethods(e.g.BoW+LR)forallfourdatasets.Neuralnetworkscancapturemorecontextualinformation offeatures.

    RecursiveNN outperformed CNN,RCNNonSSTdataset.Theimportanceofmax-pooling layerandconvolutional layer.RNTN(3-5hours)vs.RCNN(severalminutes)

    ComparingRCNNwithtreekernelsmethods,RCNNmightbeuseful inlow-resourcelanguages.

    RCNNvs.CNN.Recurrentstructureisbetterthanwindow-basedstructure.

  • RCNNvs.CNN

    6/30/16 13

    Theyconsideralloddwindowsizefrom1to19totrainandtesttheCNNmodel.

    RCNNoutperformstheCNNforallwindowsizesanditdoesnotrelyonthewindowsize.

    Therecurrentstructurecanpreservelongercontextualinformationandintroduceslessnoise.

  • LearnedKeywords

    6/30/16 14

    Themax-pooling layerselectsthemostimportantwordsintexts.

    Presentthecenterwordanditsneighboring trigram.

    Listthecomparison resultswithRNTN(Socher etal.2013)onsentimentalclassification.

  • RCNNDoesrelyonasyntacticparser. RCNNmightbeusefulinlow-resourcelanguages. RCNNhasalowertimecomplexityofO(n). LimitationofRCNN? Theirmodelcancapturethekeycomponentsintextsfortextclassification

    Capturethekeycomponentsfordisambiguation?PlantoapplyRCNNonothertasks

    6/30/16 15

    Sourcecodes:https://github.com/knok/rcnn-text-classification

Recommended

View more >