deep learning for sentence...
TRANSCRIPT
![Page 1: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/1.jpg)
Deep Learning for Sentence Representation
Internship Project Summary
Yonatan Belinkov IBM Research - Haifa Summer 2015
![Page 2: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/2.jpg)
Goals • Develop deep learning methods for representing
natural language sentences from text
• Acquire knowledge in deep learning tools and techniques
![Page 3: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/3.jpg)
Background • Vector representations (embeddings) for words and
sentences
![Page 4: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/4.jpg)
Background • Vector representations (embeddings) for words and
sentences
• Supervised vs unsupervised approaches
![Page 5: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/5.jpg)
Background • Vector representations (embeddings) for words and
sentences • Supervised vs unsupervised approaches
• Neural network architectures Recursive (RecNN) Convolutional (CNN) Recurrent (RNN)
![Page 6: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/6.jpg)
RecNN
The cat sat on the mat
NP NP
PP
VP
S
![Page 7: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/7.jpg)
CNN
The cat sat on the mat
![Page 8: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/8.jpg)
RNN
The cat sat on the mat
![Page 9: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/9.jpg)
Autoencoder Formulation • Given a sentence that is a sequence of word vectors
w1...wn, each of dimension d: § Encode the sentence into a single vector representation § Decode the representation back into the sentence
![Page 10: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/10.jpg)
Autoencoder Formulation • Given a sentence that is a sequence of word vectors
w1...wn, each of dimension d: § Encode the sentence into a single vector representation § Decode the representation back into the sentence
• During training § Get feedback from original sentence, propogate in the network
to learn parameters
![Page 11: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/11.jpg)
Autoencoder Formulation • Given a sentence that is a sequence of word vectors
w1...wn, each of dimension d: § Encode the sentence into a single vector representation § Decode the representation back into the sentence
• During training § Get feedback from original sentence, propogate in the network
to learn parameters • During testing
§ Compare decoded sentence to original one
![Page 12: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/12.jpg)
Basic RNN model • LSTM encoder-decoder (from Li, 2015)
![Page 13: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/13.jpg)
CNN Encoder
Time dimension
Word dimension
w11 w12 … w1d
w21 w22 … w2d
… … … … wn1 wn2 … wnd
![Page 14: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/14.jpg)
CNN Encoder
Time dimension
Word dimension
w11 w12 … w1d
w21 w22 … w2d
… … … … wn1 wn2 … wnd
Time dimension 1 (coarse-grained): include all word embedding dimensions (Kim 2014) Torch: nn.TemporalConvolution #params = embeddingDim * numFilters * filterWidth
![Page 15: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/15.jpg)
CNN Encoder
Time dimension
Word dimension
w11 w12 … w1d
w21 w22 … w2d
… … … … wn1 wn2 … wnd
Time dimension 1 (coarse-grained): include all word embedding dimensions (Kim 2014) Torch: nn.TemporalConvolution #params = embeddingDim * numFilters * filterWidth
Time dimension 1 (fine-grained): convolve each embedding dimension independently (Kalchbrenner 2014) Torch: nn.SpatialConvolution #params = 1 * numFilters * filterWidth
![Page 16: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/16.jpg)
CNN Encoder
Time dimension
Word dimension
w11 w12 … w1d
w21 w22 … w2d
… … … … wn1 wn2 … wnd
Time dimension 1 (coarse-grained): include all word embedding dimensions (Kim 2014) Torch: nn.TemporalConvolution #params = embeddingDim * numFilters * filterWidth
Time dimension 1 (fine-grained): convolve each embedding dimension independently (Kalchbrenner 2014) Torch: nn.SpatialConvolution #params = 1 * numFilters * filterWidth
Word dimension (fine-grained): convolve each word independently (???) Torch: nn.SpatialConvolution #params = 1 * numFilters * filterWidth
![Page 17: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/17.jpg)
Loss Functions • Log-likelihood of predicted words from the decoder
§ Penalize for every wrong word § Word order matters
• Cosine distance between bag-of-words representations of gold and predicted sentences § Representation the size of the vocabulary § Word order doesn’t matter
![Page 18: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/18.jpg)
Implementation Details • Torch • Minimal preprocessing of sentences • Optimization with AdaGrad • Dropout • 1000 dimensions for word and sentence vectors
![Page 19: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/19.jpg)
Data • Hotel reviews § “we were the only people on our floor who spoke english” § “first rate ! the rooms look like they have been recently renovated .” § “recently stayed at the colonnade .”
• Dataset sizes (# sentences)
Train set Validation
set Test set
10K-1M 100 100
![Page 20: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/20.jpg)
Quantitative Evaluation Machine translation metrics: how well the decoded sentence
“translates” the original sentence
Encoder Train size
BLEU Meteor Val error Train error
LSTM 100K 39.3 32.9 27.3 9.3
LSTM 1M 55.2 42.6 12.5 10.4
LSTM (drop 0.3) 1M 63.9 45.1 14.7 7.0
CNN (word) 100K 0.8 5.3 62.9 55.2
CNN (time, fine-grained) 100K 0.6 6.2 53.8 49.1
CNN (time, coarse-grained) 100K 16.6 20.7 39.0 26.0
![Page 21: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/21.jpg)
Quantitative Evaluation Machine translation metrics: how well the decoded sentence
“translates” the original sentence
Encoder Train size
BLEU Meteor Val error Train error
LSTM 100K 39.3 32.9 27.3 9.3
LSTM 1M 55.2 42.6 12.5 10.4
LSTM (drop 0.3) 1M 63.9 45.1 14.7 7.0
CNN (word) 100K 0.8 5.3 62.9 55.2
CNN (time, fine-grained) 100K 0.6 6.2 53.8 49.1
CNN (time, coarse-grained) 100K 16.6 20.7 39.0 26.0
![Page 22: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/22.jpg)
Quantitative Evaluation Machine translation metrics: how well the decoded sentence
“translates” the original sentence
Encoder Train size
BLEU Meteor Val error Train error
LSTM 100K 39.3 32.9 27.3 9.3
LSTM (drop 0.1) 1M 55.2 42.6 12.5 10.4
LSTM (drop 0.3) 1M 63.9 45.1 14.7 7.0
CNN (word) 100K 0.8 5.3 62.9 55.2
CNN (time, fine-grained) 100K 0.6 6.2 53.8 49.1
CNN (time, coarse-grained) 100K 16.6 20.7 39.0 26.0
![Page 23: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/23.jpg)
Quantitative Evaluation Machine translation metrics: how well the decoded sentence
“translates” the original sentence
Encoder Train size
BLEU Meteor Val error Train error
LSTM 100K 39.3 32.9 27.3 9.3
LSTM 1M 55.2 42.6 12.5 10.4
LSTM (drop 0.3) 1M 63.9 45.1 14.7 7.0
CNN (word) 100K 0.8 5.3 62.9 55.2
CNN (time, fine-grained) 100K 0.6 6.2 53.8 49.1
CNN (time, coarse-grained) 100K 16.6 20.7 39.0 26.0
![Page 24: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/24.jpg)
More observations • Bag-of-words based loss did not help
![Page 25: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/25.jpg)
More observations • Bag-of-words based loss did not help
• Preliminary results on Wikipedia are much lower
• Possible explanations
§ Open domain, larger vocabulary, longer sentences
Model Train size BLEU Meteor
LSTM-LSTM 1M sentences 18.5 21.8
![Page 26: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/26.jpg)
Qualitative Evaluation • Run trained model on unseen sentences
• Compare original and decoded sentences
![Page 27: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/27.jpg)
Gold sentence Predicted sentence
1 we were the only people on our floor who spoke english
we were only the people who on our floor group seemed on top ,
2 which was nice . the place needs updated , which was nice . the place needs updating ,
3 but it's not horrible . but it's not horrible .
4 recently stayed at the colonnade . recently stayed at the conrad .
5 i must say i was extremely impressed with the staff and overall appearance of the hotel .
i must say i was extremely impressed with the cleanliness and helpfulness of the staff overall .
6 i would definitely stay here again and would recommend this hotel to family and friends .
i would definitely stay here again and would recommend this hotel to friends and family
Qualitative Evaluation
![Page 28: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/28.jpg)
Qualitative Evaluation Gold sentence Predicted sentence
1 we were the only people on our floor who spoke english
we were only the people who on our floor group seemed on top ,
2 which was nice . the place needs updated , which was nice . the place needs updating ,
3 but it's not horrible . but it's not horrible .
4 recently stayed at the colonnade . recently stayed at the conrad .
5 i must say i was extremely impressed with the staff and overall appearance of the hotel .
i must say i was extremely impressed with the cleanliness and helpfulness of the staff overall .
6 i would definitely stay here again and would recommend this hotel to family and friends .
i would definitely stay here again and would recommend this hotel to friends and family
![Page 29: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/29.jpg)
Qualitative Evaluation Gold sentence Predicted sentence
1 we were the only people on our floor who spoke english
we were only the people who on our floor group seemed on top ,
2 which was nice . the place needs updated , which was nice . the place needs updating ,
3 but it's not horrible . but it's not horrible .
4 recently stayed at the colonnade . recently stayed at the conrad .
5 i must say i was extremely impressed with the staff and overall appearance of the hotel .
i must say i was extremely impressed with the cleanliness and helpfulness of the staff overall .
6 i would definitely stay here again and would recommend this hotel to family and friends .
i would definitely stay here again and would recommend this hotel to friends and family
![Page 30: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/30.jpg)
Qualitative Evaluation Gold sentence Predicted sentence
1 we were the only people on our floor who spoke english
we were only the people who on our floor group seemed on top ,
2 which was nice . the place needs updated , which was nice . the place needs updating ,
3 but it's not horrible . but it's not horrible .
4 recently stayed at the colonnade . recently stayed at the conrad .
5 i must say i was extremely impressed with the staff and overall appearance of the hotel .
i must say i was extremely impressed with the cleanliness and helpfulness of the staff overall .
6 i would definitely stay here again and would recommend this hotel to family and friends .
i would definitely stay here again and would recommend this hotel to friends and family
![Page 31: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/31.jpg)
Qualitative Evaluation • Run trained model on train sentences
• Create vector representations for train sentences
• Cluster vectors with k-means
![Page 32: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/32.jpg)
LSTM Encoder Clusters 'i would definitely stay here again . i love it ! 'but i would stay here again . ' 'i think i would stay here again ' 'i would ( and will ) stay here again . ' 'i would 100% stay here again . ' 'i would come here again . ' 'i would consider staying here again . ' 'would i stay here again ? '
'our staff was friendly and very fast to help us . ' 'but the staff was very friendly and accommodaCng . ' 'staff in the recepCon was very friendly . ' 'the check in staff was very friendly and helpful . ' 'the construcCon was complete . the staff is very friendly and helpful ' 'the hotel staff was very friendly and open to helping make dinner ' 'the internet was free and the staff was very friendly . '
'and the hotel is in an excellent locaCon . ' 'hotel is in a great locaCon -‐ nothing wrong with the neighbourhood . ' 'the back bay hotel is in a great locaCon ' 'the edison hotel is in a perfect locaCon ' 'the hotel circle is in a good locaCon i think ' 'the hotel is a fine hotel in a great area ' 'the hotel is huge and in a good downtown locaCon . '
![Page 33: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/33.jpg)
CNN Encoder Clusters 'great locaCon !' 'locaCon !‘ 'locaCon locaCon locaCon !' 'cute and great locaCon !' 'great stay and fabulous locaCon !' 'staff and locaCon !‘ 'wonderful locaCon !'
'fresh fruit pastries etc .''coffee shops etc . ' 'outback steak house etc . ' 'dinner walk around etc . ' 'french toast pancakes fresh fruit etc .‘ 'dinner walk around etc . ‘ 'bread toast etc .‘ 'a whole foods grocery etc . '
'the room was very clean ' 'the room was very big ' 'the room was very spacious by new york hotel standards ' 'the room i recieved was very spacious ' 'the decor of the room was very nice and modern ' 'cons : room was very small ' 'the hotel room was very nice ' 'i liked the locaCon and the room was very nice ' 'the king room at the back of the hotel was very quiet ' 'the room as very modern '
![Page 34: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/34.jpg)
Observations • Clusters tend to differ by topic (hotel, location, staff) • Certain bias towards the beginning of the sentence,
especially in pure LSTM model • Sometimes failing to capture negation • LSTM prefers full sentences, CNN also forms clusters
of words and sentences
![Page 35: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/35.jpg)
Qualitative Evaluation Distances in 10 most dense clusters
![Page 36: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/36.jpg)
Future Work • General domain model from Wikipedia
![Page 37: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/37.jpg)
Future Work • General domain model from Wikipedia • Improvements to LSTM implementations
§ Attention mechanism?
![Page 38: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/38.jpg)
Future Work • General domain model from Wikipedia • Improvements to LSTM implementations
§ Attention mechanism? • Supervised tasks (question similarity, answer selection)
§ Use autoencoder representation as fixed features § Add a supervised classification layer
![Page 39: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/39.jpg)
Future Work • General domain model from Wikipedia • Improvements to LSTM implementations
§ Attention mechanism? • Supervised tasks (question similarity, answer selection)
§ Use autoencoder representation as fixed features § Add a supervised classification layer
• Better CNN models, also on fine-grained levels § Deal with locality of convolution
![Page 40: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/40.jpg)
Future Work • General domain model from Wikipedia • Improvements to LSTM implementations
§ Attention mechanism? • Supervised tasks (question similarity, answer selection)
§ Use autoencoder representation as fixed features § Add a supervised classification layer
• Better CNN models, also on fine-grained levels § Deal with locality of convolution
• Combine LSTM and CNN during encoding
![Page 41: Deep Learning for Sentence Representationpeople.csail.mit.edu/jrg/meetings/ibm-internship-summary...Deep Learning for Sentence Representation Internship Project Summary Yonatan Belinkov](https://reader030.vdocuments.mx/reader030/viewer/2022041109/5f0e5f3f7e708231d43eed02/html5/thumbnails/41.jpg)
Future Work • General domain model from Wikipedia • Improvements to LSTM implementations
§ Attention mechanism? • Supervised tasks (question similarity, answer selection)
§ Use autoencoder representation as fixed features § Add a supervised classification layer
• Better CNN models, also on fine-grained levels § Deal with locality of convolution
• Combine LSTM and CNN during encoding • Decode with CNNs à variable length sentences?