reading comprehension on squad - stanford … · reading comprehension on squad zachary taylor,...

32
Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n Winter 2017 Codalab username: ztaylor Abstract We implement three models to answer questions in the SQuAD dataset based on their context paragraphs. We compare their results. Our first model consists of a question encoding using a bidirectional LSTM, a context encoding using an LSTM and taking the question into account, a straightforward attention mechanism, and a linear decoding. Our second model is an extension of this first model to include Dynamic Coattention. Our third model is also our simplest, and consists of separate question and context encodings using LSTMs and a dot product decoding of each context hidden state with the question representation. Initial evaluations demonstrated overfitting on the training data. The team chose to focus on debugging and understanding errors with a barebones baseline model. Ultimately, the model achieved a 24.25 F1 score and a14.01 exact match score. 1 Introduction 1.1 The Task The objective of this research is to build a question-answering machine comprehension system that performs on the SQuAD dataset. In the SQuAD task, answering a question is defined as predicting an answer span within a given context paragraph. The goal is to predict an answer span tuple {as, ae} given a question of length n, q = {q1, q2, ..., qn}, and a supporting context paragraph p = {p1, p2, ..., pm} of length m. Therefore, the model must comprehend the question and reason through the passage to identify the correct answer span. The model learns a function that, given a pair of sequences (q, p) returns a sequence of two scalar indices {as, ae} indicating the start position and end position of the answer in paragraph p, respectively. Note that in this task as ae, and 0 as, ae m. Previously, the task of machine comprehension was difficult to attempt due to datasets limited in size, but the SQuAD dataset enables researchers to now build end-to-end models. It is over double the size of other current machine comprehension datasets and all questions are hand-written by humans, requiring different forms of reasoning to answer. To approach the task, a baseline neural network model will be implemented. The architecture of the baseline model consists of feeding the context paragraph and question into an LSTM, and for each hidden state in our context representation, computing a simple dot product against the final question state to compute the probability that any index was the start or end index. After implementing the baseline model, the objective is to implement more complex models that are inspired from current state-of-the-art research. The first advanced model to implement involves utilizing a dynamic coattention network. The model also incorporates a bidirectional LSTMs and relies on linear decoding.

Upload: vuongdat

Post on 03-Sep-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

Reading Comprehension on SQuAD

Zachary Taylor, Brexton Pham, Meghana Rao

Stanford University, Department of Computer Science CS224n Winter 2017

Codalab username: ztaylor

Abstract

We implement three models to answer questions in the SQuAD dataset based on their context paragraphs. We compare their results. Our first model consists of a

question encoding using a bidirectional LSTM, a context encoding using an LSTM and taking the question into account, a straightforward attention

mechanism, and a linear decoding. Our second model is an extension of this first model to include Dynamic Coattention. Our third model is also our simplest, and

consists of separate question and context encodings using LSTMs and a dot product decoding of each context hidden state with the question representation. Initial evaluations demonstrated overfitting on the training data. The team chose

to focus on debugging and understanding errors with a barebones baseline model. Ultimately, the model achieved a 24.25 F1 score and a14.01 exact match

score.

1 Introduction 1.1 The Task

The objective of this research is to build a question-answering machine comprehension system that performs on the SQuAD dataset. In the SQuAD task, answering a question is defined as predicting an answer span within a given context paragraph. The goal is to predict an answer span tuple {as, ae} given a question of length n, q = {q1, q2, ..., qn}, and a supporting context paragraph p = {p1, p2, ..., pm} of length m. Therefore, the model must comprehend the question and reason through the passage to identify the correct answer span. The model learns a function that, given a pair of sequences (q, p) returns a sequence of two scalar indices {as, ae} indicating the start position and end position of the answer in paragraph p, respectively. Note that in this task as ≤ ae, and 0 ≤ as, ae ≤ m. Previously, the task of machine comprehension was difficult to attempt due to datasets limited in size, but the SQuAD dataset enables researchers to now build end-to-end models. It is over double the size of other current machine comprehension datasets and all questions are hand-written by humans, requiring different forms of reasoning to answer.

To approach the task, a baseline neural network model will be implemented. The architecture of the baseline model consists of feeding the context paragraph and question into an LSTM, and for each hidden state in our context representation, computing a simple dot product against the final question state to compute the probability that any index was the start or end index. After implementing the baseline model, the objective is to implement more complex models that are inspired from current state-of-the-art research. The first advanced model to implement involves utilizing a dynamic coattention network. The model also incorporates a bidirectional LSTMs and relies on linear decoding.

Page 2: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

1.2 Background/Related Work Wang et al from the IBM Watson Research Center built a Multi-Perspective Context

Matching (MPCM) model to approach the task of machine comprehension which predicts answer beginnings and endings in context paragraphs. The model multiplies each word embedding vector in the paragraph by a relevancy weight computed against a question and then uses a bi-directional LSTM to encode the question and passage. The model compares the context at each point in the paragraph against the question from multiple perspectives to find the beginning and ending point of the answer (Wang et al., 2016). A team of researchers from Salesforce Research implemented a dynamic coattention network which overcomes limitations that occur in a single-pass model. In their research, they implemented co-dependent representations of both the context paragraphs and the questions and then iterate over all potential answer spans using a dynamic pointing decoder. The co-dependent representations are computed by making an affinity matrix that contains affinity scores for each document word and each question word which they then fuse with temporal information using an LSTM. Their ensemble method achieved an F1 score of 80.4% on the SQuAD dataset (Socher, 2017). Jiang and Wang from Singapore Management University tackled the machine comprehension challenge by implementing a model that relies on a match-LSTM and an answer pointer. In the match-LSTM model, two sentences are fed in, where one is the premise and the other is the hypothesis. The model checks the words of the hypothesis against the words of the premise and creates a vector representation at each token. An attention mechanism is used to obtain the weighted vector representation which is then combined with the other previously constructed representation and fed into an LSTM. The answer pointer selects a position from the input text as an output symbol instead of picking an output token from a given fixed vocabulary. They achieved a 77% F1 score on the SQuAD dataset (Jiang, 2016).

2 Approach Data Preprocessing:

All data preprocessing was completed in train.py. The validation dataset was built by reading through the validation answers, spans, and context files. The training dataset was built by reading through and compiling the training answers, spans, context paragraphs. Each dataset also consisted of question and context paragraph masks and placeholders for the start and end indices of the answer. Padded token vectors were appended to the ends of the context paragraphs and questions that did not align in length in the dataset. Initialization of the model also occurs in train.py and is where previous saved versions of the model can be evaluated.

Baseline Model:

The baseline model was implemented in qa_model.py. In the baseline model, the questions and context are both encoded using a basic LSTM. In the encoder class, the basic LSTM cell is initialized and the outputs are created by fully dynamically unrolling the inputs using a dynamic RNN cell. The decoder class takes in a knowledge representation and outputs a probability estimation over all paragraph tokens, which helps determine which token should be the start or end of the answer spam. The knowledge representation takes in q and X, our final question hidden state and our context hidden states. The class QASystem is then initialized and is passed in the encoded question and context as well as the decoder object, embedding path and question and context masks. In the QASystem class, the placeholders for the start and end answer indices, the questions, the question masks, the context paragraphs, and the context paragraph masks are made. The probability distributions for the start and end answer are initialized.

Within QASystem, the embeddings, system, and loss are also initialized. The embeddings are initialized using the glove pretrained embeddings (we use 100 dimensional glove vectors). The loss is calculated using the softmax cross entropy loss with logits, and is the sum of the loss over the start and end placeholders. The loss is optimized using the Adam optimizer.

In the main training loop of the model, minibatches of random samples from the dataset of a preset size are assembled. Within each epoch, new minibatches are made until the entire training dataset has been covered. The inputs, and start and end answer labels are optimized. After

Page 3: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

each epoch, the loss on the validation set is calculated. To obtain the predicted start and end indices, the answer function is called, which calls a separate decode function. In decode, the probability distribution over different positions in the paragraph are returned. In the answer evaluation portion of the model, a random subsample of the validation dataset is read into the model. The F1 and exact match scores are initialized. The data is broken into segments of the size of the batch size in order to fit the placeholders initialized with the system setup.

For each sample, the string of the predicted answer is obtained by indexing into the context paragraph and grabbing the words that correspond to the predicted span of the answer. The true answer is obtained from indexing into the trained answers file. Both the predicted and true answer are passed into the F1 and EM functions and averaged.

Advnaced Model Our most advanced model incorporated dynamic coattention (see “Dynamic Coattention Networks for Question Answering”) with our previous structure of bidirectional LSTMs and co-question-context representations. We calculate attention over the question and context representations simultaneously by first calculating an affinity matrix between them, then we introduce some nonlinearities (we used softmax). We then calculate context summaries, taking into account each word of the context, by multiplying the original question representation concatenated with the combined question representation after the softmax by the post-softmax paragraph representation. We then do a similar set of calculations to get question summaries. We then use linear transformations with learnable weights to feed into our decoder. 3 Experiments To improve our model, we experimented with a wide variety of different techniques and hyperparameters. We had the goal of optimizing both our F1 score and our exact match (EM) score, which represent (roughly) how many of the words we predict as being in the answer actually are in the answer, as well as how many answers we get exactly right. When we initially implemented a baseline, we saw near-zero performance in terms of EM score on our validate set. We were able to overfit a small training sample, and even to overfit the overall training sample relatively well, but we weren’t seeing performance on the validate set even after 7 epochs:

Model F1 Score EM Score

Bidirectional Encoding w/ Attention 3.02 0.02

Partly in the attempt to fix this problem and partly in the attempt to implement a better model, we implemented a version of attention very similar to the Dynamic Coattention mechanism outlined in the paper “Dynamic Coattention Networks for Question Answering” in our references. After debugging our evaluate script and running this network on our validation set, however, we experienced similar problems:

Model F1 Score EM Score

Dynamic Coattention 3.42 0.01

Looking deeper into these results showed that our both of our models just weren’t learning anything generalizable. For example, here are some answers that our model predicted compared

Page 4: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

with the correct answers (there appears to be little correlation, and in fact, the lengths are consistently much longer than the correct answers):

Predicted Answer Correct Answer

The Sultanate of Ifat , led by the Walashma dynasty with its the Red Sea

Europe regularly for holidays . In 1889 Ireland

Highland Quichuas living in the valleys of the Sierra region . Primarily consisting

Inca

Kingdom of Great Britain and Northern Ireland and its interests and Great Britain and Northern Ireland

As the following loss chart (loss during training) for our Dynamic Coattention model shows, we were fitting our train data at least somewhat (the loss decreases a little bit), but the test performance was not good. We suspect that there was a bug in our model somewhere.

At this point in the project, we had invested incredible amounts of time but we were still seeing no results and had limited time remaining, so we decided to experiment with an extremely simple model in order to get some limited performance. To this end, we designed our simplest baseline model that fed the question and context through separate LSTMs and computed dot products as scores for both the start and the end index. This allowed us to train epochs extremely quickly compared to before (~30 minutes vs 3 hours), which in turn allowed us to debug our model more quickly and get the framework correct at a simple level. One experimental change that ended up helping us was to decrease the learning rate. By using a learning rate that started at .001 instead of .01 (the Adam Optimizer theoretically changes this, but it still effected our learning), our simple model seemed to make more progress in terms of loss after many epochs:

Page 5: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

Our simple model, as seen in the graph above, achieved a sustained lower loss than the other models (it levelled out at around 6.5). This wasn’t sure proof that we had fixed our problem, but we were delighted to find that this model actually was doing much better than our previous ones on the validation set. It achieved the following performance on the small subset we were predicting on:

Model F1 Score EM Score

Separate Encodings 22.51 9.96

We then dug deeper to find out why the model was performing better. We looked at examples such as the following:

Predicted Answer Correct Answer

hunting no natural predators

claims religious bigotry

March Bonus March

1639 1639

Catholic true apostolic succession

Toronto Toronto

system Jawed

inventor universities

The examples above show a couple answers that our models predicts correctly. Given the time crunch, we were not able to get our more advanced models to surpass this performance, but we did tune hyperparameters of our simple model (learning rate, batch size, hidden state size, etc) in order to find a good balance of speed, which allowed for more training, and flexibility. This eventually led to our FINAL TEST SET RESULTS (on codalab):

Model F1 Score EM Score

Page 6: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

Separate Encodings 24.25 14.01

In the analysis section, we will discuss what the drawbacks and the advantages of our model were, and what specific types of questions it answers poorly and well.

Evaluation:

Each model was trained for 10 epochs, and each epoch consisted of 2544 minibatches. Every 500 minibatches, the model was saved at its current timestamp in the .tmp directory in order to be evaluated. The validation cost on the validation dataset was calculated after each epoch using softmax cross entropy loss with logits. To evaluate the model, the F1 score and exact match score were calculated, two metrics introduced in the original SQuAD paper. Exact match measures the percentage of the predicted answers that match the exact ground truth answers. The F1 score is the average overlap between the ground truth and prediction, and is calculated by multiplying the precision and recall scores by two and dividing by their sum. The F1 score and exact match score were calculated as the average of the scores over the number of samples evaluated.

4 Analysis Our prediction result on dev-v1.1.json resulted in performance metrics of 23.85 and 13.556 for F1 and EM, respectively. One crucial characteristic of our model that was both a help and a hindrance to the model’s performance, based on the examples in the experiments section, was its limitation to predicting one word answers. Since we used the same decoding for the start and end labels, each start and end label was the same. This meant that the model got none of the answers correct that were more than one word, which is an obvious and large drawback. However, it also didn’t have the downside that our earlier models had of predicting spans that were too long in length. In fact, since there were a fair amount of one-word answers in the dataset, the model was able to get about 13% of the questions correct (13.556 EM score), and it also managed to sometimes pick a word that was part of the answer even if it didn’t capture the whole thing (23.85 F1). Another interesting phenomena we noticed was that our model was better at predicting certain types of one word answers than others. For example, if the correct answer was a name or a year, it was more likely to guess that answer correctly. We hypothesize that this is because the frequency of overall answers that are proper nouns or years is higher than other types of words, and the model learned to differentiate those. Also, there are usually clear identifiers in the question that would let the model know that it is looking for a certain category of noun (‘when’, ‘where’, ‘who’, etc). This probably implies that our simple model is not utilizing context as well as it could, but does take into account the likelihood based on word type. For example, consider the following visualization of the question and context paragraph that led to one of the answers from before:

Visualization

Question: Where is the Center for New Religions located ?

Context Paragraph: Robert S. Wood has argued that the United States is a model for the world in terms of how a separation of church and state—no state-run or state-established church—is good for both the church and the state , allowing a variety of religions to flourish . Speaking at the Toronto-based Center for New Religions , Wood said that the freedom of conscience and assembly allowed under such a system has led to a " remarkable religiosity " in the United States that is n't present in other industrialized nations . Wood believes that the U.S. operates on " a sort of civic religion , " which includes a generally-shared belief in a creator who " expects better of us . " Beyond that , individuals are free to decide how they want to believe and fill in their own

Page 7: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

creeds and express their conscience . He calls this approach the " genius of religious sentiment in the United States . "

Predicted Answer, Correct Answer: Toronto, Toronto

Analysis: Our suspicion is that the level that our simple model is operating at is one where it didn’t necessarily take the word “based” into account, as we were not using a bidirectional LSTM, making it hard to capture that relationship because the word “based” came later in the sentence. We would only capture forward relationships. Rather, our model probably managed to get the right answer by accurately representing the key word “located” as part of the question representation, and learning that the word “located” corresponds to location answers, such as Toronto. Being the only clear city name in the paragraph, it chose it in this case, though the model would likely not be able to differentiate very well between multiple city options if they were present (we saw this is in some other examples).

5 Conclusion

After completing this project, we have gained perspective on the challenges involved with implementing a successful neural network for the SQuAD reading comprehension task. We ultimately demonstrated a functioning baseline model after investing into implementing more complex models that had long training durations and overfit the training data. Initial successes on the training data with more complex models did not perform well on the validation set and thus catalyzed a series of simplifications and debugging procedures to scale back the model to a leaner and more iterable scale. In retrospect, we wish to have implemented the barebones model first before attempting to implement more ambitious models. Nevertheless, we at least managed to get some positive performance eventually, and we learned an incredible amount about implementing neural network models for NLP. Also, we thought that even 13% of correct answers was a worthy accomplishment on such a difficult machine comprehension task. For possible future directions, an obvious area of improvement would be changing the model to be able to predicted longer-than-one-word answers. This would involve careful experimentation and modification of the one model that we managed to get some meaningful performance out of. We could then graduate to implementing more advanced attention mechanisms. Potentially, we could learn what about our more advanced models didn’t manage to translate to the evaluation set, and fix that element. In terms of other advanced models that we are interested in, answer pointers seem like a potentially valuable next step to take if we got the other elements working. 6 Contributions Zach- Implemented the data preprocessing/reading into the model. Implemented the full baseline model, including read in, variable set up, loss, optimization, encoder/decoder, etc. Implemented the dynamic coattention model. Implemented the simple baseline model that achieved our group’s top performance. Implemented qa_answer.py to produce predictions for evaluation. Went through the codalab submission process and was responsible for producing results on the leaderboard. Wrote the “advanced models”, “experiments”, and “analysis” portions of the paper. Contributed to the poster. Meghana - Implemented evaluation answer, decode, and validation cost in baseline model. Monitored multiple training runs of the baseline model and worked on debugging the saving and reloading mechanism. Initialized validation dataset. Wrote the introduction, background, conclusion, and baseline approach of the paper and formatted the paper. Main person in charge of managing VM resources, etc. Formatted and contributed to the poster. Brexton - Also implemented evaluation answer, decode, and validation cost. Contributed to

Page 8: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

debugging the saving and reloading mechanism. Implemented methods of validating and evaluating on larger batch sizes. Spent lots of time debugging the performance of the initial models. Ran multiple tests on the virtual environment of baseline and advanced models. Contributed to poster and background information. Main person in charge of previous work research.

7 References Chen, D., Bolton, J., Manning, C. 2016. Stanford Attentive Reader. A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task. arXiv preprint arXiv:1606.02858v2

Jiang, J., Wang, S. 2016. Match-LSTM with Answer-Pointer. arXiv preprint arXiv: 1608.07905

Lee et al., 2017. Recurrent Span Representation, RaSoR. arXiv preprint arXiv:1611.01436v2.

Seo et al., 2017. Bi-directional Attention Flow, BiDAF. arXiv preprint arXiv arXiv:1611.01603v5.

Socher, R., Xiong, C., Zhong, V. 2016. Dynamic Coattention Networks for Question Answering. arXiv preprint arXiv:1611.01604

Wang et al., 2016. Multi-perspective context matching for machine comprehension. arXiv preprint arXiv:1612.04211, 2016

Yang et al., 2016. Words or Characters? Fine-Grained Grating for Reading Comprehension. arXiv preprint arXiv:1611.01724v1.

Yu et al., 2016. Dynamic Chunk Reader. arXiv preprint arXiv:1610.09996v2.

8 Supplementary Materials t r a i n . p y f r o m _ _ f u t u r e _ _ i m p o r t a b s o l u t e _ i m p o r t f r o m _ _ f u t u r e _ _ i m p o r t d i v i s i o n f r o m _ _ f u t u r e _ _ i m p o r t p r i n t _ f u n c t i o n i m p o r t o s i m p o r t j s o n i m p o r t t e n s o r f l o w a s t f i m p o r t n u m p y a s n p f r o m q a _ m o d e l i m p o r t E n c o d e r , Q A S y s t e m , D e c o d e r f r o m o s . p a t h i m p o r t j o i n a s p j o i n i m p o r t l o g g i n g i m p o r t s y s l o g g i n g . b a s i c C o n f i g ( l e v e l = l o g g i n g . I N F O ) t f . a p p . f l a g s . D E F I N E _ f l o a t ( " t r a i n i n g _ c a p " , 3 2 , " T r a i n i n g c a p . " ) t f . a p p . f l a g s . D E F I N E _ f l o a t ( " l e a r n i n g _ r a t e " , 0 . 0 0 1 , " L e a r n i n g r a t e . " ) t f . a p p . f l a g s . D E F I N E _ f l o a t ( " m a x _ g r a d i e n t _ n o r m " , 2 0 . 0 , " C l i p g r a d i e n t s t o t h i s n o r m . " ) t f . a p p . f l a g s . D E F I N E _ f l o a t ( " d r o p o u t " , 0 . 1 5 , " F r a c t i o n o f u n i t s r a n d o m l y d r o p p e d o n n o n - r e c u r r e n t c o n n e c t i o n s . " ) t f . a p p . f l a g s . D E F I N E _ i n t e g e r ( " b a t c h _ s i z e " , 3 2 , " B a t c h s i z e t o u s e d u r i n g t r a i n i n g . " ) t f . a p p . f l a g s . D E F I N E _ i n t e g e r ( " e p o c h s " , 1 5 , " N u m b e r o f e p o c h s t o t r a i n . " )

Page 9: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

# t f . a p p . f l a g s . D E F I N E _ i n t e g e r ( " s a m p l e S i z e " , 4 0 , " N u m b e r o f e p o c h s t o t r a i n . " ) # t h i s i s t h e n u m b e r o f s a m p l e s f e d i n t o e v a l u a t e a n s w e r t f . a p p . f l a g s . D E F I N E _ i n t e g e r ( " s t a t e _ s i z e " , 1 2 4 , " S i z e o f e a c h m o d e l l a y e r . " ) t f . a p p . f l a g s . D E F I N E _ s t r i n g ( " t r a i n _ d i r " , " t r a i n " , " T r a i n i n g d i r e c t o r y ( d e f a u l t : . / t r a i n ) . " ) t f . a p p . f l a g s . D E F I N E _ i n t e g e r ( " o u t p u t _ s i z e " , 4 0 0 , " T h e o u t p u t s i z e o f y o u r m o d e l . " ) t f . a p p . f l a g s . D E F I N E _ i n t e g e r ( " e m b e d d i n g _ s i z e " , 1 0 0 , " S i z e o f t h e p r e t r a i n e d v o c a b u l a r y . " ) t f . a p p . f l a g s . D E F I N E _ s t r i n g ( " d a t a _ d i r " , " d a t a / s q u a d " , " S Q u A D d i r e c t o r y ( d e f a u l t . / d a t a / s q u a d ) " ) t f . a p p . f l a g s . D E F I N E _ s t r i n g ( " l o a d _ t r a i n _ d i r " , " t r a i n " , " T r a i n i n g d i r e c t o r y t o l o a d m o d e l p a r a m e t e r s f r o m t o r e s u m e t r a i n i n g ( d e f a u l t : { t r a i n _ d i r } ) . " ) t f . a p p . f l a g s . D E F I N E _ s t r i n g ( " l o g _ d i r " , " l o g " , " P a t h t o s t o r e l o g a n d f l a g f i l e s ( d e f a u l t : . / l o g ) " ) t f . a p p . f l a g s . D E F I N E _ s t r i n g ( " o p t i m i z e r " , " a d a m " , " a d a m / s g d " ) t f . a p p . f l a g s . D E F I N E _ i n t e g e r ( " p r i n t _ e v e r y " , 1 , " H o w m a n y i t e r a t i o n s t o d o p e r p r i n t . " ) t f . a p p . f l a g s . D E F I N E _ i n t e g e r ( " k e e p " , 0 , " H o w m a n y c h e c k p o i n t s t o k e e p , 0 i n d i c a t e s k e e p a l l . " ) t f . a p p . f l a g s . D E F I N E _ s t r i n g ( " v o c a b _ p a t h " , " d a t a / s q u a d / v o c a b . d a t " , " P a t h t o v o c a b f i l e ( d e f a u l t : . / d a t a / s q u a d / v o c a b . d a t ) " ) t f . a p p . f l a g s . D E F I N E _ s t r i n g ( " e m b e d _ p a t h " , " " , " P a t h t o t h e t r i m m e d G L o V e e m b e d d i n g ( d e f a u l t : . / d a t a / s q u a d / g l o v e . t r i m m e d . { e m b e d d i n g _ s i z e } . n p z ) " ) F L A G S = t f . a p p . f l a g s . F L A G S d e f i n i t i a l i z e _ m o d e l ( s e s s i o n , m o d e l , t r a i n _ d i r ) : p r i n t ( ' t r a i n d i r ' , t r a i n _ d i r ) c k p t = t f . t r a i n . g e t _ c h e c k p o i n t _ s t a t e ( t r a i n _ d i r ) p r i n t ( ' c k p t ' , c k p t ) v 2 _ p a t h = c k p t . m o d e l _ c h e c k p o i n t _ p a t h + " . i n d e x " i f c k p t e l s e " " p r i n t ( ' v 2 _ p a t h ' , v 2 _ p a t h ) i f c k p t a n d ( t f . g f i l e . E x i s t s ( c k p t . m o d e l _ c h e c k p o i n t _ p a t h ) o r t f . g f i l e . E x i s t s ( v 2 _ p a t h ) ) : l o g g i n g . i n f o ( " R e a d i n g m o d e l p a r a m e t e r s f r o m % s " % c k p t . m o d e l _ c h e c k p o i n t _ p a t h ) p r i n t ( ' c k p t . m o d e l _ c h e c k p o i n t _ p a t h ' , c k p t . m o d e l _ c h e c k p o i n t _ p a t h ) m o d e l . s a v e r . r e s t o r e ( s e s s i o n , c k p t . m o d e l _ c h e c k p o i n t _ p a t h ) e l s e : l o g g i n g . i n f o ( " C r e a t e d m o d e l w i t h f r e s h p a r a m e t e r s . " ) s e s s i o n . r u n ( t f . g l o b a l _ v a r i a b l e s _ i n i t i a l i z e r ( ) ) l o g g i n g . i n f o ( ' N u m p a r a m s : % d ' % s u m ( v . g e t _ s h a p e ( ) . n u m _ e l e m e n t s ( ) f o r v i n t f . t r a i n a b l e _ v a r i a b l e s ( ) ) ) r e t u r n m o d e l d e f i n i t i a l i z e _ v o c a b ( v o c a b _ p a t h ) : i f t f . g f i l e . E x i s t s ( v o c a b _ p a t h ) : r e v _ v o c a b = [ ] w i t h t f . g f i l e . G F i l e ( v o c a b _ p a t h , m o d e = " r b " ) a s f : r e v _ v o c a b . e x t e n d ( f . r e a d l i n e s ( ) ) r e v _ v o c a b = [ l i n e . s t r i p ( ' \ n ' ) f o r l i n e i n r e v _ v o c a b ] v o c a b = d i c t ( [ ( x , y ) f o r ( y , x ) i n e n u m e r a t e ( r e v _ v o c a b ) ] ) r e t u r n v o c a b , r e v _ v o c a b e l s e : r a i s e V a l u e E r r o r ( " V o c a b u l a r y f i l e % s n o t f o u n d . " , v o c a b _ p a t h ) d e f g e t _ n o r m a l i z e d _ t r a i n _ d i r ( t r a i n _ d i r ) : " " "

Page 10: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

A d d s s y m l i n k t o { t r a i n _ d i r } f r o m / t m p / c s 2 2 4 n - s q u a d - t r a i n t o c a n o n i c a l i z e t h e f i l e p a t h s s a v e d i n t h e c h e c k p o i n t . T h i s a l l o w s t h e m o d e l t o b e r e l o a d e d e v e n i f t h e l o c a t i o n o f t h e c h e c k p o i n t f i l e s h a s m o v e d , a l l o w i n g u s a g e w i t h C o d a L a b . T h i s m u s t b e d o n e o n b o t h t r a i n . p y a n d q a _ a n s w e r . p y i n o r d e r t o w o r k . " " " # g l o b a l _ t r a i n _ d i r = ' . / t r a i n / 2 0 1 7 0 3 1 6 _ 0 0 2 0 0 2 ' # g l o b a l _ t r a i n _ d i r = ' . / t r a i n / 2 0 1 7 0 3 1 6 _ 0 0 2 0 0 2 ' # g l o b a l _ t r a i n _ d i r = ' . / t r a i n / 2 0 1 7 0 3 1 6 _ 0 0 2 0 0 2 / m o d e l . w e i g h t s ' g l o b a l _ t r a i n _ d i r = " / t m p / c s 2 2 4 n - s q u a d - t r a i n " # g l o b a l _ t r a i n _ d i r = t r a i n _ d i r # o s . r e m o v e ( g l o b a l _ t r a i n _ d i r ) i f o s . p a t h . e x i s t s ( g l o b a l _ t r a i n _ d i r ) : o s . u n l i n k ( g l o b a l _ t r a i n _ d i r ) i f n o t o s . p a t h . e x i s t s ( t r a i n _ d i r ) : o s . m a k e d i r s ( t r a i n _ d i r ) o s . s y m l i n k ( o s . p a t h . a b s p a t h ( t r a i n _ d i r ) , g l o b a l _ t r a i n _ d i r ) r e t u r n g l o b a l _ t r a i n _ d i r d e f m a i n ( _ ) : e m b e d _ p a t h = F L A G S . e m b e d _ p a t h o r p j o i n ( " d a t a " , " s q u a d " , " g l o v e . t r i m m e d . { } . n p z " . f o r m a t ( F L A G S . e m b e d d i n g _ s i z e ) ) v o c a b _ p a t h = F L A G S . v o c a b _ p a t h o r p j o i n ( F L A G S . d a t a _ d i r , " v o c a b . d a t " ) v o c a b , r e v _ v o c a b = i n i t i a l i z e _ v o c a b ( v o c a b _ p a t h ) d e f p a d d e d _ t o k e n _ v e c t o r ( v e c t o r , m a x _ l e n g t h ) : r e t u r n [ v o c a b [ v e c t o r [ i ] ] i f i < l e n ( v e c t o r ) e l s e v o c a b [ ' < p a d > ' ] f o r i i n r a n g e ( m a x _ l e n g t h ) ] # D o w h a t y o u n e e d t o l o a d d a t a s e t s f r o m F L A G S . d a t a _ d i r # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # T R A I N D A T A S E T # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # q u e s t i o n s = [ ] q u e s t i o n _ m a s k = [ ] w i t h o p e n ( F L A G S . d a t a _ d i r + " / t r a i n . q u e s t i o n " ) a s f i l e : f o r l i n e i n f i l e : q = s t r . s p l i t ( l i n e ) q u e s t i o n _ m a s k . a p p e n d ( l e n ( q ) ) q u e s t i o n s . a p p e n d ( p a d d e d _ t o k e n _ v e c t o r ( q , 6 0 ) ) o v e r f l o w _ i n d i c e s = [ ] c o n t e x t = [ ] c o n t e x t _ m a s k = [ ] w i t h o p e n ( F L A G S . d a t a _ d i r + " / t r a i n . c o n t e x t " ) a s f i l e : # i = 0 f o r i , l i n e i n e n u m e r a t e ( f i l e ) : # i f i < 3 2 : c = s t r . s p l i t ( l i n e ) i f l e n ( c ) > F L A G S . o u t p u t _ s i z e : o v e r f l o w _ i n d i c e s . a p p e n d ( i ) c o n t e x t _ m a s k . a p p e n d ( l e n ( c ) ) c o n t e x t . a p p e n d ( p a d d e d _ t o k e n _ v e c t o r ( c , F L A G S . o u t p u t _ s i z e ) ) # i + = 1 c o n t e x t W o r d s = [ ] # g e t a l l t h e w o r d s f r o m e a c h c o n t e x t

Page 11: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

w i t h o p e n ( F L A G S . d a t a _ d i r + " / t r a i n . c o n t e x t " ) a s f i l e : # i = 0 f o r l i n e i n f i l e : # i f i < 3 2 : n e w C o n t e x t = [ ] c = s t r . s p l i t ( l i n e ) f o r e a c h C o n t e x t i n c : w o r d = e a c h C o n t e x t . s p l i t ( " " ) n e w C o n t e x t . a p p e n d ( w o r d ) c o n t e x t W o r d s . a p p e n d ( n e w C o n t e x t ) # i + = 1 l a b e l s = [ ] l a b e l s _ s t a r t = [ ] l a b e l s _ e n d = [ ] w i t h o p e n ( F L A G S . d a t a _ d i r + " / t r a i n . s p a n " ) a s f i l e : l i n e N u m = 0 # i = 0 f o r l i n e i n f i l e : # i f i < 3 2 : s t a r t , e n d = [ i n t ( s ) f o r s i n s t r . s p l i t ( l i n e ) ] l a b e l s _ s t a r t . a p p e n d ( s t a r t ) l a b e l s _ e n d . a p p e n d ( e n d ) s t a r t _ w o r d = c o n t e x t W o r d s [ l i n e N u m ] [ s t a r t ] e n d _ w o r d = c o n t e x t W o r d s [ l i n e N u m ] [ e n d ] l i n e N u m = l i n e N u m + 1 # p r i n t ( s t a r t _ w o r d , e n d _ w o r d ) # s y s . e x i t ( ) # c r e a t e t h e s e t w o r a r r a y s s ) # i + = 1 t r a i n A n s w e r s = [ ] w i t h o p e n ( F L A G S . d a t a _ d i r + " / t r a i n . a n s w e r " ) a s f i l e : t r a i n A n s w e r s = f i l e . r e a d l i n e s ( ) t r a i n A n s w e r s = [ x . s t r i p ( ) f o r x i n t r a i n A n s w e r s ] # [ : 3 2 ] c o n t e x t P a r a g r a p h s = [ ] w i t h o p e n ( F L A G S . d a t a _ d i r + " / t r a i n . c o n t e x t " ) a s f i l e : c o n t e x t P a r a g r a p h s = f i l e . r e a d l i n e s ( ) c o n t e x t P a r a g r a p h s = [ x . s t r i p ( ) f o r x i n c o n t e x t P a r a g r a p h s ] # [ : 3 2 ] # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # V A L I D A T E D A T A S E T # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # v a l _ q u e s t i o n s = [ ] v a l _ q u e s t i o n _ m a s k = [ ] w i t h o p e n ( F L A G S . d a t a _ d i r + " / v a l . q u e s t i o n " ) a s f i l e : f o r l i n e i n f i l e : q = s t r . s p l i t ( l i n e ) v a l _ q u e s t i o n _ m a s k . a p p e n d ( l e n ( q ) ) v a l _ q u e s t i o n s . a p p e n d ( p a d d e d _ t o k e n _ v e c t o r ( q , 6 0 ) ) v a l _ o v e r f l o w _ i n d i c e s = [ ] v a l _ c o n t e x t = [ ] v a l _ c o n t e x t _ m a s k = [ ] w i t h o p e n ( F L A G S . d a t a _ d i r + " / v a l . c o n t e x t " ) a s f i l e : f o r i , l i n e i n e n u m e r a t e ( f i l e ) : c = s t r . s p l i t ( l i n e ) i f l e n ( c ) > F L A G S . o u t p u t _ s i z e : v a l _ o v e r f l o w _ i n d i c e s . a p p e n d ( i ) v a l _ c o n t e x t _ m a s k . a p p e n d ( l e n ( c ) ) v a l _ c o n t e x t . a p p e n d ( p a d d e d _ t o k e n _ v e c t o r ( c , F L A G S . o u t p u t _ s i z e ) )

Page 12: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

v a l _ c o n t e x t W o r d s = [ ] # g e t a l l t h e w o r d s f r o m e a c h c o n t e x t w i t h o p e n ( F L A G S . d a t a _ d i r + " / v a l . c o n t e x t " ) a s f i l e : f o r l i n e i n f i l e : n e w C o n t e x t = [ ] c = s t r . s p l i t ( l i n e ) f o r e a c h C o n t e x t i n c : w o r d = e a c h C o n t e x t . s p l i t ( " " ) n e w C o n t e x t . a p p e n d ( w o r d ) v a l _ c o n t e x t W o r d s . a p p e n d ( n e w C o n t e x t ) v a l _ l a b e l s = [ ] v a l _ l a b e l s _ s t a r t = [ ] v a l _ l a b e l s _ e n d = [ ] w i t h o p e n ( F L A G S . d a t a _ d i r + " / v a l . s p a n " ) a s f i l e : l i n e N u m = 0 f o r l i n e i n f i l e : s t a r t , e n d = [ i n t ( s ) f o r s i n s t r . s p l i t ( l i n e ) ] v a l _ l a b e l s _ s t a r t . a p p e n d ( s t a r t ) v a l _ l a b e l s _ e n d . a p p e n d ( e n d ) v a l A n s w e r s = [ ] w i t h o p e n ( F L A G S . d a t a _ d i r + " / v a l . a n s w e r " ) a s f i l e : v a l A n s w e r s = f i l e . r e a d l i n e s ( ) v a l A n s w e r s = [ x . s t r i p ( ) f o r x i n v a l A n s w e r s ] # [ : 3 2 ] v a l P a r a g r a p h s = [ ] w i t h o p e n ( F L A G S . d a t a _ d i r + " / v a l . c o n t e x t " ) a s f i l e : v a l P a r a g r a p h s = f i l e . r e a d l i n e s ( ) v a l P a r a g r a p h s = [ x . s t r i p ( ) f o r x i n c o n t e x t P a r a g r a p h s ] # [ : 3 2 ] f o r i i n r e v e r s e d ( o v e r f l o w _ i n d i c e s ) : q u e s t i o n s . p o p ( i ) q u e s t i o n _ m a s k . p o p ( i ) c o n t e x t . p o p ( i ) c o n t e x t _ m a s k . p o p ( i ) c o n t e x t W o r d s . p o p ( i ) l a b e l s _ s t a r t . p o p ( i ) l a b e l s _ e n d . p o p ( i ) t r a i n A n s w e r s . p o p ( i ) c o n t e x t P a r a g r a p h s . p o p ( i ) # l a b e l s = [ ] # w i t h o p e n ( F L A G S . d a t a _ d i r + " / t r a i n . s p a n " ) a s f i l e : # f o r l i n e i n f i l e : # s t a r t , e n d = [ i n t ( s ) f o r s i n s t r . s p l i t ( l i n e ) ] # a = [ 1 i f i > = s t a r t a n d i < = e n d e l s e 0 f o r i i n r a n g e ( 7 6 6 ) ] # l a b e l s _ s t a r t . a p p e n d ( s t a r t ) # l a b e l s _ e n d . a p p e n d ( e n d ) # c r e a t e t h e s e t w o r a r r a y s s ) i n p u t s = ( q u e s t i o n s , c o n t e x t , q u e s t i o n _ m a s k , c o n t e x t _ m a s k ) v a l _ d a t a s e t = ( v a l _ q u e s t i o n s , v a l _ c o n t e x t , v a l _ q u e s t i o n _ m a s k , v a l _ c o n t e x t _ m a s k , v a l _ l a b e l s _ s t a r t , v a l _ l a b e l s _ e n d , v a l _ c o n t e x t W o r d s , v a l A n s w e r s , v a l P a r a g r a p h s ) # d a t a s e t = ( q u e s t i o n s , c o n t e x t , q u e s t i o n _ m a s k , c o n t e x t _ m a s k , l a b e l s ) d a t a s e t = ( q u e s t i o n s , c o n t e x t , q u e s t i o n _ m a s k , c o n t e x t _ m a s k , l a b e l s _ s t a r t , l a b e l s _ e n d , c o n t e x t W o r d s , t r a i n A n s w e r s , c o n t e x t P a r a g r a p h s )

Page 13: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

q u e s t i o n _ e n c o d e r = E n c o d e r ( s i z e = F L A G S . s t a t e _ s i z e ) c o n t e x t _ e n c o d e r = E n c o d e r ( s i z e = F L A G S . s t a t e _ s i z e ) d e c o d e r = D e c o d e r ( o u t p u t _ s i z e = F L A G S . o u t p u t _ s i z e ) q a = Q A S y s t e m ( q u e s t i o n _ e n c o d e r , c o n t e x t _ e n c o d e r , d e c o d e r , e m b e d _ p a t h , q u e s t i o n _ m a s k , c o n t e x t _ m a s k ) i f n o t o s . p a t h . e x i s t s ( F L A G S . l o g _ d i r ) : o s . m a k e d i r s ( F L A G S . l o g _ d i r ) f i l e _ h a n d l e r = l o g g i n g . F i l e H a n d l e r ( p j o i n ( F L A G S . l o g _ d i r , " l o g . t x t " ) ) l o g g i n g . g e t L o g g e r ( ) . a d d H a n d l e r ( f i l e _ h a n d l e r ) p r i n t ( v a r s ( F L A G S ) ) w i t h o p e n ( o s . p a t h . j o i n ( F L A G S . l o g _ d i r , " f l a g s . j s o n " ) , ' w ' ) a s f o u t : j s o n . d u m p ( F L A G S . _ _ f l a g s , f o u t ) w i t h t f . S e s s i o n ( ) a s s e s s : # l o a d _ t r a i n _ d i r = g e t _ n o r m a l i z e d _ t r a i n _ d i r ( F L A G S . l o a d _ t r a i n _ d i r o r F L A G S . t r a i n _ d i r ) # l o a d _ t r a i n _ d i r = ' / t m p / c s 2 2 4 n - s q u a d - t r a i n / 2 0 1 7 0 3 1 9 _ 1 6 0 4 0 3 ' l o a d _ t r a i n _ d i r = ' . / t r a i n / 2 0 1 7 0 3 2 1 _ 1 1 0 7 2 3 ' i n i t i a l i z e _ m o d e l ( s e s s , q a , l o a d _ t r a i n _ d i r ) s a v e _ t r a i n _ d i r = g e t _ n o r m a l i z e d _ t r a i n _ d i r ( F L A G S . t r a i n _ d i r ) q a . t r a i n ( s e s s , d a t a s e t , s a v e _ t r a i n _ d i r , v a l _ d a t a s e t ) q a . e v a l u a t e _ a n s w e r ( s e s s , v a l _ d a t a s e t , l o g = T r u e ) i f _ _ n a m e _ _ = = " _ _ m a i n _ _ " : t f . a p p . r u n ( ) q a _ m o d e l . p y f r o m _ _ f u t u r e _ _ i m p o r t a b s o l u t e _ i m p o r t f r o m _ _ f u t u r e _ _ i m p o r t d i v i s i o n f r o m _ _ f u t u r e _ _ i m p o r t p r i n t _ f u n c t i o n i m p o r t t i m e i m p o r t p d b i m p o r t l o g g i n g i m p o r t d a t e t i m e i m p o r t o s i m p o r t r a n d o m i m p o r t s y s i m p o r t n u m p y a s n p f r o m s i x . m o v e s i m p o r t x r a n g e # p y l i n t : d i s a b l e = r e d e f i n e d - b u i l t i n i m p o r t t e n s o r f l o w a s t f f r o m t e n s o r f l o w . p y t h o n . o p s i m p o r t v a r i a b l e _ s c o p e a s v s f r o m e v a l u a t e i m p o r t e x a c t _ m a t c h _ s c o r e , f 1 _ s c o r e f r o m t e n s o r f l o w . p y t h o n . o p s . n n i m p o r t

Page 14: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

s p a r s e _ s o f t m a x _ c r o s s _ e n t r o p y _ w i t h _ l o g i t s a s s s c e l o g g i n g . b a s i c C o n f i g ( l e v e l = l o g g i n g . I N F O ) d e f g e t _ o p t i m i z e r ( o p t ) : i f o p t = = " a d a m " : o p t f n = t f . t r a i n . A d a m O p t i m i z e r e l i f o p t = = " s g d " : o p t f n = t f . t r a i n . G r a d i e n t D e s c e n t O p t i m i z e r e l s e : a s s e r t ( F a l s e ) r e t u r n o p t f n c l a s s E n c o d e r ( o b j e c t ) : d e f _ _ i n i t _ _ ( s e l f , s i z e ) : s e l f . s i z e = s i z e s e l f . F L A G S = t f . a p p . f l a g s . F L A G S d e f e n c o d e ( s e l f , i n p u t s , m a s k s , s c o p e _ n a m e , e n c o d e r _ s t a t e _ i n p u t = N o n e ) : " " " I n a g e n e r a l i z e d e n c o d e f u n c t i o n , y o u p a s s i n y o u r i n p u t s , m a s k s , a n d a n i n i t i a l h i d d e n s t a t e i n p u t i n t o t h i s f u n c t i o n . : p a r a m i n p u t s : S y m b o l i c r e p r e s e n t a t i o n s o f y o u r i n p u t : p a r a m m a s k s : t h i s i s t o m a k e s u r e t f . n n . d y n a m i c _ r n n d o e s n ' t i t e r a t e t h r o u g h m a s k e d s t e p s : p a r a m e n c o d e r _ s t a t e _ i n p u t : ( O p t i o n a l ) p a s s t h i s a s i n i t i a l h i d d e n s t a t e t o t f . n n . d y n a m i c _ r n n t o b u i l d c o n d i t i o n a l r e p r e s e n t a t i o n s : r e t u r n : a n e n c o d e d r e p r e s e n t a t i o n o f y o u r i n p u t . I t c a n b e c o n t e x t - l e v e l r e p r e s e n t a t i o n , w o r d - l e v e l r e p r e s e n t a t i o n , o r b o t h . " " " w i t h v s . v a r i a b l e _ s c o p e ( s c o p e _ n a m e ) : c e l l = t f . n n . r n n _ c e l l . B a s i c L S T M C e l l ( s e l f . s i z e ) i f e n c o d e r _ s t a t e _ i n p u t = = N o n e : i n i t i a l _ s t a t e = c e l l . z e r o _ s t a t e ( s e l f . F L A G S . b a t c h _ s i z e , t f . f l o a t 6 4 ) # o u t p u t s , ( o u t p u t _ f w , o u t p u t _ b w ) = t f . n n . b i d i r e c t i o n a l _ d y n a m i c _ r n n ( c e l l _ f w = c e l l , c e l l _ b w = c e l l , i n p u t s = i n p u t s , s e q u e n c e _ l e n g t h = m a s k s , i n i t i a l _ s t a t e _ f w = i n i t i a l _ s t a t e , i n i t i a l _ s t a t e _ b w = i n i t i a l _ s t a t e ) # c , h = ( t f . c o n c a t ( 1 , [ o u t p u t _ f w . c , o u t p u t _ b w . c ] ) , t f . c o n c a t ( 1 , [ o u t p u t _ f w . h , o u t p u t _ b w . h ] ) ) # o u t p u t = t f . n n . r n n _ c e l l . L S T M S t a t e T u p l e ( c , h ) # o u t p u t _ s t a t e s = t f . c o n c a t ( 2 , [ o u t p u t s [ 0 ] , o u t p u t s [ 1 ] ] ) o u t p u t _ s t a t e s , o u t p u t = t f . n n . d y n a m i c _ r n n ( c e l l , i n p u t s = i n p u t s , s e q u e n c e _ l e n g t h = m a s k s , i n i t i a l _ s t a t e = i n i t i a l _ s t a t e ) # o u t p u t = t f . c o n c a t ( 1 , [ o u t p u t . c , o u t p u t . h ] ) e l s e : i n i t i a l _ s t a t e = e n c o d e r _ s t a t e _ i n p u t o u t p u t _ s t a t e s , o u t p u t = t f . n n . d y n a m i c _ r n n ( c e l l , i n p u t s = i n p u t s , s e q u e n c e _ l e n g t h = m a s k s , i n i t i a l _ s t a t e = i n i t i a l _ s t a t e ) r e t u r n o u t p u t . h , o u t p u t _ s t a t e s

Page 15: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

c l a s s D e c o d e r ( o b j e c t ) : d e f _ _ i n i t _ _ ( s e l f , o u t p u t _ s i z e ) : s e l f . o u t p u t _ s i z e = o u t p u t _ s i z e s e l f . F L A G S = t f . a p p . f l a g s . F L A G S d e f d e c o d e ( s e l f , k n o w l e d g e _ r e p ) : " " " t a k e s i n a k n o w l e d g e r e p r e s e n t a t i o n a n d o u t p u t a p r o b a b i l i t y e s t i m a t i o n o v e r a l l p a r a g r a p h t o k e n s o n w h i c h t o k e n s h o u l d b e t h e s t a r t o f t h e a n s w e r s p a n , a n d w h i c h s h o u l d b e t h e e n d o f t h e a n s w e r s p a n . : p a r a m k n o w l e d g e _ r e p : i t i s a r e p r e s e n t a t i o n o f t h e p a r a g r a p h a n d q u e s t i o n , d e c i d e d b y h o w y o u c h o o s e t o i m p l e m e n t t h e e n c o d e r : r e t u r n : " " " b o u n d a r y _ m o d e l = T r u e i f b o u n d a r y _ m o d e l : # h _ q , h _ p = k n o w l e d g e _ r e p # w i t h v s . v a r i a b l e _ s c o p e ( " a n s w e r _ s t a r t " ) : # a _ s = t f . n n . r n n _ c e l l . _ l i n e a r ( [ h _ q , h _ p ] , s e l f . o u t p u t _ s i z e , T r u e , 1 . 0 ) # w i t h v s . v a r i a b l e _ s c o p e ( " a n s w e r _ e n d " ) : # a _ e = t f . n n . r n n _ c e l l . _ l i n e a r ( [ h _ q , h _ p ] , s e l f . o u t p u t _ s i z e , T r u e , 1 . 0 ) # r e t u r n ( a _ s , a _ e ) q , X = k n o w l e d g e _ r e p a = t f . s q u e e z e ( t f . m a t m u l ( X , t f . r e s h a p e ( q , [ - 1 , s e l f . F L A G S . s t a t e _ s i z e , 1 ] ) ) ) r e t u r n ( a , a ) e l s e : P = k n o w l e d g e _ r e p p r i n t ( ' P : ' , P ) h _ s i z e = s e l f . F L A G S . s t a t e _ s i z e W _ s = t f . g e t _ v a r i a b l e ( " W _ s " , s h a p e = [ 1 , h _ s i z e , 1 ] , i n i t i a l i z e r = t f . c o n t r i b . l a y e r s . x a v i e r _ i n i t i a l i z e r ( ) , d t y p e = t f . f l o a t 6 4 ) t i l e d _ W _ s = t f . t i l e ( W _ s , t f . p a c k ( [ s e l f . F L A G S . b a t c h _ s i z e , 1 , 1 ] ) ) S = t f . s q u e e z e ( t f . m a t m u l ( P , t i l e d _ W _ s ) ) p r i n t ( ' S : ' , S ) c e l l = t f . n n . r n n _ c e l l . B a s i c L S T M C e l l ( h _ s i z e ) i n i t i a l _ s t a t e = c e l l . z e r o _ s t a t e ( s e l f . F L A G S . b a t c h _ s i z e , t f . f l o a t 6 4 ) P _ n e w , f i n a l _ s t a t e s = t f . n n . d y n a m i c _ r n n ( c e l l , i n p u t s = P , i n i t i a l _ s t a t e = i n i t i a l _ s t a t e ) p r i n t ( ' f i n a l L S T M o u t p u t s ' , P _ n e w ) W _ e = t f . g e t _ v a r i a b l e ( " W _ e " , s h a p e = [ 1 , h _ s i z e , 1 ] , i n i t i a l i z e r = t f . c o n t r i b . l a y e r s . x a v i e r _ i n i t i a l i z e r ( ) , d t y p e = t f . f l o a t 6 4 ) t i l e d _ W _ e = t f . t i l e ( W _ e , t f . p a c k ( [ s e l f . F L A G S . b a t c h _ s i z e , 1 , 1 ] ) ) E = t f . s q u e e z e ( t f . m a t m u l ( P _ n e w , t i l e d _ W _ e ) ) p r i n t ( ' E ' , E ) r e t u r n S , E c l a s s Q A S y s t e m ( o b j e c t ) : d e f _ _ i n i t _ _ ( s e l f , q u e s t i o n _ e n c o d e r , c o n t e x t _ e n c o d e r , d e c o d e r , * a r g s ) : " " " I n i t i a l i z e s y o u r S y s t e m : p a r a m e n c o d e r : a n e n c o d e r t h a t y o u c o n s t r u c t e d i n t r a i n . p y : p a r a m d e c o d e r : a d e c o d e r t h a t y o u c o n s t r u c t e d i n t r a i n . p y : p a r a m a r g s : p a s s i n m o r e a r g u m e n t s a s n e e d e d

Page 16: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

" " " s e l f . F L A G S = t f . a p p . f l a g s . F L A G S s e l f . q u e s t i o n _ e n c o d e r = q u e s t i o n _ e n c o d e r s e l f . c o n t e x t _ e n c o d e r = c o n t e x t _ e n c o d e r s e l f . d e c o d e r = d e c o d e r s e l f . m a x _ q u e s t i o n _ l e n g t h = 6 0 # r a n s c r i p t t o f i g u r e t h i s o u t s e l f . m a x _ c o n t e x t _ l e n g t h = s e l f . F L A G S . o u t p u t _ s i z e s e l f . e m b e d _ p a t h = a r g s [ 0 ] s e l f . q u e s t i o n _ m a s k = a r g s [ 1 ] s e l f . c o n t e x t _ m a s k = a r g s [ 2 ] # = = = = s e t u p p l a c e h o l d e r t o k e n s = = = = = = = = s e l f . l a b e l s _ p l a c e h o l d e r = t f . p l a c e h o l d e r ( t f . i n t 3 2 , s h a p e = [ N o n e , s e l f . m a x _ c o n t e x t _ l e n g t h ] ) s e l f . l a b e l s _ s t a r t _ p l a c e h o l d e r = t f . p l a c e h o l d e r ( t f . i n t 3 2 , s h a p e = [ s e l f . F L A G S . b a t c h _ s i z e ] ) s e l f . l a b e l s _ e n d _ p l a c e h o l d e r = t f . p l a c e h o l d e r ( t f . i n t 3 2 , s h a p e = [ s e l f . F L A G S . b a t c h _ s i z e ] ) s e l f . q u e s t i o n s _ p l a c e h o l d e r = t f . p l a c e h o l d e r ( t f . i n t 3 2 , s h a p e = [ N o n e , s e l f . m a x _ q u e s t i o n _ l e n g t h ] ) s e l f . q u e s t i o n _ m a s k _ p l a c e h o l d e r = t f . p l a c e h o l d e r ( t f . i n t 3 2 , s h a p e = [ N o n e , ] ) s e l f . c o n t e x t _ p l a c e h o l d e r = t f . p l a c e h o l d e r ( t f . i n t 3 2 , s h a p e = [ N o n e , s e l f . m a x _ c o n t e x t _ l e n g t h ] ) s e l f . c o n t e x t _ m a s k _ p l a c e h o l d e r = t f . p l a c e h o l d e r ( t f . i n t 3 2 , s h a p e = [ N o n e , ] ) s e l f . a _ s = N o n e s e l f . a _ e = N o n e s e l f . c o n t e x t W o r d s = N o n e # u s e s i m p l e r f o r m f o r n o w \ \ s e l f . a t t e n t i o n = t f . g e t _ v a r i a b l e ( " A " , s h a p e = [ ] , i n i t i a l i z e r = t f . c o n t r i b . l a y e r s . x a v i e r _ i n i t i a l i z e r ( ) ) # = = = = a s s e m b l e p i e c e s = = = = w i t h t f . v a r i a b l e _ s c o p e ( " q a " , i n i t i a l i z e r = t f . u n i f o r m _ u n i t _ s c a l i n g _ i n i t i a l i z e r ( 1 . 0 ) ) : s e l f . s e t u p _ e m b e d d i n g s ( s e l f . e m b e d _ p a t h ) s e l f . s e t u p _ s y s t e m ( ) s e l f . s e t u p _ l o s s ( ) # = = = = s e t u p t r a i n i n g / u p d a t i n g p r o c e d u r e = = = = o p t f n = g e t _ o p t i m i z e r ( " a d a m " ) o p t i m i z e r = o p t f n ( s e l f . F L A G S . l e a r n i n g _ r a t e ) g r a d s _ a n d _ v a r s = o p t i m i z e r . c o m p u t e _ g r a d i e n t s ( s e l f . l o s s ) g r a d i e n t s , p a r a m e t e r s = z i p ( * g r a d s _ a n d _ v a r s ) g r a d i e n t s = t f . c l i p _ b y _ g l o b a l _ n o r m ( g r a d i e n t s , s e l f . F L A G S . m a x _ g r a d i e n t _ n o r m ) [ 0 ] s e l f . g r a d _ n o r m = t f . g l o b a l _ n o r m ( g r a d i e n t s ) n e w _ g r a d s _ a n d _ v a r s = z i p ( g r a d i e n t s , p a r a m e t e r s ) s e l f . t r a i n _ o p = o p t i m i z e r . a p p l y _ g r a d i e n t s ( n e w _ g r a d s _ a n d _ v a r s ) p a s s d e f s e t u p _ s y s t e m ( s e l f ) : " " " A f t e r y o u r m o d u l a r i z e d i m p l e m e n t a t i o n o f e n c o d e r a n d d e c o d e r y o u s h o u l d c a l l v a r i o u s f u n c t i o n s i n s i d e e n c o d e r , d e c o d e r h e r e t o a s s e m b l e y o u r r e a d i n g c o m p r e h e n s i o n s y s t e m ! : r e t u r n : " " "

Page 17: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

q u e s t i o n _ r e p , q u e s t i o n _ s t a t e s = s e l f . q u e s t i o n _ e n c o d e r . e n c o d e ( s e l f . q u e s t i o n _ e m b e d d i n g s , s e l f . q u e s t i o n _ m a s k _ p l a c e h o l d e r , " e n c o d e _ q u e s t i o n " ) p r i n t ( ' q u e s t i o n r e p ' , q u e s t i o n _ r e p ) # 3 2 b y 4 0 0 c o n t e x t _ r e p , c o n t e x t _ s t a t e s = s e l f . c o n t e x t _ e n c o d e r . e n c o d e ( s e l f . c o n t e x t _ e m b e d d i n g s , s e l f . c o n t e x t _ m a s k _ p l a c e h o l d e r , " e n c o d e _ c o n t e x t " ) p r i n t ( ' c o n t e x t s t a t e s ' , c o n t e x t _ s t a t e s ) # p r i n t ( ' c o n t e x t ' , c o n t e x t _ r e p ) # # c o n t e x t r e p i s 3 2 x 4 0 0 # p r i n t ( ' q ' , q u e s t i o n _ r e p ) # r e s h a p e d _ q u e s t i o n _ r e p = t f . r e s h a p e ( q u e s t i o n _ r e p , s h a p e = [ 2 , - 1 , 1 , s e l f . F L A G S . s t a t e _ s i z e ] ) # # p r i n t ( ' r e s h a p e d q u e s t i o n ' , r e s h a p e d _ q u e s t i o n _ r e p ) # a t t e n t i o n _ v e c = t f . n n . s o f t m a x ( t f . r e d u c e _ s u m ( t f . m u l t i p l y ( c o n t e x t _ r e p , q u e s t i o n _ r e p ) , a x i s = [ 3 ] ) ) # a n s w e r _ v e c s = t f . m u l t i p l y ( c o n t e x t _ s t a t e s , t f . r e s h a p e ( a t t e n t i o n _ v e c , s h a p e = [ 2 , - 1 , s e l f . m a x _ c o n t e x t _ l e n g t h , 1 ] ) ) # # a n s w e r _ v e c s = t f . a d d _ n ( c o n t e x t _ s t a t e s , t f . r e s h a p e ( a t t e n t i o n _ v e c , s h a p e = [ 2 , - 1 , s e l f . m a x _ c o n t e x t _ l e n g t h , 1 ] ) ) # s e l f . p r o b _ d i s t r i b u t i o n = s e l f . d e c o d e r . d e c o d e ( t f . r e s h a p e ( a n s w e r _ v e c s [ 1 , : , : , : ] , s h a p e = [ - 1 , s e l f . m a x _ c o n t e x t _ l e n g t h , s e l f . F L A G S . s t a t e _ s i z e ] ) ) p r i n t ( ' c o n t e x t s t a t e s ' , c o n t e x t _ s t a t e s ) p r i n t ( ' c o n t e x t ' , c o n t e x t _ r e p ) # c o n t e x t r e p i s 3 2 x 4 0 0 p r i n t ( ' q ' , q u e s t i o n _ r e p ) b o u n d a r y _ m o d e l = T r u e i f b o u n d a r y _ m o d e l : # p r i n t ( ' r e s h a p e d q u e s t i o n ' , r e s h a p e d _ q u e s t i o n _ r e p ) # a t t e n t i o n _ v e c = t f . n n . s o f t m a x ( t f . r e d u c e _ s u m ( t f . m u l t i p l y ( c o n t e x t _ r e p , q u e s t i o n _ r e p ) , a x i s = [ 3 ] ) ) # a n s w e r _ v e c s = t f . m u l t i p l y ( c o n t e x t _ s t a t e s , t f . r e s h a p e ( a t t e n t i o n _ v e c , s h a p e = [ 2 , - 1 , s e l f . m a x _ c o n t e x t _ l e n g t h , 1 ] ) ) # a n s w e r _ v e c s = t f . a d d _ n ( c o n t e x t _ s t a t e s , t f . r e s h a p e ( a t t e n t i o n _ v e c , s h a p e = [ 2 , - 1 , s e l f . m a x _ c o n t e x t _ l e n g t h , 1 ] ) ) # s e l f . p r o b _ d i s t r i b u t i o n = s e l f . d e c o d e r . d e c o d e ( t f . r e s h a p e ( a n s w e r _ v e c s [ 1 , : , : , : ] , s h a p e = [ - 1 , s e l f . m a x _ c o n t e x t _ l e n g t h , s e l f . F L A G S . s t a t e _ s i z e ] ) ) # r e s h a p e d _ q u e s t i o n _ r e p = t f . r e s h a p e ( q u e s t i o n _ r e p , s h a p e = [ 2 , - 1 , 1 , s e l f . F L A G S . s t a t e _ s i z e ] ) # s e l f . a _ s , s e l f . a _ e = s e l f . d e c o d e r . d e c o d e ( [ q u e s t i o n _ r e p . h , c o n t e x t _ r e p ] ) s e l f . a _ s , s e l f . a _ e = s e l f . d e c o d e r . d e c o d e ( [ q u e s t i o n _ r e p , c o n t e x t _ s t a t e s ] ) e l s e : Q = q u e s t i o n _ s t a t e s D = c o n t e x t _ s t a t e s L = t f . m a t m u l ( D , t f . t r a n s p o s e ( Q , [ 0 , 2 , 1 ] ) ) A D = t f . n n . s o f t m a x ( L ) A Q = t f . n n . s o f t m a x ( t f . t r a n s p o s e ( L , [ 0 , 2 , 1 ] ) ) C Q = t f . m a t m u l ( A Q , D ) p r i n t ( ' Q : ' , Q ) p r i n t ( ' C Q : ' , C Q ) p r i n t ( ' A D : ' , A D ) C D = t f . m a t m u l ( A D , t f . c o n c a t ( 2 , [ Q , C Q ] ) )

Page 18: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

h _ s i z e = s e l f . F L A G S . s t a t e _ s i z e W = t f . g e t _ v a r i a b l e ( " W " , s h a p e = [ 1 , h _ s i z e * 6 , h _ s i z e ] , i n i t i a l i z e r = t f . c o n t r i b . l a y e r s . x a v i e r _ i n i t i a l i z e r ( ) , d t y p e = t f . f l o a t 6 4 ) b = t f . g e t _ v a r i a b l e ( " b " , s h a p e = [ 1 , s e l f . m a x _ c o n t e x t _ l e n g t h , h _ s i z e ] , i n i t i a l i z e r = t f . c o n t r i b . l a y e r s . x a v i e r _ i n i t i a l i z e r ( ) , d t y p e = t f . f l o a t 6 4 ) p r i n t ( ' C D : ' , C D ) p r i n t ( ' D : ' , D ) t i l e d _ W = t f . t i l e ( W , t f . p a c k ( [ s e l f . F L A G S . b a t c h _ s i z e , 1 , 1 ] ) ) p r i n t ( ' t i l e d _ W ' , t i l e d _ W ) D = t f . m a t m u l ( t f . c o n c a t ( 2 , [ C D , D ] ) , t i l e d _ W ) + b p r i n t ( ' n e w D : ' , D ) s e l f . a _ s , s e l f . a _ e = s e l f . d e c o d e r . d e c o d e ( D ) d e f s e t u p _ l o s s ( s e l f ) : " " " S e t u p y o u r l o s s c o m p u t a t i o n h e r e : r e t u r n : " " " w i t h v s . v a r i a b l e _ s c o p e ( " l o s s " ) : l 1 = s s c e ( l a b e l s = s e l f . l a b e l s _ s t a r t _ p l a c e h o l d e r , l o g i t s = s e l f . a _ s ) l 2 = s s c e ( l a b e l s = s e l f . l a b e l s _ e n d _ p l a c e h o l d e r , l o g i t s = s e l f . a _ e ) s e l f . l o s s = l 1 + l 2 s e l f . s a v e r = t f . t r a i n . S a v e r ( ) d e f s e t u p _ e m b e d d i n g s ( s e l f , e m b e d _ p a t h ) : " " " L o a d s d i s t r i b u t e d w o r d r e p r e s e n t a t i o n s b a s e d o n p l a c e h o l d e r t o k e n s : r e t u r n : " " " w i t h v s . v a r i a b l e _ s c o p e ( " e m b e d d i n g s " ) : e m b e d d i n g _ d i c t = n p . l o a d ( e m b e d _ p a t h ) p r e t r a i n e d _ e m b e d d i n g = t f . c o n s t a n t ( e m b e d d i n g _ d i c t [ ' g l o v e ' ] ) p r i n t ( s e l f . q u e s t i o n s _ p l a c e h o l d e r ) s e l f . q u e s t i o n _ e m b e d d i n g s = t f . r e s h a p e ( t f . n n . e m b e d d i n g _ l o o k u p ( p r e t r a i n e d _ e m b e d d i n g , s e l f . q u e s t i o n s _ p l a c e h o l d e r ) , s h a p e = [ - 1 , s e l f . m a x _ q u e s t i o n _ l e n g t h , s e l f . F L A G S . e m b e d d i n g _ s i z e ] ) s e l f . c o n t e x t _ e m b e d d i n g s = t f . r e s h a p e ( t f . n n . e m b e d d i n g _ l o o k u p ( p r e t r a i n e d _ e m b e d d i n g , s e l f . c o n t e x t _ p l a c e h o l d e r ) , s h a p e = [ - 1 , s e l f . m a x _ c o n t e x t _ l e n g t h , s e l f . F L A G S . e m b e d d i n g _ s i z e ] ) p r i n t ( s e l f . q u e s t i o n _ e m b e d d i n g s ) d e f o p t i m i z e ( s e l f , s e s s i o n , t r a i n _ x , t r a i n _ y 1 , t r a i n _ y 2 ) : " " " T a k e s i n a c t u a l d a t a t o o p t i m i z e y o u r m o d e l T h i s m e t h o d i s e q u i v a l e n t t o a s t e p ( ) f u n c t i o n : r e t u r n : " " " # p r i n t ( ' t r a i n y 1 ' , t r a i n _ y 1 ) # p r i n t ( ' t r a i n y 2 ' , t r a i n _ y 2 ) i n p u t _ f e e d = { } t r a i n _ q u e s t i o n s , t r a i n _ c o n t e x t , t r a i n _ q _ m a s k , t r a i n _ c _ m a s k = t r a i n _ x i n p u t _ f e e d [ s e l f . q u e s t i o n s _ p l a c e h o l d e r ] = t r a i n _ q u e s t i o n s

Page 19: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

i n p u t _ f e e d [ s e l f . c o n t e x t _ p l a c e h o l d e r ] = t r a i n _ c o n t e x t i n p u t _ f e e d [ s e l f . q u e s t i o n _ m a s k _ p l a c e h o l d e r ] = t r a i n _ q _ m a s k i n p u t _ f e e d [ s e l f . c o n t e x t _ m a s k _ p l a c e h o l d e r ] = t r a i n _ c _ m a s k i n p u t _ f e e d [ s e l f . l a b e l s _ s t a r t _ p l a c e h o l d e r ] = t r a i n _ y 1 i n p u t _ f e e d [ s e l f . l a b e l s _ e n d _ p l a c e h o l d e r ] = t r a i n _ y 2 o u t p u t _ f e e d = [ s e l f . t r a i n _ o p , s e l f . l o s s ] o u t p u t s = s e s s i o n . r u n ( o u t p u t _ f e e d , i n p u t _ f e e d ) # r e t u r n s a l l t h r e e o u t p u t f e e d v a l u e s p r i n t ( ' l o s s : ' , n p . m e a n ( o u t p u t s [ 1 ] ) ) r e t u r n o u t p u t s [ 0 ] d e f d e c o d e ( s e l f , s e s s i o n , t e s t _ x ) : " " " R e t u r n s t h e p r o b a b i l i t y d i s t r i b u t i o n o v e r d i f f e r e n t p o s i t i o n s i n t h e p a r a g r a p h s o t h a t o t h e r m e t h o d s l i k e s e l f . a n s w e r ( ) w i l l b e a b l e t o w o r k p r o p e r l y : r e t u r n : " " " i n p u t _ f e e d = { } q u e s t i o n , c o n t e x t , q u e s t i o n _ m a s k , c o n t e x t _ m a s k , l a b e l s _ s t a r t , l a b e l s _ e n d , c o n t e x t W o r d , t r a i n A n s w e r , c o n t e x t P a r a g r a p h = z i p ( * t e s t _ x ) i n p u t _ f e e d [ s e l f . q u e s t i o n s _ p l a c e h o l d e r ] = q u e s t i o n i n p u t _ f e e d [ s e l f . c o n t e x t _ p l a c e h o l d e r ] = c o n t e x t i n p u t _ f e e d [ s e l f . q u e s t i o n _ m a s k _ p l a c e h o l d e r ] = q u e s t i o n _ m a s k i n p u t _ f e e d [ s e l f . c o n t e x t _ m a s k _ p l a c e h o l d e r ] = c o n t e x t _ m a s k i n p u t _ f e e d [ s e l f . l a b e l s _ s t a r t _ p l a c e h o l d e r ] = l a b e l s _ s t a r t i n p u t _ f e e d [ s e l f . l a b e l s _ e n d _ p l a c e h o l d e r ] = l a b e l s _ e n d o u t p u t _ f e e d = [ s e l f . a _ s , s e l f . a _ e ] o u t p u t s = s e s s i o n . r u n ( o u t p u t _ f e e d , i n p u t _ f e e d ) r e t u r n o u t p u t s d e f a n s w e r ( s e l f , s e s s i o n , t e s t _ x ) : y p , y p 2 = s e l f . d e c o d e ( s e s s i o n , t e s t _ x ) a _ s = n p . a r g m a x ( y p , a x i s = 1 ) a _ e = n p . a r g m a x ( y p 2 , a x i s = 1 ) r e t u r n ( a _ s , a _ e ) d e f s t a r t V a l i d a t e ( s e l f , s e s s i o n , v a l _ d a t a s e t ) : s a m p l e = 3 2 # c u r r e n t l y o n l y l o o k i n g a t 3 2 e x a m p l e s , c o u l d b e l o o k i n g a t e n t i r e d a t a s e t v a l _ q u e s t i o n s , v a l _ c o n t e x t , v a l _ q u e s t i o n _ m a s k , v a l _ c o n t e x t _ m a s k , v a l _ l a b e l s _ s t a r t , v a l _ l a b e l s _ e n d , v a l _ c o n t e x t W o r d s , v a l A n s w e r s , v a l P a r a g r a p h s = v a l _ d a t a s e t v a l _ t h i n g s = z i p ( v a l _ q u e s t i o n s , v a l _ c o n t e x t , v a l _ q u e s t i o n _ m a s k , v a l _ c o n t e x t _ m a s k , v a l _ l a b e l s _ s t a r t , v a l _ l a b e l s _ e n d ) i n d i c e s = [ r a n d o m . r a n d i n t ( 0 , l e n ( v a l _ q u e s t i o n s ) - 1 ) f o r i i n r a n g e ( s a m p l e ) ] s a m p l e D a t a = [ v a l _ t h i n g s [ i ] f o r i i n i n d i c e s ] # c a l c u l a t e t h e v a l i d a t i o n c o s t f o r t h e v a l _ d a t a s e t # d o w e d o t h i s t o 1 0 0 s a m p l e s o r t h e w h o l e v a l i d a t i o n d a t a s e t 3

Page 20: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

v a l _ c o s t = 0 . 0 v a l _ c o s t = s e l f . v a l i d a t e ( s e s s i o n , s a m p l e D a t a ) p r i n t ( ' t h i s i s v a l _ c o s t ' , v a l _ c o s t ) r e t u r n v a l _ c o s t d e f v a l i d a t e ( s e l f , s e s s , v a l _ t h i n g s ) : " " " I t e r a t e t h r o u g h t h e v a l i d a t i o n d a t a s e t a n d d e t e r m i n e w h a t t h e v a l i d a t i o n c o s t i s . T h i s m e t h o d c a l l s s e l f . t e s t ( ) w h i c h e x p l i c i t l y c a l c u l a t e s v a l i d a t i o n c o s t . H o w y o u i m p l e m e n t t h i s f u n c t i o n i s d e p e n d e n t o n h o w y o u d e s i g n y o u r d a t a i t e r a t i o n f u n c t i o n : r e t u r n : " " " v a l i d _ c o s t = 0 v a l i d _ c o s t = s e l f . t e s t ( s e s s , v a l _ t h i n g s ) v a l i d _ c o s t = ( n p . s u m ( v a l i d _ c o s t ) * 1 . 0 ) / l e n ( v a l _ t h i n g s ) r e t u r n v a l i d _ c o s t d e f t e s t ( s e l f , s e s s i o n , v a l _ t h i n g s ) : " " " i n h e r e y o u s h o u l d c o m p u t e a c o s t f o r y o u r v a l i d a t i o n s e t a n d t u n e y o u r h y p e r p a r a m e t e r s a c c o r d i n g t o t h e v a l i d a t i o n s e t p e r f o r m a n c e : r e t u r n : " " " v a l _ q u e s t i o n s , v a l _ c o n t e x t , v a l _ q u e s t i o n _ m a s k , v a l _ c o n t e x t _ m a s k , v a l _ l a b e l s _ s t a r t , v a l _ l a b e l s _ e n d = z i p ( * v a l _ t h i n g s ) i n p u t _ f e e d = { } i n p u t _ f e e d [ s e l f . q u e s t i o n s _ p l a c e h o l d e r ] = v a l _ q u e s t i o n s i n p u t _ f e e d [ s e l f . c o n t e x t _ p l a c e h o l d e r ] = v a l _ c o n t e x t i n p u t _ f e e d [ s e l f . q u e s t i o n _ m a s k _ p l a c e h o l d e r ] = v a l _ q u e s t i o n _ m a s k i n p u t _ f e e d [ s e l f . c o n t e x t _ m a s k _ p l a c e h o l d e r ] = v a l _ c o n t e x t _ m a s k i n p u t _ f e e d [ s e l f . l a b e l s _ s t a r t _ p l a c e h o l d e r ] = v a l _ l a b e l s _ s t a r t i n p u t _ f e e d [ s e l f . l a b e l s _ e n d _ p l a c e h o l d e r ] = v a l _ l a b e l s _ e n d o u t p u t _ f e e d = [ s e l f . l o s s ] o u t p u t s = s e s s i o n . r u n ( o u t p u t _ f e e d , i n p u t _ f e e d ) r e t u r n o u t p u t s d e f c h u n k s ( s e l f , l , n ) : " " " Y i e l d s u c c e s s i v e n - s i z e d c h u n k s f r o m l . " " " f o r i i n r a n g e ( 0 , l e n ( l ) , n ) : y i e l d l [ i : i + n ] d e f c l i p ( s e l f , l , s i z e ) : " " " C l i p s l i s t t o e l i m i n a t e e x c e s s d a t a " " " i f l e n ( l [ - 1 ] ) ! = s i z e : r e t u r n l [ : - 1 ] e l s e : r e t u r n l d e f a u g m e n t ( s e l f , l , s i z e ) : i f l e n ( l [ - 1 ] ) ! = s i z e : a u g m e n t e d _ l a s t _ b a t c h = [ l [ - 1 ] [ i ] i f i < l e n ( l [ - 1 ] ) e l s e N o n e f o r i i n r a n g e ( s i z e ) ] l . p o p ( - 1 ) l . a p p e n d ( a u g m e n t e d _ l a s t _ b a t c h )

Page 21: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

r e t u r n l d e f p r e d i c t _ a n s w e r s ( s e l f , s e s s i o n , d a t a s e t ) : q u e s t i o n s , c o n t e x t , q u e s t i o n _ m a s k , c o n t e x t _ m a s k = d a t a s e t z i p p e d D a t a = s e l f . c h u n k s ( z i p ( q u e s t i o n s , c o n t e x t , q u e s t i o n _ m a s k , c o n t e x t _ m a s k ) , s e l f . F L A G S . b a t c h _ s i z e ) z i p p e d D a t a = s e l f . c l i p ( [ i t e m f o r i t e m i n z i p p e d D a t a ] , s e l f . F L A G S . b a t c h _ s i z e ) a _ s = [ ] a _ e = [ ] f o r c h u n k i n z i p p e d D a t a : i n p u t _ f e e d = { } q u e s t i o n , c o n t e x t , q u e s t i o n _ m a s k , c o n t e x t _ m a s k = z i p ( * c h u n k ) i n p u t _ f e e d [ s e l f . q u e s t i o n s _ p l a c e h o l d e r ] = q u e s t i o n i n p u t _ f e e d [ s e l f . c o n t e x t _ p l a c e h o l d e r ] = c o n t e x t i n p u t _ f e e d [ s e l f . q u e s t i o n _ m a s k _ p l a c e h o l d e r ] = q u e s t i o n _ m a s k i n p u t _ f e e d [ s e l f . c o n t e x t _ m a s k _ p l a c e h o l d e r ] = c o n t e x t _ m a s k o u t p u t _ f e e d = [ s e l f . a _ s , s e l f . a _ e ] a _ s _ c h u n k , a _ e _ c h u n k = s e s s i o n . r u n ( o u t p u t _ f e e d , i n p u t _ f e e d ) a _ s _ c h u n k = n p . a r g m a x ( a _ s _ c h u n k , a x i s = 1 ) a _ e _ c h u n k = n p . a r g m a x ( a _ e _ c h u n k , a x i s = 1 ) a _ s _ c h u n k = a _ s _ c h u n k . t o l i s t ( ) a _ e _ c h u n k = a _ e _ c h u n k . t o l i s t ( ) f o r i i n r a n g e ( s e l f . F L A G S . b a t c h _ s i z e ) : a _ s . a p p e n d ( a _ s _ c h u n k [ i ] ) a _ e . a p p e n d ( a _ e _ c h u n k [ i ] ) # p r i n t ( ' a _ s ' , a _ s ) # p r i n t ( ' a _ e ' , a _ e ) r e t u r n a _ s , a _ e d e f e v a l u a t e _ a n s w e r ( s e l f , s e s s i o n , d a t a s e t , s a m p l e = 3 2 , l o g = F a l s e ) : # n e e d t o g e t t h i s t o w o r k w h e n s a m p l e s i z e i s n o t b a t c h s i z e " " " E v a l u a t e t h e m o d e l ' s p e r f o r m a n c e u s i n g t h e h a r m o n i c m e a n o f F 1 a n d E x a c t M a t c h ( E M ) w i t h t h e s e t o f t r u e a n s w e r l a b e l s T h i s s t e p a c t u a l l y t a k e s q u i t e s o m e t i m e . S o w e c a n o n l y s a m p l e 1 0 0 e x a m p l e s f r o m e i t h e r t r a i n i n g o r t e s t i n g s e t . : p a r a m s e s s i o n : s e s s i o n s h o u l d a l w a y s b e c e n t r a l l y m a n a g e d i n t r a i n . p y : p a r a m d a t a s e t : a r e p r e s e n t a t i o n o f o u r d a t a , i n s o m e i m p l e m e n t a t i o n s , y o u c a n p a s s i n m u l t i p l e c o m p o n e n t s ( a r g u m e n t s ) o f o n e d a t a s e t t o t h i s f u n c t i o n : p a r a m s a m p l e : h o w m a n y e x a m p l e s i n d a t a s e t w e l o o k a t : p a r a m l o g : w h e t h e r w e p r i n t t o s t d o u t s t r e a m : r e t u r n : " " " s a m p l e = 2 0 0 q u e s t i o n s , c o n t e x t , q u e s t i o n _ m a s k , c o n t e x t _ m a s k , l a b e l s _ s t a r t , l a b e l s _ e n d , c o n t e x t W o r d s , t r a i n A n s w e r s , c o n t e x t P a r a g r a p h s = d a t a s e t i n d i c e s = [ r a n d o m . r a n d i n t ( 0 , l e n ( q u e s t i o n s ) - 1 ) f o r i i n r a n g e ( s a m p l e ) ] z i p p e d D a t a = z i p ( q u e s t i o n s , c o n t e x t , q u e s t i o n _ m a s k , c o n t e x t _ m a s k , l a b e l s _ s t a r t , l a b e l s _ e n d , c o n t e x t W o r d s , t r a i n A n s w e r s , c o n t e x t P a r a g r a p h s ) # s a m p l e D a t a = [ z i p p e d D a t a [ i ] f o r i i n i n d i c e s ] z i p p e d D a t a = [ z i p p e d D a t a [ i ] f o r i i n i n d i c e s ] # [ : 3 2 ]

Page 22: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

f 1 = 0 e m = 0 # p r i n t ( ' z i p p e d D a t a ' , z i p p e d D a t a ) z i p p e d D a t a = s e l f . c h u n k s ( z i p p e d D a t a , s e l f . F L A G S . b a t c h _ s i z e ) z i p p e d D a t a = [ i t e m f o r i t e m i n z i p p e d D a t a ] z i p p e d D a t a = s e l f . c l i p ( z i p p e d D a t a , s e l f . F L A G S . b a t c h _ s i z e ) a _ s = [ ] a _ e = [ ] # p r i n t ( ' z i p p e d l e n g t h s ' , [ l e n ( x ) f o r x i n z i p p e d D a t a ] ) f o r c h u n k i n z i p p e d D a t a : a _ s _ c h u n k , a _ e _ c h u n k = s e l f . a n s w e r ( s e s s i o n , c h u n k ) a _ s _ c h u n k = a _ s _ c h u n k . t o l i s t ( ) a _ e _ c h u n k = a _ e _ c h u n k . t o l i s t ( ) a _ s . a p p e n d ( a _ s _ c h u n k ) a _ e . a p p e n d ( a _ e _ c h u n k ) # p r i n t ( ' a _ s l e n s ' , [ l e n ( x ) f o r x i n a _ s ] ) a _ s = s e l f . c l i p ( a _ s , s e l f . F L A G S . b a t c h _ s i z e ) a _ e = s e l f . c l i p ( a _ e , s e l f . F L A G S . b a t c h _ s i z e ) # p r i n t ( ' a _ s l e n s ' , [ l e n ( x ) f o r x i n a _ s ] ) i n d i c e s = s e l f . c h u n k s ( i n d i c e s , s e l f . F L A G S . b a t c h _ s i z e ) i n d i c e s = [ i t e m f o r i t e m i n i n d i c e s ] i n d i c e s = s e l f . c l i p ( i n d i c e s , s e l f . F L A G S . b a t c h _ s i z e ) c o n t e x t W o r d s C o u n t e r = 0 # f o r i i n r a n g e ( l e n ( z i p p e d D a t a ) ) : f o r i , k i n e n u m e r a t e ( i n d i c e s ) : f o r j , i n d e x i n e n u m e r a t e ( k ) : c u r r e n t _ z i p p e d D a t a _ b u c k e t = z i p p e d D a t a [ i ] # p r i n t ( ' c z b ' , c u r r e n t _ z i p p e d D a t a _ b u c k e t ) c u r r e n t _ a _ s _ b u c k e t = a _ s [ i ] c u r r e n t _ a _ e _ b u c k e t = a _ e [ i ] # f o r j i n r a n g e ( l e n ( c u r r e n t _ z i p p e d D a t a _ b u c k e t ) ) : p = c o n t e x t W o r d s [ i n d e x ] p = [ i t e m [ 0 ] f o r i t e m i n p ] s t a r t A n s w e r I n d e x = c u r r e n t _ a _ s _ b u c k e t [ j ] e n d A n s w e r I n d e x = c u r r e n t _ a _ e _ b u c k e t [ j ] a n s w e r = ( p [ s t a r t A n s w e r I n d e x : e n d A n s w e r I n d e x + 1 ] ) a n s w e r = " " . j o i n ( s t r ( x ) f o r x i n a n s w e r ) t r u e A n s w e r = t r a i n A n s w e r s [ i n d e x ] p r i n t ( ' t r u e a n s w e r : ' , t r u e A n s w e r ) p r i n t ( ' p r e d i c t e d a n s w e r : ' , a n s w e r ) f 1 + = f 1 _ s c o r e ( a n s w e r , t r u e A n s w e r ) e m + = e x a c t _ m a t c h _ s c o r e ( a n s w e r , t r u e A n s w e r ) c o n t e x t W o r d s C o u n t e r + = 1 l e n g t h = s u m ( [ l e n ( i ) f o r i i n i n d i c e s ] ) f 1 = ( f 1 * 1 . 0 ) / ( l e n g t h ) e m = ( e m * 1 . 0 ) / ( l e n g t h ) p r i n t ( ' f 1 ' , f 1 ) p r i n t ( ' e m ' , e m ) r e t u r n f 1 , e m d e f t r a i n ( s e l f , s e s s i o n , d a t a s e t , t r a i n _ d i r , v a l _ d a t a s e t ) : " " "

Page 23: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

I m p l e m e n t m a i n t r a i n i n g l o o p T I P S : Y o u s h o u l d a l s o i m p l e m e n t l e a r n i n g r a t e a n n e a l i n g ( l o o k i n t o t f . t r a i n . e x p o n e n t i a l _ d e c a y ) C o n s i d e r i n g t h e l o n g t i m e t o t r a i n , y o u s h o u l d s a v e y o u r m o d e l p e r e p o c h . M o r e a m b i t i o u s a p p o a r c h c a n i n c l u d e i m p l e m e n t e a r l y s t o p p i n g , o r r e l o a d p r e v i o u s m o d e l s i f t h e y h a v e h i g h e r p e r f o r m a n c e t h a n t h e c u r r e n t o n e A s s u g g e s t e d i n t h e d o c u m e n t , y o u s h o u l d e v a l u a t e y o u r t r a i n i n g p r o g r e s s b y p r i n t i n g o u t i n f o r m a t i o n e v e r y f i x e d n u m b e r o f i t e r a t i o n s . W e r e c o m m e n d y o u e v a l u a t e y o u r m o d e l p e r f o r m a n c e o n F 1 a n d E M i n s t e a d o f j u s t l o o k i n g a t t h e c o s t . : p a r a m s e s s i o n : i t s h o u l d b e p a s s e d i n f r o m t r a i n . p y : p a r a m d a t a s e t : a r e p r e s e n t a t i o n o f o u r d a t a , i n s o m e i m p l e m e n t a t i o n s , y o u c a n p a s s i n m u l t i p l e c o m p o n e n t s ( a r g u m e n t s ) o f o n e d a t a s e t t o t h i s f u n c t i o n : p a r a m t r a i n _ d i r : p a t h t o t h e d i r e c t o r y w h e r e y o u s h o u l d s a v e t h e m o d e l c h e c k p o i n t : r e t u r n : " " " # s o m e f r e e c o d e t o p r i n t o u t n u m b e r o f p a r a m e t e r s i n y o u r m o d e l # i t ' s a l w a y s g o o d t o c h e c k ! # y o u w i l l a l s o w a n t t o s a v e y o u r m o d e l p a r a m e t e r s i n t r a i n _ d i r # s o t h a t y o u c a n u s e y o u r t r a i n e d m o d e l t o m a k e p r e d i c t i o n s , o r # e v e n c o n t i n u e t r a i n i n g d e f g e t _ m i n i b a t c h e s ( d a t a , m i n i b a t c h _ s i z e , s h u f f l e = T r u e ) : " " " I t e r a t e s t h r o u g h t h e p r o v i d e d d a t a o n e m i n i b a t c h a t a t t i m e . Y o u c a n u s e t h i s f u n c t i o n t o i t e r a t e t h r o u g h d a t a i n m i n i b a t c h e s a s f o l l o w s : f o r i n p u t s _ m i n i b a t c h i n g e t _ m i n i b a t c h e s ( i n p u t s , m i n i b a t c h _ s i z e ) : . . . O r w i t h m u l t i p l e d a t a s o u r c e s : f o r i n p u t s _ m i n i b a t c h , l a b e l s _ m i n i b a t c h i n g e t _ m i n i b a t c h e s ( [ i n p u t s , l a b e l s ] , m i n i b a t c h _ s i z e ) : . . . A r g s : d a t a : t h e r e a r e t w o p o s s i b l e v a l u e s : - a l i s t o r n u m p y a r r a y - a l i s t w h e r e e a c h e l e m e n t i s e i t h e r a l i s t o r n u m p y a r r a y m i n i b a t c h _ s i z e : t h e m a x i m u m n u m b e r o f i t e m s i n a m i n i b a t c h s h u f f l e : w h e t h e r t o r a n d o m i z e t h e o r d e r o f r e t u r n e d d a t a R e t u r n s : m i n i b a t c h e s : t h e r e t u r n v a l u e d e p e n d s o n d a t a : - I f d a t a i s a l i s t / a r r a y i t y i e l d s t h e n e x t m i n i b a t c h o f d a t a .

Page 24: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

- I f d a t a a l i s t o f l i s t s / a r r a y s i t r e t u r n s t h e n e x t m i n i b a t c h o f e a c h e l e m e n t i n t h e l i s t . T h i s c a n b e u s e d t o i t e r a t e t h r o u g h m u l t i p l e d a t a s o u r c e s ( e . g . , f e a t u r e s a n d l a b e l s ) a t t h e s a m e t i m e . " " " l i s t _ d a t a = t y p e ( d a t a ) i s l i s t a n d ( t y p e ( d a t a [ 0 ] ) i s l i s t o r t y p e ( d a t a [ 0 ] ) i s n p . n d a r r a y ) d a t a _ s i z e = l e n ( d a t a [ 0 ] ) i f l i s t _ d a t a e l s e l e n ( d a t a ) i n d i c e s = n p . a r a n g e ( d a t a _ s i z e ) i f s h u f f l e : n p . r a n d o m . s h u f f l e ( i n d i c e s ) f o r m i n i b a t c h _ s t a r t i n n p . a r a n g e ( 0 , d a t a _ s i z e , m i n i b a t c h _ s i z e ) : m i n i b a t c h _ i n d i c e s = i n d i c e s [ m i n i b a t c h _ s t a r t : m i n i b a t c h _ s t a r t + m i n i b a t c h _ s i z e ] y i e l d [ m i n i b a t c h ( d , m i n i b a t c h _ i n d i c e s ) f o r d i n d a t a ] i f l i s t _ d a t a \ e l s e m i n i b a t c h ( d a t a , m i n i b a t c h _ i n d i c e s ) d e f m i n i b a t c h ( d a t a , m i n i b a t c h _ i d x ) : i f t y p e ( d a t a ) i s n p . n d a r r a y : r e t u r n d a t a [ m i n i b a t c h _ i d x ] e l s e : r e s u l t = [ ] f o r i i n m i n i b a t c h _ i d x : r e s u l t . a p p e n d ( d a t a [ i ] ) r e t u r n r e s u l t t i c = t i m e . t i m e ( ) p a r a m s = t f . t r a i n a b l e _ v a r i a b l e s ( ) n u m _ p a r a m s = s u m ( m a p ( l a m b d a t : n p . p r o d ( t f . s h a p e ( t . v a l u e ( ) ) . e v a l ( ) ) , p a r a m s ) ) t o c = t i m e . t i m e ( ) l o g g i n g . i n f o ( " N u m b e r o f p a r a m s : % d ( r e t r e i v a l t o o k % f s e c s ) " % ( n u m _ p a r a m s , t o c - t i c ) ) q u e s t i o n s , c o n t e x t , q u e s t i o n _ m a s k , c o n t e x t _ m a s k , l a b e l s _ s t a r t , l a b e l s _ e n d , c o n t e x t W o r d s , t r a i n A n s w e r s , c o n t e x t P a r a g r a p h s = d a t a s e t s e l f . c o n t e x t W o r d s = c o n t e x t W o r d s f o r e p o c h i n r a n g e ( s e l f . F L A G S . e p o c h s ) : p r i n t ( " E p o c h % d o u t o f % d " % ( e p o c h + 1 , s e l f . F L A G S . e p o c h s ) ) i = 1 p r i n t ( s e l f . F L A G S . b a t c h _ s i z e ) f o r q _ m b , c _ m b , q _ m _ m b , c _ m _ m b , l a b e l s _ s t a r t _ m i n i b a t c h , l a b e l s _ e n d _ m i n i b a t c h i n g e t _ m i n i b a t c h e s ( [ q u e s t i o n s , c o n t e x t , q u e s t i o n _ m a s k , c o n t e x t _ m a s k , l a b e l s _ s t a r t , l a b e l s _ e n d ] , s e l f . F L A G S . b a t c h _ s i z e ) : i f ( l e n ( c o n t e x t ) - ( i - 1 ) * ( s e l f . F L A G S . b a t c h _ s i z e ) ) > = s e l f . F L A G S . b a t c h _ s i z e : i f i % 1 0 = = 0 : p r i n t ( ' M i n i b a t c h n u m b e r : ' , i ) i + = 1 i n p u t s _ m i n i b a t c h = ( q _ m b , c _ m b , q _ m _ m b , c _ m _ m b ) # p r i n t ( i n p u t s _ m i n i b a t c h ) s e l f . o p t i m i z e ( s e s s i o n , i n p u t s _ m i n i b a t c h , l a b e l s _ s t a r t _ m i n i b a t c h , l a b e l s _ e n d _ m i n i b a t c h ) i f i % 5 0 0 = = 0 : r e s u l t s _ p a t h = " / { : % Y % m % d _ % H % M % S } / " . f o r m a t ( d a t e t i m e . d a t e t i m e . n o w ( ) ) # r e s u l t s _ p a t h = " / n e w F i l e " # m o d e l _ p a t h = r e s u l t s _ p a t h + " m o d e l . w e i g h t s / "

Page 25: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

# g l o b a l _ t r a i n _ d i r = " / t m p / c s 2 2 4 n - s q u a d - t r a i n " m o d e l _ p a t h = t r a i n _ d i r + r e s u l t s _ p a t h p r i n t ( " t r a i n _ d i r " , t r a i n _ d i r ) i f n o t o s . p a t h . e x i s t s ( m o d e l _ p a t h ) : o s . m a k e d i r s ( m o d e l _ p a t h ) l o g g i n g . g e t L o g g e r ( ) . i n f o ( " N e w b e s t s c o r e ! S a v i n g m o d e l i n % s " , m o d e l _ p a t h ) # s e l f . s a v e r . s a v e ( s e s s i o n , m o d e l _ p a t h ) s e l f . s a v e r . s a v e ( s e s s i o n , m o d e l _ p a t h + " m o d e l . c k p t " ) r e s u l t s _ p a t h = " / { : % Y % m % d _ % H % M % S } / " . f o r m a t ( d a t e t i m e . d a t e t i m e . n o w ( ) ) # r e s u l t s _ p a t h = " / n e w F i l e " # m o d e l _ p a t h = r e s u l t s _ p a t h + " m o d e l . w e i g h t s / " m o d e l _ p a t h = t r a i n _ d i r + r e s u l t s _ p a t h i f n o t o s . p a t h . e x i s t s ( m o d e l _ p a t h ) : o s . m a k e d i r s ( m o d e l _ p a t h ) l o g g i n g . g e t L o g g e r ( ) . i n f o ( " N e w b e s t s c o r e ! S a v i n g m o d e l i n % s " , m o d e l _ p a t h ) # s e l f . s a v e r . s a v e ( s e s s i o n , m o d e l _ p a t h ) p r i n t ( " T H I S I S H E R E " , m o d e l _ p a t h + " / m o d e l . c k p t " ) s e l f . s a v e r . s a v e ( s e s s i o n , m o d e l _ p a t h + " m o d e l . c k p t " ) s e l f . s t a r t V a l i d a t e ( s e s s i o n , v a l _ d a t a s e t ) q a _ a n s w e r . p y

f rom __future__ import absolute_import

from __future__ import division

from __future__ import print_function

import io

import os

import json

import sys

import random

from os.path import join as pjoin

from tqdm import tqdm

import numpy as np

from six.moves import xrange

import tensorf low as tf

Page 26: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

from qa_model import Encoder, QASystem, Decoder

from preprocessing.squad_preprocess import data_from_json, maybe_download, squad_base_url , \

invert_map, tokenize, token_idx_map

import qa_data

import logging

logging.basicConfig( level=logging.INFO)

FLAGS = tf .app.f lags.FLAGS

tf .app.f lags.DEFINE_float(" learning_rate", 0.001, "Learning rate.")

t f .app.f lags.DEFINE_float("dropout", 0.15, "Fraction of units randomly dropped on non-recurrent connections.")

t f .app.f lags.DEFINE_integer("batch_size", 32, "Batch size to use during training.")

t f .app.f lags.DEFINE_integer("epochs", 0, "Number of epochs to train.")

t f .app.f lags.DEFINE_integer("state_size", 124, "Size of each model layer.")

t f .app.f lags.DEFINE_integer("embedding_size", 100, "Size of the pretrained vocabulary.")

t f .app.f lags.DEFINE_integer("output_size", 400, "The output size of your model.")

t f .app.f lags.DEFINE_integer("keep", 0, "How many checkpoints to keep, 0 indicates keep al l .")

#tf .app.f lags.DEFINE_string("train_dir" , "train/20170316_002002/model.weights", "Training directory (default : . / train) .")

# tf .app.f lags.DEFINE_str ing("train_dir" , "train", "Training directory (default : . / train) .")

#tf .app.f lags.DEFINE_string("train_dir" , "train/20170318_192824/model.weights", "Training directory (default : . / train) .")

#tf .app.f lags.DEFINE_string("train_dir" , "train /20170319_154527/model.weights", "Training directory (default : . / train) .")

t f .app.f lags.DEFINE_string("log_dir" , " log", "Path to store log and f lag f i les (default : . / log)")

t f .app.f lags.DEFINE_string("vocab_path", "data/squad/vocab.dat", "Path to vocab f i le (default : . /data/squad/vocab.dat)")

t f .app.f lags.DEFINE_string("embed_path", "" , "Path to the tr immed GLoVe embedding (default : . /data/squad/glove.tr immed.{embedding_size}.npz)")

t f .app.f lags.DEFINE_string("dev_path", "data/squad/dev-v1.1. json", "Path to the JSON dev set to evaluate against (default : . /data/squad/dev-v1.1. json)")

t f .app.f lags.DEFINE_float("max_gradient_norm", 20.0, "Cl ip gradients to this norm.")

t f .app.f lags.DEFINE_string("optimizer", "adam", "adam / sgd")

def init ial ize_model(session, model, train_dir) :

ckpt = tf .train.get_checkpoint_state(train_dir)

v2_path = ckpt.model_checkpoint_path + " . index" i f ckpt else ""

#print( 'checkpoint path' , ckpt.model_checkpoint_path)

#print( ' f i rst bool ' , t f .gf i le.Exists(v2_path))

Page 27: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

#print( 'second bool ' , t f .gf i le.Exists(ckpt.model_checkpoint_path))

i f ckpt and (tf .gf i le.Exists(ckpt.model_checkpoint_path) or tf .gf i le.Exists(v2_path)) :

logging. info("Reading model parameters from %s" % ckpt.model_checkpoint_path)

model.saver.restore(session, ckpt.model_checkpoint_path)

else:

logging. info("Created model with fresh parameters.")

session.run(tf .global_variables_init ial izer() )

logging. info( 'Num params: %d' % sum(v.get_shape() .num_elements() for v in tf .trainable_variables()) )

return model

def init ial ize_vocab(vocab_path):

i f t f .gf i le.Exists(vocab_path):

rev_vocab = [ ]

with tf .gf i le.GFile(vocab_path, mode="rb") as f :

rev_vocab.extend(f .readlines())

rev_vocab = [ l ine.str ip( ' \n ' ) for l ine in rev_vocab]

vocab = dict([ (x, y) for (y, x) in enumerate(rev_vocab)])

return vocab, rev_vocab

else:

raise ValueError("Vocabulary f i le %s not found.", vocab_path)

def read_dataset(dataset, t ier, vocab):

"""Reads the dataset, extracts context, question, answer,

and answer pointer in their own f i le. Returns the number

of questions and answers processed for the dataset"""

context_data = [ ]

query_data = [ ]

question_uuid_data = [ ]

for art icles_id in tqdm(range(len(dataset[ 'data'] ) ) , desc="Preprocessing {}" . format(t ier)) :

art icle_paragraphs = dataset[ 'data'] [art icles_id][ 'paragraphs']

for pid in range(len(art icle_paragraphs)) :

context = art icle_paragraphs[pid][ 'context ' ]

# The fol lowing replacements are suggested in the paper

# BidAF (Seo et al . , 2016)

context = context.replace(" ' '" , '" ' )

context = context.replace("` `" , '" ' )

Page 28: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

context_tokens = tokenize(context)

qas = art icle_paragraphs[pid][ 'qas']

for qid in range(len(qas)) :

question = qas[qid][ 'question']

question_tokens = tokenize(question)

question_uuid = qas[qid][ ' id ' ]

context_ids = [str(vocab.get(w, qa_data.UNK_ID)) for w in context_tokens]

qustion_ids = [str(vocab.get(w, qa_data.UNK_ID)) for w in question_tokens]

context_data.append( ' ' . join(context_ids))

query_data.append( ' ' . join(qustion_ids))

question_uuid_data.append(question_uuid)

return context_data, query_data, question_uuid_data

def prepare_dev(prefix, dev_f i lename, vocab):

# Don't check f i le size, since we could be using other datasets

dev_dataset = maybe_download(squad_base_url , dev_f i lename, prefix)

dev_data = data_from_json(os.path. join(prefix, dev_f i lename))

context_data, question_data, question_uuid_data = read_dataset(dev_data, 'dev' , vocab)

return context_data, question_data, question_uuid_data

def generate_answers(sess, model, dataset, rev_vocab):

"""

Loop over the dev or test dataset and generate answer.

Note: output format must be answers[uuid] = "real answer"

You must provide a str ing of words instead of just a l ist , or start and end index

In main() function we are dumping onto a JSON f i le

evaluate.py wil l take the output JSON along with the original JSON f i le

and output a F1 and EM

Page 29: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

You must implement this function in order to submit to Leaderboard.

:param sess: active TF session

:param model: a bui lt QASystem model

:param rev_vocab: this is a l ist of vocabulary that maps index to actual words

: return:

"""

question_data, context_data, question_masks, context_masks, question_uuid_data = dataset

predict_dataset = (question_data, context_data, question_masks, context_masks)

a_s, a_e = model.predict_answers(sess, predict_dataset)

answers = (context_data[i ] [s:e + 1] for i , (s, e) in enumerate(zip(a_s, a_e)))

answers = [" " . join(rev_vocab[x] for x in answer) for answer in answers]

answers = [answers[i ] i f i < len(answers) else "overf low" for i in range(len(question_data))]

answers_dict = { }

for a, uuid in zip(answers, question_uuid_data):

answers_dict[uuid] = a

return answers_dict

def get_normalized_train_dir(train_dir) :

"""

Adds symlink to {train_dir} from /tmp/cs224n-squad-train to canonicalize the

f i le paths saved in the checkpoint. This al lows the model to be reloaded even

i f the location of the checkpoint f i les has moved, al lowing usage with CodaLab.

This must be done on both train.py and qa_answer.py in order to work.

"""

global_train_dir = ' / tmp/cs224n-squad-train'

# global_train_dir = ' . / train/20170316_002002/model.weights'

i f os.path. lexists(global_train_dir) :

pr int( ' lexists' )

os.unl ink(global_train_dir)

i f not os.path.exists(train_dir) :

pr int( 'path does not exist ' )

os.makedirs(train_dir)

pr int( 'path: ' , os.path.abspath(train_dir) )

pr int( 'global train dir : ' , global_train_dir)

os.symlink(os.path.abspath(train_dir) , global_train_dir)

return global_train_dir

Page 30: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

def main(_) :

def padded_vector(vector, max_length):

return [vector[ i ] i f i < len(vector) else vocab[ '<pad>'] for i in range(max_length)]

vocab, rev_vocab = init ial ize_vocab(FLAGS.vocab_path)

embed_path = FLAGS.embed_path or pjoin("data", "squad", "glove.tr immed.{} .npz".format(FLAGS.embedding_size))

i f not os.path.exists(FLAGS.log_dir) :

os.makedirs(FLAGS.log_dir)

f i le_handler = logging.Fi leHandler(pjoin(FLAGS.log_dir , " log.txt"))

logging.getLogger() .addHandler(f i le_handler)

pr int(vars(FLAGS))

with open(os.path. join(FLAGS.log_dir , "f lags. json") , 'w') as fout:

json.dump(FLAGS.__f lags, fout)

# ========= Load Dataset =========

# You can change this code to load dataset in your own way

max_question_length = 60

max_context_length = FLAGS.output_size

dev_dirname = os.path.dirname(os.path.abspath(FLAGS.dev_path))

dev_f i lename = os.path.basename(FLAGS.dev_path)

context_data, question_data, question_uuid_data = prepare_dev(dev_dirname, dev_f i lename, vocab)

context_data = [ [ int(n) for n in c.spl it ( ) ] for c in context_data]

question_data = [ [ int(n) for n in q.spl it ( ) ] for q in question_data]

context_masks = [ len(c) for c in context_data]

question_masks = [ len(q) for q in question_data]

context_data = [padded_vector(c, max_context_length) for c in context_data]

question_data = [padded_vector(q, max_question_length) for q in question_data]

#print( 'question uuids' , question_uuid_data)

#print( 'context data' , context_data)

#print( 'dirname, f i lename: ' , dev_dirname, dev_f i lename)

dataset = (question_data, context_data, question_masks, context_masks, question_uuid_data)

Page 31: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

# ========= Model-specif ic =========

# You must change the fol lowing code to adjust to your model

question_encoder = Encoder(size=FLAGS.state_size)

context_encoder = Encoder(size=FLAGS.state_size)

decoder = Decoder(output_size=FLAGS.output_size)

qa = QASystem(question_encoder, context_encoder, decoder, embed_path, question_masks, context_masks)

with tf .Session() as sess:

train_dir = get_normalized_train_dir( ' train' )

#train_dir = ' . / train/20170321_110723'

init ial ize_model(sess, qa, train_dir)

answers = generate_answers(sess, qa, dataset, rev_vocab)

# write to json f i le to root dir

with io.open( 'dev-prediction. json' , 'w' , encoding='utf -8 ' ) as f :

f .write(unicode(json.dumps(answers, ensure_ascii=False)))

i f __name__ == "__main__":

t f .app.run()

Page 32: Reading Comprehension on SQuAD - Stanford … · Reading Comprehension on SQuAD Zachary Taylor, Brexton Pham, Meghana Rao Stanford University, Department of Computer Science CS224n

Reading Comprehension on SQuADStanford University (CS224N)

Meghana Rao, Brexton Pham, Zach TaylorMarch 2017

Background

What are we looking at? Question AnsweringPrevious approaches

- Multiperspective Context Matching (MPCM) Model

- Dynamic Coattention Network (DCN)

- Match-LSTM with Answer Pointer

DatasetsFor our dataset, we used the Stanford

Question Answering Dataset (SQuAD),

which contains context paragraphs that

were extracted from a set of articles from

Wikipedia. Humans then generated

questions using that paragraph as a context,

and selected a span from the same

paragraph as the target answer.SQuAD is comprised of 100K question-answer pairs, along with a context paragraph. The answers are of variable lengths and require various forms of multi-sentence reasoning. The span of answers are accessible in a reference document.

Methods/Algorithms/Models1: A simple baseline in which we fed both the

question and the context through an LSTM, and

then for each hidden state in our context LSTM,

we ran a simple dot product mechanism against

the final question state to compute the probability

that any index was the start or end index.

2: A model utilizing bidrectional LSTM’s along

with a question-aware context encoding. This

model also had a simple attention mechanism, and

we decoded the representations linearly.

3: A dynamic coattention model with an LSTM

classifier as the decoder

Experimental Evaluation and Findings

Future DirectionsFor possible future directions, an obvious area of improvement would be changing the model to be able to predicted longer-than-one-word answers. This would involve careful experimentation and modification of the one model that we managed to get some meaningful performance out of. We could then graduate to implementing more advanced attention mechanisms. Potentially, we could learn what about our more advanced models didn’t manage to translate to the evaluation set, and fix that element. In terms of other advanced models that we are interested in , answer pointers seem like a potentially valuable next step to take if we got the other elements working.

Problem StatementIn the SQuAD task, the goal is to predict an

answer span tuple {as, ae} given a question

of length n, q = {q1, q2, ..., qn}, and a

supporting context paragraph p = {p1, p2, ...,

pm} of length m. Thus, the model learns a

function that, given a pair of sequences (q,

p) returns a sequence of two scalar indices

{as, ae} indicating the start position and end

position of the answer in paragraph p,

respectively.

Scenarios to Consider1. Overfitting and underfitting

2. Accounting for unknown words in

vocab

3. Amount of parameters and

hyperparameter tuning

ConclusionAfter completing this project, we have gained

perspective into the challenges involved with

implementing a successful neural network for the

SQuAD reading comprehension task. We

ultimately demonstrated a functioning baseline

model after investing into implementing more

complex models that had long training durations

and overfit the training data. Initial successes on

the training data with more complex models did

not perform well on the validation set and thus

catalyzed a series of simplifications and

debugging procedures to scale back the model to

a slender and iterable scale. In retrospect, we wish

to have implemented the barebones model first

before attempting to implement more ambitious

models. Nevertheless, we at least managed to get

some positive performance eventually, and we

learned an incredible amount about implementing

neural network models for NLP. Also, we thought

that even 13% of correct answers was a worthy

accomplishment on such a difficult machine

comprehension task.

Predicted Answer

Correct Answer

hunting no natural predators

claims religious bigotry

March Bonus March

1639 1639

Catholic true apostolic succession

Toronto Toronto

Model F1 Score

EM Score

Separate Encodings

24.25 14.01

FINAL TEST SET RESULTS (on codalab):

Visualization

Question: Where is the Center for New Religions located ?

Context Paragraph: Robert S. Wood has argued that the

United States is a model for the world in terms of how a

separation of church and state—no state-run or

state-established church—is good for both the church and the

state , allowing a variety of religions to flourish . Speaking at

the Toronto-based Center for New Religions , Wood said that

the freedom of conscience and assembly allowed under such

a system has led to a " remarkable religiosity " in the United

States that is n't present in other industrialized nations .

Wood believes that the U.S. operates on " a sort of civic

religion , " which includes a generally-shared belief in a

creator who " expects better of us . " Beyond that , individuals

are free to decide how they want to believe and fill in their

own creeds and express their conscience . He calls this

approach the " genius of religious sentiment in the United

States .

"Predicted Answer, Correct Answer: Toronto, Toronto