sequence models - umd

15
Sequence Models Machine Learning: Jordan Boyd-Graber University of Maryland LSTM VARIANTS Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 1/5

Upload: others

Post on 11-May-2022

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sequence Models - UMD

Sequence Models

Machine Learning: Jordan Boyd-GraberUniversity of MarylandLSTM VARIANTS

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 1 / 5

Page 2: Sequence Models - UMD

GRU simplifies slightly

No explicit memory

Only one gate

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 2 / 5

Page 3: Sequence Models - UMD

GRU simplifies slightly

Slightly fewer parameters

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 2 / 5

Page 4: Sequence Models - UMD

What’s most important part of LSTM

Greff et al. explore

� No Input Gate (NIG)

� No Forget Gate (NFG)

� No Output Gate (NOG)

� No Input Activation Function(NIAF)

� No Output Activation Function(NOAF)

� No Peepholes (NP)

� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it

� Full Gate Recurrence (FGR):Original LSTM paper

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5

Page 5: Sequence Models - UMD

What’s most important part of LSTM

Greff et al. explore

� No Input Gate (NIG)

� No Forget Gate (NFG)

� No Output Gate (NOG)

� No Input Activation Function(NIAF)

� No Output Activation Function(NOAF)

� No Peepholes (NP)

� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it

� Full Gate Recurrence (FGR):Original LSTM paper

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5

Page 6: Sequence Models - UMD

What’s most important part of LSTM

Greff et al. explore

� No Input Gate (NIG)

� No Forget Gate (NFG)

� No Output Gate (NOG)

� No Input Activation Function(NIAF)

� No Output Activation Function(NOAF)

� No Peepholes (NP)

� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it

� Full Gate Recurrence (FGR):Original LSTM paper

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5

Page 7: Sequence Models - UMD

What’s most important part of LSTM

Greff et al. explore

� No Input Gate (NIG)

� No Forget Gate (NFG)

� No Output Gate (NOG)

� No Input Activation Function(NIAF)

� No Output Activation Function(NOAF)

� No Peepholes (NP)

� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it

� Full Gate Recurrence (FGR):Original LSTM paper

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5

Page 8: Sequence Models - UMD

What’s most important part of LSTM

Greff et al. explore

� No Input Gate (NIG)

� No Forget Gate (NFG)

� No Output Gate (NOG)

� No Input Activation Function(NIAF)

� No Output Activation Function(NOAF)

� No Peepholes (NP)

� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it

� Full Gate Recurrence (FGR):Original LSTM paper

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5

Page 9: Sequence Models - UMD

What’s most important part of LSTM

Greff et al. explore

� No Input Gate (NIG)

� No Forget Gate (NFG)

� No Output Gate (NOG)

� No Input Activation Function(NIAF)

� No Output Activation Function(NOAF)

� No Peepholes (NP)

� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it

� Full Gate Recurrence (FGR):Original LSTM paper

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5

Page 10: Sequence Models - UMD

What’s most important part of LSTM

Greff et al. explore

� No Input Gate (NIG)

� No Forget Gate (NFG)

� No Output Gate (NOG)

� No Input Activation Function(NIAF)

� No Output Activation Function(NOAF)

� No Peepholes (NP)

� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it

� Full Gate Recurrence (FGR):Original LSTM paper

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5

Page 11: Sequence Models - UMD

What’s most important part of LSTM

Greff et al. explore

� No Input Gate (NIG)

� No Forget Gate (NFG)

� No Output Gate (NOG)

� No Input Activation Function(NIAF)

� No Output Activation Function(NOAF)

� No Peepholes (NP)

� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it

� Full Gate Recurrence (FGR):Original LSTM paper

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5

Page 12: Sequence Models - UMD

What’s most important part of LSTM

Greff et al. explore

� No Input Gate (NIG)

� No Forget Gate (NFG)

� No Output Gate (NOG)

� No Input Activation Function(NIAF)

� No Output Activation Function(NOAF)

� No Peepholes (NP)

� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it

� Full Gate Recurrence (FGR):Original LSTM paper

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5

Page 13: Sequence Models - UMD

What’s most important part of LSTM

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5

Page 14: Sequence Models - UMD

Bi-directional LSTMs

Simple extension, often slightly improve performance (but don’t alwaysmake sense for task)

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 4 / 5

Page 15: Sequence Models - UMD

Comparing architechtures

� GRUs seem competitive

� LSTM seems to be good tradeoff

� Bi-directional often offers slight improvement

Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 5 / 5