text generation by learning from demonstrations

22
Text Generation by Learning from Demonstrations Richard Yuanzhe Pang yzpang.me joint work with He He 6/2/21

Upload: others

Post on 09-Jan-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Text Generation by Learning from Demonstrations

Text Generation by Learning from Demonstrations

Richard Yuanzhe Pang yzpang.me

joint work with He He

6/2/21

Page 2: Text Generation by Learning from Demonstrations

6/2/21 1

summarization (extractive, abstractive)

machine translation

question generation

creative writing assistance

language modeling (unconditional generation)

chitchat dialogue

(certain) open-ended dialogue

textual style transfer

story generation

table/data-to-text generation

Here, diversity as in the ability to generate a number of different correct generations given one context

“one good output” is enough need diverse outputs

Text generation tasks

Page 3: Text Generation by Learning from Demonstrations

Today: supervised conditional text generation

6/2/21 2

The College of the University of Chicago grants Bachelor of Arts and Bachelor of Science degrees in 50 academic majors and 28 minors

How many academic minors does the university grant in total ?

Actor Vince Vaughn and pop star Lady Gaga mustered up the courage to dive into freezing cold water for the Special Olympics charity the day after cities across the country broke cold records for last month. The Chicago-native Wedding Crashers star was the celebrity guest of honor at his hometown's annual Polar Plunge, in which brave swimmers raise money for athletes with special needs by plunging into Lake Michigan. Vaughn, wearing a Blackhawks hockey jersey and jeans, led the way into a patch of frigid, slushy 33 degree water near Lincoln Park that had been cleared of snow, which has accumulated in the city and much of the US. Scroll down for videos . Icy dip: Actor Vince Vaughn was the Special Olympic Polar Plunge celebrity guest on Sunday, when the water was right around the freezing point . Hometown hero: Vaughn, a Chicago native, wore a jersey from the city's Blackhawks hockey team and jeans to take the icy dip and warmed up with a towel after the trying ordeal . Taking the plunge: Lady Gaga joined her new fiance, Chicago Fire actor Taylor Kinney, at the event with his co-stars on the show . Poker face: Celebrities such as Vince Vaughn and Lady Gaga found it difficult not to grimace in the freezing patch of Lake Michigan . The star of upcoming movie Unfinished Business first went in up to his knees before lowering himself into the water backwards. Gaga attended the event with shirtless fiance Taylor Kinney, who was wearing a baseball cap and shorts, and his costars from the show Chicago Fire. The singer rode piggyback-style with Kinney into the water before they fully immersed themselves into the lake. The couple then played with each other in the frozen surf and retreated to the warmth of the shore. 'Taylor gave me his hat I thought my wig was gonna freeze into and become one with the lake,' said the songstress, who will take different sort of plunge with her beau after becoming engaged last month.. Lake Michigan was estimated to be right at freezing when the event began at 10.30am Central Time. Staying warm with love: Lady Gaga attended the charity event with her fiance after getting engaged to the television actor last month . Before and after: Gaga and Kinney looked comfortable while posing for photos (left) before their plunge left them in varied states of bedraggled . California love: A shirtless Kinney wore a baseball cap reminiscent of the warm-weather West Coast when he carried his future wife into the waters of Lake Michigan . Splashing in the snow: Gaga grits her teeth as she finds the energy to play in the lake despite a winter of record-breaking cold weather . Paparazzi: Gaga, whose real name is Stefani Germanotta, lost the glamour she often displays on stage and the red carpet when emerging from Lake Michigan . Organizers say that 5,000 people were expected to attend the plunge, and that they had already made more than $1million during the morning, according to ABC7. The plunge came at the end of a February that had seen cold records shattered across much of the US, which has seen snow in almost every state this winter. Chicago, which has seen weeks of freezing temperatures in the past month, tied a 140-year-old record for its coldest February with an average temperature of 14.6. Temperatures at O'Hare Airport hit minus 10 degrees Saturday morning, according to the Chicago Tribune. Mission accomplished: Special Olympics Chicago's annual Polar Plunge event regularly raises more than $1million for special needs athletes . The Fame: The event draws thousands of participants and spectators to the water each year along with celebrities such as Gaga. Above, the singer poses with a fan on the beach . Fire and ice: The Chicago Fire Department had to clear snow and …

NQG (using SQuAD)

CNN/DailyMail(extractivesummarization)

Chicago-native Vaughn was celebrity guest at his hometown's annual Polar Plunge for Special Olympics . Newly-engaged Lady Gaga attended with Chicago Fire fiance Taylor Kinney and his television co-stars . City tied 140-year record for coldest February with 14.6 degree average and more winter weather expected across US . Winter Storm Sparta brings more snow and ice to East Coast after causing tragic weather-related deaths in Midwest . Boston can possibly eclipse its record for most snow in one winter if it receives 5.7 more inches .

Input Output

The country's consumer watchdog has taken Apple to court for false advertising because the tablet computer does not work on Australia's 4G network. Apple's lawyers said they were willing to publish a clarification. However the company does not accept that it misled customers. The Australian Competition and Consumer Commission (ACCC) said on Tuesday: "Apple's recent promotion of the new 'iPad with wi-fi + 4G' is misleading because it represents to Australian consumers that the product can, with a sim card, connect to a 4G mobile data network in Australia, when this is not the case." The watchdog then lodged a complaint at the Federal Court in Melbourne. At a preliminary hearing, Apple lawyer Paul Anastassiou said Apple had never claimed the device would work fully on the current 4G network operated by Telstra. Apple says the new iPad works on what is globally accepted to be a 4G network. The matter will go to a full trial on 2 May. The Apple iPad's third version went on sale earlier this month, with Australia the first country where it was available. Shoppers lined up by the hundreds at Apple stores on opening day and the company said it had been its strongest iPad launch to date. The ACCC said it was seeking an injunction on sales as well as a financial penalty against Apple, corrective advertising and refunds to consumers. On its website, Apple does state that 4G LTE is only supported on selected networks in the US and Canada.

XSum(abstractivesummarization)

US technology firm Apple has offered to refund Australian customers who felt misled about the 4G capabilities of the new iPad.

IWLST14 De-En(machine translation)

In the American midwest, farmers used to load grain onto barges and send it upriver to the chicagomarket.

Im amerikanischen mittelwesten ludenbauern getreide auf kähne und sandten es den fluss hoch auf den markt nach chicago .

2

Page 4: Text Generation by Learning from Demonstrations

Decoder

hbosi

<latexit sha1_base64="ObOH7GyAcNT8S4iaR+AAWnzndE4=">AAAB+3icbZDLSgMxFIbP1Futt7Eu3QSL4KrMiKLLohuXFewFOkPJpJk2NJMMSUYspa/ixoUibn0Rd76N6XQW2vpD4OM/53BO/ijlTBvP+3ZKa+sbm1vl7crO7t7+gXtYbWuZKUJbRHKpuhHWlDNBW4YZTrupojiJOO1E49t5vfNIlWZSPJhJSsMEDwWLGcHGWn23GnAshpyiSOpA5dh3a17dy4VWwS+gBoWaffcrGEiSJVQYwrHWPd9LTTjFyjDC6awSZJqmmIzxkPYsCpxQHU7z22fo1DoDFEtlnzAod39PTHGi9SSJbGeCzUgv1+bmf7VeZuLrcMpEmhkqyGJRnHFkJJoHgQZMUWL4xAImitlbERlhhYmxcVVsCP7yl1ehfV73L+qX9xe1xk0RRxmO4QTOwIcraMAdNKEFBJ7gGV7hzZk5L86787FoLTnFzBH8kfP5AwMTlG8=</latexit>

Training (usually, teacher forcing + MLE)

y1(gold)

<latexit sha1_base64="YTdwu+k1CkmXasSNfm+IDaQsv6A=">AAAB/HicbVDLSsNAFJ3UV62vaJduBotQNyWRil0W3LisYB/QhjCZTNqhkwczN2II9VfcuFDErR/izr9x2mahrQcuHM65l3vv8RLBFVjWt1Ha2Nza3invVvb2Dw6PzOOTnopTSVmXxiKWA48oJnjEusBBsEEiGQk9wfre9Gbu9x+YVDyO7iFLmBOSccQDTgloyTWrmZvbeATsEfL6OBb+xWzmmjWrYS2A14ldkBoq0HHNr5Ef0zRkEVBBlBraVgJOTiRwKtisMkoVSwidkjEbahqRkCknXxw/w+da8XEQS10R4IX6eyInoVJZ6OnOkMBErXpz8T9vmELQcnIeJSmwiC4XBanAEON5EtjnklEQmSaESq5vxXRCJKGg86roEOzVl9dJ77JhNxtXd81au1XEUUan6AzVkY2uURvdog7qIooy9Ixe0ZvxZLwY78bHsrVkFDNV9AfG5w9n95SY</latexit>

y0(gold)

<latexit sha1_base64="3Tk9SUKOJxLunVCzLbIDzeppBNE=">AAAB/HicbVDLSsNAFJ3UV62vaJduBotQNyWRil0W3LisYB/QhjCZTNqhkwczN2II9VfcuFDErR/izr9x2mahrQcuHM65l3vv8RLBFVjWt1Ha2Nza3invVvb2Dw6PzOOTnopTSVmXxiKWA48oJnjEusBBsEEiGQk9wfre9Gbu9x+YVDyO7iFLmBOSccQDTgloyTWrmZtbeATsEfL6OBb+xWzmmjWrYS2A14ldkBoq0HHNr5Ef0zRkEVBBlBraVgJOTiRwKtisMkoVSwidkjEbahqRkCknXxw/w+da8XEQS10R4IX6eyInoVJZ6OnOkMBErXpz8T9vmELQcnIeJSmwiC4XBanAEON5EtjnklEQmSaESq5vxXRCJKGg86roEOzVl9dJ77JhNxtXd81au1XEUUan6AzVkY2uURvdog7qIooy9Ixe0ZvxZLwY78bHsrVkFDNV9AfG5w9mZJSX</latexit>

pT<latexit sha1_base64="KI1W/AobxwiQAzgAX0E1OMzenK0=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0n8wB4LXjxWbG2hDWWznbRLN5uwuxFK6E/w4kERr/4ib/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjRx2nimGLxSJWnYBqFFxiy3AjsJMopFEgsB2Mb2d++wmV5rFsmkmCfkSHkoecUWOlh6Tf7JcrbtWdg6wSLycVyNHol796g5ilEUrDBNW667mJ8TOqDGcCp6VeqjGhbEyH2LVU0gi1n81PnZIzqwxIGCtb0pC5+nsio5HWkyiwnRE1I73szcT/vG5qwpqfcZmkBiVbLApTQUxMZn+TAVfIjJhYQpni9lbCRlRRZmw6JRuCt/zyKnm8qHqX1ev7q0q9lsdRhBM4hXPw4AbqcAcNaAGDITzDK7w5wnlx3p2PRWvByWeO4Q+czx8zzo24</latexit>

6/2/21

The most widespread approach for supervised conditional text generation:

L(✓) = �Ey⇠phuman

TX

t=0

log p✓(yt | y0:t�1,x)

<latexit sha1_base64="hDkhGYvivs52QvfHH3Ify7cSGZ0=">AAACfHicbZFbaxQxFMczo9W6Vl3tow8Nbgu76C4zUrUIhUIRfPChQrct7KxDJpPdDc1lSM5Ih5BP0W/mmx/FFzGzuw+9eCDkz+9ccnJOUQluIUl+R/GDhxuPHm8+6Tzdevb8RfflqzOra0PZmGqhzUVBLBNcsTFwEOyiMozIQrDz4vK49Z//ZMZyrU6hqdhUkrniM04JBJR3rzNJYEGJcN98P4MFAzLAh3i4xEXhvvjcZYUWpW1kuFzjcWa5xFXAwK7ALWpJlPctrmXu4DDxP05xJvQ8xKwK4n6TQyZ5iW9Xyl3yGYapf3cTX/lB3u0lo2Rp+L5I16KH1naSd39lpaa1ZAqoINZO0qSCqSMGOBXMd7LasorQSzJnkyAVkcxO3XJ4Hu8FUuKZNuEowEt6M8MRadveQmQ7E3vX18L/+SY1zA6mjquqBqbo6qFZLTBo3G4Cl9wwCqIJglDDQ6+YLoghFMK+OmEI6d0v3xdn70fp/ujD9/3e0cF6HJvoNXqD+ihFn9AR+opO0BhR9CfaifrRIPob78Zv4+EqNI7WOdvolsUf/wFjF8Oo</latexit>

Page 5: Text Generation by Learning from Demonstrations

Decoder

hbosi

<latexit sha1_base64="ObOH7GyAcNT8S4iaR+AAWnzndE4=">AAAB+3icbZDLSgMxFIbP1Futt7Eu3QSL4KrMiKLLohuXFewFOkPJpJk2NJMMSUYspa/ixoUibn0Rd76N6XQW2vpD4OM/53BO/ijlTBvP+3ZKa+sbm1vl7crO7t7+gXtYbWuZKUJbRHKpuhHWlDNBW4YZTrupojiJOO1E49t5vfNIlWZSPJhJSsMEDwWLGcHGWn23GnAshpyiSOpA5dh3a17dy4VWwS+gBoWaffcrGEiSJVQYwrHWPd9LTTjFyjDC6awSZJqmmIzxkPYsCpxQHU7z22fo1DoDFEtlnzAod39PTHGi9SSJbGeCzUgv1+bmf7VeZuLrcMpEmhkqyGJRnHFkJJoHgQZMUWL4xAImitlbERlhhYmxcVVsCP7yl1ehfV73L+qX9xe1xk0RRxmO4QTOwIcraMAdNKEFBJ7gGV7hzZk5L86787FoLTnFzBH8kfP5AwMTlG8=</latexit>

Usually, autoregressive inference

Decoder

hbosi

<latexit sha1_base64="ObOH7GyAcNT8S4iaR+AAWnzndE4=">AAAB+3icbZDLSgMxFIbP1Futt7Eu3QSL4KrMiKLLohuXFewFOkPJpJk2NJMMSUYspa/ixoUibn0Rd76N6XQW2vpD4OM/53BO/ijlTBvP+3ZKa+sbm1vl7crO7t7+gXtYbWuZKUJbRHKpuhHWlDNBW4YZTrupojiJOO1E49t5vfNIlWZSPJhJSsMEDwWLGcHGWn23GnAshpyiSOpA5dh3a17dy4VWwS+gBoWaffcrGEiSJVQYwrHWPd9LTTjFyjDC6awSZJqmmIzxkPYsCpxQHU7z22fo1DoDFEtlnzAod39PTHGi9SSJbGeCzUgv1+bmf7VeZuLrcMpEmhkqyGJRnHFkJJoHgQZMUWL4xAImitlbERlhhYmxcVVsCP7yl1ehfV73L+qX9xe1xk0RRxmO4QTOwIcraMAdNKEFBJ7gGV7hzZk5L86787FoLTnFzBH8kfP5AwMTlG8=</latexit>

Training (usually, teacher forcing + MLE)

y1(gold)

<latexit sha1_base64="YTdwu+k1CkmXasSNfm+IDaQsv6A=">AAAB/HicbVDLSsNAFJ3UV62vaJduBotQNyWRil0W3LisYB/QhjCZTNqhkwczN2II9VfcuFDErR/izr9x2mahrQcuHM65l3vv8RLBFVjWt1Ha2Nza3invVvb2Dw6PzOOTnopTSVmXxiKWA48oJnjEusBBsEEiGQk9wfre9Gbu9x+YVDyO7iFLmBOSccQDTgloyTWrmZvbeATsEfL6OBb+xWzmmjWrYS2A14ldkBoq0HHNr5Ef0zRkEVBBlBraVgJOTiRwKtisMkoVSwidkjEbahqRkCknXxw/w+da8XEQS10R4IX6eyInoVJZ6OnOkMBErXpz8T9vmELQcnIeJSmwiC4XBanAEON5EtjnklEQmSaESq5vxXRCJKGg86roEOzVl9dJ77JhNxtXd81au1XEUUan6AzVkY2uURvdog7qIooy9Ixe0ZvxZLwY78bHsrVkFDNV9AfG5w9n95SY</latexit>

y0(gold)

<latexit sha1_base64="3Tk9SUKOJxLunVCzLbIDzeppBNE=">AAAB/HicbVDLSsNAFJ3UV62vaJduBotQNyWRil0W3LisYB/QhjCZTNqhkwczN2II9VfcuFDErR/izr9x2mahrQcuHM65l3vv8RLBFVjWt1Ha2Nza3invVvb2Dw6PzOOTnopTSVmXxiKWA48oJnjEusBBsEEiGQk9wfre9Gbu9x+YVDyO7iFLmBOSccQDTgloyTWrmZtbeATsEfL6OBb+xWzmmjWrYS2A14ldkBoq0HHNr5Ef0zRkEVBBlBraVgJOTiRwKtisMkoVSwidkjEbahqRkCknXxw/w+da8XEQS10R4IX6eyInoVJZ6OnOkMBErXpz8T9vmELQcnIeJSmwiC4XBanAEON5EtjnklEQmSaESq5vxXRCJKGg86roEOzVl9dJ77JhNxtXd81au1XEUUan6AzVkY2uURvdog7qIooy9Ixe0ZvxZLwY78bHsrVkFDNV9AfG5w9mZJSX</latexit>

y0

<latexit sha1_base64="u0msaCK1jRqjz/UTLcWlnvbi2lc=">AAAB8HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkYo8FLx4r2A9pQ9lsN+3SzSbsToQS+iu8eFDEqz/Hm//GbZuDtj4YeLw3w8y8IJHCoOt+O4WNza3tneJuaW//4PCofHzSNnGqGW+xWMa6G1DDpVC8hQIl7yaa0yiQvBNMbud+54lrI2L1gNOE+xEdKREKRtFKj/0xxWw6cGeDcsWtuguQdeLlpAI5moPyV38YszTiCpmkxvQ8N0E/oxoFk3xW6qeGJ5RN6Ij3LFU04sbPFgfPyIVVhiSMtS2FZKH+nshoZMw0CmxnRHFsVr25+J/XSzGs+5lQSYpcseWiMJUEYzL/ngyF5gzl1BLKtLC3EjammjK0GZVsCN7qy+ukfVX1atXr+1qlUc/jKMIZnMMleHADDbiDJrSAQQTP8ApvjnZenHfnY9lacPKZU/gD5/MH2gyQaw==</latexit>

pT<latexit sha1_base64="KI1W/AobxwiQAzgAX0E1OMzenK0=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0n8wB4LXjxWbG2hDWWznbRLN5uwuxFK6E/w4kERr/4ib/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjRx2nimGLxSJWnYBqFFxiy3AjsJMopFEgsB2Mb2d++wmV5rFsmkmCfkSHkoecUWOlh6Tf7JcrbtWdg6wSLycVyNHol796g5ilEUrDBNW667mJ8TOqDGcCp6VeqjGhbEyH2LVU0gi1n81PnZIzqwxIGCtb0pC5+nsio5HWkyiwnRE1I73szcT/vG5qwpqfcZmkBiVbLApTQUxMZn+TAVfIjJhYQpni9lbCRlRRZmw6JRuCt/zyKnm8qHqX1ev7q0q9lsdRhBM4hXPw4AbqcAcNaAGDITzDK7w5wnlx3p2PRWvByWeO4Q+czx8zzo24</latexit>

6/2/21 4

The most widespread approach for supervised conditional text generation:

Page 6: Text Generation by Learning from Demonstrations

Decoder

hbosi

<latexit sha1_base64="ObOH7GyAcNT8S4iaR+AAWnzndE4=">AAAB+3icbZDLSgMxFIbP1Futt7Eu3QSL4KrMiKLLohuXFewFOkPJpJk2NJMMSUYspa/ixoUibn0Rd76N6XQW2vpD4OM/53BO/ijlTBvP+3ZKa+sbm1vl7crO7t7+gXtYbWuZKUJbRHKpuhHWlDNBW4YZTrupojiJOO1E49t5vfNIlWZSPJhJSsMEDwWLGcHGWn23GnAshpyiSOpA5dh3a17dy4VWwS+gBoWaffcrGEiSJVQYwrHWPd9LTTjFyjDC6awSZJqmmIzxkPYsCpxQHU7z22fo1DoDFEtlnzAod39PTHGi9SSJbGeCzUgv1+bmf7VeZuLrcMpEmhkqyGJRnHFkJJoHgQZMUWL4xAImitlbERlhhYmxcVVsCP7yl1ehfV73L+qX9xe1xk0RRxmO4QTOwIcraMAdNKEFBJ7gGV7hzZk5L86787FoLTnFzBH8kfP5AwMTlG8=</latexit>

heosi

<latexit sha1_base64="D1C007AdzTLLhEbwRty62u6x/kM=">AAAB/HicbZDLSsNAFIYn9VbrLdqlm8EiuCqJKHZZcOOygr1AE8pketIOnUzCzEQIob6KGxeKuPVB3Pk2TtMstPWHgY//nMM58wcJZ0o7zrdV2djc2t6p7tb29g8Oj+zjk56KU0mhS2Mey0FAFHAmoKuZ5jBIJJAo4NAPZreLev8RpGKxeNBZAn5EJoKFjBJtrJFd9zgREw4YYoU9WfDIbjhNpxBeB7eEBirVGdlf3jimaQRCU06UGrpOov2cSM0oh3nNSxUkhM7IBIYGBYlA+Xlx/ByfG2eMw1iaJzQu3N8TOYmUyqLAdEZET9VqbWH+VxumOmz5ORNJqkHQ5aIw5VjHeJEEHjMJVPPMAKGSmVsxnRJJqDZ51UwI7uqX16F32XSvmtf3V412q4yjik7RGbpALrpBbXSHOqiLKMrQM3pFb9aT9WK9Wx/L1opVztTRH1mfP19zlJI=</latexit>

Usually, autoregressive inference

Decoder

hbosi

<latexit sha1_base64="ObOH7GyAcNT8S4iaR+AAWnzndE4=">AAAB+3icbZDLSgMxFIbP1Futt7Eu3QSL4KrMiKLLohuXFewFOkPJpJk2NJMMSUYspa/ixoUibn0Rd76N6XQW2vpD4OM/53BO/ijlTBvP+3ZKa+sbm1vl7crO7t7+gXtYbWuZKUJbRHKpuhHWlDNBW4YZTrupojiJOO1E49t5vfNIlWZSPJhJSsMEDwWLGcHGWn23GnAshpyiSOpA5dh3a17dy4VWwS+gBoWaffcrGEiSJVQYwrHWPd9LTTjFyjDC6awSZJqmmIzxkPYsCpxQHU7z22fo1DoDFEtlnzAod39PTHGi9SSJbGeCzUgv1+bmf7VeZuLrcMpEmhkqyGJRnHFkJJoHgQZMUWL4xAImitlbERlhhYmxcVVsCP7yl1ehfV73L+qX9xe1xk0RRxmO4QTOwIcraMAdNKEFBJ7gGV7hzZk5L86787FoLTnFzBH8kfP5AwMTlG8=</latexit>

Training (usually, teacher forcing + MLE)

y1(gold)

<latexit sha1_base64="YTdwu+k1CkmXasSNfm+IDaQsv6A=">AAAB/HicbVDLSsNAFJ3UV62vaJduBotQNyWRil0W3LisYB/QhjCZTNqhkwczN2II9VfcuFDErR/izr9x2mahrQcuHM65l3vv8RLBFVjWt1Ha2Nza3invVvb2Dw6PzOOTnopTSVmXxiKWA48oJnjEusBBsEEiGQk9wfre9Gbu9x+YVDyO7iFLmBOSccQDTgloyTWrmZvbeATsEfL6OBb+xWzmmjWrYS2A14ldkBoq0HHNr5Ef0zRkEVBBlBraVgJOTiRwKtisMkoVSwidkjEbahqRkCknXxw/w+da8XEQS10R4IX6eyInoVJZ6OnOkMBErXpz8T9vmELQcnIeJSmwiC4XBanAEON5EtjnklEQmSaESq5vxXRCJKGg86roEOzVl9dJ77JhNxtXd81au1XEUUan6AzVkY2uURvdog7qIooy9Ixe0ZvxZLwY78bHsrVkFDNV9AfG5w9n95SY</latexit>

y0(gold)

<latexit sha1_base64="3Tk9SUKOJxLunVCzLbIDzeppBNE=">AAAB/HicbVDLSsNAFJ3UV62vaJduBotQNyWRil0W3LisYB/QhjCZTNqhkwczN2II9VfcuFDErR/izr9x2mahrQcuHM65l3vv8RLBFVjWt1Ha2Nza3invVvb2Dw6PzOOTnopTSVmXxiKWA48oJnjEusBBsEEiGQk9wfre9Gbu9x+YVDyO7iFLmBOSccQDTgloyTWrmZtbeATsEfL6OBb+xWzmmjWrYS2A14ldkBoq0HHNr5Ef0zRkEVBBlBraVgJOTiRwKtisMkoVSwidkjEbahqRkCknXxw/w+da8XEQS10R4IX6eyInoVJZ6OnOkMBErXpz8T9vmELQcnIeJSmwiC4XBanAEON5EtjnklEQmSaESq5vxXRCJKGg86roEOzVl9dJ77JhNxtXd81au1XEUUan6AzVkY2uURvdog7qIooy9Ixe0ZvxZLwY78bHsrVkFDNV9AfG5w9mZJSX</latexit>

y2

<latexit sha1_base64="o6o6NkRRfh2Tebv9cTA37l2N/iI=">AAAB8HicbVBNS8NAEJ3Ur1q/qh69BIvgqSSlYo8FLx4r2FZpQ9lsN+3S3U3YnQgl9Fd48aCIV3+ON/+N2zYHbX0w8Hhvhpl5YSK4Qc/7dgobm1vbO8Xd0t7+weFR+fikY+JUU9amsYj1Q0gME1yxNnIU7CHRjMhQsG44uZn73SemDY/VPU4TFkgyUjzilKCVHvtjgtl0UJsNyhWv6i3grhM/JxXI0RqUv/rDmKaSKaSCGNPzvQSDjGjkVLBZqZ8alhA6ISPWs1QRyUyQLQ6euRdWGbpRrG0pdBfq74mMSGOmMrSdkuDYrHpz8T+vl2LUCDKukhSZostFUSpcjN359+6Qa0ZRTC0hVHN7q0vHRBOKNqOSDcFffXmddGpVv169uqtXmo08jiKcwTlcgg/X0IRbaEEbKEh4hld4c7Tz4rw7H8vWgpPPnMIfOJ8/3RaQbQ==</latexit>

y0

<latexit sha1_base64="u0msaCK1jRqjz/UTLcWlnvbi2lc=">AAAB8HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkYo8FLx4r2A9pQ9lsN+3SzSbsToQS+iu8eFDEqz/Hm//GbZuDtj4YeLw3w8y8IJHCoOt+O4WNza3tneJuaW//4PCofHzSNnGqGW+xWMa6G1DDpVC8hQIl7yaa0yiQvBNMbud+54lrI2L1gNOE+xEdKREKRtFKj/0xxWw6cGeDcsWtuguQdeLlpAI5moPyV38YszTiCpmkxvQ8N0E/oxoFk3xW6qeGJ5RN6Ij3LFU04sbPFgfPyIVVhiSMtS2FZKH+nshoZMw0CmxnRHFsVr25+J/XSzGs+5lQSYpcseWiMJUEYzL/ngyF5gzl1BLKtLC3EjammjK0GZVsCN7qy+ukfVX1atXr+1qlUc/jKMIZnMMleHADDbiDJrSAQQTP8ApvjnZenHfnY9lacPKZU/gD5/MH2gyQaw==</latexit>

y1

<latexit sha1_base64="X3rvl6PeNNa+VTeEyXSXDpwjPPI=">AAAB8HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkYo8FLx4r2A9pQ9lsN+3SzSbsToQS+iu8eFDEqz/Hm//GbZuDtj4YeLw3w8y8IJHCoOt+O4WNza3tneJuaW//4PCofHzSNnGqGW+xWMa6G1DDpVC8hQIl7yaa0yiQvBNMbud+54lrI2L1gNOE+xEdKREKRtFKj/0xxWw68GaDcsWtuguQdeLlpAI5moPyV38YszTiCpmkxvQ8N0E/oxoFk3xW6qeGJ5RN6Ij3LFU04sbPFgfPyIVVhiSMtS2FZKH+nshoZMw0CmxnRHFsVr25+J/XSzGs+5lQSYpcseWiMJUEYzL/ngyF5gzl1BLKtLC3EjammjK0GZVsCN7qy+ukfVX1atXr+1qlUc/jKMIZnMMleHADDbiDJrSAQQTP8ApvjnZenHfnY9lacPKZU/gD5/MH25GQbA==</latexit>

y1

<latexit sha1_base64="X3rvl6PeNNa+VTeEyXSXDpwjPPI=">AAAB8HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkYo8FLx4r2A9pQ9lsN+3SzSbsToQS+iu8eFDEqz/Hm//GbZuDtj4YeLw3w8y8IJHCoOt+O4WNza3tneJuaW//4PCofHzSNnGqGW+xWMa6G1DDpVC8hQIl7yaa0yiQvBNMbud+54lrI2L1gNOE+xEdKREKRtFKj/0xxWw68GaDcsWtuguQdeLlpAI5moPyV38YszTiCpmkxvQ8N0E/oxoFk3xW6qeGJ5RN6Ij3LFU04sbPFgfPyIVVhiSMtS2FZKH+nshoZMw0CmxnRHFsVr25+J/XSzGs+5lQSYpcseWiMJUEYzL/ngyF5gzl1BLKtLC3EjammjK0GZVsCN7qy+ukfVX1atXr+1qlUc/jKMIZnMMleHADDbiDJrSAQQTP8ApvjnZenHfnY9lacPKZU/gD5/MH25GQbA==</latexit>

y0

<latexit sha1_base64="u0msaCK1jRqjz/UTLcWlnvbi2lc=">AAAB8HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkYo8FLx4r2A9pQ9lsN+3SzSbsToQS+iu8eFDEqz/Hm//GbZuDtj4YeLw3w8y8IJHCoOt+O4WNza3tneJuaW//4PCofHzSNnGqGW+xWMa6G1DDpVC8hQIl7yaa0yiQvBNMbud+54lrI2L1gNOE+xEdKREKRtFKj/0xxWw6cGeDcsWtuguQdeLlpAI5moPyV38YszTiCpmkxvQ8N0E/oxoFk3xW6qeGJ5RN6Ij3LFU04sbPFgfPyIVVhiSMtS2FZKH+nshoZMw0CmxnRHFsVr25+J/XSzGs+5lQSYpcseWiMJUEYzL/ngyF5gzl1BLKtLC3EjammjK0GZVsCN7qy+ukfVX1atXr+1qlUc/jKMIZnMMleHADDbiDJrSAQQTP8ApvjnZenHfnY9lacPKZU/gD5/MH2gyQaw==</latexit>

pT<latexit sha1_base64="KI1W/AobxwiQAzgAX0E1OMzenK0=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0n8wB4LXjxWbG2hDWWznbRLN5uwuxFK6E/w4kERr/4ib/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjRx2nimGLxSJWnYBqFFxiy3AjsJMopFEgsB2Mb2d++wmV5rFsmkmCfkSHkoecUWOlh6Tf7JcrbtWdg6wSLycVyNHol796g5ilEUrDBNW667mJ8TOqDGcCp6VeqjGhbEyH2LVU0gi1n81PnZIzqwxIGCtb0pC5+nsio5HWkyiwnRE1I73szcT/vG5qwpqfcZmkBiVbLApTQUxMZn+TAVfIjJhYQpni9lbCRlRRZmw6JRuCt/zyKnm8qHqX1ev7q0q9lsdRhBM4hXPw4AbqcAcNaAGDITzDK7w5wnlx3p2PRWvByWeO4Q+czx8zzo24</latexit>

6/2/21 5

The most widespread approach for supervised conditional text generation:

Page 7: Text Generation by Learning from Demonstrations

Motivation 1: mismatched history (gold vs. model-generated)Beam Search, b=32, from GPT-2:"The study, published in the Proceedings of the National Academy of Sciences of the United States of America (PNAS), was conducted by researchers from the Universidad Nacional Autónoma de México (UNAM) and the Universidad Nacional Autónoma de México (UNAM/Universidad Nacional Autónoma de México/Universidad Nacional Autónomade México/Universidad Nacional Autónoma de México/Universidad Nacional Autónoma de ...” (Holtzman et al., 2020)

Machine translation hallucination (Wang and Sennrich, 2020):Source: So höre nicht auf die Ableugner.Reference: So hearken not to those who deny.MLE: Do not drive or use machines.

Repetition

Hallucination

Motivation 2: mismatched learning/evaluation objectives

• High recall: pθ must cover all outputs from phuman

L(✓) = �Ey⇠phuman

TX

t=0

log p✓(yt | y0:t�1,x)

<latexit sha1_base64="hDkhGYvivs52QvfHH3Ify7cSGZ0=">AAACfHicbZFbaxQxFMczo9W6Vl3tow8Nbgu76C4zUrUIhUIRfPChQrct7KxDJpPdDc1lSM5Ih5BP0W/mmx/FFzGzuw+9eCDkz+9ccnJOUQluIUl+R/GDhxuPHm8+6Tzdevb8RfflqzOra0PZmGqhzUVBLBNcsTFwEOyiMozIQrDz4vK49Z//ZMZyrU6hqdhUkrniM04JBJR3rzNJYEGJcN98P4MFAzLAh3i4xEXhvvjcZYUWpW1kuFzjcWa5xFXAwK7ALWpJlPctrmXu4DDxP05xJvQ8xKwK4n6TQyZ5iW9Xyl3yGYapf3cTX/lB3u0lo2Rp+L5I16KH1naSd39lpaa1ZAqoINZO0qSCqSMGOBXMd7LasorQSzJnkyAVkcxO3XJ4Hu8FUuKZNuEowEt6M8MRadveQmQ7E3vX18L/+SY1zA6mjquqBqbo6qFZLTBo3G4Cl9wwCqIJglDDQ6+YLoghFMK+OmEI6d0v3xdn70fp/ujD9/3e0cF6HJvoNXqD+ihFn9AR+opO0BhR9CfaifrRIPob78Zv4+EqNI7WOdvolsUf/wFjF8Oo</latexit>

• High precision: all outputs from pθ must be scored high under phuman

<latexit sha1_base64="/j1mfxdTF9ZLgYtyd102lPDpwaw=">AAACZ3icbVFdaxQxFM2MVev60dUWKfQluggVdJkRUREKxVLoY4VuW9isQyaT3Q3Nx5DckQ4h/si+9d0X/4WZ7T64rRdCTs659+bmpKylcJBl10l6b+3+g4frj3qPnzx9ttF//uLUmcYyPmJGGnteUsel0HwEAiQ/ry2nqpT8rLw46PSzn9w6YfQJtDWfKDrTYioYhUgV/V9EUZiXpT8MhSelkZVrVdx8GzBxQuG6IDDnQLtjowoPe1n4cYKJNLOoeQL8Evy8UVSHgHfbAjBRosKrrQqffYX3eXi3wl+Gt0V/kA2zReC7IF+CAVrGcdG/IpVhjeIamKTOjfOshomnFgSTPPRI43hN2QWd8XGEmiruJn7hU8BvIlPhqbFxacAL9t8KT5XrZouZnSvuttaR/9PGDUy/TLzQdQNcs5uLpo3EYHBnOq6E5QxkGwFlVsRZMZtTSxnEr+lFE/LbT74LTj8M80/D7PvHwf63pR3raAe9RrsoR5/RPjpCx2iEGPqd9JLNZCv5k26kL9Ptm9Q0WdZsopVIX/0Fsg67Ng==</latexit>

Ey⇠p✓

TX

t=0

log phuman(yt | y0:t�1,x)

6/2/21 6

Page 8: Text Generation by Learning from Demonstrations

Motivationà Background: RL formulation of text genOffline objective: learning algorithm GOLD

6/2/21 7

Page 9: Text Generation by Learning from Demonstrations

Background: RL in text generationL(✓) = �Ey⇠phuman

TX

t=0

log p✓(yt | y0:t�1,x)

<latexit sha1_base64="hDkhGYvivs52QvfHH3Ify7cSGZ0=">AAACfHicbZFbaxQxFMczo9W6Vl3tow8Nbgu76C4zUrUIhUIRfPChQrct7KxDJpPdDc1lSM5Ih5BP0W/mmx/FFzGzuw+9eCDkz+9ccnJOUQluIUl+R/GDhxuPHm8+6Tzdevb8RfflqzOra0PZmGqhzUVBLBNcsTFwEOyiMozIQrDz4vK49Z//ZMZyrU6hqdhUkrniM04JBJR3rzNJYEGJcN98P4MFAzLAh3i4xEXhvvjcZYUWpW1kuFzjcWa5xFXAwK7ALWpJlPctrmXu4DDxP05xJvQ8xKwK4n6TQyZ5iW9Xyl3yGYapf3cTX/lB3u0lo2Rp+L5I16KH1naSd39lpaa1ZAqoINZO0qSCqSMGOBXMd7LasorQSzJnkyAVkcxO3XJ4Hu8FUuKZNuEowEt6M8MRadveQmQ7E3vX18L/+SY1zA6mjquqBqbo6qFZLTBo3G4Cl9wwCqIJglDDQ6+YLoghFMK+OmEI6d0v3xdn70fp/ujD9/3e0cF6HJvoNXqD+ihFn9AR+opO0BhR9CfaifrRIPob78Zv4+EqNI7WOdvolsUf/wFjF8Oo</latexit>

MLE

Ey⇠p✓

TX

t=1

log phuman(yt | y0:t�1,x)

<latexit sha1_base64="5IhR52kYf00O48q/9tA02fKIIN8=">AAACZ3icbVFdaxQxFM2MVev60dVKKfgSXYQKdZmRikUQCqXQxwrdtrBZh0wmuxuajyG5Ix1C/JG++e6L/6KZ7T50Wy+EnJxz783NSVlL4SDL/iTpg7WHjx6vP+k9ffb8xUb/5aszZxrL+IgZaexFSR2XQvMRCJD8oracqlLy8/LysNPPf3LrhNGn0NZ8ouhMi6lgFCJV9H8RRWFelv4oFJ6URlauVXHzbcDECYXrgsCcAw04nhtVePiWhx+nmEgzi6InwK/AzxtFdQg7bQFEiQqvdip89hU+5mF3hb8KH3DRH2TDbBH4PsiXYICWcVL0f5PKsEZxDUxS58Z5VsPEUwuCSR56pHG8puySzvg4Qk0VdxO/8Cng95Gp8NTYuDTgBXu7wlPluuFiZueKu6t15P+0cQPT/YkXum6Aa3Zz0bSRGAzuTMeVsJyBbCOgzIo4K2ZzaimD+DW9aEJ+98n3wdmnYb43/Px9b3Cwv7RjHb1B79AOytEXdICO0QkaIYb+Jr1kM3md/Es30q10+yY1TZY1m2gl0rfXs9+7MA==</latexit>

Eval

RL J(✓) = E⌧⇠⇡✓

TX

t=0

R(at, st)

<latexit sha1_base64="hIzi5NJezcEYyyM7eFSD+o8pG2o=">AAACLnicbVDLSgMxFM34tr6qLt0Ei1BByowouhEEEcSVSlsLnTrcSdM2mMwMyR2hDPNFbvwVXQgq4tbPMH0sfB0IHM45l5t7wkQKg6774kxMTk3PzM7NFxYWl5ZXiqtrdROnmvEai2WsGyEYLkXEayhQ8kaiOahQ8uvw9mTgX99xbUQcVbGf8JaCbiQ6ggFaKSienpd97HGEbXpEfQXYC8PsNA8yHyH1jVDUT0QwiuSU+iZVQYZHbn5TpVdlCHDHBLgdFEtuxR2C/iXemJTIGBdB8clvxyxVPEImwZim5ybYykCjYJLnBT81PAF2C13etDQCxU0rG56b0y2rtGkn1vZFSIfq94kMlDF9Fdrk4CDz2xuI/3nNFDuHrUxESYo8YqNFnVRSjOmgO9oWmjOUfUuAaWH/SlkPNDC0DRdsCd7vk/+S+m7F26vsX+6Vjg/HdcyRDbJJysQjB+SYnJELUiOM3JNH8krenAfn2Xl3PkbRCWc8s05+wPn8AiX3p/g=</latexit>

policy rewardaction

state

0

6/2/21 8

Page 10: Text Generation by Learning from Demonstrations

Background: RL in text generation

• Prior approach: directly optimize a sequence-level metric like BLEU, ROUGE, etc., using policy gradient

• Pros: no exposure bias; may discover high-quality outputs outside the references• Cons: degenerate solutions and difficult optimization (gradient estimated by samples from

policy has high variance)• Current popular solution: stay close to MLE (by interpolating, by regularizing, etc.) but this

defeats the purpose of using RL!• Marginal improvements compared to MLE (Wu et al., 2018; Choshen et al., 2020)• Problem: policy/generator interacting with the world

6/2/21 9

Page 11: Text Generation by Learning from Demonstrations

Background: RL in text generation

• Interaction/exploration?• Argument in RL: allows us to learn about the

environment dynamics• But we already know the dynamics

• Argument in RL : Explore novel actions that may lead to higher reward• We don’t have a good reward

• Summary• MLE: mismatched losses, easy to optimize• RL: matched losses, hard to optimize

6/2/21 10

Page 12: Text Generation by Learning from Demonstrations

MotivationBackground: RL formulation of text genà Offline objective: learning algorithm GOLD

6/2/21 11

Page 13: Text Generation by Learning from Demonstrations

Online + on-policy policy gradient

• Step 1: sample outputs from the model

• Step 2: get seq-level rewards like BLEU

• Step 3: use policy gradient to optimize

Online vs. offline policy gradient

6/2/21

<latexit sha1_base64="O9GU1XQec3JywHfU3a9R9qa4SAI=">AAACiHicbVFNaxsxENVumzZ1k9Zpjr2ImoJNgtktadIeCiGlEHJKoE4ClllmtXIsImkXabZgxP6W/qfe8m+qtZeSjw4Int68eTMa5ZWSDpPkLoqfPd948XLzVe/11vabt/2dd5eurC0XE16q0l7n4ISSRkxQohLXlRWgcyWu8tvvbf7ql7BOluYnLisx03Bj5FxywEBl/d/MQK4gY7gQCPRsuAYj+o0yDbjIc/+jyTxDqJmTmnq26umtKBrKKtkVNuHiap0hfejHVHlzTzaEDJmWBXUZjv55hZGECR6ULQD9RTMM2X0apKOml/UHyThZBX0K0g4MSBfnWf8PK0pe6+DIFTg3TZMKZx4sSq5E02O1ExXw29BzGqABLdzMrwZp6MfAFHRe2nAM0hV7v8KDdm6p86Bsl+Me51ryf7lpjfMvMy9NVaMwfN1oXiuKJW1/hRbSCo5qGQBwK8OslC/AAsfwd+0S0sdPfgouP43Tw3FycTA4PunWsUnekw9kSFJyRI7JKTknE8KjjWgvOog+x704iY/ir2tpHHU1u+RBxCd/Ae0oxTg=</latexit>

r✓J(✓) = E⌧⇠⇡✓

X

t

r✓ log ⇡✓(at | st)Q(st, at)

Page 14: Text Generation by Learning from Demonstrations

Online + on-policy policy gradient

• Step 1: sample outputs from the model

• Step 2: get seq-level rewards like BLEU

• Step 3: use policy gradient to optimize

Offline + off-policy policy gradient

• Step 1: sample from demonstrations (i.e., gold supervised data)

• Step 2: get token-level rewards based on pMLE

• Step 3: use policy gradient to optimize

⇡b = phuman

<latexit sha1_base64="DNhW2leVOn3SfgGmHC/p1R4wpng=">AAACEHicbVBNS8NAEN3U7/pV9ehlsYieSiIVexEELx4r2Co0IWy2U7t0swm7E7GE/AQv/hUvHhTx6tGb/8ZtzcGvBwOP92Z2Z16USmHQdT+cyszs3PzC4lJ1eWV1bb22sdk1SaY5dHgiE30VMQNSKOigQAlXqQYWRxIuo9HpxL+8AW1Eoi5wnEIQs2slBoIztFJY28v96SO5hn5B/VSEET2mNA1zH+EW82EWM1UURViruw13CvqXeCWpkxLtsPbu9xOexaCQS2ZMz3NTDHKmUXAJRdXPDKSMj9g19CxVLAYT5NNdCrprlT4dJNqWQjpVv0/kLDZmHEe2M2Y4NL+9ifif18tw0ApyodIMQfGvjwaZpJjQSTq0LzRwlGNLGNfC7kr5kGnG0WZYtSF4v0/+S7oHDa/ZODxv1k9aZRyLZJvskH3ikSNyQs5Im3QIJ3fkgTyRZ+feeXRenNev1opTzmyRH3DePgGBZ52B</latexit>

wt ⇡ ⇡✓(at0 | st0)

<latexit sha1_base64="dDX+3/JvRxfk5mOl3T3ebM6n7Rc=">AAACIXicbZBNa9tAEIZX7kdc98tJjr0sNaXuxUgloT4aeskxgfgDLCNG67G9eCUtu6M0Ruiv5NK/kksOLSW3kD+TtexDa3dg4eF9Z3Zn31gracn3H7zas+cvXh7UXzVev3n77n3z8Ghgs9wI7ItMZWYUg0UlU+yTJIUjbRCSWOEwXn5f+8MrNFZm6SWtNE4SmKdyJgWQk6JmtwirS4pY5VjyHxHxELQ22TUPtYxCWiBBG6KCPpdhIqfcVviljJotv+NXxfch2EKLbes8at6H00zkCaYkFFg7DnxNkwIMSaGwbIS5RQ1iCXMcO0whQTspquVK/skpUz7LjDsp8Ur9e6KAxNpVErvOBGhhd721+D9vnNOsOylkqnPCVGwemuWKU8bXcfGpNChIrRyAMNLtysUCDAhyoTZcCMHul/dh8LUTnHROL05ave42jjr7wD6yNgvYN9ZjZ+yc9ZlgN+yW/WK/vZ/enffHu9+01rztzDH7p7zHJ46jpFE=</latexit>

Intuition

• Upweight more ”confident” examples; focus more on successful data (closer to test-time distribution)

• Intuitively reduce exposure bias

Online vs. offline policy gradient

model confidence pMLEbased reward(see paper)

use empirical distn

6/2/21 13

<latexit sha1_base64="nS4TH8lK3XJ0dGqG35q8SEnRLtw=">AAACl3icbVFtaxQxEM5ufann21n9In4JHsIdyLEr0kpBLFal+KkHXlu4HGGSzfVCk90lma0cy/4lf4zf/Ddm945iWwdCnswzz8xkRpRGe0ySP1G8defuvfvbD3oPHz1+8rT/bOfEF5WTaioLU7gzAV4ZnaspajTqrHQKrDDqVFwctvzppXJeF/kPXJVqbuE81wstAYOL93+xHIQBznCpEOj34RqM6EfKLOBSiPprw2uGUDGvLa1ZV7N2KmsoKzUXTbh9ZTleccJUqqE/OQbmWnZmivNOtH4PgSOzOqOe4+hKHRpUOULQLgHrSTMM7FsaQkdNr8f7g2ScdEZvg3QDBmRjx7z/m2WFrGxIKQ14P0uTEuc1ONTSqKbHKq9KkBeh6CzAHKzy87rrpKFvgieji8KFkyPtvP8qarDer6wIke2s/E2udf6Pm1W4+DCvdV5WqHK5LrSoDMWCtkuimXZKolkFANLp0CuVS3AgMayyHUJ688u3wcm7cbo7TibvBwefN+PYJq/IazIkKdkjB+SIHJMpkdGLaD86jL7EL+NP8bf4aB0aRxvNc3LN4slfZ+nLZQ==</latexit>

r✓J(✓) = E⌧⇠⇡b

X

t

wtr✓ log ⇡✓(at | st)Q(st, at)<latexit sha1_base64="KhEiSxVwtIpv8YXTdqls8bYPykU=">AAACMHicbVBNSwMxEM36WetX1aOXYBEVtOyKqJeC6EGPCq0K3brMpqkNTXaXZFYoy/4kL/4UvSgo4tVfYVp70OpA4OW9mczLCxMpDLruizM2PjE5NV2YKc7OzS8slpaWL02casbrLJaxvg7BcCkiXkeBkl8nmoMKJb8Kuyd9/eqOayPiqIa9hDcV3EaiLRigpYLSaeYPHskszyOEnPodwOwi3zQBbkOAW7RKfZOqIMONKuY3NerfglJwY+87mFPdF/I8KJXdijso+hd4Q1AmwzoPSo9+K2apskuZBGManptgMwONgkmeF/3U8ARY19pqWBiB4qaZDbzmdN0yLdqOtT0R0gH7cyIDZUxPhbZTAXbMqNYn/9MaKbYPm5mIkhR5xL4XtVNJMab99GhLaM5Q9iwApoX1SlkHNDC0GRdtCN7ol/+Cy92Kt19xL/bKR8fDOApklayRTeKRA3JEzsg5qRNG7skTeSVvzoPz7Lw7H9+tY85wZoX8KufzC3guqc8=</latexit>

Q(st, at) =TX

t0=t

�t0�trt0

No interaction with the environment

<latexit sha1_base64="O9GU1XQec3JywHfU3a9R9qa4SAI=">AAACiHicbVFNaxsxENVumzZ1k9Zpjr2ImoJNgtktadIeCiGlEHJKoE4ClllmtXIsImkXabZgxP6W/qfe8m+qtZeSjw4Int68eTMa5ZWSDpPkLoqfPd948XLzVe/11vabt/2dd5eurC0XE16q0l7n4ISSRkxQohLXlRWgcyWu8tvvbf7ql7BOluYnLisx03Bj5FxywEBl/d/MQK4gY7gQCPRsuAYj+o0yDbjIc/+jyTxDqJmTmnq26umtKBrKKtkVNuHiap0hfejHVHlzTzaEDJmWBXUZjv55hZGECR6ULQD9RTMM2X0apKOml/UHyThZBX0K0g4MSBfnWf8PK0pe6+DIFTg3TZMKZx4sSq5E02O1ExXw29BzGqABLdzMrwZp6MfAFHRe2nAM0hV7v8KDdm6p86Bsl+Me51ryf7lpjfMvMy9NVaMwfN1oXiuKJW1/hRbSCo5qGQBwK8OslC/AAsfwd+0S0sdPfgouP43Tw3FycTA4PunWsUnekw9kSFJyRI7JKTknE8KjjWgvOog+x704iY/ir2tpHHU1u+RBxCd/Ae0oxTg=</latexit>

r✓J(✓) = E⌧⇠⇡✓

X

t

r✓ log ⇡✓(at | st)Q(st, at)

Page 15: Text Generation by Learning from Demonstrations

Reward function⇡b = phuman

<latexit sha1_base64="DNhW2leVOn3SfgGmHC/p1R4wpng=">AAACEHicbVBNS8NAEN3U7/pV9ehlsYieSiIVexEELx4r2Co0IWy2U7t0swm7E7GE/AQv/hUvHhTx6tGb/8ZtzcGvBwOP92Z2Z16USmHQdT+cyszs3PzC4lJ1eWV1bb22sdk1SaY5dHgiE30VMQNSKOigQAlXqQYWRxIuo9HpxL+8AW1Eoi5wnEIQs2slBoIztFJY28v96SO5hn5B/VSEET2mNA1zH+EW82EWM1UURViruw13CvqXeCWpkxLtsPbu9xOexaCQS2ZMz3NTDHKmUXAJRdXPDKSMj9g19CxVLAYT5NNdCrprlT4dJNqWQjpVv0/kLDZmHEe2M2Y4NL+9ifif18tw0ApyodIMQfGvjwaZpJjQSTq0LzRwlGNLGNfC7kr5kGnG0WZYtSF4v0/+S7oHDa/ZODxv1k9aZRyLZJvskH3ikSNyQs5Im3QIJ3fkgTyRZ+feeXRenNev1opTzmyRH3DePgGBZ52B</latexit>

wt ⇡ ⇡✓(at0 | st0)

<latexit sha1_base64="dDX+3/JvRxfk5mOl3T3ebM6n7Rc=">AAACIXicbZBNa9tAEIZX7kdc98tJjr0sNaXuxUgloT4aeskxgfgDLCNG67G9eCUtu6M0Ruiv5NK/kksOLSW3kD+TtexDa3dg4eF9Z3Zn31gracn3H7zas+cvXh7UXzVev3n77n3z8Ghgs9wI7ItMZWYUg0UlU+yTJIUjbRCSWOEwXn5f+8MrNFZm6SWtNE4SmKdyJgWQk6JmtwirS4pY5VjyHxHxELQ22TUPtYxCWiBBG6KCPpdhIqfcVviljJotv+NXxfch2EKLbes8at6H00zkCaYkFFg7DnxNkwIMSaGwbIS5RQ1iCXMcO0whQTspquVK/skpUz7LjDsp8Ur9e6KAxNpVErvOBGhhd721+D9vnNOsOylkqnPCVGwemuWKU8bXcfGpNChIrRyAMNLtysUCDAhyoTZcCMHul/dh8LUTnHROL05ave42jjr7wD6yNgvYN9ZjZ+yc9ZlgN+yW/WK/vZ/enffHu9+01rztzDH7p7zHJ46jpFE=</latexit>

model confidence pMLEbased reward(see paper)

use empirical distn

<latexit sha1_base64="nS4TH8lK3XJ0dGqG35q8SEnRLtw=">AAACl3icbVFtaxQxEM5ufann21n9In4JHsIdyLEr0kpBLFal+KkHXlu4HGGSzfVCk90lma0cy/4lf4zf/Ddm945iWwdCnswzz8xkRpRGe0ySP1G8defuvfvbD3oPHz1+8rT/bOfEF5WTaioLU7gzAV4ZnaspajTqrHQKrDDqVFwctvzppXJeF/kPXJVqbuE81wstAYOL93+xHIQBznCpEOj34RqM6EfKLOBSiPprw2uGUDGvLa1ZV7N2KmsoKzUXTbh9ZTleccJUqqE/OQbmWnZmivNOtH4PgSOzOqOe4+hKHRpUOULQLgHrSTMM7FsaQkdNr8f7g2ScdEZvg3QDBmRjx7z/m2WFrGxIKQ14P0uTEuc1ONTSqKbHKq9KkBeh6CzAHKzy87rrpKFvgieji8KFkyPtvP8qarDer6wIke2s/E2udf6Pm1W4+DCvdV5WqHK5LrSoDMWCtkuimXZKolkFANLp0CuVS3AgMayyHUJ688u3wcm7cbo7TibvBwefN+PYJq/IazIkKdkjB+SIHJMpkdGLaD86jL7EL+NP8bf4aB0aRxvNc3LN4slfZ+nLZQ==</latexit>

r✓J(✓) = E⌧⇠⇡b

X

t

wtr✓ log ⇡✓(at | st)Q(st, at)<latexit sha1_base64="KhEiSxVwtIpv8YXTdqls8bYPykU=">AAACMHicbVBNSwMxEM36WetX1aOXYBEVtOyKqJeC6EGPCq0K3brMpqkNTXaXZFYoy/4kL/4UvSgo4tVfYVp70OpA4OW9mczLCxMpDLruizM2PjE5NV2YKc7OzS8slpaWL02casbrLJaxvg7BcCkiXkeBkl8nmoMKJb8Kuyd9/eqOayPiqIa9hDcV3EaiLRigpYLSaeYPHskszyOEnPodwOwi3zQBbkOAW7RKfZOqIMONKuY3NerfglJwY+87mFPdF/I8KJXdijso+hd4Q1AmwzoPSo9+K2apskuZBGManptgMwONgkmeF/3U8ARY19pqWBiB4qaZDbzmdN0yLdqOtT0R0gH7cyIDZUxPhbZTAXbMqNYn/9MaKbYPm5mIkhR5xL4XtVNJMab99GhLaM5Q9iwApoX1SlkHNDC0GRdtCN7ol/+Cy92Kt19xL/bKR8fDOApklayRTeKRA3JEzsg5qRNG7skTeSVvzoPz7Lw7H9+tY85wZoX8KufzC3guqc8=</latexit>

Q(st, at) =TX

t0=t

�t0�trt0

6/2/21 14

Dirac-delta Q Ideal Q

The horse was in the barn sleeping

1 larger

The horse raced past the barn looked at me

1 smaller

Page 16: Text Generation by Learning from Demonstrations

Reward function

• Finding a good r is difficult; but now we only focus on demonstrations (gold data)

• (1) Use dirac-delta function: Q is 1 for all training data, 0 for other data• (2) Use estimated phuman: find p that min KL(πb || p)

• The p is pMLE!Dirac-delta Q Ideal Q

The horse was in the barn sleeping

1 larger

The horse raced past the barn looked at me

1 smaller

⇡b = phuman

<latexit sha1_base64="DNhW2leVOn3SfgGmHC/p1R4wpng=">AAACEHicbVBNS8NAEN3U7/pV9ehlsYieSiIVexEELx4r2Co0IWy2U7t0swm7E7GE/AQv/hUvHhTx6tGb/8ZtzcGvBwOP92Z2Z16USmHQdT+cyszs3PzC4lJ1eWV1bb22sdk1SaY5dHgiE30VMQNSKOigQAlXqQYWRxIuo9HpxL+8AW1Eoi5wnEIQs2slBoIztFJY28v96SO5hn5B/VSEET2mNA1zH+EW82EWM1UURViruw13CvqXeCWpkxLtsPbu9xOexaCQS2ZMz3NTDHKmUXAJRdXPDKSMj9g19CxVLAYT5NNdCrprlT4dJNqWQjpVv0/kLDZmHEe2M2Y4NL+9ifif18tw0ApyodIMQfGvjwaZpJjQSTq0LzRwlGNLGNfC7kr5kGnG0WZYtSF4v0/+S7oHDa/ZODxv1k9aZRyLZJvskH3ikSNyQs5Im3QIJ3fkgTyRZ+feeXRenNev1opTzmyRH3DePgGBZ52B</latexit>

wt ⇡ ⇡✓(at0 | st0)

<latexit sha1_base64="dDX+3/JvRxfk5mOl3T3ebM6n7Rc=">AAACIXicbZBNa9tAEIZX7kdc98tJjr0sNaXuxUgloT4aeskxgfgDLCNG67G9eCUtu6M0Ruiv5NK/kksOLSW3kD+TtexDa3dg4eF9Z3Zn31gracn3H7zas+cvXh7UXzVev3n77n3z8Ghgs9wI7ItMZWYUg0UlU+yTJIUjbRCSWOEwXn5f+8MrNFZm6SWtNE4SmKdyJgWQk6JmtwirS4pY5VjyHxHxELQ22TUPtYxCWiBBG6KCPpdhIqfcVviljJotv+NXxfch2EKLbes8at6H00zkCaYkFFg7DnxNkwIMSaGwbIS5RQ1iCXMcO0whQTspquVK/skpUz7LjDsp8Ur9e6KAxNpVErvOBGhhd721+D9vnNOsOylkqnPCVGwemuWKU8bXcfGpNChIrRyAMNLtysUCDAhyoTZcCMHul/dh8LUTnHROL05ave42jjr7wD6yNgvYN9ZjZ+yc9ZlgN+yW/WK/vZ/enffHu9+01rztzDH7p7zHJ46jpFE=</latexit>

model confidence pMLEbased reward(see paper)

use empirical distn

<latexit sha1_base64="nS4TH8lK3XJ0dGqG35q8SEnRLtw=">AAACl3icbVFtaxQxEM5ufann21n9In4JHsIdyLEr0kpBLFal+KkHXlu4HGGSzfVCk90lma0cy/4lf4zf/Ddm945iWwdCnswzz8xkRpRGe0ySP1G8defuvfvbD3oPHz1+8rT/bOfEF5WTaioLU7gzAV4ZnaspajTqrHQKrDDqVFwctvzppXJeF/kPXJVqbuE81wstAYOL93+xHIQBznCpEOj34RqM6EfKLOBSiPprw2uGUDGvLa1ZV7N2KmsoKzUXTbh9ZTleccJUqqE/OQbmWnZmivNOtH4PgSOzOqOe4+hKHRpUOULQLgHrSTMM7FsaQkdNr8f7g2ScdEZvg3QDBmRjx7z/m2WFrGxIKQ14P0uTEuc1ONTSqKbHKq9KkBeh6CzAHKzy87rrpKFvgieji8KFkyPtvP8qarDer6wIke2s/E2udf6Pm1W4+DCvdV5WqHK5LrSoDMWCtkuimXZKolkFANLp0CuVS3AgMayyHUJ688u3wcm7cbo7TibvBwefN+PYJq/IazIkKdkjB+SIHJMpkdGLaD86jL7EL+NP8bf4aB0aRxvNc3LN4slfZ+nLZQ==</latexit>

r✓J(✓) = E⌧⇠⇡b

X

t

wtr✓ log ⇡✓(at | st)Q(st, at)<latexit sha1_base64="KhEiSxVwtIpv8YXTdqls8bYPykU=">AAACMHicbVBNSwMxEM36WetX1aOXYBEVtOyKqJeC6EGPCq0K3brMpqkNTXaXZFYoy/4kL/4UvSgo4tVfYVp70OpA4OW9mczLCxMpDLruizM2PjE5NV2YKc7OzS8slpaWL02casbrLJaxvg7BcCkiXkeBkl8nmoMKJb8Kuyd9/eqOayPiqIa9hDcV3EaiLRigpYLSaeYPHskszyOEnPodwOwi3zQBbkOAW7RKfZOqIMONKuY3NerfglJwY+87mFPdF/I8KJXdijso+hd4Q1AmwzoPSo9+K2apskuZBGManptgMwONgkmeF/3U8ARY19pqWBiB4qaZDbzmdN0yLdqOtT0R0gH7cyIDZUxPhbZTAXbMqNYn/9MaKbYPm5mIkhR5xL4XtVNJMab99GhLaM5Q9iwApoX1SlkHNDC0GRdtCN7ol/+Cy92Kt19xL/bKR8fDOApklayRTeKRA3JEzsg5qRNG7skTeSVvzoPz7Lw7H9+tY85wZoX8KufzC3guqc8=</latexit>

Q(st, at) =TX

t0=t

�t0�trt0

6/2/21 15

Page 17: Text Generation by Learning from Demonstrations

Reward function

• (2) Use estimated phuman: find p that min KL(πb || p)• The p is pMLE! But… I just said that pMLE is not a good scoring function in general. However,

it’s a good scoring function at scoring demonstrations.• Two choices

• Product of phuman: a sequence is good if all words are good

• Sum of phuman: a sequence is good if most words are good

Q(st, at) =TX

t0=t

log phuman(at|st)

<latexit sha1_base64="aMRB4FyyMftIKoI+oYIVuI33/1M=">AAACMHicbVBdSxtBFJ219aPxK7WPvgwNYgQJu6LoSyDgQ/uoYBIhG5e7k0kyZGZ2mbkrDev+pL74U/RFoVL62l/RycdDjR4YOJxzD3fuiVMpLPr+s7f04ePyyurap9L6xubWdvnzTssmmWG8yRKZmOsYLJdC8yYKlPw6NRxULHk7Hp1P/PYtN1Yk+grHKe8qGGjRFwzQSVH5WzgEzC+Lqo3wkEKEB7ROQ5upKMf9OhY3VzSUyYBOx9IiykPkPzAfZgp0UdCqS9y56EFUrvg1fwr6lgRzUiFzXETlh7CXsExxjUyCtZ3AT7Gbg0HBJC9KYWZ5CmwEA95xVIPitptPDy7onlN6tJ8Y9zTSqfp/Igdl7VjFblIBDu2iNxHf8zoZ9s+6udBphlyz2aJ+JikmdNIe7QnDGcqxI8CMcH+lbAgGGLqOS66EYPHkt6R1VAuOayeXx5XG2byONbJLvpIqCcgpaZDv5II0CSM/ySP5RV68e+/J++39mY0uefPMF/IK3t9/MaKpug==</latexit>

Q(st, at) =TX

t0=t

phuman(at|st)

<latexit sha1_base64="mINhyn4VMNjqitkWBGu3PAefpj8=">AAACK3icbVDLSgMxFM34tr6qLt0Ei1hByowouhFENy4VrAqdOtxJUxtMMkNyRyzj/I8bf8WFLnzg1v8wrV34OhA4nHMuN/fEqRQWff/VGxoeGR0bn5gsTU3PzM6V5xdObZIZxusskYk5j8FyKTSvo0DJz1PDQcWSn8VXBz3/7JobKxJ9gt2UNxVcatEWDNBJUXk/7ADmx0XVRrhOIcI1uktDm6kox9VdLC5OaD+RFlEeIr/BvJMp0EVBqy5866bWonLFr/l90L8kGJAKGeAoKj+GrYRlimtkEqxtBH6KzRwMCiZ5UQozy1NgV3DJG45qUNw28/6tBV1xSou2E+OeRtpXv0/koKztqtglFWDH/vZ64n9eI8P2TjMXOs2Qa/a1qJ1JigntFUdbwnCGsusIMCPcXynrgAGGrt6SKyH4ffJfcrpRCzZrW8eblb2dQR0TZIkskyoJyDbZI4fkiNQJI3fkgTyTF+/ee/LevPev6JA3mFkkP+B9fAJb/KfK</latexit>

⇡b = phuman

<latexit sha1_base64="DNhW2leVOn3SfgGmHC/p1R4wpng=">AAACEHicbVBNS8NAEN3U7/pV9ehlsYieSiIVexEELx4r2Co0IWy2U7t0swm7E7GE/AQv/hUvHhTx6tGb/8ZtzcGvBwOP92Z2Z16USmHQdT+cyszs3PzC4lJ1eWV1bb22sdk1SaY5dHgiE30VMQNSKOigQAlXqQYWRxIuo9HpxL+8AW1Eoi5wnEIQs2slBoIztFJY28v96SO5hn5B/VSEET2mNA1zH+EW82EWM1UURViruw13CvqXeCWpkxLtsPbu9xOexaCQS2ZMz3NTDHKmUXAJRdXPDKSMj9g19CxVLAYT5NNdCrprlT4dJNqWQjpVv0/kLDZmHEe2M2Y4NL+9ifif18tw0ApyodIMQfGvjwaZpJjQSTq0LzRwlGNLGNfC7kr5kGnG0WZYtSF4v0/+S7oHDa/ZODxv1k9aZRyLZJvskH3ikSNyQs5Im3QIJ3fkgTyRZ+feeXRenNev1opTzmyRH3DePgGBZ52B</latexit>

wt ⇡ ⇡✓(at0 | st0)

<latexit sha1_base64="dDX+3/JvRxfk5mOl3T3ebM6n7Rc=">AAACIXicbZBNa9tAEIZX7kdc98tJjr0sNaXuxUgloT4aeskxgfgDLCNG67G9eCUtu6M0Ruiv5NK/kksOLSW3kD+TtexDa3dg4eF9Z3Zn31gracn3H7zas+cvXh7UXzVev3n77n3z8Ghgs9wI7ItMZWYUg0UlU+yTJIUjbRCSWOEwXn5f+8MrNFZm6SWtNE4SmKdyJgWQk6JmtwirS4pY5VjyHxHxELQ22TUPtYxCWiBBG6KCPpdhIqfcVviljJotv+NXxfch2EKLbes8at6H00zkCaYkFFg7DnxNkwIMSaGwbIS5RQ1iCXMcO0whQTspquVK/skpUz7LjDsp8Ur9e6KAxNpVErvOBGhhd721+D9vnNOsOylkqnPCVGwemuWKU8bXcfGpNChIrRyAMNLtysUCDAhyoTZcCMHul/dh8LUTnHROL05ave42jjr7wD6yNgvYN9ZjZ+yc9ZlgN+yW/WK/vZ/enffHu9+01rztzDH7p7zHJ46jpFE=</latexit>

model confidence pMLEbased reward(see paper)

use empirical distn

<latexit sha1_base64="nS4TH8lK3XJ0dGqG35q8SEnRLtw=">AAACl3icbVFtaxQxEM5ufann21n9In4JHsIdyLEr0kpBLFal+KkHXlu4HGGSzfVCk90lma0cy/4lf4zf/Ddm945iWwdCnswzz8xkRpRGe0ySP1G8defuvfvbD3oPHz1+8rT/bOfEF5WTaioLU7gzAV4ZnaspajTqrHQKrDDqVFwctvzppXJeF/kPXJVqbuE81wstAYOL93+xHIQBznCpEOj34RqM6EfKLOBSiPprw2uGUDGvLa1ZV7N2KmsoKzUXTbh9ZTleccJUqqE/OQbmWnZmivNOtH4PgSOzOqOe4+hKHRpUOULQLgHrSTMM7FsaQkdNr8f7g2ScdEZvg3QDBmRjx7z/m2WFrGxIKQ14P0uTEuc1ONTSqKbHKq9KkBeh6CzAHKzy87rrpKFvgieji8KFkyPtvP8qarDer6wIke2s/E2udf6Pm1W4+DCvdV5WqHK5LrSoDMWCtkuimXZKolkFANLp0CuVS3AgMayyHUJ688u3wcm7cbo7TibvBwefN+PYJq/IazIkKdkjB+SIHJMpkdGLaD86jL7EL+NP8bf4aB0aRxvNc3LN4slfZ+nLZQ==</latexit>

r✓J(✓) = E⌧⇠⇡b

X

t

wtr✓ log ⇡✓(at | st)Q(st, at)<latexit sha1_base64="KhEiSxVwtIpv8YXTdqls8bYPykU=">AAACMHicbVBNSwMxEM36WetX1aOXYBEVtOyKqJeC6EGPCq0K3brMpqkNTXaXZFYoy/4kL/4UvSgo4tVfYVp70OpA4OW9mczLCxMpDLruizM2PjE5NV2YKc7OzS8slpaWL02casbrLJaxvg7BcCkiXkeBkl8nmoMKJb8Kuyd9/eqOayPiqIa9hDcV3EaiLRigpYLSaeYPHskszyOEnPodwOwi3zQBbkOAW7RKfZOqIMONKuY3NerfglJwY+87mFPdF/I8KJXdijso+hd4Q1AmwzoPSo9+K2apskuZBGManptgMwONgkmeF/3U8ARY19pqWBiB4qaZDbzmdN0yLdqOtT0R0gH7cyIDZUxPhbZTAXbMqNYn/9MaKbYPm5mIkhR5xL4XtVNJMab99GhLaM5Q9iwApoX1SlkHNDC0GRdtCN7ol/+Cy92Kt19xL/bKR8fDOApklayRTeKRA3JEzsg5qRNG7skTeSVvzoPz7Lw7H9+tY85wZoX8KufzC3guqc8=</latexit>

Q(st, at) =TX

t0=t

�t0�trt0

6/2/21 16

Page 18: Text Generation by Learning from Demonstrations

GOLD: generation by offline+off-policy learning from demonstrations

Intuition

• Upweight more ”confident” examples; focus more on successful data (closer to test-time distribution)

• Intuitively reduce exposure bias

Page 19: Text Generation by Learning from Demonstrations

The College of the University of Chicago grants Bachelor of Arts and Bachelor of Science degrees in 50 academic majors and 28 minors

How many academic minors does the university grant in total ?

Actor Vince Vaughn and pop star Lady Gaga mustered up the courage to dive into freezing cold water for the Special Olympics charity the day after cities across the country broke cold records for last month. The Chicago-native Wedding Crashers star was the celebrity guest of honor at his hometown's annual Polar Plunge, in which brave swimmers raise money for athletes with special needs by plunging into Lake Michigan. Vaughn, wearing a Blackhawks hockey jersey and jeans, led the way into a patch of frigid, slushy 33 degree water near Lincoln Park that had been cleared of snow, which has accumulated in the city and much of the US. Scroll down for videos . Icy dip: Actor Vince Vaughn was the Special Olympic Polar Plunge celebrity guest on Sunday, when the water was right around the freezing point . Hometown hero: Vaughn, a Chicago native, wore a jersey from the city's Blackhawks hockey team and jeans to take the icy dip and warmed up with a towel after the trying ordeal . Taking the plunge: Lady Gaga joined her new fiance, Chicago Fire actor Taylor Kinney, at the event with his co-stars on the show . Poker face: Celebrities such as Vince Vaughn and Lady Gaga found it difficult not to grimace in the freezing patch of Lake Michigan . The star of upcoming movie Unfinished Business first went in up to his knees before lowering himself into the water backwards. Gaga attended the event with shirtless fiance Taylor Kinney, who was wearing a baseball cap and shorts, and his costars from the show Chicago Fire. The singer rode piggyback-style with Kinney into the water before they fully immersed themselves into the lake. The couple then played with each other in the frozen surf and retreated to the warmth of the shore. 'Taylor gave me his hat I thought my wig was gonna freeze into and become one with the lake,' said the songstress, who will take different sort of plunge with her beau after becoming engaged last month.. Lake Michigan was estimated to be right at freezing when the event began at 10.30am Central Time. Staying warm with love: Lady Gaga attended the charity event with her fiance after getting engaged to the television actor last month . Before and after: Gaga and Kinney looked comfortable while posing for photos (left) before their plunge left them in varied states of bedraggled . California love: A shirtless Kinney wore a baseball cap reminiscent of the warm-weather West Coast when he carried his future wife into the waters of Lake Michigan . Splashing in the snow: Gaga grits her teeth as she finds the energy to play in the lake despite a winter of record-breaking cold weather . Paparazzi: Gaga, whose real name is Stefani Germanotta, lost the glamour she often displays on stage and the red carpet when emerging from Lake Michigan . Organizers say that 5,000 people were expected to attend the plunge, and that they had already made more than $1million during the morning, according to ABC7. The plunge came at the end of a February that had seen cold records shattered across much of the US, which has seen snow in almost every state this winter. Chicago, which has seen weeks of freezing temperatures in the past month, tied a 140-year-old record for its coldest February with an average temperature of 14.6. Temperatures at O'Hare Airport hit minus 10 degrees Saturday morning, according to the Chicago Tribune. Mission accomplished: Special Olympics Chicago's annual Polar Plunge event regularly raises more than $1million for special needs athletes . The Fame: The event draws thousands of participants and spectators to the water each year along with celebrities such as Gaga. Above, the singer poses with a fan on the beach . Fire and ice: The Chicago Fire Department had to clear snow and …

NQG (using SQuAD)

CNN/DailyMail(extractivesummarization)

Chicago-native Vaughn was celebrity guest at his hometown's annual Polar Plunge for Special Olympics . Newly-engaged Lady Gaga attended with Chicago Fire fiance Taylor Kinney and his television co-stars . City tied 140-year record for coldest February with 14.6 degree average and more winter weather expected across US . Winter Storm Sparta brings more snow and ice to East Coast after causing tragic weather-related deaths in Midwest . Boston can possibly eclipse its record for most snow in one winter if it receives 5.7 more inches .

Input Output

The country's consumer watchdog has taken Apple to court for false advertising because the tablet computer does not work on Australia's 4G network. Apple's lawyers said they were willing to publish a clarification. However the company does not accept that it misled customers. The Australian Competition and Consumer Commission (ACCC) said on Tuesday: "Apple's recent promotion of the new 'iPad with wi-fi + 4G' is misleading because it represents to Australian consumers that the product can, with a sim card, connect to a 4G mobile data network in Australia, when this is not the case." The watchdog then lodged a complaint at the Federal Court in Melbourne. At a preliminary hearing, Apple lawyer Paul Anastassiou said Apple had never claimed the device would work fully on the current 4G network operated by Telstra. Apple says the new iPad works on what is globally accepted to be a 4G network. The matter will go to a full trial on 2 May. The Apple iPad's third version went on sale earlier this month, with Australia the first country where it was available. Shoppers lined up by the hundreds at Apple stores on opening day and the company said it had been its strongest iPad launch to date. The ACCC said it was seeking an injunction on sales as well as a financial penalty against Apple, corrective advertising and refunds to consumers. On its website, Apple does state that 4G LTE is only supported on selected networks in the US and Canada.

XSum(abstractivesummarization)

US technology firm Apple has offered to refund Australian customers who felt misled about the 4G capabilities of the new iPad.

IWLST14 De-En(machine translation)

In the American midwest, farmers used to load grain onto barges and send it upriver to the chicagomarket.

Im amerikanischen mittelwesten ludenbauern getreide auf kähne und sandten es den fluss hoch auf den markt nach chicago .

6/2/21 18

Page 20: Text Generation by Learning from Demonstrations

Our hypotheses to GOLD 1. GOLD improves generation quality

• Automatic results

• Human evals: pairwise comparison of GOLD-s vs. MLE generations

2. GOLD improves precision at the cost of recall

6/2/21 19

NQG (BART)(BLEU)

CNN/DM (BART)(R-2)

XSum (BART)(R-2)

IWSLT14 De-En(Transformer) (BLEU)

MLE 20.68 21.28 22.08 34.64

GOLD-p (product as Q) 21.42 22.01 22.26 35.33

GOLD-s (sum as Q) 21.98 22.09 22.58 35.45

High perplexity != low quality

0

5

10

15

NQG (BART) CNN/DM (BART)

MLE GOLD

Page 21: Text Generation by Learning from Demonstrations

Our hypotheses to GOLD 1. GOLD improves generation quality

• Automatic results

• Human evals: pairwise comparison of GOLD-s vs. MLE generations

2. GOLD improves precision at the cost of recall• High precision: larger BLEU/ROUGE, better human evals• Low recall: very large perplexity w.r.t. gold standards

3. GOLD alleviates exposure bias

6/2/21 20

NQG (BART)(BLEU)

CNN/DM (BART)(R-2)

XSum (BART)(R-2)

IWSLT14 De-En(Transformer) (BLEU)

MLE 20.68 21.28 22.08 34.64

GOLD-p (product as Q) 21.42 22.01 22.26 35.33

GOLD-s (sum as Q) 21.98 22.09 22.58 35.45

Page 22: Text Generation by Learning from Demonstrations

Takeaways

1. MLE encourages high recall

2. GOLD (generation by off-policy and offline learning from demonstration) is easy to implement and optimize• Essentially weighted MLE

3. GOLD encourages high-precision generation• Instead of distribution matching

6/2/21 21