![Page 1: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/1.jpg)
CS378:NaturalLanguageProcessingLecture1:Introduc:on
GregDurre=
introvideo
actualcourse
![Page 2: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/2.jpg)
Administrivia
‣ Coursewebsite(includingsyllabus):h=p://www.cs.utexas.edu/~gdurre=/courses/fa2020/cs378.shtml
‣ Piazza:linkonthecoursewebsite
‣ Myofficehours:seecoursewebsite
‣ TA:TanyaGoyal;Proctor:ShivangSingh.SeewebsiteforOHs
‣ Lecture:TuesdaysandThursdays9:30am-10:45am
‣ Allofficehoursstartnextweek,butIwillstayarounda\erthisclassifyouhaveques:ons
![Page 3: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/3.jpg)
CourseRequirements
‣ CS429
‣ Recommended:CS331,familiaritywithprobabilityandlinearalgebra,programmingexperienceinPython
‣ Helpful:ExposuretoAIandmachinelearning(e.g.,CS342/343/363)
![Page 4: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/4.jpg)
Enrollment
‣ Assignment0isoutnow(op:onal):
‣ Ifthisseemslikeit’llbechallengingforyou,comeandtalktome(thisissmaller-scalethantheotherassignments,whicharesmaller-scalethanthefinalproject)
‣ Ifyouarepast25onthewaitlist,youhavealowchanceofgedngintotheclass,butwehavetoseehowitprogresses
![Page 5: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/5.jpg)
FormatandAccessibility
‣ Requiredequipment:devicetomakeZoomcallswith,somewaytodohomework
‣ LabmachinesavailableviaSSH
‣ AGPUisnotrequiredtocompletetheassignments!HavingaGPUorGCPcreditscouldbehelpfulifyouwanttopursueanindependentproject
‣ Lectureswillbuildin:mefordiscussion,in-classexercises,andques:ons.Addi:onalmaterialisavailableasvideostowatcheitherbeforeora\erlectures
‣ We’lldoplentyofdiscussiongroupsinclass.Piazzaisalsoavailabletofindteammates
![Page 6: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/6.jpg)
What’sthegoalofNLP?‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext
‣ Example:dialoguesystemsSiri,what’syourfavoritekindof
movie?
Ilikesuperheromovies!
What’scomeoutrecently?
TheAvengers
![Page 7: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/7.jpg)
MachineTransla:on
中共中央政治局7⽉30⽇召开会议,会议分析研究当前经济形势,部署下半年经济⼯作。
ThePoli:calBureauoftheCPCCentralCommi=eeheldamee:ngonJuly30toanalyzeandstudythecurrenteconomicsitua:onandplaneconomicworkinthesecondhalfoftheyear.
People’sDaily,August10,2020
ThePoli:calBureauoftheCPCCentralCommi=ee July30 holdamee:ng
Translate
![Page 8: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/8.jpg)
Ques:onAnsweringWhenwasAbrahamLincolnborn?
February12,1809
Name BirthdayLincoln,Abraham 2/12/1809
Washington,George 2/22/1732Adams,John 10/30/1735
Theparkhasatotaloffivevisitorcenters
five
HowmanyvisitorscentersarethereinRockyMountainNa:onalPark?
maptoBirthdayfield
![Page 9: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/9.jpg)
NLPAnalysisPipeline
Syntac:cparses
Coreferenceresolu:on
En:tydisambigua:on
Discourseanalysis
Summarize
Extractinforma:on
Answerques:ons
Iden:fysen:ment
‣ NLPisaboutbuildingthesepieces!(largelyusingsta:s:calapproaches)
Translate
TextAnalysis Applica=onsText Annota=ons
![Page 10: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/10.jpg)
Howdowerepresentlanguage?Labels
Sequences/tags
Trees
Text
themoviewasgood +Beyoncéhadoneofthebestvideosofall6me subjec=ve
TomCruisestarsinthenewMissionImpossiblefilmPERSON WORK_OF_ART
Ieatcakewithicing
PPNP
S
NPVP
VBZ NNflightstoMiami
λx.flight(x)∧dest(x)=Miami
![Page 11: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/11.jpg)
Howdoweusetheserepresenta:ons?
Labels
Sequences
Trees
TextAnalysisText
‣Mainques:on:Whatrepresenta:onsdoweneedforlanguage?Whatdowewanttoknowaboutit?Whatambigui:esdoweneedtoresolve?
…
Applica=ons
Learntree-to-treemachinetransla:onmodels
end-to-endmodels
…
![Page 12: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/12.jpg)
Whyislanguagehard?(andhowcanwehandlethat?)
![Page 13: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/13.jpg)
LanguageisAmbiguous!
‣ HectorLevesque(2011):“Winogradschemachallenge”(nameda\erTerryWinograd,thecreatorofSHRDLU)
Thecitycouncilrefusedthedemonstratorsapermitbecausethey______violence
‣ >5datasetsinthelasttwoyearsexaminingthisproblemandcommonsensereasoning
‣ Referen:alambiguity
Thecitycouncilrefusedthedemonstratorsapermitbecausetheyadvocatedviolence
Thecitycouncilrefusedthedemonstratorsapermitbecausetheyfearedviolence
![Page 14: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/14.jpg)
LanguageisAmbiguous!
examplecredit:DanKlein
‣ Syntac:candseman:cambigui:es:parsingneededtoresolvethese,butneedcontexttofigureoutwhichparseiscorrect
TeacherStrikesIdleKids
BanonNudeDancingonGovernor’sDesk
IraqiHeadSeeksArms
![Page 15: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/15.jpg)
LanguageisReallyAmbiguous!
‣ Therearen’tjustoneortwopossibili:eswhichareresolvedpragma:cally
‣ Combinatoriallymanypossibili:es,manyyouwon’tevenregisterasambigui:es,butsystemss:llhavetoresolvethem
Itisreallyniceout
ilfaitvraimentbeau It’sreallyniceTheweatherisbeau:fulItisreallybeau:fuloutsideHemakestrulybeau:fulItfactactuallyhandsome
![Page 16: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/16.jpg)
Whattechniquesdoweuse?(tocombinedata,knowledge,linguis:cs,etc.)
![Page 17: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/17.jpg)
“AIwinter”rule-based,expertsystems
Unsup:topicmodels,grammarinduc:on
Collinsvs.Charniakparsers
Abriefhistoryof(modern)NLP
1980 1990 2000 2010 2020
earlieststatMTworkatIBM
Penntreebank
NP VP
S
Ratnaparkhitagger
NNP VBZ
Sup:SVMs,CRFs,NER,Sen:ment
Semi-sup,structuredpredic:on
Neural
![Page 18: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/18.jpg)
Wherearewe?
‣ NLPconsistsof:analyzingandbuildingrepresenta:onsfortext,solvingproblemsinvolvingtext
‣ Theseproblemsarehardbecauselanguageisambiguous,requiresdrawingondata,knowledge,andlinguis:cstosolve
‣ Knowingwhichtechniquesuserequiresunderstandingdatasetsize,problemcomplexity,andalotoftricks!
‣ NLPencompassesallofthesethings
![Page 19: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/19.jpg)
NLPvs.Computa:onalLinguis:cs
‣ NLP:buildsystemsthatdealwithlanguagedata
‣ CL:usecomputa:onaltoolstostudylanguage
Hamiltonetal.(2016)
![Page 20: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/20.jpg)
OutlineoftheCourse‣ Classifica:on:linearandneural,wordrepresenta:ons(3.5weeks)‣ Textanalysis:taggingandparsing(3weeks)<=takesustothemidterm
‣ Genera:on,applica:ons:languagemodeling,machinetransla:on(3weeks)
‣ Ques:onanswering,pre-training(2weeks)
‣ CoverfundamentaltechniquesusedinNLP
‣ CovermodernNLPproblemsencounteredintheliterature:whataretheac:veresearchtopicsin2020?
‣ Understandhowtolookatlanguagedataandapproachlinguis:cphenomena
‣ Goals:
‣ Applica:onsandmiscellaneous(2.5weeks)
![Page 21: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/21.jpg)
OutlineoftheCourse‣ Throughoutthecourse:ethicsandfairness‣ BroadertopicinMLthanjustNLP
‣ Howcanwemakesureoursystemsbenefitsociety,andeveryoneinit?
‣ Partsoflecturesdevotedtotopicsinethics,comprehensivediscussiononthelastclassday
‣ Nov3:op:onallecture
‣ Balancealgorithms,linguis:cs,data,ethics
![Page 22: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/22.jpg)
Coursework
‣ Fiveassignments,worth45%ofgrade(A1-4:10%,A5:5%)
‣ Mixofwri:ngandimplementa:on;
‣ Assignment0isoutnow,op:onaldiagnos:c
‣ ~2weeksperassignmentexceptforA5
‣ 5“slipdays”throughoutthesemestertoturninassignments24hourslate.Otherwise,youlose15%creditperdaytheassignmentislate
‣ SubmissiononGradescope
Theseassignmentsrequireunderstandingtheconcepts,wri:ngperformantcode,andthinkingabouthowtodebugcomplexsystems.Theyarechallenging;startearly!
Thecoursestaffarenotheretodebugyourcode!Wewillhelpyouunderstandtheconceptsfromlectureandcomeupwithdebuggingstrategies
![Page 23: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/23.jpg)
Coursework
‣ Finalproject(25%ofgrade)‣ Groupsof1or2‣ Standardproject:neuralnetworkmodelsforques:onanswering
‣ Independentprojectsarepossible:thesemustbeproposedearlier(togetyouthinkingearly)andwillbeheldtoahighstandard!
‣ Midterm(20%ofgrade),take-homeOctober14-16
‣ Similartowri=enhomeworkproblems
‣ In-classproblems(10%ofthegrade)
‣ ThesewillbedoneviaUTInstapoll.Youdon’thavetocometoclasstodothem
‣ Dropthelowest5
![Page 24: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/24.jpg)
AcademicHonesty
‣ Youmayworkingroups,butyourfinalwriteupandcodemustbeyourown
‣ Don’tsharecodewithothers!
![Page 25: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/25.jpg)
A climate conducive to learning and creating knowledge is the right of every person in our community. Bias, harassment and discrimination of any sort have no place here. If you notice an incident that causes concern, please contact the Campus Climate Response Team: diversity.utexas.edu/ccrt
The College of Natural Sciences is steadfastly committed to enriching and transformative educational and research experiences for every member of our community. Find more resources to support a diverse, equitable and welcoming community within Texas Science and share your experiences at cns.utexas.edu/diversity
Conduct
![Page 26: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience](https://reader033.vdocuments.mx/reader033/viewer/2022052614/6057afefbfe635320d55bda1/html5/thumbnails/26.jpg)
Survey