cs122 –lecture 1 winter term,...

35
CS122 – Lecture 1 Winter Term, 2017-2018

Upload: others

Post on 29-Jun-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

CS122– Lecture1WinterTerm,2017-2018

Page 2: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

Welcome!� Howdorelationaldatabaseswork?

� Provideahands-on opportunitytoexplorethistopic� Thisisaprojectcourse:

� Asequenceofprogrammingassignmentsfocusingonvariousareasofdatabasesystemimplementation

� Alsosomereadingandproblem-solving,butnotalot� Nomidterm,nofinalexamJ

2

Page 3: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

Prerequisites� Youshouldbefamiliarwith:

� TheSQLquerylanguage(DDLandDML)� Therelationalmodelandrelationalalgebra� IwillpostalinktotheCS121slides

� Also,familiaritywith:� Algorithms(e.g.sorting,searching),trees,graphs,etc.� Basictimeandspacecomplexity

� AllprogrammingisinJavaSE8+

3

Page 4: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

SoftwareDevelopment� Secondaryfocusofcourseisonsoftwaredevelopment

� Automatedbuildsystem:Ant� Documentationgeneration:Javadoc� Automatedtesting:TestNG (JaCoCo fortestcoverage)� Codelinting /staticanalysis:FindBugs� Versioncontrol:Git� Integrateddevelopmentenvironments(e.g.IntelliJ IDEA,Eclipse,etc.)

� Youwillworkwithallofthesetoolsduringtheterm

4

Page 5: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

SoftwareDevelopment(2)� Databasesmustbecorrect

� Aportionofeachassignment’sgradewilldependonthecorrectnessofyourwork

� Allworkmustbecleanandwell-documented� FollowstandardJavacodingconventions� UseJavadoctodocumentthecodeyouwrite� Clearlyandconciselycommentyourcodeaswell� Commit-logsmustalsoincludecleardocumentationofyourchanges

� Failuretodothesethingswillresultinpointdeductions

5

Page 6: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

TeamsandCollaboration� Youmustworkinteamsof2-3people

� Needtodecideteamcompositionrelativelyquickly� FirstassignmenthopefullyoutbyWednesdaythisweek

� Eachteam’ssubmissionmustbeentirelyitsownwork� Feelfreetohelpdebugeachother’sprograms,gettoolsorthedebuggerworking,etc.� Followthe“50-footrule”forhelpingotherteamsdebug:helpotherswithyourbrain,notyourcode.

6

Page 7: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

CourseWebsite� Lectures,assignments,andothercoursedetailsareavailableontheCaltechMoodle� https://courses.caltech.edu/course/view.php?id=2900� Enrollmentkey:writeahead� PleaseenrollASAP! AllcourseannouncementsaremadethroughMoodle.

� Coursepolicies,collaborationpolicy,contactinfo,etc.willallbepostedthere– makesuretoreviewitall!

7

Page 8: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

LatePolicy� Latepenaltieswillbeappliedtoassignments:

� Upto24hourslate:-10%� Upto48hourslate:-30%� Upto72hourslate:-60%� After72hours:don’tbotherL

� Extensionsareavailableforextenuatingcircumstances� Healthcenternotes,extensionsfromDean’soffice,ortalktomeifthesearen’tappropriate

� Teamsalsohave4tokenstousethroughouttheterm� Eachtokenwortha24-hourextension,“noquestionsasked”� Stateinyoursubmissionhowmanytokensyouareusing

8

Page 9: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

CourseOverview� Thisclassfocusesontheimplementationofrelationaldatabases

� Tables(relations)populatedwithrows(tuples),manipulatedwithSQLqueries

� Lotsofotherkindsofdatabasestoo,but(sadly)wewon’tdiscussthemthisterm� CS123isagreatvenuetoexploreotherdatamodels,e.g.XMLdatabases,graphdatabases,NoSQL,etc.

9

Page 10: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

RelationalDBRequirements(1)� Needgeneralizedquerysupport

� UsetheSQLquerylanguage� Howtorepresentqueryplansinternally?� Howtooptimizequeryplans?

� Whatplantransformationsareallowed?� Whatdatastatisticsdoweneedtoguidethisphase?

� Howtoexecutequeryplans?� Generalissues:subqueries,correlatedsubqueries,inserts/updates/deletes,etc.

� Advancedtechniques(e.g.parallelexecution)

10

Page 11: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

RelationalDBRequirements(2)� Typicallyneedspersistent storageofdata

� Somedatabasesarein-memoryonly� (Makesthingsmuch simpler!)

� Problem:disksareslow� Howtoimprovedataaccessperformance?

� Storageformats,datalayouts,dataaccesspatterns� Alternateaccesspaths (i.e.indexes)� Betterhardware?(e.g.RAID,SSD)

� Howdotheseimprovementsaffectqueryplanningandexecution?

11

Page 12: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

RelationalDBRequirements(3)� Generally,databasesshouldprovidetransactedoperations

� Satisfyspecificproperties(ACID)� Example:Durability

� Icomplete somecriticalpieceofwork� i.e.thedatabasereportsthetransactionascommitted

� Thenthepowerfails!� Atrestart,databaseshouldensuremyworkisstillthere…

� Example:ConsistencyandAtomicity� Iamperformingamultiple-steptransaction…� Alongtheway,Iviolateadatabaseconstraint!

� DBmustrollbackallotherchangesinthetransaction

12

Page 13: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

RelationalDBRequirements(4)� Databasesoftenneedtosupportconcurrentaccessfrommultipleclients

� Howtoensurethatclientoperationsareisolatedfromeachother?� (Andwhatdowemeanby“isolated”anyway?)� Howtogovernaccesstosharedtables/records� Whattechniquescanmakethisfastandefficient?

� Howdoesconcurrentaccessaffectqueryplanning,optimization,andexecution?

13

Page 14: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

DBArchitecture(1)� Databasedataisstoredinfilesondisks

� Filesarereadandwritteninblockstoimproveperformanceandsimplifytransactionprocessing

� Thefilemanager exposesrawdiskdatatotheDB� e.g.allowsblocksorpagesofdatatobereadfromafile� e.g.allowsfilestobecreatedordeletedaswell

FileManager

DataDictionary

TableFiles

IndexFiles

Transaction Logs

filesystem

14

Page 15: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

DBArchitecture(2)� Thebuffermanager cachesdiskpagesinmemory

� Ensuresthatdiskisonlyaccessedwhennecessary� Allowsverylargedatafilestobemanagedbydatabase

� Frequently,mostwidelyuseddatapagesremaincachedinmemoryacrossmanyqueries� e.g.schemadefinitions,criticalindexes,etc.

DataDictionary

TableFiles

IndexFiles

Transaction Logs

filesystem

BufferManager

FileManager

15

Page 16: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

DBArchitecture(3)� Thetransactionmanager providestransactionprocessinginthesystem� Mustworkcloselywiththebuffermanager� Ensurepagesarewrittenbacktodiskinawaythatalsosatisfiestransactionproperties

� Example:� Dirtypagescannotbewrittenbacktodiskuntiltransactionlogsareproperlyupdated

DataDictionary

TableFiles

IndexFiles

Transaction Logs

filesystem

BufferManager

FileManager

TransactionManager

16

Page 17: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

DBArchitecture(4)� Thesecomponentscollectivelycomprisethestoragemanager

� Providesaccesstoschemainformation,tabledata,indexes,andstatisticsabouttabledata

� Canalsomanipulateeachofthesekindsofdata

TransactionManager

DataDictionary

TableFiles

IndexFiles

Transaction Logs

filesystem

BufferManager

FileManager

StorageManager

17

Page 18: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

DBArchitecture(5)� Datadefinitionoperationsdon’trequiresophisticatedqueryingcapabilities

� Manydatabasesdon’tevenhavetransactedDDL

TransactionManager

DataDictionary

TableFiles

IndexFiles

Transaction Logs

filesystem

BufferManager

FileManager

StorageManager

DDL Parser

DDL Interpreter

18

Page 19: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

DataDefinition� Datadefinitionoperationsrequiremanipulationofdatadictionary,tablefiles,andindexfiles� Create/alter/dropatable� Create/alter/dropanindex

� Thedatadictionary holdsallDBschemainformation� Mostdatabasesuseanothersetoftablestostorethedatadictionary� Usesamemachinerytoread/writetuplesindatadictionary� MakesiteasiertotransactDDLoperationstoo� Manipulationofthesetablesisperformedmoredirectly,withoutusingqueryprocessingsystem

19

Page 20: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

DBArchitecture(6)� Databasequeriescanbeverycomplexandtime-consuming

� Queryevaluator executesplanscomprisedofbasicoperations� Operationsarelooselybasedontherelationalalgebraoperators

� Usesstoragemanagertoaccesstablesasasequenceoftuples

TransactionManager

DataDictionary

TableFiles

IndexFiles

Transaction Logs

filesystem

BufferManager

FileManager

StorageManager

DDL Parser

DDL Interpreter

Query Evaluator

20

Page 21: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

DBArchitecture(7)� Queryplanner preparesaqueryforevaluation

� Translatesthequeryintoanoptimizedexecutionplan� Usesdatastatisticsandeducatedguessestooptimizetheplan

� Passestheplantothequeryevaluatorforexecution

TransactionManager

DataDictionary

TableFiles

IndexFiles

Transaction Logs

filesystem

BufferManager

FileManager

StorageManager

DDL Parser

DDL Interpreter

DML Parser

Query Planner

Query Evaluator

21

Page 22: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

IntroducingNanoDB� AssignmentsandprojectswillbebasedontheNanoDBcodebase� AsimpleJavadatabaseimplementation

� Primarygoalistobeunderstandableandextensible� Whendecidingbetweenperformanceandclarity,NanoDBgenerallychoosesclarity

� (Mostopen-sourcedatabaseprojectsgotheotherway.)� Definitelystillaworkinprogress!

� Plentyofopportunitytomakealastingcontributiontothisproject,andtofutureiterationsofthisclass

22

Page 23: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

DatabaseFiles� Thedatabasestoresdatainfiles,andmanipulatesdatainmemory…� …thisdataisstoredonphysicaldevicesinthecomputer

� Needtounderstandthecharacteristicsofthesedevicestounderstandhowtoimplementdatabasesproperly

23

Page 24: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

StorageHierarchy� Widevarietyofstoragemediaavailableforuse� Canclassifystoragemediabasedon:

� Accessspeed� Costperunitstorage� Reliability

� Differentmediasupportdifferentaccesspatterns� Affectstheirusefulnessforvarioustasks

� Examples:� Magnetictapesarebestsuitedtosequentialaccess;randomaccessisprohibitivelyslow

� DRAM/flashmemoryareverygoodforrandomaccess

24

Page 25: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

StorageHierarchy(2)� Primarystorage:

� Cachememoryisusuallyverysmall,veryfast� Mainmemoryismuchlargerthancache,butusuallystilltoosmalltoholdanentiredatabase

� Supportsrandomaccess(a.k.a.directaccess)� Storageisvolatile:datadoesn’tsurviveapowerloss

� Secondarystorage,a.k.a.onlinestorage:� Muchlarger,cheaper,andslowerthanprimarystorage� Storageispersistent:datalaststhroughapowerloss� Magneticdisksaremostcommonformofsecondarystorage� Flashmemorydevices(SolidStateDrivesorSSD)areincreasinglycommon,butarestillsmaller/moreexpensive

25

Page 26: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

StorageHierarchy(3)� Tertiarystorage,a.k.a.offlinestorage:

� Opticalstorage(CDs,DVDs,etc.),tapestorage� Primarilyusedforbackupandarchivalstorage� Accessratesarevery slowcomparedtoprimaryandsecondarystorage

� Tapesonlysupportsequentialaccess,notdirectaccess� Tapes/opticaldiskscanbemanagedinalibrary

� Roboticsystemtoloadspecifictapes/disksintoadrive� Callednear-linestorage

� Primarydifferencebetweenoffline&near-linestorage:� Acomputercanaccessnear-linestorageandloaditintoonlinestorageallbyitself

� Ahumanbeingmustmakeofflinestorageavailable

26

Page 27: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

Internalvs.ExternalMemory� Indatabasesystems,primarystorageisalsooftencalledinternalmemory� MemorytheCPUcanaccessquicklyandefficiently

� Similarly,secondarystorage(flashmemory,magneticdisks)arecalledexternalmemory

� Algorithmsdesignedtoworkwithdatasetsmuchlargerthanprimarystoragearecalledexternalmemoryalgorithms� Mustrepeatedlyloadportionsofdataintointernalmemory,processthem,thenstorethembacktodisk

27

Page 28: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

Internalvs.ExternalMemory(2)� Datastructuresfororganizingdataondiskarecalledexternalmemorystructures� e.g.hashfileorganizationalsocalledexternalhashing

� Frequentlyencounter“external-memory(whatever)”indatabaseliterature� Simplymeansthealgorithm/structureisdesignedtoworkwithdatastoredondisk,notjustinmemory

28

Page 29: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

StorageHierarchy(4)

cache

main memory

flash memory

magnetic disk

optical disk

magnetic tapesslow

er

fast

er

chea

per

mor

e ex

pens

ive

tertiary (off-line) storageUsually, a person mustfacilitate access

secondary (on-line) storage(aka “external memory”)OS must facilitate accessnonvolatile storage

primary storage(aka “internal memory”)CPU can access directlyvolatile storage

29

Page 30: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

MagneticDisks� Magneticdisksaremostwidelyusedonlinestoragemediumincomputers� Harddiskdrives(HDD)

� Drivecontainssomenumberofplattersmountedonaspindle� Plattersspinataconstantrateofspeed

� 5400RPM,upto15000RPM� Read/writeheadsaresuspendedaboveplattersonadiskarm� Allheadsmovetogetherasaunit

platters

spindle

disk armassembly

read/writeheads

30

Page 31: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

MagneticDisks(2)� Plattersaredividedintotracks� Tracksaredividedintosectors

� Moderndriveshavemoresectorstowardsedgeofdisk� Allheadsarepositionedbyoneassembly…

� Acylinder ismadeupofthetracksonallplattersatthesameposition

� Toreadasectorfromdisk:� Assemblyseeks totheappropriatecylinder

� Sectorisreadwhenitrotatesunderthediskhead

cylinder

tracks

sectors

31

Page 32: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

DiskPerformanceMeasures� Accesstime isthetimebetweenaread/writerequestbeingissued,andthedatabeingreturned� Read/writeheadsmustbemovedtoappropriatetrack� Sectorsmustrotatepasttheread/writeheads

� Firstoperationiscalledaseek� Averageseektime ofadiskismeasuredfromaseriesofrandomseeks(uniformdistribution)

� Generallyrangesfrom3-15ms� Typicalconsumerdrivesareintherangeof9-12ms

� Seekingnearbytrackswillobviouslybefaster� Track-to-track seektimesinrangeof0.2-0.8ms

� (SSDshave“seektimes”inthe0.08-0.16msrange)

32

Page 33: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

DiskPerformanceMeasures(2)� Rotationallatencytime isamountoftimeforsectortopassunderread/writeheads� Averagerotationallatency is½thetimeforafullrotation� 5,400RPM:5.6ms� 7,200RPM:4.2ms� 15,000RPM:2ms

� Diskscanonlyread/writeinformationsoquickly� Datatransferrate specifieshowfastdataisreadfrom/writtentothedisk

� Currentinterfacescansupportupto600+MB/sec� Actualtransferratedependsonseveralthings:

� Thediskanditscontroller,motherboardchipset,etc.� Thesectionofthediskbeingaccessed

33

Page 34: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

Disk-AccessOptimizations� Widerangeoftechniquesusedtoimproveharddiskperformance� ImplementedintheHDDitself,and/orinoperatingsystem

� Buffering� Whendataisread,storeitinamemorybuffer� Ifsamedataisrequestedagain,provideitfromthebuffer

� Read-ahead� Whenasectorisread,readothersectorsinthesametrack� Ifaprogramisscanningthroughafile,subsequentaccessescanbesatisfiedimmediatelyfromcache

34

Page 35: CS122 –Lecture 1 Winter Term, 2017-2018courses.cms.caltech.edu/cs122/lectures-wi2018/CS122Lec01.pdf · Algorithms (e.g. sorting, searching), trees, graphs, etc. ... querying capabilities

Disk-AccessOptimizations(2)� I/OScheduling

� Theharddiskcanqueueupbatchesofreadandwriterequests,thenscheduletheminareasonableway

� Goal:reducetheaverageseektimeofaccesses� Writescanbebufferedinvolatilememorytofacilitatethis(cancauseproblemsifpowerfailsbeforewriteisperformed)

� Nonvolatilewritebuffers� DiskprovidesNV-RAMtocachediskwrites� Data issavedinNV-RAMbeforebeingsavedtodisk� Dataisn’twrittentodiskuntil thediskisidle,ortheNV-RAMbufferisfull

� Ifpowerfails,contentsofNV-RAMarestillintact!

35