modelling data on the web - university of...

102
COMP60411 Modelling Data On The Web Tim Morris & Uli Sattler Week 1 Introduction, Data Models, Tables, and SQL

Upload: others

Post on 26-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

COMP60411ModellingDataOnTheWeb

TimMorris&UliSattler

Week1Introduction,DataModels,Tables,andSQL

Page 2: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

TopicOverviewWhatisa(core)datamodel?E.g.,Flat:flatfilesTablebased:relationalTreebased:XMLandabitofJSONGraphbased:RDF

Tradeoffs(esp.representational)betweenthemDiscussingpainpoints&sweetspots,distinguishing

principledonesfromDM-basedonesfromthosecausedbyyourusageofDM

Page 3: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

CourseGoals:Knowledge&Understanding

This aimstogiveyouagoodunderstandingofcoreconceptsofdatamodellingsomefamiliaritywithformalisms,APIs,andlanguages

formodellingdataonthewebdesign/representationissuesthatarise

courseunit

Page 4: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

CourseGoals:Skills

This aimstogiveyoutheability/skilltocomparedifferentdatamodellingformalisms,designoranalyseadatamanagementsystem,

doesitmakegooduseoftheformalism'sfeatures?doesitfititspurpose?

courseunit

Page 5: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

CourseStructureLectures

ActivelearningLab

Makesureyouunderstandthecoursework!Readings

AllreadingsavailableonlineCore:the"Learning"eBookseries

LearningSQLLearningXMLLearningSPARQL

Page 6: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

OurExpectationsLectures:

activelistening&participationLabMondaysafternoon:

makesureyouunderstandthecoursework!Labduringweek:

workonyourcourseworkmakeuseofTAs:14:00-15:00

Coursework:submitontime

Read!

Page 7: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

AssessmentCoursework(50%,≈200marks)

Eachweek,amixture1. MCQquizzes(≈10marks)2. Shortessays(≈5marks)3. Amodellingassignment(≈10marks)4. Aprogrammingassignment(≈15marks)Precisemarkbreakdownvaries

Exam(50%)TakenonlineVerylike1&2

Page 8: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

Materials&BlackboardAllcoursematerialsareavailableonlineonthematerialspageWeuseBlackboardfor

CourseworkOnlineforums

SubscribetoeachforumAskquestionsthereAnswerquestionsthereShareexamples,testcasesthere

Exam

Page 9: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

VariantCircumstancesDisability(EqualityAct):

anyconditionwhichhasasignificant,adverseandlong-termeffectonaperson’sabilitytocarryoutnormalday-to-dayactivities.

Exam&Studysupport&moreGreat,helpfulpeople

and process

DisabilityAdvisoryandSupportService

CounsellingserviceSSO MitigatingCircumstances

...feelfreetoaskus:we'rehappytoadvise!

Page 10: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

Assistance&HelpEarlyinterventionismoreeffective

Ifyouarehavingchallengesofanysortthesoonertheyareidentifiedandcommunicatedtousthemorelikelywecanfindagoodresolution

ThisisverytrueformitigatingcircumstancesIfsomethingisinterfering,documentit!FillouttheformwhenthingsarehappeningThereisa"toolate"here!

...whenindoubt,askusandSSOforMitCircs

Page 11: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

ExpectedConductWeexpectofyou(andourselves)to

befairmindedtreateachotherwell&withrespectavoidacademicmalpracticetakeresponsibilityforcoursedutiesbeengaged,curious,andactive

Ifyouhaveaproblemorissuepleaseraiseitwithusifthatdoesn'thelp,contactyourprogrammedirector

Page 12: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

Preliminaries

Weallhavetostartsomewhere

Page 13: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

DataManagement(1)Almosteveryprogrammustdosomedatamanagement

Ifonlyconfigfiles!Manyareinformationheavy

andmustdealwiththatinformationovertimeDatabaseManagementSystems(DBMSs)

Separate(orseparable)componentSpecialisedforvariablespurposed

secondarystorage,scaling,complexity,etc.

Page 14: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

DataManagement:Lifetime

Somedatais(typically)transientorephemeralPositionofthecursoronthescreen

Somedatais(typically)persistentBankrecords,addresses,healthdata,libraryentriesCursorpositioncanbe!

(Ifyouarerecordingthescreen...)

We'refocusedondatathatleanstowardpersistent

Page 15: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

DataManagement:Structure

Somedatais(moreorless)informationallyopaquee.g.,images,video,text,audioitsinformation/contentisn't(easily)available

Youtypicallymustdosomeextractionthisiscalledunstructureddata

Somedataisinformationallytransparentitsinformation/contentisprogrammaticallyexplicitthisiscalled(semi-)structureddata

Page 16: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

OutOfScopeThereislotsofDMthat'soutsideourscope1. Performance&Scaling:see2. Concurrency

Thustransactions(YoushouldreaduponACIDity)

3. Tuning,indeedmostphysicallevelstuff4. Cleansing5. Integration

Exceptforatinybit,aroundmerging

COMP62421

Theseconsiderationsdoaffectmodelling!

Page 17: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

DataAndTheWebTheWebisacollaborativeinformationstructure

LargelydecentralisedImmenseGrowingrapidlyChangingrapidly

TheWebproducesnewdatachallengesScaleofdataKindofdataShapeofdataUseofdata

Page 18: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

Dataon,from,behindtheWebOntheWeb

data.gov,data.gov.uk,...FromtheWeb

LogfilesBehindtheWeb

Data(base)backedWebsitesThefilesystemisakindofdatabase

ContentManagementSystemsWordpress

SitesasDatabaseFrontEndsSeeAmazon

Page 19: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

WhatisaDataModel?ThreeKeyAspects1. UnderlyingDataStructure,"CoreDataModel"2. DataIntegrity3. DataManipulation4. (Plusafourth!)DataSharing

MoreimportantontheWeb*

Page 20: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

"DataModel"isAmbiguous:1. acompletedatarepresentationandmanipulationapproach(wedothis!)2. justthecoredatamodel3. aparticulardatarepresentationforadomainorapplication,alsocalledthedomainmodel

"Doesyourcalendardatamodelincludeleapyears?"

Generally,youcantellfromcontext,(2)israre.

Page 21: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

KindsofDataDatacanlenditselftodifferentshapes

Array-likeTree-likeGraph-likeDocument-like

DatacanhavedifferentvolumesSmallto"big"data

DatacanhavedifferentvelocitiesStatic/offlinetostreaming

DatacanhavedifferentusepatternsManyreaders/fewwritersorthereverseorother!

Page 22: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

DataDoesNotGrowonTreesDatamaylenditselftooneshape

e.g.,tree-shapeorgraph-shapebutthisdoesnotmeanthat

wehavetopersistitinthisformweknowexactlyhowtocastitinthisform...considerpain-pointsandsweetspotsothersshareitinthisform

Page 23: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

PolyglotPersistence...we are gearing up for a shift to polyglot persistence — where anydecent sized enterprise will have a variety of different data storagetechnologies for different kinds of data. There will still be largeamountsof itmanaged in relational stores,but increasinglywe'llbe firstaskinghowwewanttomanipulatethedataandonlythenfiguringoutwhattechnologyisthebestbetforit.

—MartinFowler

Page 24: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

PolyglotPersistence(2)Thispolyglot[e]ffectwillbeapparentevenwithinasingleapplication.Acomplex enterprise application uses different kinds of data, and alreadyusuallyintegratesinformationfromdifferentsources.Increasinglywe'llsee such applications manage their own data using differenttechnologiesdependingonhowthedataisused.—MartinFowler

Page 25: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

Poly-Glot/-SystemPersistenceEvenasinglecoredatamodelcanresultin

multiplesystemswithdifferentcharacteristicsmultiple,overlapping,domainmodelsmultiple,overlappingowners,versions,variants

ThisisparticularlytrueinontheWeb!

Page 27: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

ASampleDomainWestartwithaclassicexample:TheAddressBook

PeopleandinformationaboutthemNamesandcontactinformation

Wecandoafirstcutasadiagram

Page 28: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

ForExampleBijan!

Name:BijanParsiaCompany:UniversityofManchesterEmail:[email protected]...

Uli!Name:UliSattlerCompany:UniversityofManchesterEmail:[email protected]

Page 29: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

Storing!

Page 30: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

SlidesarenotagoodstorageplacefordataWehaveanarraylikestructureso...

Howaboutaspreadsheet!1entity/record/personperrowEachfield/attributeisacolumn

Wehavesoftwarethatworkswellwiththis!

Page 31: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

InteractingWithTheData

Tothedemo!

Page 32: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

PainpointsAround"name"

SortingisoncolumnsCannotsortbysurname

Filtering:canfilterbynamesbeginningwithZCannotfilterbysurnamesbeginningwithZ

Around"address"Can'tsortorfilterbypostcodeCan'tsortorfilterbycityCan'tsortorfilterbycounty

Problemswithspreadsheetsorourformat?

Page 33: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

Format2Thisshouldfixourpainpoints!

Page 34: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

Interacting!

Demoencore!

Page 35: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

NewPainPointsVariablenumbersofthe"same"attribute

PhonenumberEmailaddressWebpageInsertingcolumnsispainful

LotsofpartialcolumnsSheernumbersucks

Companieshaveaddresses!Morethanone!Andphonenumbers,etc.

Moreproblemswithourformat

Page 36: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)
Page 37: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

NOTANewFormatNotafixtoourformat:

Page 38: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

FixingTheFormatAgainWewantaddinga(similar)columntobeeasy!

Easyasaddingarow!MakeanewtablejustforphonenumbersIndexnumberswithpersonrows

Page 39: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

Format3Nowthisshouldfixourpainpoints!

Page 40: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

StillPainPointsSortingdestroystherelationship

WeusedrownumberstoconnectSortingchangestherownumber!

HardtoseetherecordNolongerasimpleflatfile

CSVformatmakesassumptions

Theseare(mostly)implementationproblems!

Page 41: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

AnalyseFormatFailureDidwe

getthedomainwrong(addresses)?fititwrongintoourcoreDM(tables)?pickthewrongcoreDMtomodelitin?

Isourformatunworkable?workablebutrequiresalotofapplicationcode?reasonablewithsomeworkarounds?

Howmuchtechnicaldebtarewepilingup?What'sthecostofswitching?

Page 42: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

UnsuitableCoreDataModelIfyouare

always"fighting"thesystemuselotsofapplicationcodetohackthingsliveinanerrorrichenvironmenthaveincreasingamountsofworkaroundsupportinyourdata

Yourcoredatamodelmightnotbeagoodfitforyourdomainandapplication!

Page 43: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

TheRestOfTheDBMSEvenifyourcoreDMisn'tagoodfit,youmight

bestuckwiththesystemYoupaidgoodmoneyforthatOracledatabase!

needfeaturesoftheimplementationisthereanXMLdatabasewithtransactions?what'sthesupportcontract?

bestuckwiththemodel(criticallegacyapps)Justbecausethemodelisbrokendoesn'tmeanthatthesystemis

Orisbrokenenoughtojustifyaswitch

Page 44: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

FlatFileProgramming

Page 45: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

SharingOurDatabasesSpreadsheets?Propriatory-ish(Excel,GoogleDoc,OpenOffice)

Linguafranca:CSVComma(orTab)DelimitedValuesExactlythe(pure)flatfilemodelFormat:textfile

1recordperlineFirstlinecanbespecial(columnnames)Eachcolumnseparatedbya","

Wemayneedtoquotecells(withcommas)

Page 46: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

CSVExample

Page 47: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

ProgrammaticManipulationIfwestoreourdatabasesasCSV

WecanloadandparsethemintostructuresManipulateourdatafromourprograms

E.g.,usingPythonimport csvwith open("../Adresses/mod2-uk-500.csv") as csvfile: line_count = 0 myreader = csv.reader(csvfile, delimiter=',', quotechar='t') for row in myreader: if line_count == 0: line_count += 1 else: print(f' Candidate {line_count}: Firstname {row[0]} Lastname {row[1]} City {row[4]}') line_count += 1print(f'Processed {line_count -1} Candidates.')

Page 48: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

SolvingProblemsThissolvessomeproblems!

Inserting/removingcolumnsa"smallmatterofprogramming"Orwecouldusemultiplearrayswithpointers

Wecansplit/combinefieldsatwillWell,withabitofprogramming

WecancontrolsortingwellenoughUsepointerstoconnect

Lotsofwork!

Page 49: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

AgainstBespokeProgrammingThisisallatthewronglevel

Flatfilesandflatfile++areubiquitousWeshouldn'tbecodingcomplexfunctions

Overandoveragain!Evenifwecanprogramourwayaroundproblems

Doesn'teliminatetheproblemsSomesolutions(pointers)effectivelychangethecoremodel:nolongerflatfiles!

Page 51: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

TablesAcoreDMwheretable(orrelation)isthecoredatastructure

AtableisasetoftuplesAtupleis

ann-arysequenceasetofkey-valuepairs

FlatfilehadonetableWeallowmany!NamedtablesAkarelations

Page 52: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

Relations!(Weusetableandrelationinterchangeably)RelationsarelikeFirstOrderLogic(FOL)predicates

Relationname=PredicatenameNumberofcolumns=Arityofpredicate

Person(bijan,u_o_manchester,...)Predicateistrue(orfalse!)ofitsarguments

Relationis"true"oftupleswhichoccurinitPredicatescanhavedefinitions(intensional!)facts(extensional!)

Page 53: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

OrderandIdentityRecords/Rows/Entitiesneedidentity

InExcel,wehadtherowlabeltheorderorpositionofarecordwassignificant

Inourmodel,weneeddistinguishingattributeswepushidentityintothedata:akey

eithera"naturally"uniquesetofattributesoramadeupone:anID

Orderisalwaysapropertyofthedatavaluesimplementation

Page 54: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

MultipleTablesActionsonmultipletables:Splittingat

designtime:trytonormalizeyourDBruntime:droppingbits

CombiningTaketwotablesandproduceanewtable

ThekeytorelationaldomainmodellingDecomposeyourprobleminto"base"tablesDerivenewtablesforspecificneeds

Page 56: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

WhatIsAFormalism?Aformalsystem(orformalism):syntax:whatcanwewrite?semantics:whatdoesourwritingmean?withprecise(mathematical)definitionsdesignedtocaptureacoherentsetofoperations("syntax"isloose,e.g.,wemightjusthaveacollectionofoperators)

Page 57: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

KeyGoalsOfAFormalism1. tobeclearaboutwhatwemean

Inourspreadsheetis"1"anumber,astring,either,both,somethingelse?

2. toallowthedeterminationofkeypropertiese.g.,complexityofqueryanswering

3. toabstractawayfromparticularimplementionse.g.,allowustodeterminewhenwildlydifferentimplementationsarecorrectthuscaninteroperate

Page 58: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

Formalismvs.LanguageFormalismsareoftenabstract

Thiscanbeanadvantage!CanbehardtouseifonlyabstractConcreteinstancestypicallyinvolvecompromise

WefocusonconcretelanguagesFormalismsarethetheoryLanguagesarethepractice

Well,itmaybeallrightinpractice,butitwillneverworkintheory.Intheory,thereisnodifferencebetweentheoryandpractice.But,inpractice,thereis.

OtherQuotesOnTheoryvsPractice

Page 59: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

SQL:ALanguageForTablesSchema

CREATE TABLEtable_nameUpdate

INSERT INTOtable_nameDELETE FROMtable_nameUPDATEtable_name...

QuerySELECT ... FROMtable_name

SQLoperations(largely)areclosedovertables

Page 60: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

AnInfelicityThereisalotoflingowithslightdifferentmeanings.Conceptsgetdividedup

inslightlydifferentways.

Ourtalk Common LearningSQLp.10

CoreDataModelDataIntegrity DataDefinition SQLschemastatements"CREATE"

DataManipulation Query/UpdateLanguage

SQLDatastatements

Page 61: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

ASampleSQLProgramCREATE TABLE People ( name varchar(255), company varchar(255), address varchar(255), phone varchar(255), email varchar(255), home_page varchar(255));

INSERT INTO People VALUES ('Aleshia Tomkiewicz', 'Alan D Rosenburg Cpa Pc', '14 Taylor St, St. Stephens Ward, Kent CT2 7PP', '01835-703597','[email protected]', 'http://www.alandrosenburgcpapc.co.uk');SELECT name FROM People

YoumustDefinebeforeUpdatebeforeQueryI.e.,CREATEbeforeINSERTbeforeSELECT

Page 62: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

ModellingWithSQLSQLletsusexpressmodelsatthelogicalto(someofthe)physicallevel

SpecifyingindicesisabitphysicalKnowledgeaboutimplementationmayinformmodellingchoices

SQLhasnomechanismsforconceptuallevel

Page 63: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

Format1InSQL

Page 64: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

Format1InSQL CREATE TABLE People ( name varchar(255), company varchar(255), address varchar(255), phone varchar(255), email varchar(255), home_page varchar(255));

INSERT INTO People VALUES ('Aleshia Tomkiewicz', 'Alan D Rosenburg Cpa Pc', '14 Taylor St, St. Stephens Ward, Kent CT2 7PP', '01835-703597','[email protected]', 'http://www.alandrosenburgcpapc.co.uk');...

Canwedoallthatwedidinthespreadsheet?

Page 65: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

SQLManipulationofFormat1CountrecordsinyourPeopletable:

Searchforitems:

Sortthetable!

SELECT COUNT(*) FROM People

SELECT * FROM PeopleWHERE name like 'Aleshia%'

SELECT * FROM PeopleWHERE name like '%Tomkiewicz'

SELECT * FROM PeopleORDER BY name asc

Page 66: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

Format2InSQL

Page 67: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

Format2InSQL CREATE TABLE People ( first_name varchar(255), surname varchar(255), company varchar(255), street_address varchar(255), city varchar(255), county varchar(255), post_code varchar(255), phone varchar(255), email varchar(255), home_page varchar(255));

INSERT INTO People VALUES ('Aleshia', 'Tomkiewicz', 'Alan D Rosenburg Cpa Pc', '14 Taylor St', 'St. Stephens Ward', 'Kent', 'CT2 7PP', '01835-703597','[email protected]', 'http://www.alandrosenburgcpapc.co.uk');...

Page 68: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

SQLManipulationofFormat2Theoldquerieswork,butwecanimprovethem

Searchforitems:

WecanrecreateFormat1!

SELECT * FROM PeopleWHERE first_name = 'Aleshia'

SELECT * FROM PeopleWHERE surname = 'Tomkiewicz'

SELECT first_name || " " ||surname as name, street_address || ", " ||city ||", "|| county ||" " || post_code as address,phone,email,home_pageFROM People

Page 69: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

Format3InSQL

Page 70: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

Format3InSQL CREATE TABLE People ( person_id SMALLINT UNSIGNED, first_name varchar(255), surname varchar(255), company varchar(255), street_address varchar(255), city varchar(255), county varchar(255), post_code varchar(255), email varchar(255), home_page varchar(255), CONSTRAINT pk_person PRIMARY KEY (person_id));

CREATE TABLE Phone ( person_id varchar(255), number varchar (255), CONSTRAINT pk_phone_number PRIMARY KEY (number));

INSERT INTO People VALUES ('1','Aleshia', 'Tomkiewicz', 'Alan D Rosenburg Cpa Pc', '14 Taylor St', 'St. Stephens Ward', 'Kent', 'CT2 7PP', '[email protected]', 'http://www.alandrosenburgcpapc.co.uk');INSERT INTO Phone Values ('1', '01835-703597')INSERT INTO Phone Values ('1', '01944-369967')

Page 71: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

SQLManipulationofFormat3RecreateFormat1andFormat2:easyFindeveryonewithsamephonenumberCanwehaveunassignedphonenumbers?

Page 72: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

Howdidourformatsdo?CoreDM/Datastructure:Tablesseemtowork!SQLandRelationalModel

Wecandoeverything!AllqueriesinallmodelsFormat3has2tables/requiresjoins

Format3Neaterinsertinganddeleting

Canhaveasmanyphonesasyouwant!Everyotherdomainmodelcanbederived

Justwritethequery!

Page 73: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

ExpressivePowerSQLisexpressive

ThecoredatamodelisrichComposingandfilteringtablesdoesalot!Operatorsandfunctionshelpful

Withoutconcat(...),there'dbetrouble!Thelanguageispowerful

ReasonablycomposableLotsoffeaturesExtended&extensibleinmanyimplementations

Interopproblems!

Page 74: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

QueryingWithSQL

Page 75: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

SchemasVs.QueriesCREATEstatements

"create"emptytablesoutofnothingatallwithcertainconstraintswithsomeexpectationofpermanence

SELECTstatements"generate"newtables(possiblywithdata)outofexistingtablesaccordingtosomeconstraintswithnoexpectationofpermanence

Page 76: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

ClosedOverTablesSQLis(mostly)closedovertables

MostSQLconstructstake&producetablesClearexception:Functions!

ManipulationismanipulationoftablesNotrows,columns,orcellsdirectlyRows,columns,andcellsare"degeneratetables"...

Page 77: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

FilteringKeyoperationSELECT:ignoringsomeparts

Basically"find"CanfilterrowsorcolumnsorbothRequires"testing"functionsonvalues

Page 78: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

FilteringColumnsaka"Projection",specifiedinSELECTclause

Keepallcolumns:

Justasinglecolumn:

Multiplecolumns:

Renamecolumns:

SELECT * FROM People

SELECT county FROM People

SELECT name, county FROM People

SELECT street_address AS address FROM People

Page 79: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

FilteringrowsSelectingspecifictuplesSpecifiedintheWHEREclauseofyourquery:

Equality:

Range:

Compoundcriteria:

SELECT * FROM People WHERE surname = "Smith"

SELECT * FROM PeopleWHERE heartrate > 95

SELECT * FROM PeopleWHERE heartrate > 95 AND county="Kent"

Page 80: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

BuildingTableswithCrossJoinThefundamentaloperationisCartesianproduct

T1xT2forexamplePeoplexPhone

MakesanewrowforeverypairofrowsfromT1&T2What'sthesizeoftheresult?

Notreallyauser-orientedfeature"Incidentally"crossjoinsaredangerous!

Page 81: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

BuildingTablesWithInnerJoinAninnerjoinisajoinfilteredoncommoncolumns

Usefulforourphonerecords!

Theaboveisspecialcase,called"natural"joincanbewrittenasfollows:

SELECT * FROM People, PhoneINNER JOIN ON People.person_id = Phone.person_id

SELECT * FROM People NATURAL JOIN Phone

Page 82: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

BuildingTableswithOuterJoinAnouterjoinislikeaninnerjoinbutitreturnsalsorowsthatdonothaveamatchintheothertable

leftouterdifferentfromrightouterSELECT * FROM People, PhoneRIGHT OUTER JOIN ON People.person_id = Phone.person_id

willreturnalsopeoplewhohavenophone!

Page 83: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

BuildingAndFilteringOncewe'vebuiltatablewecanfilterthingsweneed:

SELECT * FROM People, PhoneRIGHT OUTER JOIN ON People.person_id = Phone.person_idWHERE People.surname = "Smith"

...youknewthatalready!?

Page 84: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

TheCostAkeyissuewithjoins

WorstcasefortheircomputationisaCROSSEvenifyoudon'tgeneratetheCROSS

Youmighthavetoconsiderallthepairs(Ifyouaren'tcareful)

GoodoptimisersavoidbothConsideringlotsofmatches(thinkindexes)Generatinglargeintermediatetables

Page 86: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

MultiplePhoneColumnsSomepeoplehavenoneoroneOrnoemailorwebpage

Page 87: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

NoSurnameEvenifwenormalisedthataway

Somepeopledon'thaveasurname!

Page 88: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

Nullnullisadistinguishedvaluewhichcanmean:

"Valuenotyetknown""Notapplicabletothisentity""Valueundefined"checkout

Keyproperty:Unequaltoeverythingnull = nullisnevertrueMatchonnotnull,ratherthannull

LSQL

Strangevalue!

Page 89: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

OuterJoinsIfyouhavenonullsinyourbasetables

youcan'tgetthemintablesderivedbyinnerjoinHowever,the2phonecolumntableisderivable

WeusetheouterjoinOuterjoinstakeatableT

foreachrowinTextenditwiththe(projected)columnsfromanothertableIfthere'samatch,addthematchedvalues*else,addnulls

SeeLearningSQL forexamplesChapter10

Page 90: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

NullProliferationnullnevermatches

SoiteratedouterjoinsproliferatenullsAsyougetwider,yougetsparser

Ifyouarematchingonasparseattributenullsposechallengeforrelationaltheory

AndsomewhatforpracticeStartsmovingfromthesweetspot

Page 92: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

SQLDrivenWebsitesManywebsitesarebackedbyadatabase

PHPmakesiteasyConsiderWordPressandotherCMSs

LotsofunstructuredcontentStuffinblobsandtextfields

KeypropertiesScalingACID:Atomicity,Consistency,Isolation,Durability

TransactionsConcurrentaccess

Thereisa thatisstillgoodreading,espchps -

keyhistoricaltext11 12

Page 93: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)
Page 94: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

CSV&SQLprogramsontheWeb

Othergovernmentrepositories:data.govdata.gov.uk

Scientificsitesallaboutclinicaltrials!

allaboutproteins!...

UNDatarepository

ClinicalTrials.govUniProt

Page 95: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

GoogleQueryVizLanguageASQLlikelanguage

UsedinGoogleDocsSpreadsheetQUERYfunctiontakesqueriesasargument

Page 96: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

WebSQLTheWhatWGandW3CtriedtostandardizeWebSQL

This specification introduces a set of APIs to manipulate client-sidedatabasesusingSQL.

Localdatabasebackedwebapps

ForofflineuseJustincreasedcapabilities

function prepareDatabase(ready, error) { return openDatabase('documents', '1.0', 'Offline document storage', 5*1024*1024, function (db) { db.changeVersion('', '1.0', function (t) { t.executeSql('CREATE TABLE docids (id, name)'); }, error); });}

Page 97: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)
Page 98: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

Whatisthisdata?Arecurringissue:whatisinthisshareddocument?

csvtableJSONsnippet...

Whatdoesitmean?Howtoparse?Howtoshare?Sothatit'sgoodtouse?Self-DescribingandMeaningwillbediscussedatlength

Page 101: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

AnyQuestionsSoFar?

Page 102: Modelling Data On The Web - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2019/COMP60411/... · 1. a complete data representation and manipulation approach (we do this!)

Labs&CourseworkNext,wegototheLabsYoulookinBBatWeek1coursework:

QuizQ1ShortEssaySE1SmallModellingexerciseM1SomequeryingCW1

Read,think,askus!