relational data modelling i: modelling processes and the ... · relational data modelling i:...

13
WEEK 3: RELATIONAL DATA MODELLING I Relational Data Modelling I: modelling processes and the language of sets. In this week, we will cover the following topics: What modelling is, and specifically what data modelling is. What data modelling involves in real-world situations (modelling cycle) Some visual design conventions used in relational database design. The language of sets … and will result in the following learning outcomes: An understanding of the general modelling cycle and how database designers might liaise with business clients. An appreciation for the fact that good design needs an good appreciation of the data domain, and the data in the domain. An understanding that of the mathematical language of sets. An appreciation that understanding this language will help inform database queries.

Upload: others

Post on 30-Apr-2020

26 views

Category:

Documents


0 download

TRANSCRIPT

WEEK3:RELATIONALDATAMODELLINGI

RelationalDataModellingI:modellingprocessesandthelanguageofsets.Inthisweek,wewillcoverthefollowingtopics:

• Whatmodellingis,andspecificallywhatdatamodellingis.• Whatdatamodellinginvolvesinreal-worldsituations(modellingcycle)• Somevisualdesignconventionsusedinrelationaldatabasedesign.• Thelanguageofsets

…andwillresultinthefollowinglearningoutcomes:

• Anunderstandingofthegeneralmodellingcycleandhowdatabasedesignersmightliaisewithbusinessclients.

• Anappreciationforthefactthatgooddesignneedsangoodappreciationofthedatadomain,andthedatainthedomain.

• Anunderstandingthatofthemathematicallanguageofsets.• Anappreciationthatunderstandingthislanguagewillhelpinformdatabasequeries.

2

WEEK3:RELATIONALDATAMODELLINGI

TableofContents3.1 Modelsandmodellingprocesses...................................................................................3

3.1.1 Whatismodelling?...........................................................................................................33.1.2 Whatisdata/databasemodelling?.................................................................................3

3.2 Theprocessofdatabasedesign/modelling.....................................................................43.2.1 Whatisdatabasedevelopment........................................................................................43.2.2 3keysteps:processesofdatamodeldevelopment.........................................................53.2.3 Introductiontosomevisualconventions.........................................................................6

3.3 Thelanguageofsets......................................................................................................83.3.1 Sets,visually......................................................................................................................83.3.2 Sets:Union,difference,intersection..............................................................................113.3.3 Thesetlanguagerelatestothequerylanguage.............................................................123.3.4 Relationalalgebra...........................................................................................................12

3.4 Summary.....................................................................................................................13Figure1:Databasemodelling...................................................................................................5Figure2:Relatingentitiesgraphically......................................................................................7Figure3:Creatingaset.............................................................................................................9Figure4:Subsetandunion.......................................................................................................9Figure5:Subsetanddifference...............................................................................................9Figure6:Intersectionoftwosets...........................................................................................11Figure7:SetsasVenndiagrams.............................................................................................11Table1:SQLqueryasVenndiagram......................................................................................12

3

WEEK3:RELATIONALDATAMODELLINGI

3.1 ModelsandmodellingprocessesThesamekindsofwordscanbeusedindifferentcontextsand,consequently,takeonslightlydifferentmeaning.Therefore,itisimportanttodistinguishwhatwemeanbymodellinganddistinguish the meaning we are using in the context of databases. We will do this bycontrastingwithaslightlydifferent,butcommonly-used,notionofmodelling.3.1.1 Whatismodelling?Generally,amodelisarepresentationofsomethinginthereal-world,designedtocapturetheessentialaspectsofthat ‘thing’.Ataconcrete levelwemight, forexample,considerachildplayingwithamodel/toyboatbecausetheyhavedevelopedaninterestedinrealboats.Similarly,amarinearchitectmightbuildmorerealisticfeaturesofboatsintoamoreadvancedmodel(physicallybuilt,orcodedintoacomputer)ofarealvesseland,furthermore,mighttestthedesignaccordingtoconditionsofasimulatedenvironment(physicalwavemachines,computersimulationsofseaconditions).Inacademia,equationalmodelsareoftendesignedtocapturefeaturesofthephysicalworld,intheformofequations.Equationsthencanbeusedtoanalysethebehaviouroftherealsystem,analogouslytohowaphysicalmodelofaboatmightbeusedtounderstand‘boat’behaviourwhen,scaled-upinsize,builttocrosstheocean.Equationscancapturephysicalsystems of varying abstraction. For example, Newton’s laws of gravity describe the forceinteractions between objects that have mass, whereas more concrete models might beconcernedwiththeXXXXX.Forallsuchmodelsausefulruleofthumb,attributedtoAlbertEinstein,isthatthemodelshouldbesimple,butnottoosimple:“Everythingshouldbemadeas simple as possible, but not simpler”. The important aspect of modelling identified byEinsteinisthatthereislittlepointinhavingamodelthatisverysimple,ifitdoesnotproperlydescribethereal-worldsituation.Ontheotherhand,amodelthatisoverlycomplicatedwillnotservetoorganiseourunderstanding,either.Abalancemustbestruck.3.1.2 Whatisdata/databasemodelling?Asthetitleofthisweeksuggests,weareinterestedinrelationaldatamodelling,specifically,extendingourlessformalintroductiontothisarea,coveredinthepreviousweek(week2).Thereasonwehavebrieflyconsideredtheabovemeaningofmodelling,commonlyusedinacademiaandscience,istodistinguishitfromthespecificmeaningofmodellingthatconcernsusinthiscourse.AsRitchiestates:“Inthestudyofdatabases,ourinterestisindatamodelling.Theroleofdatamodellingistoprovidetechniquesthatallowustorepresent,bygraphicalandotherformalmethods,thenatureofdatainreal-worldcomputerapplications.Theprincipaljustificationfortheuseofdatamodellingistoprovideaclearerunderstandingoftheunderlyingnatureofinformationandtheprocessingofinformation.”

Ritchie(2002,p19).ReadthelastsentenceoftheRitchiequoteagain:“Theprincipaljustificationfortheuseofdatamodellingistoprovideaclearerunderstandingoftheunderlyingnatureofinformationandtheprocessingofinformation”.Youcouldjustaseasilysubstitutetheword‘information’for ‘data’, and we probably should: in other areas of computer science and computer

4

WEEK3:RELATIONALDATAMODELLINGI

engineering,‘information’meanssomethingmoredistinctiveandrelatestotheamountofsurpriseintheflowofdata(asbinarydigits)throughcables,whichiscapturedbyShannon’slaw,ahighlypragmaticandimportantequationinthesubjectareaofcomputernetworks.Nevertheless,youshouldinferfromRitchie’squotetheemphasisonformalityandduringthisweekwewillbecomemoreformalinthewaywestarttodescribe.Hence,westarttodevelopa languageof relationaldatamodelling.Fortunately, relationaldatabasemodelsarequitestraightforwardtounderstand.3.2 Theprocessofdatabasedesign/modellingBeforewebegintoproperlylearnthelanguage–thetools,(graphicalandother)descriptions,concepts,methodsetc.–ofrelationaldatamodelling,wewillintroducethepracticalstepsthattypicallytakeplaceduringtheearly-stageprocessofdatabasedevelopment.Ofcourse,the actual steps that take place will vary according to the specific application domain.Nevertheless,evencaricaturingthisprocesssomewhatwillgiveusafeelforthisimportantaspectofdatabasedevelopment.RememberthetitleofthiscourseisDatabaseDevelopmentand,while it is importantto learntheappropriatetechnical languages,wealwaysneedtoremembertheirpracticalcontextandpurpose.3.2.1 WhatisdatabasedevelopmentAswithmanyaspectsofsoftwaredevelopment,anabilitytodevelopusefulsoftwaredependson the developer requiring a knowledge of the problem domain, and in databasedevelopmentthismeansdevelopingofviewofthedata,basedonwhatthedatabasewillbeused forand thebestway inwhichdata canbeaccessedand storedefficiently for thesepurposes.Thedomain, i.e., thetargetforthedatamodellingexercise,ofcourse,containsthetargetdata.Aswehavediscussedpreviously,thisdatamightbestoredasafile-basedsystem.Itisimportant,forthedeveloper,tounderstandhowthedataisunderstoodfromthepointofviewofthecurrentuserofthatdata,inthecontextoftheclientbusiness,forexample.Thedatawillbeusedwithinthiscontextinacertainwayand,consequently,theuserwillhavedevelopedaviewofthedata–thiswillincludehowthedataisrelated,stored,used,updatedetc. In order that a useful database representation of the data can be developed, it isimportantduring the initial stages,aswith softwaredevelopmentmoregenerally, for thedeveloper(inthiscasethedatabasedeveloper,ordatabasedevelopmentteam)togetagoodunderstanding of how the clientunderstands and uses the data, aswell as developing aknowledgeofthecurrentstorageandrepresentationofthedataitself.Inordertodevelopthisunderstanding,questionsshouldbeaskedtofacilitatethisand,againaswithsoftwaredevelopmentgenerally,clientinteractionswillbequitefrequentduringtheinitialstages.Whenthedatabaseteambegintostarttoimplementthedatabaseitisagoodideathatthedatamodellingiscomplete.Thatis,forrelationaldatabasedesigndevelopmentisdoneinatop-downfashion.Thephrasetop-down,here,referstotheideathatthoroughcomesfirst,beforethedatabaseisimplemented.Inrecenttimes,theideaofagilesoftwareapplicationdevelopment has taken hold, which emphasizes a modern approach to development, incontrasttothemoretraditional,top-downapproachestosoftwareengineering.However,an

5

WEEK3:RELATIONALDATAMODELLINGI

agilemind-set innotbestsuitedtorelationaldatabasedevelopment.This isaviewiswellexpressedbyAllardice(2015):“…planning is vital [for relational database development]. Whereas in other areas ofdevelopmentsuchasagileincrementalapproaches(buildsomethingquickly,getitoutthere,reviseit,addnewfeatures[quickly]overweeksorevendays),relationaldatabasesrewardup-frontplanningbecausetheentirepointistoimposerules,constraintsandstructureonthedataandwedon’twanttohaveonesetofrulesforoneweekandanothersetofrulesforanotherweek.”

Allardice(2015)

Figure1:Databasemodelling

Therefore,thedatabasemodellingprocessitselfisabouthowwegetfromtheuser-definedviewofthedata,howthedatagetsprocessedandupdatedetc.,totheassociated,databaseviewofthedata3.2.2 3keysteps:processesofdatamodeldevelopmentFigure1presentsaschematic,whichhelpsvisualizetheintendedprocessofmovingfromtheapplicationdomaintothedesignofthedatabaseitself.Notewell,therearethreemainstepsduring the database development process. These stages all concern the process ofidentificationandspecificationofentitiesandrelationshipsbetweenentitiesfromthedata.Eachlevel,from1to3,isintendedasaphaseofidentification,eachstepbeingfollowedbyanotherone,whichaddsdetailsandspecification,suchthedatamodelbecomesmoreandmoreexplicitovertime.

1. Conceptual data modelling (least detailed): this is to identify the salient entitieswithinthedataandthemostimportantrelationshipsbetweenthem.Giventhattwoentitieshavebeenidentified(say,Customer,andProduct),forexample,weknowthatthesetwoentitiesarerelatedinsomeway.Atthisstageitistypicaltoremainquitehigh-levelinourdescriptionsfromthepointofviewthatwedonotneedtodisclose

Understanding

User Domain

User View

1 - Conceptual View

2 - Logical View

3 - Physical View

Data

Business

Context

File System

Data ?

Data ?

Data ?

Data ?

Data Modelling Process

Iteration

Itera

tion

Data ?

?

?

?

?

?

?

DATABASE

DOMAIN OF DATA MODELLING

?

?

Increasing explicit data model

6

WEEK3:RELATIONALDATAMODELLINGI

all of the detail. For example, we only really need to know that Customers andProductsareentities;thereisnoneedtogofurtherbybeingexplicitaboutCustomerandProductattributes–e.g.,‘CustomerNumber’,‘FirstName’,‘Surname’,‘Address’,etc.,orforproduct‘ProductNumber’,‘ProductName’,‘Category’.

2. Logicaldatamodelling(moredetail):progressingfromthepreviousstage,westart

toidentifyalloftheentitiesinthedatamodel,andalloftherelationships.Foreachentitywespecifyalloftheattributesandidentifytheprimarykeyinthedata,alongwith the identification of foreign keys, which are used to define the relationshipsbetweenentities.

3. Physicaldatamodelling(fullydetailed):atthisstageacomprehensivemodelofthe

data is inplace, in the formofa logical,entity relationshipdiagram.However, thisadditionalstepmightincludealterationstothelogicalmodel.Thisdiagramisspecifiedintabularform,includingalloftheprimarykeys,relationships(foreignkeys)etc.

Steps2and3alsoinvolvesomethingknownasnormalization,andpossiblydenormalisation.However,wedonotcoverordefinetheseaspectshere–wewillreturntothistopicafutureweek.Again,itisimportantweemphasisethatthethree-stepprocessofdatamodellingisoneofdesignonly,noimplementation.3.2.3 IntroductiontosomevisualconventionsAsinmanyareasofsoftwareengineering,theoutputofthedesignprocess–thedesignitself–canbecommunicatedtoothersinnaturallanguage(i.e.,text),butadiagramcansometimesbeworth a thousandwords. Therefore, just likewemight useUML to communicate thedesignofanapplicationprogram,databasedesigncomeswithitsowndiagrammatictools.Webrieflylookatcommonlyuseddiagramshereandasimpleexample.In Figure 2, we present some common graphical components typically used to constructentity-relationshipdiagrams.Entitiesaresquare-boxed,Attributesarerepresentedbyovalsandactions,whichrelateentities,arerepresentedbydiamonds.Figure2alsocontainstypicallinetypeusedtodenotethemappingbetweenentities–mappingscanbeofvariouskinds(one-to-one,one-to-many,one-to-zero-or-many,one-to-one-or-many,many-to-many).Intheexample,wehavethreeentities(Customer,ProductandOrder).Asingular,concretesituationcanbeimaginedasfollows:acustomer isbrowsinganon-lineshoppingweb-sitetheyhavealreadycreatedanaccountwith.He/sheclicksonaproduct,likesthelookofitandconsequentlyorders the product. In a similar vein, but out of sight of the customer, thecustomerconsequentlygeneratesanorder–i.e.,astoredsetofinformation,thatcontainstheproduct,whichisusedintheendbythecompanytomakesuretheproductarrivestothecustomer. Looking at the examplewe can see that these various actions relate the givenentities.However, the purpose of the diagram in Figure 2 is not to merely represent any singlesituation,specifictoagivencustomerorderingaproduct.Thepurposeistorepresentallsuchsituations.Forexample,takethreeseparatesituations(SituationA,SituationBandSituationC),whicharealldifferent.SituationAistheonewejustdescribed:asinglecustomerordering

7

WEEK3:RELATIONALDATAMODELLINGI

a single item, resulting in the generation of a single order that contains the product.Alternatively,insituationB,wecanhaveacustomerwhohasregisteredanaccountbutwhohasplacednoordersatall.SituationCisdifferentagain–customerChasvisitedthesitemanytimes, on occasions ordering a varying number of products at any one time, afteraccumulatingthemintheirbasket,thentriggeringoneorderoncompletionfornumerousproducts.WethereforerequiretherelationshipbetweenCustomers,ProductsandOrdertobewelldefined,meaningthat thedefinitionsof therelationshipsneedtorepresentreal-worldusecases.TheexampletriadicrelationshipbetweenCustomer,ProductandOrderisdefinedquitespecificallyinthediagram.Inadditiontotheactionsthatconnecttheentities,theconnectionsspecifywell-definedconstrains,asfollows:

• ACustomercanorderoneormanyproductsatonetime.• ACustomercangenerateoneOrderwhenpurchasingtheirproducts,butovertime

theymightaccumulatemanyofthem.• AProducttypecanbeordernever,once,ormanytimesfromanygivenCustomer• AnOrdercancontainoneormanyProducts.• AnOrdermightneverbegeneratedbyacustomer(someonewhoonlyregisteredthe

accountanddidnotbuyanything),mightbegeneratedonce(ormanytimes)byothercustomers.

Technicaldiagramsareusefultopeoplewholearnthem.Althoughittakestimetolearntheirmeaning,intheendthishardworkwillpayoff.Thisisbecausediagramsthenspeaktothedeveloperaboutthedesign,andinquiteanefficientway(comparetheamountoftextwrittenaboutthediagram,above,withthesimplicityofthediagramitself).Overtimeyouwilllearnto read diagrams quickly, and create them such they you can communicate your owncreationstoothersasefficiently.

Figure2:Relatingentitiesgraphically

Entity

Attribute

Action

Entity: the ‘thing’ in question. Often a tangible object in the real world (Person, Product), but see text for comments.

Attribute: An attribute ‘belongs’ to an entity and together and the entity will typically have several attributes; together the constitute the entity.

One to one: The straight line is used to denote the fact that an entity is ‘defined by’ the attribute.

Action: used to represent actions between entities.

One to many: Where one entity relates to many instances of another entity.

One to zero or many: Where one entity may relate to zero or many of instances of another entity.

Many to many: Where one entity may relate to many instances of another entity, which may also relate to many instances of the entity.

Customer

Generates

Product

Order

Orders

One to one or many: Where one entity may relate to many instances of another entity, which may also relate to many instances of the entity.

Contains

Example

8

WEEK3:RELATIONALDATAMODELLINGI

3.3 ThelanguageofsetsManytextbookshaveatleastsomereferencetothenotionofsets.Forthebeginner,thesesectionscanquiteoftenlookintimidating,giventhestrangelookingnotionused,e.g.:

𝑚 ∈ 𝐴 ⊆ 𝑊or:

𝑆⋂𝑇 = {𝑎, 𝑏}However,thisstrange-lookingnotation,likemostmathematics,ismerelyaformofshorthandthatonlylooksstrange,butinfactfairlyeasytounderstand.Indeed,manyconceptsrelatedto set theory are so simple that they can actually be illustrated visually, which we willdemonstratesoon.Thereasonitisworthwhilereviewingthelanguageofsets,apartfromtoputtheintimidationfactorinitsrightfulplace,isbecauseset-basedoperatorsunderpinwhatisknownasrelationalalgebra,thetheoreticalfoundationofrelationaldatabasesandquerylanguages.Sets thereforeprovidetheaxiomaticsubstrate formanyof thepractical thingsthatyouwilldoinrelationaldatabasedevelopment.Asetis“acollection,possiblyinfinite,ofdistinctnumbers,objectsetc.,thatistreatedasanentity in itsownright,andwith identitydependentonlyuponitsmembers” (BorowskiandBorwein,1989).Themembersofasetthushaveincommonthattheyaremembersoftheset,andforsomereason:KenyaandMauritiusareinsidethesetAfricanCountries;ScotlandandWalesareinsidethesetUnitedKingdom;theiPhone,iMacandiPodareinsidethesetAppleProductsetc.IntheseexamplesKenya,Mauritius,Scotland,Wales,iMac,iPhone,iPodetc.aresaidtobemembersorelementsoftheset.Membersofsetsaretypicallydenotedwith lower-case letterswhereas sets themselvesaredenotedwithupper-case letters. Forexample:todenoteMauritiuswemightchoosetheletter‘m’;todenotethesetofcountriesAfricawemightchoosetheletter‘A’,andtodenoteallofthecountriesintheworldwemightchoosetheletter‘W’.ItisclearlythecasethatAfrica,A,isasetofcountriesthatis‘containedwithin’thesetofcountriesoftheworld,W,andthatMauritius,m,isinAfrica.Therefore,themathematicalstatement:

𝑚 ∈ 𝐴 ⊆ 𝑊says“MauritiusisinAfrica,whichisasubsetofallthecountriesintheworld”.Hence,thereisnothingmysteriousaboutthemathematicalform,onceyouknowwhatthesymbolsmean.3.3.1 Sets,visuallyLetusconsiderasetofobjectstobeginwith:abicycle,amotorbike,acar,abus,atruck,aboat,asailingboat,anairplane.Wefollowtheideathatasetofobjectscanbethoughtofasbeingcontainedtogetherinsidetheset,whichisvisuallyrenderedasasee-throughcontainer(Stewart,1995),suchthatweknowwhichobjectsarethere.Additionally,wedenotetheactofplacingtheobjects inthecontainerwiththemappingsymbol(↦). Inthisway,Figure3definestheactofcreatingasetofvehicles.

9

WEEK3:RELATIONALDATAMODELLINGI

Figure3:Creatingaset

Althoughwehavedecidedtoplacealloftheobjectsintoasingleset,weareatthesametimeabletorecognizethateachindividualmemberhasattributes.Forexample,wemightnoticethatthebicycleistheonlyvehiclewhoseengineisalsothedriver,whichmakesthebicyclequiteunique,althoughitsharestheattributeofhavingwheelswiththemotorbike,motorcar,busandtruck.Wecanrendertheseobservationsvisually,asinFigure4.Wesimplyplacethebicycleinsidea ‘bag’ (left-hand picture), inside the see-through container, and we do likewise (middlepicture)withthevehiclesnotpropelledbyahuman‘engine’,butwhichdohavewheels,whichofcoursecanbeusedtorollalongroads.Sofar,then,wehaveidentifiedthesetwosets,andwecanseethattheyaresubsetsofthesetofvehicles.Thelargersubsetofthesetofvehicles,i.e.,thevehicleswithwheels(right-handpicture)isatypeofunificationofthesetwosets,knownastheunion.

Figure4:Subsetandunion

Definitions:• Propersubset:asetthatcontainsmembersthatarealsomembersofalargerset,

atthesametimeexcludingothermembersofthelargerset.Sub-setsare,then,bydefinition,alsoaset.

• Union:thesetofelementsthatbelongstoeitherofagivenpairofsets.

Figure5:Subsetanddifference.

bicycle truck

motorbike

aeroplane

motorcar

bus

motorboat

sailboat 7!set of vehicles

set of vehicles set of vehicles set of vehicles

7!A B C D

10

WEEK3:RELATIONALDATAMODELLINGI

Inadifferentway,ratherthannoticingaunificationoftwoexistingsets,wemightstartwitha subset, say all of the vehicles that use energy derived from fossil fuels (i.e., airplane,motorbike,motorcar,truck,bus,boat)andwantto‘getat’membersofthisset,butonlythosethataredesignedtobepropelledonaroadsurface.ThissituationispresentedinFigure5.Weplacethevehiclesthatrunonfossilfuelsinsideabag(left-handpicture‘A’)andwedolikewise forallof thevehicleswithwheels (left-handpicture ‘B’).Thenwetakethesetofvehicleswithwheelsthatdonotusefossilfuels,thebicycle,identifiedpreviously(left-handpicture‘C’).Fromthesesetswecanderiveset‘D’(right-handpicture).Definition:

• Difference:thesetofmembersofthefirstsetthatarenotmembersofthesecond.Withthisdefinition,westartwithB(setofwheeledvehicles)andwe‘minus’setAtogetridofallthevehiclesthatarenotsuitablefortheroad.Whenwehavecompletedthisoperation,weminussetCfromtheresult,thebicycleinthiscase,whichispropelledbyahuman,nottheburningoffossilfuel,leavingset‘roadvehiclesthatrunonfossilfuel’,i.e.,setD:

𝐵 − 𝐴 − 𝐶 = 𝐷Anotherimportantaspectofsetoperationiswherewehavetwoseparatesets,andwewanttoknowifthetwosetshaveanycommonmembers.Theintersectionprovidestheanswertothis.Definition:

• Intersection:thesetofmembersofthefirstsetthatoverlapwiththesecond,ormoregenerally,thesetofelementsthataremembersoftwoormoregivensets.

AnexampleintersectionispresentedinFigure6.Wehavetwoseparatesets,thistimedrawninsidetwosquares.There isthesetofvehiclesthatpollute theatmosphere,becausetheyburn fossil fuels in the process of acceleration, and the set of 2-wheeled vehicles. Theintersectionisasinglememberset,themotorbike,whichhastwowheelsandpollutes.IfwedenotevehiclesthatpollutewithaletterP,theset2-wheeledvehicleswithletterQ,andthesinglemotorbikeasR,thenthefollowingsymbolicrepresentationdescribesthesamepicture:

𝑅 = 𝑃 ∩ 𝑄

11

WEEK3:RELATIONALDATAMODELLINGI

Figure6:Intersectionoftwosets.

3.3.2 Sets:Union,difference,intersectionTheunion,differenceand intersectionofsets formthree importantoperators in termsofrelationaldatabases.Upuntilnowwehavebeenvisualisingtheset,andoperationsonsets,astwo-dimensionally.AnotherwaytodothisisbyusingVenndiagrams.Venndiagrams–illustratedinFigure7–canbeusedtogenerallyrepresentwhatisinsideagivenset,irrespectiveofwhatthismightbe.Asetisrepresentedbyacircleandwheretherearemembersofsetsthatarethesame,thisisrepresentedastheoverlappingareasofthecircles.Wethushavethreesets(A,BandC).Thesixexampleoperationsshowdifferentwaysofidentifyingareasofthesets,individually,orasacombination.Theshadedareascorrespondtheresultoftheoperationsdefinedunderneaththepictures.Here,weonlyhavethreesets,butthepowerofthesettheoryisthatoperationscanbeappliedtoanynumberofsets,eventhoughthisisdifficulttothenvisualize‘onpaper’,sotospeak.Thedifferentoperatorsshownherecorrespondtotheoperationdescribedabove–i.e.∪,∩,−aretheunion,intersectionanddifferenceoperators,respectively.

Figure7:SetsasVenndiagrams.

Vehicles that pollute

2-wheeled vehicles

2-wheeled vehicles that pollute

A

B C

A [B

A \ C

(A [B) \ C

B \ C A�B

(B �A)� C

12

WEEK3:RELATIONALDATAMODELLINGI

3.3.3 ThesetlanguagerelatestothequerylanguageYoumight have been wondering what all of this has to do with databases?Well, it haseverythingtodowiththem.Wehavepreviouslydiscussedhowadatabaseisnotjustawayofstoringafilesysteminmemory,butisa“…sharedcollectionoflogicallyrelateddataandits description, deigned tomeet the information needs of an organisation” (Conolly Begg1.31).Ourcoverageofbasicsetexplanationsservestohighlightthatsetsarealsologicallyrelatedandthislogiccanbeexploredandexploitedintheprocessofdatamodelling,suchthatwecanaccessthedatausing,essentially,thelanguageofsetoperators.Tip: it can be very useful to start thinking about sets, and set operations, because theyunderpin much of the syntax in the statements of query languages (the programminglanguagesusedtoprocess,accessandmanipulatedatafromdatabases).WewillintroduceaquerylanguageknownasSQLinmoredetaillaterinthecourse,butfornow,justtodemonstratetherelevanceofthelanguageofsetswepresentTable1,whichindicatesthatthetypicalstructureofanSQLcommandistoSELECTasubsetofdataFROMadatabasetables(ortables),inaccordancewithCONDITIONSthatyouspecify.Indeed,SELECTandFROMareSQLkeywords; conditionsareoftenusually specifiedwithnumerousotherkeywords(e.g.,WHERE,IFetc.etc.).YouareencouragedtogoandresearchspomesimpleSQLstatementswiththisbasicstructureinmind;itwillhelpyoustarttounderstandingthelogicofdataselectionandhowdatabasescanbe‘asked’togiveuswhatwewant.TypicalstructureofSQLqueries:SELECTthesubsetofdata.FROMadatabasetable.CONDITIONSspecifyingthesubsetbasedonconditionsyouprovide.

Table1:SQLqueryasVenndiagram

3.3.4 RelationalalgebraRelationalalgebraconsistsofanumberofoperationsthatwillbeusefulgoingforwards,andsowelistthemhere.

• Difference:• Intersection:• Join:• Product:• Project:• Restrict:

6

32

4

5

1

7 8

9

9

Red product

Green product

13

WEEK3:RELATIONALDATAMODELLINGI

• Union:Youareencouragedtogoandlearnwhatthesetermsmean.TheRitchiebookisagoodplacetostart.3.4 Summary

• ‘Modelling’ is a widely label, in everyday life, but also heavily used within thecomputationalsciences.

• Wethusexploredthemeaningofmodelsandthemodellingprocess.Thisallowedus

to specifically define and discuss ‘datamodelling’ and ‘databasemodelling’ in thecontextofrelationaldatabases.

• Database development as a sequence of iterative steps involving conceptual

modelling,logicaldatamodelling,followedbyphysicaldatamodelling.

• We introduced some standard visualdiagramming techniques related to relationaldatabasemodelling.

• We then introduced the language of set theory, which is the mathematical

underpinningofthequerylanguageofrationaldatabases,knownasSQL.

~~~~~