organizing and modelling data
DESCRIPTION
Organizing and modelling data. From Information Technology Group www.wageningenur.nl/inf. Sjoukje Osinga. Gert Jan Hofstede Teacher Course Data Management, INF-21306. Why manage data?. The organization that loses its memory, loses its life Data to manage are everywhere! - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/1.jpg)
1
Organizing and modelling data
Gert Jan HofstedeTeacher Course Data Management, INF-21306
Sjoukje Osinga
From Information Technology Groupwww.wageningenur.nl/inf
![Page 2: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/2.jpg)
2
Why manage data?
• The organization that loses its memory, loses its life
• Data to manage are everywhere!– Experimental data, model inputs, model
outputs…• ..but can all this be managed?
– most of it just grows unmanaged– some of it is managed with spreadsheets or
databases
![Page 3: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/3.jpg)
3
Data Management Course topics:The place of data management
Why manage data?What is a database?
Database design (week 2)Advanced SQL (week 3)Architectures (week 4)
Managing (week 5)
.... and some additional topics
![Page 4: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/4.jpg)
4
The place of data management
• Manage: – personnel, finance, equipment, information
• In an organization’s information system you have– People– Procedures– Data sets– Software– Hardware
Project mgmtData mgmt
(This
course)
IT mgmt
![Page 5: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/5.jpg)
5
Why model data?
• Research:– results are hidden in piles of paper– data files lack documentation– costly or impossible to use existing data
• Management:– redundancy leads to errors– data structures are stable over time– A good design saves programming cost
![Page 6: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/6.jpg)
6
Database design in research
Have a research question• Try out;• Think and rethink;• Design ‘real’ datamodel;• Collect data;• Query & Interpret data(Write article)
![Page 7: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/7.jpg)
7
What is a database?• Theoretically:
– a coherent collection of data– searchable as one whole– by many people
• In practice: – a collection of related 2-dim tables– rows are “things”– columns are “attributes”– special software “DBMS” is needed
![Page 8: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/8.jpg)
8
A database table column
row
![Page 9: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/9.jpg)
9
Database: more than tables.The fact that one employee can be another’s
boss: a one-to-many relationship
![Page 10: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/10.jpg)
10
Two tables- the usual caseof 1 to many -
![Page 11: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/11.jpg)
11
The same tables: data structure (metadata)
![Page 12: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/12.jpg)
12
Databases only work if they...
• are actually stored into the computer (procedures)
• can be accessed (availability)• can be understood (meta-data)• are the right data to look at (design)• are properly looked at (query)• …
(This
course)
![Page 13: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/13.jpg)
13
Problems• redundancy• poor control of data, compared to
money, machines, personnel.• poor interface• gap with real world needs• no integration
(any examples known to you?)
![Page 14: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/14.jpg)
14
Solutions?
• Improving the organization– people– procedures (a.o. information management)– communication
• Improving technology– a.o. database
![Page 15: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/15.jpg)
15
What is database design?• Finding out
– Which facts you wish the db to remember• about which things ( entity types, tables)• what data to keep about those things ( attributes)
– how the facts link the data together ( relationships)• Not
– process, data flow– experimental design
![Page 16: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/16.jpg)
16
Database and world
• “A field has one or more facets”– what counts as a field, or facet?– who says so?
• Agreeing on definitions is a prerequisite!– E.g. Mars orbiter: inches vs metric...
Rice, Bhutan
![Page 17: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/17.jpg)
17
Data modelling exercise
You are in charge of designing a database to find out which teachers give which lectures where and when in your course programme.
This is the main ‘fact type’ you need to store.Find out which entities are important.Find key attributes.Draw an Entity-Relationship diagram to show
the structure.
![Page 18: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/18.jpg)
18
Possible course data model E-R diagram
COURSE(course#,
dept)
COURSE-INSTANCE(course#, trimester,room#)
ROOM(room#,address)
TEACHER(tname,
dept, telno)
LECTURE(course#, trimester,
tname, datetime, endtime, title
occurs as includes
is scheduled in deliversLegend: according to Hofstede
What if several rooms per
course-instance?
![Page 19: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/19.jpg)
19
Datatype or data value? (cf p. 184)
e.g. Measurement data.(a) POINT (x,y, date1, a, b, date2, a, b, c)
or (b)POINT (x,y)MEASUREMENT (x,y,date, a, b, c)
or (c)POINT (x,y)MEASUREMENT (x,y,date,type, value)
See: Hofstede (2002) Databases modelleren, bouwen en gebruiken
Where are a, b, and
c?
![Page 20: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/20.jpg)
20Gert Jan Hofstede - Data Management
Law of no escape from trouble
• There is no escaping from choosing.• Data types (columns) vs data (rows):
design issue!(a) efficient but no measurements can be added(b) measurements can be added but not new ones(c) flexible, extensible but not efficient
![Page 21: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/21.jpg)
21
SQL, Structured Query Language
• One language for all 3 levels of database architecture:– regulate user level (grant, revoke)– create data (create, drop, alter)– regulate storage (create index, tablespace…)
– see data or metadata (select)
Ch 10
User program
User program
User program
Data dictionary
storage storage
![Page 22: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/22.jpg)
22
SQL
• format of a select statement (‘query’):select < what you want >from < where it is stored >[ where < some conditions apply > ];
• e.g. select itemname from qsale;
• also used from within Java, PHP…
![Page 23: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/23.jpg)
23
Other issues / good practices
• Always save structured data also in a raw format that can be ‘read’ without the software. So not only .xls, but also .csv
• This is harder for data in databases – save all tables also as .txt when possible
• When all else fails, you can still import the text version into new structures with new software
• Can another person use + understand your work?
![Page 24: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/24.jpg)
24
“A butterfly’s wing can change the world”
When you run a simulation model:• Simulation software changes• Hardware changes (e.g. 16->32 bits)• Especially relevant for random generatorsCan you still reproduce your results?• Short term: Always store results + model version • Longer term: Save random seed (and algorithm)• Forever: impossible.
![Page 25: Organizing and modelling data](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56816933550346895de086bc/html5/thumbnails/25.jpg)
25
“Communicate with the future” !(EU funded project called ‘Shaman’)