fromexcelto database#...jun 21, 2017 · somerules! single copy for valued data " valued...
TRANSCRIPT
![Page 1: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/1.jpg)
Bo Yao 06/2017
From Excel To Database
![Page 2: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/2.jpg)
Outlines
• Excel vs. Database • Fundamental Knowledge of Database • Database Design Strategies • Database Services Provided By BICF
– Project Examples
![Page 3: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/3.jpg)
Data Record Issues
• Most of experimental data or results are recorded in Excel files
How to understand the variables and inputs recorded by other persons?
How to clean, pick up, or combine the data from several excel files?
How to safely transfer data from a leaving person to a new hire?
How to avoid typos and mismatches in excel files
How to control data access permissions and data usage
………
![Page 4: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/4.jpg)
Reasons for Issues
How to understand the variables and inputs recorded by other persons?
How to clean, pick up, or combine the data from several excel files?
How to safely transfer data from a leaving person to a new hire?
How to avoid typos and mismatches in excel files
How to control data access permissions and data usage
No codebook or dicUonary
Weak search funcUon in Excel
No centralized data management. No code record standards
No self check and validaUon when inpuXng data
Weak funcUons of data access control in excel
![Page 5: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/5.jpg)
Excel vs. Database Excel File Online Record Online Advantage
Access Loca6on Local Machine Internet Share Easier for collaborators
Data Source MulUple Copies on different machines
Single Data Source Easier for Data Version Control and Maintenance
Data Input Slow and wrong-‐input risk
Quick and Standard Input 1) Validate User’s Input 2) Allow Batch input
Access Permission Control
Weak Strong Contain mulUple access protecUons
View Change History None Possible Clinical InformaUon Change History is
Recorded
Unexpected Informa6on Dele6on
None Can be recovered The clinical informaUon deleUon can be
recovered in a short Ume
Data Backup None Periodic Data Backup Avoid Data Missing
Data Summary Weak Strong Quickly Generates Summary Graph and
Records
![Page 6: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/6.jpg)
SuggesUons
• Excel – Quick – Flexible – Personal – Small projects – Temp / Short-‐term
• Database – Design before usage – Standard – Team work or shared – Large projects – Long-‐term
![Page 7: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/7.jpg)
About Database
• Many Database Systems
.txt
.ini Registry Excel xml
Flat Database
Oracle SQL_Server MySQL
RelaUonal Database
Redis Tokyo_Cabinet
Flare
Key-‐Value Database
MongoDB CouchDB
Document Oriented Database
Cassandra Voldemort
Distributed Database
![Page 8: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/8.jpg)
Learn RelaUonal Database • A relational database (RDB) is a collective set of
multiple data sets organized by tables, records and columns. RDBs establish a well-defined relationship between database tables. Tables communicate and share information, which facilitates data searchability, organization and reporting. (https://www.techopedia.com/definition/1234/relational-database-rdb)
• Top Questions o How to assign variables into tables? o How to set up constraints between these tables? o How to speed up search query?
![Page 9: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/9.jpg)
Example
School Management System Database: MySQL
![Page 10: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/10.jpg)
Fundamental Knowledge – Database Components
Database
Table 1 Table 2 Table 3
Table 4
variables
variables variables variables
![Page 11: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/11.jpg)
Fundamental Knowledge -‐ Variables
• Name • Type
– string: varchar, text… – number: int, float, decimal… – Date: date, dateUme,datestamp – Blob
• Default value • Is Null • Is Auto Increment • Is Key
– Primary key – Foreign key – Unique key
• Charset
![Page 12: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/12.jpg)
Fundamental Knowledge -‐ Keys
• Key is to data self-‐check or self-‐constraint
IdenUfier for row; Unique in table; AutomaUc index
Primary Key
Value is limited to value list of a variable of another table
Foreign Key
Unique in table
Unique Key
![Page 13: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/13.jpg)
Examples -‐ key
PersonID Varchar(10) 155556 155557
Name Varchar(255) Eric Yao Tiger Yao
Birthday Date 12/12/2010 11/11/2011
SSN Varchar(20) 111-‐11-‐1111 222-‐22-‐2222
Department Varchar(255) Clinical Sciences BioinformaUcs
JobTitle Varchar(255) Web Developer I Postdoc
… … … …
Employee Table
Primary Key
Unique Key
![Page 14: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/14.jpg)
Examples
PersonID Varchar(10)
Name Varchar(255)
Birthday Date
SSN Varchar(20)
Department Varchar(255)
JobTitle Varchar(255)
… …
PersonID Varchar(10)
Salary Decimal
… …
Employee Table
Salary Table
![Page 15: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/15.jpg)
Fundamental Knowledge – Codebook and DicUonary
• Codebook is to summarize the categories of variable • Codebook is to standardize data input
Race • Asian • American African • White • …
Smoking Status • Current Smoker • Former Smoker • Non Smoker • …
Diagnosis • Yolk sac tumor • Embryonal carcinoma • Choriocarcinoma • …
![Page 16: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/16.jpg)
Example of Codebook and DicUonary
hnps://qbrc.swmed.edu/projects/gct/documents/GCT%20CodeBook_v3.4.pdf hnps://qbrc.swmed.edu/projects/gct/documents/GCT_dicUonary_v3.4.pdf
Codebook DicUonary
![Page 17: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/17.jpg)
Simple Conclusions – RelaUonal Database
• Consisted of several tables • Tables are linked by foreign keys • Keys are set as data constraints (self-‐check) • Codebook / dicUonary is to data input standards
![Page 18: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/18.jpg)
How to design MySQL Database
• Main consideraUon before Design – Database size – Data Loading Methods – Data sensiUvity – End users – The aims of data collecUon – User account controls – Data backup – Data encrypUon
![Page 19: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/19.jpg)
Basic Requirements
Data Consistence
No mismatch
Least Redundancy
Good space usage
Scalable
PotenUal for bigger data
Quick Query
Query performance
Data Standards
Avoid typos
![Page 20: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/20.jpg)
Some rules u Single copy for valued data
² Valued variable only exists in one table
u Avoid performance to go down while records are increasing ² The number of records in one table should be less than 10^7
u Key / Constraints to avoid wrong input ² Linked as many tables as possible
u Atomic information stored in individual cell (e.g. avoid information like 'black,white' in one Race cell
² Combined values in one ‘cell’ is difficult to search or be indexed
u Set codebook as categorized variables ² Data standards
![Page 21: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/21.jpg)
Database Design PracUce
Database design task
• QuesUon: Create a MySQL database ‘test’ to contain this informaUon. (No data input, only schema)
Sample'ID'(auto.increment)' 1' 2' 3'Patient'MRN'*' K3212d' Ge23ds3' Kid02112'Surgery'Date'*' 03/23/2016' 05/12/2016' 06/12/2016'Procedure'*' Surgery' Biopsy' Biopsy'Sequencing'Platform' Illumina' Affymetrix' Agilent'Data'Type' Row' Processed' Processed'Create'Date' 05/18/2017' 05/18/2017' 05/18/2017''
![Page 22: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/22.jpg)
MySQL Tools
• MySQL management tool – phpmyadmin
• Database Client Tool – DbVisualizer – DataGrip
![Page 23: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/23.jpg)
Codebook Tables • CodeProcedure
CREATE TABLE CodeProcedure ( ID int(2) NOT NULL, Proc varchar(40) NOT NULL, PRIMARY KEY (ID), UNIQUE KEY Proc (Proc) ) ENGINE=InnoDB DEFAULT CHARSET=laUn1
• CodeSeqPlarorm
CREATE TABLE CodeSeqPlarorm ( ID int(2) NOT NULL, SeqPlarorm varchar(40) NOT NULL, PRIMARY KEY (ID), UNIQUE KEY SeqPlgrorm(SeqPlgrorm) ) ENGINE=InnoDB DEFAULT CHARSET=laUn1
• CodeTypeData
CREATE TABLE CodeTypeData ( ID int(2) NOT NULL, TypeData varchar(40) NOT NULL, PRIMARY KEY (ID), UNIQUE KEY TypeData (TypeData) ) ENGINE=InnoDB DEFAULT CHARSET=laUn1
![Page 24: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/24.jpg)
Sample Table CREATE TABLE Sample ( ID int(10) unsigned NOT NULL AUTO_INCREMENT, MRN varchar(40) NOT NULL, DateSurgery date NOT NULL, Proc int(2) NOT NULL, SeqPlarorm int(2) DEFAULT NULL, TypeData int(2) DEFAULT NULL, CreateDate date DEFAULT NULL, PRIMARY KEY (ID) ) ENGINE=InnoDB DEFAULT CHARSET=laUn1
![Page 25: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/25.jpg)
Check Database Schema
(created by DBVisualizer)
![Page 26: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/26.jpg)
Add Data Constraints
• Add foreign keys
ALTER TABLE Sample ADD CONSTRAINT s_procedure FOREIGN KEY (Proc) REFERENCES CodeProcedure(ID); ALTER TABLE Sample ADD CONSTRAINT s_seqplarorm FOREIGN KEY (SeqPlarorm) REFERENCES CodeSeqPlarorm(ID); ALTER TABLE Sample ADD CONSTRAINT s_typedata FOREIGN KEY (TypeData) REFERENCES CodeTypeData(ID);
![Page 27: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/27.jpg)
Final Database Schema
(created by DBVisualizer)
![Page 28: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/28.jpg)
Quick Summary
• Codebook • Meaningful naming • Data type selecUon • Key selecUon
![Page 29: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/29.jpg)
Database Services From BICF
• Help desk for consulUng – Database design – Web portal design and development – Training
• Complete service for design and implement – Database: database design, data loading, maintenance, and periodic backup
– Web portal: design, development, deploy, and maintenance
![Page 30: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/30.jpg)
Project Example
• Help Desk -‐ NutriUon Center
Help with database design to speed up data query
Database
Code checking to enhance web site security
Website Security
Advices to web user interface and funcUon to improve web usage performance
Website Enhancement
![Page 31: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/31.jpg)
Project Management
• Complete service – Children’s Hospital
• Pediatric Biobank – Record paUent’s clinical data – Database and Web Portal
![Page 32: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/32.jpg)
Pediatric Biobank
Secure Account System
User-‐friendly Data Input and Search
Track Account Login History
Track Clinical Data Change History
Collaborators Online Record Tool
![Page 33: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/33.jpg)
Hardware Architect
Outside Internet
Firewall BICF Virtual Server
Clinical Server
Website Database
UTSW Internal User
Data Backup Server
![Page 34: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/34.jpg)
Data ClassificaUon
• To standardize the input of clinical data, we classify the variables
Basic Informa6on Diagnosis
Chemotherapy Radia6on
Stem Cell Transplant Cancel Predisposi6on
Family History Others
![Page 35: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/35.jpg)
Pediatric Biobank Tool
PaUent Search PaUent InformaUon Input
Data input and query
![Page 36: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/36.jpg)
Data
UTSW Firewall
Secure HTTP web access
Clinical Server AuthenUcaUon
Mysql Database AuthenUcaUon
SensiUve Data Encrypted in Database
Data ProtecUon
![Page 37: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/37.jpg)
Other FuncUons
Dynamic Data Summary Func6ons • Print specific-‐format record • Monitor illegal access and
email alert • Single unexpected data
deleUon recovery (in one month)
![Page 38: FromExcelTo Database#...Jun 21, 2017 · Somerules! Single copy for valued data " Valued variable only exists in one table Avoid performance to go down while records are increasing](https://reader036.vdocuments.mx/reader036/viewer/2022081617/604439caf4e07e70cf399c22/html5/thumbnails/38.jpg)
BICF Help Desk • hnp://www.utsouthwestern.edu/labs/bioinformaUcs/
• Contact us [email protected] Help Desk: 10AM – 11AM daily. LocaUon: NB5.604