iie-1 cse 300 informatics and information engineering prof. steven a. demurjian, sr. computer...

129
IIE-1 CSE 300 Informatics and Information Informatics and Information Engineering Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road, Box U-255 Storrs, CT 06269-2155 [email protected] http://www.engr.uconn.edu/ ~steve (860) 486 - 4818 Copyright © 2008 by S. Demurjian, Storrs, CT. Portions of these slides are being used with the permission of Dr. Ling Lui, Associate Professor, College of Computing, Georgia Tech.

Upload: jocelyn-stokes

Post on 11-Jan-2016

224 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-1

CSE300

Informatics and Information EngineeringInformatics and Information Engineering

Prof. Steven A. Demurjian, Sr.Computer Science & Engineering Department

The University of Connecticut371 Fairfield Road, Box U-255

Storrs, CT 06269-2155

[email protected]://www.engr.uconn.edu/

~steve(860) 486 - 4818

Copyright © 2008 by S. Demurjian, Storrs, CT. Portions of these slides are being used with the permission of Dr. Ling Lui, Associate Professor, College of Computing, Georgia Tech.

Page 2: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-2

CSE300

Overview Overview Informatics Informatics

What is Informatics? What is Biomedical Informatics? What are Key Biomedical Informatics Challenges?

Information EngineeringInformation Engineering Data vs. Information vs. Knowledge What is Science? What is Engineering? What is Information Consistency?

Information Usage and RepositoriesInformation Usage and Repositories How do we Store and Utilize Information? Role of Web in Informatics Sharing, Collaboration, and Security Databases vs. Data Mining

Page 3: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-3

CSE300

InformaticsInformatics Informatics is:Informatics is:

Management and Processing of Data From Multiple Sources/Contexts Involves Classification (Ontologies), Collection,

Storage, Analysis, Dissemination Informatics is Multi-DisciplinaryInformatics is Multi-Disciplinary

Computing (Model, Store, Process Information) Social Science (User Interactions, HCI) Statistics (Analysis)

Informatics Can Apply to Multiple Domains:Informatics Can Apply to Multiple Domains: Business, Biology, Fine Arts, Humanities Pharmacology, Nursing, Medicine, etc.

Page 4: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-4

CSE300

What is Informatics?What is Informatics? Heterogeneous Field – Heterogeneous Field –

Interaction between Interaction between People, Information and People, Information and TechnologyTechnology Computer Science

and Engineering Social Science

(Human Computer Interface)

Information Science (Data Storage, Retrieval and Mining)

People

Information Technology

Informatics

Adapted from Shortcliff textbook

Page 5: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-5

CSE300

What is Biomedical Informatics (BMI)?What is Biomedical Informatics (BMI)? BMI is Information and its Usage Associated with the BMI is Information and its Usage Associated with the

Research and Practice of Medicine Including:Research and Practice of Medicine Including: Clinical Informatics for Patient Care

Medical Record + Personal Health Record Bioinformatics for Research/Biology to Bedside

From Genomics To Proteomics Public Health Informatics (State and Federal)

Tracking Trends in Public Sector Clinical Research Informatics

Deidentified Repositories and Databases Facilitate Epidemiological Research and Ongong

Clinical Studies (Drug Trails, Data Analysis, etc.)

Page 6: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-6

CSE300

What are Key BMI Focal Areas?What are Key BMI Focal Areas? T1 Research T1 Research

Transition Bench Results into Clinical Research Clinical ResearchClinical Research

Applying Clinical Research Results via Trials with Patients on Medication, Devices, Treatment Plans

T2 Research T2 Research Translating “Successful” Clinical Trials into

Practice and the Community Clinical Practice Clinical Practice

Tracking all of the Information Associated with a Patient and his/her Care

Integrated and Inter-Disciplinary Information Integrated and Inter-Disciplinary Information Spectrum Spectrum

Page 7: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-7

CSE300

What is Medical Informatics?What is Medical Informatics? Clinical Informatics, Pharmacy InformaticsClinical Informatics, Pharmacy Informatics Public Health InformaticsPublic Health Informatics Consumer Health InformaticsConsumer Health Informatics Nursing InformaticsNursing Informatics Systems and People Issues Systems and People Issues

Intended to Improve Clinical outcomes, Satisfaction and Efficiency

Workflow Changes, Business Implications, Implementation, etc…

Patient Centered – Personal Health Record and Medical Home

Care Centered – Pay for Performance, Improving Treatment Compliance

Page 8: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-8

CSE300

What is Bionformatics? What is Bionformatics? Focused on Research Tools for T1:Focused on Research Tools for T1:

Genomic and Proteomic Tools, Evaluation Methods, Computing And Database Needs

Information Retrieval and Manipulation of Large Distributed (caBIG) Data Sets (cabig.cancer.gov/index.asp)

Often Requires Grid Computing Includes Cancer and Immunology Research

Increasing Need to Tie These Separate Types of Increasing Need to Tie These Separate Types of Systems Together = Personalized MedicineSystems Together = Personalized Medicine

Biology and the Bedside (Biology and the Bedside (www.i2b2.orgwww.i2b2.org))

Page 9: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-9

CSE300

Where is Data/How is it Used?Where is Data/How is it Used? Medical And Administrative Data Found in Clinical Medical And Administrative Data Found in Clinical

Information Systems (CIS) Such As:Information Systems (CIS) Such As: Hospital Info. Systems Electronic Medical Records Personal Health Records… Pharmacy Nursing, Picture Archiving Systems Complex Data Storage and Retrieval – Many

Different Systems T1 Research Increasingly Reliant on CIST1 Research Increasingly Reliant on CIS T2 Research is Reliant on:T2 Research is Reliant on:

End Systems for Embedding EBM (Evidence-Based Medicine) Guidelines

Measuring Outcomes, Looking at Policy

Page 10: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-10

CSE300

What are Major Informatics Challenges?What are Major Informatics Challenges? Shortage of Trained People NationallyShortage of Trained People Nationally Slows adoption of Health Information TechnologySlows adoption of Health Information Technology Results in Poor Planning and Coordination, Results in Poor Planning and Coordination,

Duplication of Efforts and Incomplete EvaluationDuplication of Efforts and Incomplete Evaluation What are Critical Needs?What are Critical Needs?

Dually Trained Clinicians or Researchers in Leadership of some Initiatives

Connect all folks with Informatics Roles across Institutions to Improve Efficiency

Multi-Disciplinary: CSE, Statistics, Biology, Medicine, Nursing, Pharmacy, etc.

Emerging Standards for Information Modeling and Emerging Standards for Information Modeling and Exchange (Exchange (www.hl7.orgwww.hl7.org) based on XML) based on XML

Page 11: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-11

CSE300

Information EngineeringInformation Engineering Data vs. Information vs. KnowledgeData vs. Information vs. Knowledge

How do we Differentiate Between them? Where are they used in BMI?

Science vs. EngineeringScience vs. Engineering What is each of their Roles in Informatics? How can we Engineer Information? What is their Role in BMI?

What is Information Engineering?What is Information Engineering? What are the Unique Challenges and

Opportunities? What is Available Today and Tomorrow?

Page 12: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-12

CSE300

From American HeritageFrom American Heritage DataData

Information, esp. information organized for analysis or used as the basis for a decision.

Numerical information in a form suitable for processing by computer.

InformationInformation The act of informing or the condition of being

informed; communication of knowledge. A non-accidental signal used as an input to a

computer or communications system. KnowledgeKnowledge

The state or fact of knowing. The sum or range of what has been perceived,

discovered, or learned. Specific information about something.

Page 13: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-13

CSE300

From Webster’s 9From Webster’s 9thth Collegiate Collegiate DataData

Factual information (e.g. statistics) used as a basis for reasoning, discussion, or calculation.

InformationInformation The communication of knowledge or intelligence Something (as a message, experimental data, or a

picture) which justifies change in a construct (as a plan or theory) that represents physical or mental experience or another construct

quantitative measure of the content of information KnowledgeKnowledge

The fact or condition of having information or of being learned.

The sum of what is known: the body of truth, information, and principles acquired by mankind.

Page 14: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-14

CSE300

Data vs. Information vs. KnowledgeData vs. Information vs. Knowledge Overlapping DefinitionsOverlapping Definitions Conflicting DefinitionsConflicting Definitions Agreement on DataAgreement on Data Knowledge and Information - SynonymsKnowledge and Information - Synonyms Discussion Questions:Discussion Questions:

Equivalence of Knowledge/Information? How can we Distinguish them? Do these Three Terms Cover Possibilities?

Page 15: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-15

CSE300

Data, Information, and Knowledge in BMIData, Information, and Knowledge in BMI Data – Basic LevelData – Basic Level

BP, Pulse, Temperature Peak Flow, Glucose Level, Biopsy Result X-Ray, MRI, Cat Scan

Information - First level of InterpretationInformation - First level of Interpretation BPs, Peak Flow, Glucose over Time Interpreting Scan (Radiologist) or Biopsy Result

(Oncologist) Knowledge – Applying Experience towards DiagnosisKnowledge – Applying Experience towards Diagnosis

What can Low Peak Flows over Time lead to? What Next Step after Positive Scan or Biopsy? What if Glucose Level is Yo-yoing?

Page 16: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-16

CSE300

From American HeritageFrom American Heritage ScienceScience

The observation, identification, description, experimental investigation, and theoretical explanation of natural phenomena.

Methodologoical activity, discipline, or study. An activity that appears to require study & method. Knowledge, esp. gained through experience.

EngineeringEngineering The application of scientific and mathematical

principles to practical ends such as the design, construction, and operation of efficient and economical structures, equipment, and systems.

Page 17: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-17

CSE300

From Webster’s 9From Webster’s 9thth Collegiate Collegiate ScienceScience

The state of knowing: knowledge as distinguished from ignorance or misunderstanding

A department of systemized knowledge as an object of study

A system or method reconciling practical ends with scientific laws.

EngineeringEngineering The application of science and mathematics by

which the properties of matter and the sources of energy in nature are made useful to people in structures, machines, products, systems, and processes.

Page 18: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-18

CSE300

Science and Engineering in BMIScience and Engineering in BMI ScienceScience

Data/Information Collection & Analysis to Reach Hypothesis

Patients with CHF and Lipitor have Less Heart Attacks than CHF and Baby Aspirin

Verify in Clinical Research/Epidemiological Study EngineeringEngineering

Usage of Information in Practice Apply Scientific Results to Medical Practice Image Processing used to Identify Tumors in CT

and MRI Scans Transfer of Radiologists Knowledge into

Computer Based (Assisted) Solution An Engineering Solution to Scientific Result

Page 19: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-19

CSE300

What is Information Engineering?What is Information Engineering? Incorporation of an Engineering Approach and Incorporation of an Engineering Approach and

Discipline to the Generation of Information and the Discipline to the Generation of Information and the Promotion of the Better Use of Information and Promotion of the Better Use of Information and Resources Information Engineering Unifies and Resources Information Engineering Unifies and Combines:Combines: Software Engineering Database Engineering Security Engineering Performance Engineering Etc...

Moral: Systems Cannot and Must Not be Engineered Moral: Systems Cannot and Must Not be Engineered in a Vacuum!in a Vacuum!

Particularly true in BMI (T1, T2, Clinical Research, Particularly true in BMI (T1, T2, Clinical Research, and Clinical Practice)and Clinical Practice)

Page 20: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-20

CSE300

Information Engineering is Motivated by:Information Engineering is Motivated by: Realization that Management/Control of Information Realization that Management/Control of Information

will be a Primary Concern as we Continue through the will be a Primary Concern as we Continue through the 1990s and into the 21st Century1990s and into the 21st Century

Currently in an Age of Information - Volume and Currently in an Age of Information - Volume and Complexity DependenciesComplexity Dependencies

Critical Systems Heavily Depend on Information:Critical Systems Heavily Depend on Information: Airline/Hotel/Auto Reservations Telecommunications Banking/ATMs ATM/Credit Cards at Gas Stations/Supermarkets Credit Bureaus Electronically Collect Information

from Many Diverse Sources E-Tailing Medical Care/All Aspects of BMI

Page 21: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-21

CSE300

Info. Engrg. - Challenge for 21st CenturyInfo. Engrg. - Challenge for 21st Century Timely and Efficient Utilization of InformationTimely and Efficient Utilization of Information

Significantly Impacts on Productivity Supports and Promotes Collaboration for

Competitive Advantage Use Information in New and Different Ways

Collection, Synthesis, Analyses of InformationCollection, Synthesis, Analyses of Information Better Understanding of Processes, Sales,

Productivity, etc. Dissemination of Only Relevant/Significant

Information - Reduce Overload Implications for BMI?Implications for BMI?

Sharing of Results – Benefit Mankind Ability to Research on Rare Diseases Are there Unknown Isolated “Cures”?

Page 22: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-22

CSE300

How is Information Engineered?How is Information Engineered? Careful Thought to its Definition/Purpose & Thorough Careful Thought to its Definition/Purpose & Thorough

Understanding of its Intended Usage/Potential ImpactUnderstanding of its Intended Usage/Potential Impact Insure and Maintain its ConsistencyInsure and Maintain its Consistency

Quality, Correctness, and Relevance Protect and Control its Availability (Secure Access)Protect and Control its Availability (Secure Access)

Who can Access What Information in Which Location and at What Time?

Long-Term Persistent Storage/RecoverabilityLong-Term Persistent Storage/Recoverability Cost, Reusability, Longitudinal, and Cumulative

Experience Integration of Past, Present and Future Information via Integration of Past, Present and Future Information via

Intranet and Internet AccessIntranet and Internet Access What are Implications/Challenges for BMI?What are Implications/Challenges for BMI?

Let’s Discuss Briefly…

Page 23: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-23

CSE300

Towards Information ConsistencyTowards Information Consistency Consistency of Information is Key!Consistency of Information is Key! Consistency Gauged with respect to:Consistency Gauged with respect to:

Usage of Information Persistency of Information Integrity/Security of Information

Allowable Values and Protection from Misuse Validity (Relevance) of Information

Means Something to Someone in a Postive Way Discussion Questions:Discussion Questions:

Why is Consistency Important for BMI? How is Consistency Attained for BMI? What Else Impacts Consistency BMI?

Page 24: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-24

CSE300

What's Available to Support IE?What's Available to Support IE? What Can be Provided to Make the Advanced What Can be Provided to Make the Advanced

Application Design Process:Application Design Process: More Complete? More Robust? More Responsive? Less Error Prone?

Current Choices to Support Information Engineering:Current Choices to Support Information Engineering: Conventional Programming Languages and Data

Models Object-Oriented Programming Languages Object-Oriented DBS XML Databases Middleware and SOA (Web) Data Mining/Warehouses

Page 25: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-25

CSE300

What are Key Questions?What are Key Questions? Focus on Information and its BehaviorFocus on Information and its Behavior

What are Different Kinds of Information? How is Information Manipulated? Is Same Information Stored in Different Ways? What are Information Interdependencies? Will Information Persist? Long-Term DB?

Versions of Information? What Past Info. is Needed from Legacy DBs or

Applications? Who Needs Access to What Info. When? What Information is Available Across WWW?

All of these Questions Apply to BMI!All of these Questions Apply to BMI!

Page 26: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-26

CSE300

Information Usage and RepositoriesInformation Usage and Repositories How do we Store and Utilize Information?How do we Store and Utilize Information?

Databases Data Mining

What are Key Issues?What are Key Issues? Information Sharing/Data Correctness Collaboration

1. Among Providers and Researchers

2. Among Providers and Patients

3. Among Patients (Support Groups) Security

1. Control of Patient Information (De-identified)

2. Secure Exchange/Patient Ownership

3. Establish Custom Patient Controlled Groups What is the Role of Web in Informatics?What is the Role of Web in Informatics?

Page 27: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-27

CSE300

The Role of a DatabaseThe Role of a Database Database is a Norm in Today's and Tomorrow's Database is a Norm in Today's and Tomorrow's

ApplicationsApplications Usage Information Tightly Linked to its StorageUsage Information Tightly Linked to its Storage Integration of Database - Key ComponentIntegration of Database - Key Component Support Many Representations of ``Same'' InformationSupport Many Representations of ``Same'' Information Promotes Retrieval of Information Geared Towards Promotes Retrieval of Information Geared Towards

User Needs and ResponsibilitiesUser Needs and Responsibilities Gap Exists Between Standalone Programming Gap Exists Between Standalone Programming

Applications and Database SystemsApplications and Database Systems For BMI:For BMI:

Database (Data Warehouse) is a Key Feature Need for Access to Data (De-identified) Need to Share and Interact among Stakeholders

Page 28: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-28

CSE300

DBMS ArchitectureDBMS Architecture DBMS LanguagesDBMS Languages

Data Definition Language (DDL) Data Manipulation Language (DML)

From Embedded Queries or DB Commands Within a Program

“Stand-alone” Query Language Host Language:Host Language:

DML Specification (e.g., SQL) is Embedded in a “Host” Programming Language (e.g., Java, C++)

DBMS InterfacesDBMS Interfaces Menu-Based Interface Graphical Interface Forms-Based Interface Interface for DBA (DB Administrator)

Page 29: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-29

CSE300

ANSI/SPARC - Three Schema ArchitectureANSI/SPARC - Three Schema Architecture External Data Schema (Users’ view)External Data Schema (Users’ view) Conceptual Data Schema (Logical Schema)Conceptual Data Schema (Logical Schema) Internal Data Schema (Physical Schema)Internal Data Schema (Physical Schema)

Page 30: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-30

CSE300

How are these Used for BMI?How are these Used for BMI? Internal Data Schema (Physical Schema)Internal Data Schema (Physical Schema)

Hidden Data Representation for Storage of BMI Data in Proprietary Format

Under the Control of DB System Conceptual Data Schema (Logical Schema)Conceptual Data Schema (Logical Schema)

The Data Model for the BMI Application Access to Schema Controllable via SQL

External Data Schema (Users’ view)External Data Schema (Users’ view) Subsets of the Data Model for Different Users External View for Patients External View for Providers External View for Clinical Researchers Need Ability for a Patient to Control Access to

his/her Own External View

Page 31: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-31

CSE300

Data IndependenceData Independence Ability that Allows Application Programs Not Being Ability that Allows Application Programs Not Being

Affected by Changes in Irrelevant Parts of the Affected by Changes in Irrelevant Parts of the Conceptual Data Representation, Data Storage Conceptual Data Representation, Data Storage Structure and Data Access MethodsStructure and Data Access Methods

Invisibility (Transparency) of the Details of Entire Invisibility (Transparency) of the Details of Entire Database Organization, Storage Structure and Access Database Organization, Storage Structure and Access Strategy to the UsersStrategy to the Users Both Logical and Physical

Recall Software Engineering Concepts:Recall Software Engineering Concepts: Abstraction the Details of an Application's

Components Can Be Hidden, Providing a Broad Perspective on the Design

Representation Independence: Changes Can Be Made to the Implementation that have No Impact on the Interface and Its Users

Page 32: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-32

CSE300

Physical Data IndependencePhysical Data Independence The Ability to Modify the Physical Data The Ability to Modify the Physical Data

Representation Without Causing Application Representation Without Causing Application Programs to Be RewrittenPrograms to Be Rewritten

Examples:Examples: Transparency of the Physical Storage Organization Transparency of Physical Access Paths Numeric Data Representation and Units Character Data Representation Data Coding Physical Data Structure

All of these are Vital for BMI – Particularly if we Use All of these are Vital for BMI – Particularly if we Use Standard to Achieve Application IndependenceStandard to Achieve Application Independence

Page 33: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-33

CSE300

Physical Data IndependencePhysical Data Independence Physical Data Independence is a Measure of How Physical Data Independence is a Measure of How

Much the Internal Schema Can Change Without Much the Internal Schema Can Change Without Affecting the Application ProgramsAffecting the Application Programs

In BMI – Allows us to Plug and Play Different DBMS In BMI – Allows us to Plug and Play Different DBMS Platforms – Extensible and Versatile IntegrationPlatforms – Extensible and Versatile Integration

Physical

Page 34: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-34

CSE300

Logical Data IndependenceLogical Data Independence Transparency of the Entire Database Conceptual Transparency of the Entire Database Conceptual

OrganizationOrganization As a Result:As a Result:

Transparency of Logical Access Strategy Addition of New Entities Removal of Entities Virtual (Derived) Data Items Union of Records

ViewsViews Common Mechanism for Logical Data

Dependency Provide Different Logical Data Contexts to

Different Users Based on Their Needs Update Views vs. Read-Only Views

Page 35: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-35

CSE300

Logical Data IndependenceLogical Data Independence Logical Data Independence is a Measure of How Logical Data Independence is a Measure of How

Much the Conceptual Schema Can Change Without Much the Conceptual Schema Can Change Without Affecting the Application ProgramsAffecting the Application Programs

For BMI – Allows us to Separate End User For BMI – Allows us to Separate End User Applications (Patients, Providers, etc.) from DBApplications (Patients, Providers, etc.) from DB

Logical

Page 36: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-36

CSE300

Classic Information System DesignClassic Information System Design

Page 37: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-37

CSE300

Data vs. InformationData vs. Information

Page 38: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-38

CSE300

Programming Language Systems vs. DBSProgramming Language Systems vs. DBS Similarities and Differences Exist At System Level:Similarities and Differences Exist At System Level:

Shared Resources vs. Shared Data Execution Granularity - Programs vs. Transactions Granularity Difference - Files vs. Instances

Classic Problem of “Impedance Mismatch”Classic Problem of “Impedance Mismatch” Thin Layer of Overlap between PLS (C++, Java,

etc.) and Relational Database System What will Future Bring?

SQL3 with Object-Oriented Extensions XML Databases (Apached Xindice, Sendra, etc.)

Today Tomorrow?

PLS

RDBS

PLS

XML DBS

Page 39: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-39

CSE300

What is Today’s Impedance Mismatch?What is Today’s Impedance Mismatch? Relational Data Organizes Information into Flat FilesRelational Data Organizes Information into Flat Files

Relational Tables with Primary Key High Number of Tuples per Table (1000s & more) Limited Number of Tables (10-50) for Even Large

Size Application Limited Linkages Among Tables (Foreign Keys)

What Does BMI/PHR/EMR Require?What Does BMI/PHR/EMR Require? For Each Patient, Track Multiple Dependencies

Visits per Patient Tests per Patient Prescriptions per Patient

Data Inherently Complex and Interdependent Flattened into Relational Format

Page 40: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-40

CSE300

The Health Care Application - ClassesThe Health Care Application - Classes

Page 41: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-41

CSE300

The Health Care Application - ClassesThe Health Care Application - Classes

Page 42: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-42

CSE300

The Health Care Application - ClassesThe Health Care Application - Classes

Page 43: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-43

CSE300

The Health Care Application - RelationshipsThe Health Care Application - Relationships

Page 44: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-44

CSE300

How Does Mismatch Occur?How Does Mismatch Occur?

Above – Relational TablesAbove – Relational Tables Stage Data from Tables into OO (e.g. Java) format Utilize JDBC What are the Implications/Impacts?

On Left – OO ClassesOn Left – OO Classes Inheritance Dependencies

Programmatic ViewProgrammatic View C++ or Java Usage Staging from DB to OO

Item(Phy_Name*, Date*, Visit_Flag, Symptom, Diagnosis, Treatment, Presc_Flag, Pre_No, Pharm_Name, Medication, Test_Flag, Test_Code, Spec_No, Status, Tech)

Page 45: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-45

CSE300

Implications and ImpactImplications and Impact Three Copies of “Same” Information in DifferentThree Copies of “Same” Information in Different

Database Table (Item) OO Representation – Server Side (Classes) GUI Display – Client Side (html/xml)

What can this Lead to? What can this Lead to?

Item(Phy_Name*, Date*, Visit_Flag, Symptom, Diagnosis, Treatment, Presc_Flag, Pre_No, Pharm_Name, Medication, Test_Flag, Test_Code, Spec_No, Status, Tech)

Dr. D, Jan 01, 08Fever, Flu, Bed RestNo ScriptsNo Tests

Page 46: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-46

CSE300

What is one Possible Solution?What is one Possible Solution? Standards and Usage of XMLStandards and Usage of XML

Consider CDA – Clinical Document Architecture Standard for Clinical (Provider) Medical Record

Clinical Record Organized as:Clinical Record Organized as: <patient_encounter> - location <legal_authenticator> - MD <originating_organization> and <provider> <patient> - name, birthdate, gender <body_confidentiality-”CONF1”> - note

History Past Medical History Medications Allergies Social History Physical Exam Vitals (BP, Resp, Temp, HR) Etc...

Page 47: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-47

CSE300

What is one Possible Solution?What is one Possible Solution? Let’s Explore this in Greater DetailLet’s Explore this in Greater Detail Starting with the CDA HeaderStarting with the CDA Header<?xml version="1.0"?><!DOCTYPE levelone PUBLIC "-//HL7//DTD CDA Level One 1.0//EN" "levelone_1.0.dtd"><levelone> <clinical_document_header> <id EX="a123" RT="2.16.840.1.113883.3.933"/> <set_id EX="B" RT="2.16.840.1.113883.3.933"/> <version_nbr V="2"/> <document_type_cd V="11488-4" S="2.16.840.1.113883.6.1" DN="Consultation note"/> <origination_dttm V="2000-04-07"/> <confidentiality_cd ID="CONF1" V="N" S="2.16.840.1.113883.5.1xxx"/> <confidentiality_cd ID="CONF2" V="R" S="2.16.840.1.113883.5.1xxx"/> <document_relationship> <document_relationship.type_cd V="RPLC"/> <related_document> <id EX="a234" RT="2.16.840.1.113883.3.933"/> <set_id EX="B" RT="2.16.840.1.113883.3.933"/> <version_nbr V="1"/> </related_document> </document_relationship> <fulfills_order> <fulfills_order.type_cd V="FLFS"/> <order><id EX="x23ABC" RT="2.16.840.1.113883.3.933"/></order> <order><id EX="x42CDE" RT="2.16.840.1.113883.3.933"/></order> </fulfills_order>

Page 48: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-48

CSE300

CDA Example - ContinuedCDA Example - Continued

Page 49: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-49

CSE300

CDA Example - ContinuedCDA Example - Continued

Page 50: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-50

CSE300

CDA Example - ContinuedCDA Example - Continued

Page 51: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-51

CSE300

CDA Example - ContinuedCDA Example - Continued

Page 52: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-52

CSE300

CDA Example - ContinuedCDA Example - Continued

Page 53: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-53

CSE300

CDA Example - ContinuedCDA Example - Continued

Page 54: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-54

CSE300

CDA Example - ContinuedCDA Example - Continued

Page 55: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-55

CSE300

CDA Example - ContinuedCDA Example - Continued

Page 56: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-56

CSE300

Information Sharing/Access: Potential PitfallsInformation Sharing/Access: Potential Pitfalls Another Critical Issue is Information SharingAnother Critical Issue is Information Sharing

Perception: How do I see/understand Data/Info? Differences: What is the Reality?

Dealing with Information at Different LevelsDealing with Information at Different Levels Syntax – Format of Information Semantics – Meaning of Information Pragmatics – Usage of Information

When Unifying Databases/Information Repositories, When Unifying Databases/Information Repositories, Must Address all Three!Must Address all Three!

Data Integrity and Data SecurityData Integrity and Data Security Correct and Consistent Values Assurance in All Secure Accesses

For BMI – All of the Above are Critical for Correct For BMI – All of the Above are Critical for Correct Usage and Interpretation in All Contexts (T1, T2, …) Usage and Interpretation in All Contexts (T1, T2, …)

Page 57: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-57

CSE300

Information Syntactic ConsiderationsInformation Syntactic Considerations Syntax is Structure and Format of the Information Syntax is Structure and Format of the Information

That is Needed to Support a CoalitionThat is Needed to Support a Coalition Incorrect Structure or Format Could Result in Simple Incorrect Structure or Format Could Result in Simple

Error Message to Catastrophic EventError Message to Catastrophic Event For Sharing, Strict Formats Need to be MaintainedFor Sharing, Strict Formats Need to be Maintained Health Care Data Suffers from Lack of StandardsHealth Care Data Suffers from Lack of Standards

Standards for Diagnosis (Insurance Industry) Emerging Standards Include:

Health Level 7 (HL7) Based on XML

Formats Non-Standard for Different Health Formats Non-Standard for Different Health Organizations, Insurers, Pharmacy Networks, etc. Organizations, Insurers, Pharmacy Networks, etc. N*N Translations Prone to Errors!

Page 58: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-58

CSE300

Information Semantics ConcernsInformation Semantics Concerns Semantics (Meaning and Interpretation)Semantics (Meaning and Interpretation)

NATO and US - Different Message Formats Distances (Miles vs. Kilometers) Grid Coordinates (Mils, Degrees) Maps (Grid, True, and Magnetic North)

What Can Happen in Health Care Data?What Can Happen in Health Care Data? Possible to Confuse Dosages of Medications? Weight of Patients (Pounds vs. Kilos)? Measurement of Vital Signs? Dana Farber Chemo Death – Checks/Balances What Others are Possible?

Page 59: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-59

CSE300

Syntactic & Semantic ConsiderationsSyntactic & Semantic Considerations What’s Available to Support Information Sharing?What’s Available to Support Information Sharing? How do we Insure that Information can be Accurately How do we Insure that Information can be Accurately

and Precisely Exchanged?and Precisely Exchanged? How do we Associate Semantics with the Information How do we Associate Semantics with the Information

to be Exchanged?to be Exchanged? What Can we Do to Verify the Syntactic Exchange What Can we Do to Verify the Syntactic Exchange

and that Semantics are Maintained?and that Semantics are Maintained? Can Information Exchange Facilitate Federation? Can Information Exchange Facilitate Federation? Can this be Handled Dynamically?Can this be Handled Dynamically? Or, Must we Statically Solve Information Sharing in Or, Must we Statically Solve Information Sharing in

Advance?Advance?

Page 60: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-60

CSE300

Information Pragmatics ConsiderationsInformation Pragmatics Considerations Pragmatics Require that we Totally Understand Pragmatics Require that we Totally Understand

Information Usage and Information MeaningInformation Usage and Information Meaning What are the Critical Information Sources? How will Information Flow Among Them? What Systems Need Access to these Sources? How will that Access be Delivered? Who (People/Roles) will Need to See What When?

How will What a Person Sees Impact Other

Sources? Focus on: Way that Information is Utilized and Focus on: Way that Information is Utilized and

Understood in its Specific ContextUnderstood in its Specific Context Can Medical Info be Misused even if Understood?Can Medical Info be Misused even if Understood?

Page 61: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-61

CSE300

Information Pragmatics ConsiderationsInformation Pragmatics Considerations What are Pragmatics Issues re. Underinsured and What are Pragmatics Issues re. Underinsured and

Uninsured Populations in Event?Uninsured Populations in Event? How Can we Use Info Effectively if we Don’t

Know if it is Complete? Has Info from All Sources Been Collected? What Happens if Same Patient in Different

Repositories Can’t be Reconciled? What if Patient in Unresponsive and Can’t Supply

any Info? Is Usage of Info Complicated due to

Incompleteness? Multiple Locations? Or, if the Event is Major – will all Patient Or, if the Event is Major – will all Patient

Populations Suffer Same Substandard Care?Populations Suffer Same Substandard Care?

Page 62: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-62

CSE300

Collaboration and SecurityCollaboration and Security Two Concepts go Hand in HandTwo Concepts go Hand in Hand Strong ParallelsStrong Parallels

Collaboration Among Providers and Researchers Among Providers and Patients Among Patients (Support Groups)

Security Control of Patient Information (De-identified) Secure Exchange/Patient Ownership Establish Custom Patient Controlled Groups

Let’s Explore them Both via our Semester ProjectLet’s Explore them Both via our Semester Project Also Consider Emergent and Policy IssuesAlso Consider Emergent and Policy Issues

Page 63: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-63

CSE300

Collaboration: Providers and ResearchersCollaboration: Providers and Researchers ProvidersProviders

Seeking new Treatment Plans Looking for Clinical Research Studies for Patients Looking to Communicate with Clinical

Researchers ResearchersResearchers

Publish Evidence-Based Guidelines New Treatments Collect Data on Provider Visits Provide Forum to Discuss with Provider Allow Provider to Upload Anonymous Outcomes

Also – Need to Collaborate Among Researchers of All Also – Need to Collaborate Among Researchers of All Types (Sharepoint, WIKIs, etc.)Types (Sharepoint, WIKIs, etc.)

Page 64: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-64

CSE300

Collaboration: Providers and PatientsCollaboration: Providers and Patients PatientsPatients

Open Personal Health Record to Providers Patients have

Data Entry Facility for Chronic Conditions Ability to Graph and Track their Disease

Education Materials also Available ProvidersProviders

Securely Communicate (email) with Patients (see https://www.relayhealth.com/rh/specific/patients/default.aspx)

Access to Authorized Patient Data Tracking of Patients (to Reduce Office Visits) Proactive Intervention to Head off Potential

Hospitalizations/Problems via Treatment Algorithms to Auto-Notify Based on Data Values

Page 65: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-65

CSE300

Collaboration: Among PatientsCollaboration: Among Patients PatientsPatients

Provide Each with a List of Support Groups Allow them to Join Groups or Form New Groups Secure Communication via:

Email Chatting Environment Link to Actual (Physical Meetings)

Repository of Available Support Groups Overall:Overall:

Patients can Meet other Patients with Same Issues Vital for Patients with Rare Diseases Form On-Line Communities

Page 66: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-66

CSE300

Security: General ConceptsSecurity: General Concepts AuthenticationAuthentication

Proving you are who you are Signing a Message Is the Client who S/he Says they are?

AuthorizationAuthorization Granting/Denying Access Revoking Access Does the Client have Permission to do what S/he

Wants? EncryptionEncryption

Establishing Communications Such that No One but Receiver will Get the Content of the Message

Symmetric Encryption Public Key Encryption

Page 67: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-67

CSE300

Key Security IssuesKey Security Issues Legal and Ethical Issues Legal and Ethical Issues

Information that Must be Protected Information that Must be Accessible

Policy Issues Policy Issues Who Can See What Information When? Applications Limits w.r.t. Data vs. Users?

System Level EnforcementSystem Level Enforcement What is Provided by the DBMS? Programming

Language? OS? Application? How Do All of the Pieces Interact?

Multiple Security Levels/Organizational EnforcementMultiple Security Levels/Organizational Enforcement Mapping Security to Organizational Hierarchy Protecting Information in Organization

Page 68: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-68

CSE300

What are Key Access Control Concepts?What are Key Access Control Concepts? AssuranceAssurance

Are the Security Privileges for Each User Adequate to Support their Activities?

Do the Security Privileges for Each User Meet but Not Exceed their Capabilities?

ConsistencyConsistency Are the Defined Security Privileges for Each User

Internally Consistent? Least-Privilege Principle: Just Enough Access

Are the Defined Security Privileges for Related Users Globally Consistent? Mutual-Exclusion: Read for Some-Write for Others

Page 69: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-69

CSE300

Available Security ApproachesAvailable Security Approaches Mandatory Access Control (MAC)Mandatory Access Control (MAC)

Bell/Lapadula Security Model Security Classification Levels for Data Items Access Based on Security Clearance of User

Role Based Access Control (RBAC)Role Based Access Control (RBAC) Govern Access to Information based on Role Users can Play Different Roles at Different Times

Responsibilities of Users Guiding Factor Facilitate User Interactions while Simultaneously

Protecting Sensitive Data Discretionary Access Control (DAC)Discretionary Access Control (DAC)

Richer Set of Access Modes - Govern Access to Information based on User Id

Discretionary Rules on Access Privileges Focused on Application Needs/Requirements

Page 70: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-70

CSE300

Mandatory Security MechanismMandatory Security Mechanism Typical Security Classification Levels for Typical Security Classification Levels for

Subjects/programs and Objects/resourcesSubjects/programs and Objects/resources Top Secret (TS) and Secret (S) Confidential (C) and Unclassified (U)

Rules:Rules: TS is the Highest and U is the Lowest Level TS > S > C > U Security Levels:

C1 is Security Clearance Given to User U1 C2 is Security Classification Given to Object O1 U1 can Access O1 iff C1 C2 This is Referred to as the Domination of U1 Over O1

Not Prevalent in BMI – But May have RelevanceNot Prevalent in BMI – But May have Relevance

Page 71: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-71

CSE300

Role Based Access Control (RBAC)Role Based Access Control (RBAC) Focuses on Defining Roles of Typical BehaviorFocuses on Defining Roles of Typical Behavior

Nurse, Nurse-Manager, Education-RN Physician, Attending-MD, Specialist Student, Faculty-Advisor, Head Focus on Duties that are Shared

During Authorization of Roles to UsersDuring Authorization of Roles to Users Establish Boundaries of Access User Steve with Role Faculty-Advisor

Limited to Faculty Capabilities on Peoplesoft Only Can Manipulate His Advisees

User Steve with Role Associate Head Possible Overlap in Responsibilities w/ Faculty-

Advisor Other Activities not given to Faculty-Advisor Role

Page 72: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-72

CSE300

Why is RBAC Needed?Why is RBAC Needed? In Health Care, different professionals (e.g., Nurses In Health Care, different professionals (e.g., Nurses

vs. Physicians vs. Administrators, etc.) Require Select vs. Physicians vs. Administrators, etc.) Require Select Access to Sensitive Patient DataAccess to Sensitive Patient Data

Suppose we have a Patient Access ClientSuppose we have a Patient Access Client Lois playing the Nurse Role would be Allowed to

Enter Patient History, Record Vital Signs, etc. Steve playing M.D. Role would be Allowed to do

all of a Nurse plus Write Orders, Enter Scripts, etc. Vicky playing Admin Role would be Allowed to

Enter Demographic/Insurance Info. Role Dictates Client BehaviorRole Dictates Client Behavior

Physician’s Write Scripts Nurses Enter Patient Data (Vitals + History) All Access Shared Medical Record Access is Limited Based on Role

Page 73: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-73

CSE300

Discretionary Access ControlDiscretionary Access Control DiscretionaryDiscretionary

Grant Privileges to Users, Including Capabilities to Access Specific Data Items in a Specific Mode

Available in Most Commercial DBMSs Aspects of DACAspects of DAC

User’s Identity Predefined Discretionary “Rules” Defined by the

Security Administrator Allows User to “Delegate” Capabilities to Another

User Delegate Capabilities and Ability to Delegate

Role Delegation and Delegation AuthorityRole Delegation and Delegation Authority DAC Available in SQL2DAC Available in SQL2

Page 74: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-74

CSE300

What is Role Delegation?What is Role Delegation? Role Delegation, a User-to-User Relationship, Allows Role Delegation, a User-to-User Relationship, Allows

an Original User (OU) to Transfer Responsibility for a an Original User (OU) to Transfer Responsibility for a Particular Role to a Delegated User (DU)Particular Role to a Delegated User (DU)

Two Major Types of DelegationTwo Major Types of Delegation Administratively-directed Delegation has an

Administrative Infrastructure Outside the Direct Control of a User Mediates Delegation

User-directed Delegation has an User (Playing a Role) Determining If and When to Delegate a Role to Another User

In Both, Security Administrators Still Oversee Who In Both, Security Administrators Still Oversee Who Can Do What When w.r.t. DelegationCan Do What When w.r.t. Delegation

Page 75: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-75

CSE300

Why is Role Delegation Important?Why is Role Delegation Important? Many Different Scenarios Under Which Privileges Many Different Scenarios Under Which Privileges

May Want to be Passed to Other IndividualsMay Want to be Passed to Other Individuals Large organizations often require delegation to

meet demands on individuals in specific roles for certain periods of time

True in Many Different Sectors Health Care and Financial Services Engineering and Academic Setting

Example: Reda Delegates Head Role to Steve when Traveling

Key Issues:Key Issues: Who Controls Delegation to Whom? How are Delegation Requirements Enforced?

Page 76: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-76

CSE300

Coalitions for Clinical/Translational ScienceCoalitions for Clinical/Translational Science

UConnHealthCenter

UConnStorrs

DCF,DSS, etc.

SaintFrancis,

CCMC, …

Pfizer Bayer

NIH FDA

NSF

Info. Sharing - Joint R&DSupport T1, T2, and Clinical ResearchCompany and University PartnershipsCollaborative Funding OpportunitiesCohesive and Trusted Environment

Existing Systems/Databases and New Applications

How do you Protect Commercial Interests?Promote Research Advancement? Free Read for Some Data/Limited for Other?Commercialization vs. Intellectual Property?

Balancing Cooperation with Propriety

Page 77: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-77

CSE300

Emergent Public Policy IssuesEmergent Public Policy Issues How do we Protect a Person’s DNA?How do we Protect a Person’s DNA?

Who Owns a Person’s DNA? Who Can Profit from Person’s DNA? Can Person’s DNA be Used to Deny Insurance?

Employment? Etc. How do you Define Security Limitations/Access?

What about i2b2 – Informatics for Integrating Biology What about i2b2 – Informatics for Integrating Biology and the Bedside (see and the Bedside (see https://www.i2b2.org/https://www.i2b2.org/)) Scalable Informatics Framework to Bridge

Clinical Research Data Vast Data Banks for Basic Science Research

Goal: Understand Genetic Bases of Diseases

Page 78: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-78

CSE300

Emergent Public Policy IssuesEmergent Public Policy Issues Can DNA Repositories be Anonymously Available for Can DNA Repositories be Anonymously Available for

Medical Research?Medical Research? Do Societal Needs Trump Individual Rights? Can DNA be Made Available Anonymously for

Medical Research? De-identified Data Repositories Privacy Protecting Data Mining

International Repository Might Allow Medical Researchers Access to Large Enough Data Set for Rare Conditions (e.g., Orphan Drug Act)

Individual Rights vs. Medical AdvancesIndividual Rights vs. Medical Advances

Page 79: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-79

CSE300

Internet and the WebInternet and the Web A Major Opportunity for BusinessA Major Opportunity for Business

A Global Marketplace Business Across State and Country Boundaries

A Way of Extending Services Online Payment vs. VISA, Mastercard

A Medium for Creation of New Services Publishers, Travel Agents, Teller, Virtual Yellow

Pages, Online Auctions … A Boon for AcademiaA Boon for Academia

Research Interactions and Collaborations Free Software for Classroom/Research Usage Opportunities for Exploration of Technologies in

Student Projects What are Implications for BMI? Where is the Adv?What are Implications for BMI? Where is the Adv?

Page 80: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-80

CSE300

IntranetIntranet Decision

support Mfg.. System

monitoring corporate

repositories Workgroups

Server

CorporateNetwork

Server

ServerServer

CorporateNetwork

Internet

InternetInternet Sales Marketing Information Services

Business to BusinessBusiness to Business Information sharing Ordering info./status Targeted electronic

commerce

WWW: Three Market SegmentsWWW: Three Market Segments

Provider Network

Exposure to Outside

Provider Network

Page 81: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-81

CSE300

Information Delivery Problems on the NetInformation Delivery Problems on the Net Everyone can Publish Information on the Web Everyone can Publish Information on the Web

Independently at Any TimeIndependently at Any Time Consequently, there is an Information Explosion Identifying Information Content More Difficult

There are too Many Search Engines but too Few There are too Many Search Engines but too Few Capable of Returning High Quality DataCapable of Returning High Quality Data

Most Search Engines are Useful for Ad-hoc Searches Most Search Engines are Useful for Ad-hoc Searches but Awkward for Tracking Changesbut Awkward for Tracking Changes

What are Information Delivery Issues for BMI?What are Information Delivery Issues for BMI? Publishing of Patient Education Materials Publishing of Provider Education Materials How Can Patients/Providers find what Need? How do they Know if its Relevant? Reputable?

Page 82: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-82

CSE300

Example Web ApplicationsExample Web Applications Scenario 1: World Wide WaitScenario 1: World Wide Wait

A Major Event is Underway and the Latest, Up-to-the Minute Results are Being Posted on the Web

You Want to Monitor the Results for this Important Event, so you Fire up your Trusty Web Browser, Pointing at the Result Posting Site, and Wait, and Wait, and Wait …

What is the Problem?What is the Problem? The Scalability Problems are the Result of a

Mismatch Between the Data Access Characteristics of the Application and the Technology Used to Implement the Application

May not be Relevant to BMI: Hard to Apply ScenarioMay not be Relevant to BMI: Hard to Apply Scenario

Page 83: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-83

CSE300

Example Web ApplicationsExample Web Applications Scenario 2: Scenario 2:

Many Applications Today have the Need for Tracking Changes in Local and Remote Data Sources and Notifying Changes If Some Condition Over the Data Source(s) is Met

To Monitor Changes on Web, You Need to Fire Your Trusty Web Browser from Time to Time, Cache the Most Recent Result, and Difference Manually Each Time You Poll the Data Source(s)

Issue: Pure Pull is Not the Answer to All ProblemsIssue: Pure Pull is Not the Answer to All Problems BMI: If a Patient Enters Data that Sets off a Chain BMI: If a Patient Enters Data that Sets off a Chain

Reaction, how Can Provider be Notified and in Turn Reaction, how Can Provider be Notified and in Turn the Provider Notify the Patient (Bad Health Event)the Provider Notify the Patient (Bad Health Event)

Page 84: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-84

CSE300

What is the Problem?What is the Problem? Applications are Asymmetric but the Web is NotApplications are Asymmetric but the Web is Not

Computation Centric vs. Information Flow Centric Type of AsymmetryType of Asymmetry

Network Asymmetry Satellite, CATV, Mobile Clients, Etc.

Client to Server Ratio Too Many Clients can Swamp Servers

Data Volume Mouse and Key Click vs. Content Delivery

Update and Information Creation Clients Need to be Informed or Must Poll

Clearly, for BMI, Simple Web Environment/Browser Clearly, for BMI, Simple Web Environment/Browser is Not Sufficient – No Auto-Notificationis Not Sufficient – No Auto-Notification

Page 85: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-85

CSE300

What are Information Delivery Styles?What are Information Delivery Styles? Pull-Based SystemPull-Based System

Transfer of Data from Server to Client is Initiated by a Client Pull

Clients Determine when to Get Information Potential for Information to be Old Unless Client

Periodically Pulls Push-Based SystemPush-Based System

Transfer of Data from Server to Client is Initiated by a Server Push

Clients may get Overloaded if Push is Too Frequent

HybridHybrid Pull and Push Combined Pull First and then Push Continually

Page 86: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-86

CSE300

Publish/SubscribePublish/Subscribe Semantics: Servers Publish/Clients SubscribeSemantics: Servers Publish/Clients Subscribe

Servers Publish Information Online Clients Subscribe to the Information of Interest

(Subscription-based Information Delivery) Data Flow is Initiated by the Data Sources

(Servers) and is Aperiodic Danger: Subscriptions can Lead to Other

Unwanted Subscriptions ApplicationsApplications

Unicast: Database Triggers and Active Databases 1-to-n: Online News Groups

May work for Clinical Researcher to Provider PushMay work for Clinical Researcher to Provider Push

Page 87: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-87

CSE300

Design Options for NodesDesign Options for Nodes Three Types of Nodes:Three Types of Nodes:

Data Sources Provide Base Data which is to be Disseminated

Clients Who are the Net Consumers of the Information

Information Brokers Acquire Information from Other Data Sources, Add

Value to that Information and then Distribute this Information to Other Consumers

By Creating a Hierarchy of Brokers, Information Delivery can be Tailored to the Need of Many Users

Brokers may be Ideal Intermediaries for BMI!Brokers may be Ideal Intermediaries for BMI! Act on Behalf of Patients, Providers Incorporate Secure Access

Page 88: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-88

CSE300 Ubiquitous/Pervasive

Many computers and information appliances everywhere,

networked together

Research ChallengesResearch Challenges Inherent Complexity:Inherent Complexity:

Coping with Latency (Sometimes Unpredictable)

Failure Detection and Recovery (Partial Failure)

Concurrency, Load Balancing, Availability, Scale

Service Partitioning Ordering of Distributed Events

““Accidental” Complexity:Accidental” Complexity: Heterogeneity: Beyond the Local

Case: Platform, Protocol, Plus All Local Heterogeneity in Spades.

Autonomy: Change and Evolve Autonomously

Tool Deficiencies: Language Support (Sockets,rpc), Debugging, Etc.

Page 89: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-89

CSE300

Problem: too many sources,too much information

Internet:Information Jungle

Clean, Reliable,Timely Information,Anywhere

DigitalEarth

Sensors

PersonalizedFiltering &Info. Delivery

Infopipes

Resou

rce A

dapta

tion Property Mgmt

Information QualityContinual Queries

Mic

rofe

edba

ck

specializationInfosphereInfosphere

Page 90: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-90

CSE300

ThinClient

WebServer

MainframeDatabaseServer

Current State-of-ArtCurrent State-of-Art

Page 91: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-91

CSE300 Infotaps &

Fat Clients

Varietyof Servers

Sensors

DatabaseServer

Many sources

Infosphere Scenario – for BMIInfosphere Scenario – for BMI

Page 92: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-92

CSE300

Heterogeneity and AutonomyHeterogeneity and Autonomy Heterogeneity:Heterogeneity:

How Much can we Really Integrate? Syntactic Integration

Different Formats and Models Web/SQL Query Languages

Semantic Interoperability Basic Research on Ontology, Etc

AutonomyAutonomy No Central DBA on the Net Independent Evolution of Schema and Content Interoperation is Voluntary Interface Technology (Support for Isvs)

DCOM: Microsoft Standard CORBA, Etc...

Page 93: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-93

CSE300

Security and Data QualitySecurity and Data Quality SecuritySecurity

System Security in the Broad Sense Attacks: Penetrations, Denial of Service System (and Information) Survivability

Security Fault Tolerance Replication for Performance, Availability, and

Survivability Data QualityData Quality

Web Data Quality Problems Local Updates with Global Effects Unchecked Redundancy (Mutual Copying) Registration of Unchecked Information Spam on the Rise

Page 94: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-94

CSE300

Legacy Data ChallengeLegacy Data Challenge Legacy Applications and DataLegacy Applications and Data

Definition: Important and Difficult to Replace Typically, Mainframe Mission Critical Code Most are OLTP and Database Applications

Evolution of Legacy DatabasesEvolution of Legacy Databases Client-server Architectures Wrappers Expensive and Gradual in Any Case

Page 95: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-95

CSE300

Potential Value Added/Jumping on BandwagonPotential Value Added/Jumping on Bandwagon Sophisticated Query CapabilitySophisticated Query Capability

Combining SQL with Keyword Queries Consistent UpdatesConsistent Updates

Atomic Transactions and Beyond But Everything has to be in a Database!But Everything has to be in a Database!

Only If we Stick with Classic DB Assumptions Relaxing DB AssumptionsRelaxing DB Assumptions

Interoperable Query Processing Extended Transaction Updates

Commodities DB SoftwareCommodities DB Software A Little Help is Still Good If it is Cheap Internet Facilitates Software Distribution Databases as Middleware

Page 96: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-96

CSE300

Data Warehousing and Data MiningData Warehousing and Data Mining Data WarehousingData Warehousing

Provide Access to Data for Complex Analysis, Knowledge Discovery, and Decision Making

Underlying Infrastructure in Support of Mining Provides Means to Interact with Multiple DBs OLAP (on-Line Analytical Processing) vs. OLTP

Data MiningData Mining Discovery of Information in a Vast Data Sets Search for Patterns and Common Features based Discover Information not Previously Known

Medical Records Accessible Nationwide Research/Discover Cures for Rare Diseases

Relies on Knowledge Discovery in DBs (KDD)

Page 97: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-97

CSE300

Data Warehousing and OLAPData Warehousing and OLAP A Data Warehouse A Data Warehouse

Database is Maintained Separately from an Operational Database

“A Subject-Oriented, Integrated, Time-Variant, and Non-Volatile Collection of Data in Support for Management’s Decision Making Process [W.H.Inmon]”

OLAP (on-Line Analytical Processing)OLAP (on-Line Analytical Processing) Analysis of Complex Data in the Warehouse Attempt to Attain “Value” through Analysis Relies on Trained and Adept Skilled Knowledge

Workers who Discover Information Data MartData Mart

Organized Data for a Subset of an Organization Establish De-Identified Marts for BMI Research

Page 98: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-98

CSE300

Corporate data warehouse

Data Mart Data MartData MartData Mart

Corporate data

Option 1:Consolidate Data Marts

Option 2:Build from scratch

...

Building a Data WarehouseBuilding a Data Warehouse Option 1Option 1

Leverage Existing Repositories

Collate and Collect May Not Capture All

Relevant Data

Option 2Option 2 Start from Scratch Utilize Underlying

Corporate Data

Page 99: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-99

CSE300

BMI data warehouse

Data Mart Data MartData MartData Mart...

BMI – Partition/Excerpt Data WarehouseBMI – Partition/Excerpt Data Warehouse Clinical and Epidemiological Research (and for T2 and T1) Clinical and Epidemiological Research (and for T2 and T1)

Each Study Submitted to Institutional Review Board (IRB)Each Study Submitted to Institutional Review Board (IRB) For Human Subjects (Assess Risks, Protect Privacy) See: http://resadm.uchc.edu/hspo/irb/

To Satisfy IRB (and Privacy, Security, etc.), Reverse Process to To Satisfy IRB (and Privacy, Security, etc.), Reverse Process to Create a Data Mart for each Approved StudyCreate a Data Mart for each Approved Study Export/Excerpt Study Data from Warehouse May be Single or Multiple Sources

Page 100: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-100

CSE300

Data Warehouse CharacteristicsData Warehouse Characteristics Utilizes a “Multi-Dimensional” Data ModelUtilizes a “Multi-Dimensional” Data Model Warehouse Comprised ofWarehouse Comprised of

Store of Integrated Data from Multiple Sources Processed into Multi-Dimensional Model

Warehouse Supports ofWarehouse Supports of Times Series and Trend Analysis “Super-Excel” Integrated with DB Technologies

Data is Less Volatile than Regular DB Data is Less Volatile than Regular DB Doesn’t Dramatically Change Over Time Updates at Regular Intervals Specific Refresh Policy Regarding Some Data

Page 101: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-101

CSE300

External data sources

metadata

Operational databasesExtraxtTransformLoadRefresh

monitor

integrator

Data Warehouse

Data marts

OLAP Server

Summarizationreport

Query report

Data mining

serve

Three Tier ArchitectureThree Tier Architecture

Page 102: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-102

CSE300

Data Warehouse DesignData Warehouse Design Most of Data Warehouses use a Start Schema to Most of Data Warehouses use a Start Schema to

Represent Multi-Dimensional Data ModelRepresent Multi-Dimensional Data Model Each Dimension is Represented by a Each Dimension is Represented by a Dimension Dimension

TableTable that Provides its Multidimensional Coordinates that Provides its Multidimensional Coordinates and Stores Measures for those Coordinatesand Stores Measures for those Coordinates

A A Fact TableFact Table Connects All Dimension Tables with a Connects All Dimension Tables with a Multiple JoinMultiple Join Each Tuple in Fact Table Represents the Content

of One Dimension Each Tuple in the Fact Table Consists of a Pointer

to Each of the Dimensional Tables Links Between the Fact Table and the Dimensional

Tables for a Shape Like a Star

Page 103: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-103

CSE300

What is a Multi-Dimensional Data Cube?What is a Multi-Dimensional Data Cube? Representation of Information in Two or More Representation of Information in Two or More

DimensionsDimensions Typical Two-Dimensional - SpreadsheetTypical Two-Dimensional - Spreadsheet In Practice, to Track Trends or Conduct Analysis, In Practice, to Track Trends or Conduct Analysis,

Three or More Dimensions are UsefulThree or More Dimensions are Useful For BMI – Axes for Diagnosis, Drug, Subject AgeFor BMI – Axes for Diagnosis, Drug, Subject Age

Page 104: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-104

CSE300

Multi-Dimensional SchemasMulti-Dimensional Schemas Supporting Multi-Dimensional Schemas Requires Supporting Multi-Dimensional Schemas Requires

Two Types of Tables:Two Types of Tables: Dimension Table: Tuples of Attributes for Each

Dimension Fact Table: Measured/Observed Variables with

Pointers into Dimension Table Star SchemaStar Schema

Characterizes Data Cubes by having a Single Fact Table for Each Dimension

Snowflake SchemaSnowflake Schema Dimension Tables from Star Schema are

Organized into Hierarchy via Normalization Both Represent Storage Structures for CubesBoth Represent Storage Structures for Cubes

Page 105: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-105

CSE300

Date

Product

Store

Customer

Unit_Sales

Dollar_Sales

ProductNoProdNameProdDescCategoryu

Product

CustIDCustNameCustCityCustCountry

Customer

DateMonthYear

Date

StoreIDCityStateCountryRegion

Store

Sale Fact Table

Example of Star SchemaExample of Star Schema

Page 106: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-106

CSE300

Visit Date

Vitals

Symptoms

Patient

Medications

Etc.

BPTempRespHR (Pulse)

Vitals

PatientIDPatientNamePatientCityPatientCountry

Patient

DateMonthYear

Date

PulmonaryHeartMus-SkelSkinDigestive

Symptoms

Patient Fact Table

Example of Star Schema for BMIExample of Star Schema for BMI

Reference another StarSchema for all Meds

Page 107: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-107

CSE300

A Second Example of Star Schema … A Second Example of Star Schema …

Page 108: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-108

CSE300

and Corresponding Snowflake Schemaand Corresponding Snowflake Schema

Page 109: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-109

CSE300

Data Warehouse IssuesData Warehouse Issues Data AcquisitionData Acquisition

Extraction from Heterogeneous Sources Reformatted into Warehouse Context - Names,

Meanings, Data Domains Must be Consistent Data Cleaning for Validity and Quality

is the Data as Expected w.r.t. Content? Value? Transition of Data into Data Model of Warehouse Loading of Data into the Warehouse

Other Issues Include:Other Issues Include: How Current is the Data? Frequency of Update? Availability of Warehouse? Dependencies of Data? Distribution, Replication, and Partitioning Needs? Loading Time (Clean, Format, Copy, Transmit,

Index Creation, etc.)? For CTSA – Data Ownership (Competing Hosps).

Page 110: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-110

CSE300

Knowledge DiscoveryKnowledge Discovery Data Warehousing Requires Knowledge Discovery to Data Warehousing Requires Knowledge Discovery to

Organize/Extract Information MeaningfullyOrganize/Extract Information Meaningfully Knowledge DiscoveryKnowledge Discovery

Technology to Extract Interesting Knowledge (Rules, Patterns, Regularities, Constraints) from a Vast Data Set

Process of Non-trivial Extraction of Implicit, Previously Unknown, and Potentially Useful Information from Large Collection of Data

Data MiningData Mining A Critical Step in the Knowledge Discovery

Process Extracts Implicit Information from Large Data Set

Page 111: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-111

CSE300

Steps in a KDD ProcessSteps in a KDD Process Learning the Application Domain (goals)Learning the Application Domain (goals) Gathering and Integrating DataGathering and Integrating Data Data CleaningData Cleaning Data IntegrationData Integration Data Transformation/ConsolidationData Transformation/Consolidation Data MiningData Mining

Choosing the Mining Method(s) and Algorithm(s) Mining: Search for Patterns or Rules of Interest

Analysis and Evaluation of the Mining ResultsAnalysis and Evaluation of the Mining Results Use of Discovered Knowledge in Decision MakingUse of Discovered Knowledge in Decision Making Important CaveatsImportant Caveats

This is Not an Automated Process! Requires Significant Human Interaction!

Page 112: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-112

CSE300

OLAP StrategiesOLAP Strategies OLAP Strategies OLAP Strategies

Roll-Up: Summarization of Data Drill-Down: from the General to Specific (Details) Pivot: Cross Tabulate the Data Cubes Slide and Dice: Projection Operations Across

Dimensions Sorting: Ordering Result Sets Selection: Access by Value or Value Range

Implementation IssuesImplementation Issues Persistent with Infrequent Updates (Loading) Optimization for Performance on Queries is More

Complex - Across Multi-Dimensional Cubes Recovery Less Critical - Mostly Read Only Temporal Aspects of Data (Versions) Important

Page 113: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-113

CSE300

Product

Product Store Date Sale

acron Rolla,MO 7/3/99 325.24

budwiser LA,CA 5/22/99 833.92

large pants NY,NY 2/12/99 771.24

3’ diaper Cuba,MO 7/30/99 81.99

PantsDiapers

BeerNuts

West

East

Central

Mountain

South

Jan Feb March April

Date

Region

On-Line Analytical ProcessingOn-Line Analytical Processing Data CubeData Cube

A Multidimensonal Array Each Attribute is a Dimension

In Example Below, the Data Must be Interpreted so In Example Below, the Data Must be Interpreted so that it Can be Aggregated by Region/Product/Datethat it Can be Aggregated by Region/Product/Date

Page 114: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-114

CSE300

Medication

Patient Med BirthDat Dosage

Steve Lipitor 1/1/45 10mg

John Zocor 2/2/55 80mg

Harry Crestor 3/3/65 5mg

Lois Lipitor 4/4/66 20mg

Charles Crestor 7/1/59 10mg

LescolCrestor

ZocorLipitor

5

10

20

40

80

1940s 1950s 1960s 1970s

Decade

Dosage

On-Line Analytical ProcessingOn-Line Analytical Processing For BMI – Imagine a Data Table with Patient DataFor BMI – Imagine a Data Table with Patient Data

Define Axis Summarize Data Create Perspective to Match Research Goal Essentially De-identified Data Mart

Page 115: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-115

CSE300

Months

Cities

Prod

ucts

Sal

es

Multi-Dimensional Data Cube

Months

Cities

Prod

ucts

Sal

es

Slice on city Atlanta

Examples of Data MiningExamples of Data Mining The Slicing ActionThe Slicing Action

A Vertical or Horizontal Slice Across Entire Cube

Page 116: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-116

CSE300

March 2000

Atla

nta

Electronics Dice on Electronics and Atlanta

Months

Cities

Prod

ucts

Sal

es

Examples of Data MiningExamples of Data Mining The Dicing ActionThe Dicing Action

A Slide First Identifies on Dimension A Selection of Any Cube within the Slice which Essentially

Constrains All Three Dimensions

Prod

ucts

Sal

es

Months

Atlanta

Page 117: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-117

CSE300

Examples of Data MiningExamples of Data Mining

Drill Down - Takes a Facet (e.g., Q1) Drill Down - Takes a Facet (e.g., Q1) and Decomposes into Finer Detail and Decomposes into Finer Detail

Q1 Q2 Q3 Q4

Location (city, GA)

Pro

duct

s Sa

les

Jan Feb March

Citi

esP

rodu

cts

Sale

s

Drill down on Q1

Roll Up on Location(State, USA)

Atlanta

Columbus

Gainesville

Savannah

Q1 Q2 Q3 Q4

Pro

duct

s Sa

les

Arizona

CaliforniaGeorgiaIowa

Roll Up: Combines Multiple DimensionsRoll Up: Combines Multiple DimensionsFrom Individual Cities to StateFrom Individual Cities to State

Page 118: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-118

CSE300

Time series data

Geographical and Satellite Data

Spatial databases

Multimedia databases

World Wide Web

Mining Other Types of DataMining Other Types of Data Analysis and Access Dramatically More Complicated!Analysis and Access Dramatically More Complicated! Time Series Data for Glucose, BP, Peak Flow, etc.Time Series Data for Glucose, BP, Peak Flow, etc.

Page 119: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-119

CSE300

Advantages/Objectives of Data MiningAdvantages/Objectives of Data Mining Descriptive MiningDescriptive Mining

Discover and Describe General Properties 60% People who buy Beer on Friday also have

Bought Nuts or Chips in the Past Three Months Predictive MiningPredictive Mining

Infer Interesting Properties based on Available Data

People who Buy Beer on Friday usually also Buy Nuts or Chips

Result of MiningResult of Mining Order from Chaos Mining Large Data Sets in Multiple Dimensions

Allows Businesses, Individuals, etc. to Learn about Trends, Behavior, etc.

Impact on Marketing Strateg

Page 120: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-120

CSE300

Data Mining Methods (1)Data Mining Methods (1) AssociationAssociation

Discover the Frequency of Items Occurring Together in a Transaction or an Event

Example 80% Customers who Buy Milk also Buy Bread

Hence - Bread and Milk Adjacent in Supermarket 50% of Customers Forget to Buy Milk/Soda/Drinks

Hence - Available at Register PredictionPrediction

Predicts Some Unknown or Missing Information based on Available Data

Example Forecast Sale Value of Electronic Products for Next

Quarter via Available Data from Past Three Quarters

Page 121: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-121

CSE300

Association RulesAssociation Rules Motivated by Market AnalysisMotivated by Market Analysis Rules of the Form Rules of the Form

Item1^Item2^…^ ItemkItemk+1 ^ … ^ Itemn ExampleExample

“Beer ^ Soft Drink Pop Corn” Problem: Discovering All Interesting Association Problem: Discovering All Interesting Association

Rules in a Large Database is Difficult!Rules in a Large Database is Difficult! Issues

Interestingness Completeness Efficiency

Basic Measurement for Association Rules Support of the Rule Confidence of the Rule

Page 122: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-122

CSE300

Data Mining Methods (2)Data Mining Methods (2) ClassificationClassification

Determine the Class or Category of an Object based on its Properties

Example Classify Companies based on the Final Sale Results in

the Past Quarter ClusteringClustering

Organize a Set of Multi-dimensional Data Objects in Groups to Minimize Inter-group Similarity is and Maximize Intra-group Similarity

Example Group Crime Locations to Find Distribution Patterns

Page 123: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-123

CSE300

ClassificationClassification Two StagesTwo Stages

Learning Stage: Construction of a Classification Function or Model

Classification Stage: Predication of Classes of Objects Using the Function or Model

Tools for ClassificationTools for Classification Decision Tree Bayesian Network Neural Network Regression

ProblemProblem Given a Set of Objects whose Classes are Known

(Training Set), Derive a Classification Model which can Correctly Classify Future Objects

Page 124: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-124

CSE300

AttributesAttributes

Class Attribute - Play/Don’t Play the GameClass Attribute - Play/Don’t Play the Game Training SetTraining Set

Values that Set the Condition for the Classification What are the Pattern Below?

Attribute Possible Valuesoutlook sunny, overcast, raintemperature continuoushumidity continuouswindy true, false

Outlook Temperature Humidity Windy Playsunny 85 85 false Noovercast 83 78 false Yessunny 80 90 true Nosunny 72 95 false Nosunny 72 70 false Yes… … … … ...

An ExampleAn Example

Page 125: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-125

CSE300

Data Mining Methods (3)Data Mining Methods (3) SummarizationSummarization

Characterization (Summarization) of General Features of Objects in the Target Class

Example Characterize People’s Buying Patterns on the Weekend Potential Impact on “Sale Items” & “When Sales Start” Department Stores with Bonus Coupons

DiscriminationDiscrimination Comparison of General Features of Objects

Between a Target Class and a Contrasting Class Example

Comparing Students in Engineering and in Art Attempt to Arrive at Commonalities/Differences

Page 126: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-126

CSE300

barcode category brand content size

14998 milk diaryland Skim 2L

12998 mechanical MotorCraft valve 23a 12in

… … … … ...

food

Milk … bread

Skim milk … 2% milk White whole bread … wheat

Lucern … DairylandWonder … Safeway

Category Content Count

milk skim 280milk 2% 98… … ...

Summarization TechniqueSummarization Technique Attribute-Oriented Induction Attribute-Oriented Induction Generalization using Concert hierarchy (Taxonomy)Generalization using Concert hierarchy (Taxonomy)

Page 127: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-127

CSE300

Why is Data Mining Popular?Why is Data Mining Popular? Technology PushTechnology Push

Technology for Collecting Large Quantity of Data Bar Code, Scanners, Satellites, Cameras

Technology for Storing Large Collection of Data Databases, Data Warehouses Variety of Data Repositories, such as Virtual Worlds,

Digital Media, World Wide Web Corporations want to Improve Direct Marketing and Corporations want to Improve Direct Marketing and

Promotions - Driving Technology AdvancesPromotions - Driving Technology Advances Targeted Marketing by Age, Region, Income, etc. Exploiting User Preferences/Customized Shopping

What is Potential for BMI?What is Potential for BMI? How do you see Data Mining Utilized? What are Key Issues to Worry About?

Page 128: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-128

CSE300

Requirements & Challenges in Data MiningRequirements & Challenges in Data Mining Security and Social Security and Social

What Information is Available to Mine? Preferences via Store Cards/Web Purchases What is Your Comfort Level with Trends?

User Interfaces and VisualizationUser Interfaces and Visualization What Tools Must be Provided for End Users of

Data Mining Systems? How are Results for Multi-Dimensional Data

Displayed? Performance GuaranteesPerformance Guarantees

Range from Real-Time for Some Queries to Long-Term for Other Queries

Data Sources of Complex Data Types or Unstructured Data Sources of Complex Data Types or Unstructured Data - Ability to Format, Clean, and Load Data SetsData - Ability to Format, Clean, and Load Data Sets

Page 129: IIE-1 CSE 300 Informatics and Information Engineering Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

IIE-129

CSE300

Concluding RemarksConcluding Remarks We’ve looked at:We’ve looked at:

Informatics Information Engineering Information Usage and Repositories

Focused on Their Applicability and Relevance for Focused on Their Applicability and Relevance for BMIBMI

Likely Generated More Questions than AnswersLikely Generated More Questions than Answers