iie-1 cse 300 informatics and information engineering prof. steven a. demurjian, sr. computer...
TRANSCRIPT
IIE-1
CSE300
Informatics and Information EngineeringInformatics and Information Engineering
Prof. Steven A. Demurjian, Sr.Computer Science & Engineering Department
The University of Connecticut371 Fairfield Road, Box U-255
Storrs, CT 06269-2155
[email protected]://www.engr.uconn.edu/
~steve(860) 486 - 4818
Copyright © 2008 by S. Demurjian, Storrs, CT. Portions of these slides are being used with the permission of Dr. Ling Lui, Associate Professor, College of Computing, Georgia Tech.
IIE-2
CSE300
Overview Overview Informatics Informatics
What is Informatics? What is Biomedical Informatics? What are Key Biomedical Informatics Challenges?
Information EngineeringInformation Engineering Data vs. Information vs. Knowledge What is Science? What is Engineering? What is Information Consistency?
Information Usage and RepositoriesInformation Usage and Repositories How do we Store and Utilize Information? Role of Web in Informatics Sharing, Collaboration, and Security Databases vs. Data Mining
IIE-3
CSE300
InformaticsInformatics Informatics is:Informatics is:
Management and Processing of Data From Multiple Sources/Contexts Involves Classification (Ontologies), Collection,
Storage, Analysis, Dissemination Informatics is Multi-DisciplinaryInformatics is Multi-Disciplinary
Computing (Model, Store, Process Information) Social Science (User Interactions, HCI) Statistics (Analysis)
Informatics Can Apply to Multiple Domains:Informatics Can Apply to Multiple Domains: Business, Biology, Fine Arts, Humanities Pharmacology, Nursing, Medicine, etc.
IIE-4
CSE300
What is Informatics?What is Informatics? Heterogeneous Field – Heterogeneous Field –
Interaction between Interaction between People, Information and People, Information and TechnologyTechnology Computer Science
and Engineering Social Science
(Human Computer Interface)
Information Science (Data Storage, Retrieval and Mining)
People
Information Technology
Informatics
Adapted from Shortcliff textbook
IIE-5
CSE300
What is Biomedical Informatics (BMI)?What is Biomedical Informatics (BMI)? BMI is Information and its Usage Associated with the BMI is Information and its Usage Associated with the
Research and Practice of Medicine Including:Research and Practice of Medicine Including: Clinical Informatics for Patient Care
Medical Record + Personal Health Record Bioinformatics for Research/Biology to Bedside
From Genomics To Proteomics Public Health Informatics (State and Federal)
Tracking Trends in Public Sector Clinical Research Informatics
Deidentified Repositories and Databases Facilitate Epidemiological Research and Ongong
Clinical Studies (Drug Trails, Data Analysis, etc.)
IIE-6
CSE300
What are Key BMI Focal Areas?What are Key BMI Focal Areas? T1 Research T1 Research
Transition Bench Results into Clinical Research Clinical ResearchClinical Research
Applying Clinical Research Results via Trials with Patients on Medication, Devices, Treatment Plans
T2 Research T2 Research Translating “Successful” Clinical Trials into
Practice and the Community Clinical Practice Clinical Practice
Tracking all of the Information Associated with a Patient and his/her Care
Integrated and Inter-Disciplinary Information Integrated and Inter-Disciplinary Information Spectrum Spectrum
IIE-7
CSE300
What is Medical Informatics?What is Medical Informatics? Clinical Informatics, Pharmacy InformaticsClinical Informatics, Pharmacy Informatics Public Health InformaticsPublic Health Informatics Consumer Health InformaticsConsumer Health Informatics Nursing InformaticsNursing Informatics Systems and People Issues Systems and People Issues
Intended to Improve Clinical outcomes, Satisfaction and Efficiency
Workflow Changes, Business Implications, Implementation, etc…
Patient Centered – Personal Health Record and Medical Home
Care Centered – Pay for Performance, Improving Treatment Compliance
IIE-8
CSE300
What is Bionformatics? What is Bionformatics? Focused on Research Tools for T1:Focused on Research Tools for T1:
Genomic and Proteomic Tools, Evaluation Methods, Computing And Database Needs
Information Retrieval and Manipulation of Large Distributed (caBIG) Data Sets (cabig.cancer.gov/index.asp)
Often Requires Grid Computing Includes Cancer and Immunology Research
Increasing Need to Tie These Separate Types of Increasing Need to Tie These Separate Types of Systems Together = Personalized MedicineSystems Together = Personalized Medicine
Biology and the Bedside (Biology and the Bedside (www.i2b2.orgwww.i2b2.org))
IIE-9
CSE300
Where is Data/How is it Used?Where is Data/How is it Used? Medical And Administrative Data Found in Clinical Medical And Administrative Data Found in Clinical
Information Systems (CIS) Such As:Information Systems (CIS) Such As: Hospital Info. Systems Electronic Medical Records Personal Health Records… Pharmacy Nursing, Picture Archiving Systems Complex Data Storage and Retrieval – Many
Different Systems T1 Research Increasingly Reliant on CIST1 Research Increasingly Reliant on CIS T2 Research is Reliant on:T2 Research is Reliant on:
End Systems for Embedding EBM (Evidence-Based Medicine) Guidelines
Measuring Outcomes, Looking at Policy
IIE-10
CSE300
What are Major Informatics Challenges?What are Major Informatics Challenges? Shortage of Trained People NationallyShortage of Trained People Nationally Slows adoption of Health Information TechnologySlows adoption of Health Information Technology Results in Poor Planning and Coordination, Results in Poor Planning and Coordination,
Duplication of Efforts and Incomplete EvaluationDuplication of Efforts and Incomplete Evaluation What are Critical Needs?What are Critical Needs?
Dually Trained Clinicians or Researchers in Leadership of some Initiatives
Connect all folks with Informatics Roles across Institutions to Improve Efficiency
Multi-Disciplinary: CSE, Statistics, Biology, Medicine, Nursing, Pharmacy, etc.
Emerging Standards for Information Modeling and Emerging Standards for Information Modeling and Exchange (Exchange (www.hl7.orgwww.hl7.org) based on XML) based on XML
IIE-11
CSE300
Information EngineeringInformation Engineering Data vs. Information vs. KnowledgeData vs. Information vs. Knowledge
How do we Differentiate Between them? Where are they used in BMI?
Science vs. EngineeringScience vs. Engineering What is each of their Roles in Informatics? How can we Engineer Information? What is their Role in BMI?
What is Information Engineering?What is Information Engineering? What are the Unique Challenges and
Opportunities? What is Available Today and Tomorrow?
IIE-12
CSE300
From American HeritageFrom American Heritage DataData
Information, esp. information organized for analysis or used as the basis for a decision.
Numerical information in a form suitable for processing by computer.
InformationInformation The act of informing or the condition of being
informed; communication of knowledge. A non-accidental signal used as an input to a
computer or communications system. KnowledgeKnowledge
The state or fact of knowing. The sum or range of what has been perceived,
discovered, or learned. Specific information about something.
IIE-13
CSE300
From Webster’s 9From Webster’s 9thth Collegiate Collegiate DataData
Factual information (e.g. statistics) used as a basis for reasoning, discussion, or calculation.
InformationInformation The communication of knowledge or intelligence Something (as a message, experimental data, or a
picture) which justifies change in a construct (as a plan or theory) that represents physical or mental experience or another construct
quantitative measure of the content of information KnowledgeKnowledge
The fact or condition of having information or of being learned.
The sum of what is known: the body of truth, information, and principles acquired by mankind.
IIE-14
CSE300
Data vs. Information vs. KnowledgeData vs. Information vs. Knowledge Overlapping DefinitionsOverlapping Definitions Conflicting DefinitionsConflicting Definitions Agreement on DataAgreement on Data Knowledge and Information - SynonymsKnowledge and Information - Synonyms Discussion Questions:Discussion Questions:
Equivalence of Knowledge/Information? How can we Distinguish them? Do these Three Terms Cover Possibilities?
IIE-15
CSE300
Data, Information, and Knowledge in BMIData, Information, and Knowledge in BMI Data – Basic LevelData – Basic Level
BP, Pulse, Temperature Peak Flow, Glucose Level, Biopsy Result X-Ray, MRI, Cat Scan
Information - First level of InterpretationInformation - First level of Interpretation BPs, Peak Flow, Glucose over Time Interpreting Scan (Radiologist) or Biopsy Result
(Oncologist) Knowledge – Applying Experience towards DiagnosisKnowledge – Applying Experience towards Diagnosis
What can Low Peak Flows over Time lead to? What Next Step after Positive Scan or Biopsy? What if Glucose Level is Yo-yoing?
IIE-16
CSE300
From American HeritageFrom American Heritage ScienceScience
The observation, identification, description, experimental investigation, and theoretical explanation of natural phenomena.
Methodologoical activity, discipline, or study. An activity that appears to require study & method. Knowledge, esp. gained through experience.
EngineeringEngineering The application of scientific and mathematical
principles to practical ends such as the design, construction, and operation of efficient and economical structures, equipment, and systems.
IIE-17
CSE300
From Webster’s 9From Webster’s 9thth Collegiate Collegiate ScienceScience
The state of knowing: knowledge as distinguished from ignorance or misunderstanding
A department of systemized knowledge as an object of study
A system or method reconciling practical ends with scientific laws.
EngineeringEngineering The application of science and mathematics by
which the properties of matter and the sources of energy in nature are made useful to people in structures, machines, products, systems, and processes.
IIE-18
CSE300
Science and Engineering in BMIScience and Engineering in BMI ScienceScience
Data/Information Collection & Analysis to Reach Hypothesis
Patients with CHF and Lipitor have Less Heart Attacks than CHF and Baby Aspirin
Verify in Clinical Research/Epidemiological Study EngineeringEngineering
Usage of Information in Practice Apply Scientific Results to Medical Practice Image Processing used to Identify Tumors in CT
and MRI Scans Transfer of Radiologists Knowledge into
Computer Based (Assisted) Solution An Engineering Solution to Scientific Result
IIE-19
CSE300
What is Information Engineering?What is Information Engineering? Incorporation of an Engineering Approach and Incorporation of an Engineering Approach and
Discipline to the Generation of Information and the Discipline to the Generation of Information and the Promotion of the Better Use of Information and Promotion of the Better Use of Information and Resources Information Engineering Unifies and Resources Information Engineering Unifies and Combines:Combines: Software Engineering Database Engineering Security Engineering Performance Engineering Etc...
Moral: Systems Cannot and Must Not be Engineered Moral: Systems Cannot and Must Not be Engineered in a Vacuum!in a Vacuum!
Particularly true in BMI (T1, T2, Clinical Research, Particularly true in BMI (T1, T2, Clinical Research, and Clinical Practice)and Clinical Practice)
IIE-20
CSE300
Information Engineering is Motivated by:Information Engineering is Motivated by: Realization that Management/Control of Information Realization that Management/Control of Information
will be a Primary Concern as we Continue through the will be a Primary Concern as we Continue through the 1990s and into the 21st Century1990s and into the 21st Century
Currently in an Age of Information - Volume and Currently in an Age of Information - Volume and Complexity DependenciesComplexity Dependencies
Critical Systems Heavily Depend on Information:Critical Systems Heavily Depend on Information: Airline/Hotel/Auto Reservations Telecommunications Banking/ATMs ATM/Credit Cards at Gas Stations/Supermarkets Credit Bureaus Electronically Collect Information
from Many Diverse Sources E-Tailing Medical Care/All Aspects of BMI
IIE-21
CSE300
Info. Engrg. - Challenge for 21st CenturyInfo. Engrg. - Challenge for 21st Century Timely and Efficient Utilization of InformationTimely and Efficient Utilization of Information
Significantly Impacts on Productivity Supports and Promotes Collaboration for
Competitive Advantage Use Information in New and Different Ways
Collection, Synthesis, Analyses of InformationCollection, Synthesis, Analyses of Information Better Understanding of Processes, Sales,
Productivity, etc. Dissemination of Only Relevant/Significant
Information - Reduce Overload Implications for BMI?Implications for BMI?
Sharing of Results – Benefit Mankind Ability to Research on Rare Diseases Are there Unknown Isolated “Cures”?
IIE-22
CSE300
How is Information Engineered?How is Information Engineered? Careful Thought to its Definition/Purpose & Thorough Careful Thought to its Definition/Purpose & Thorough
Understanding of its Intended Usage/Potential ImpactUnderstanding of its Intended Usage/Potential Impact Insure and Maintain its ConsistencyInsure and Maintain its Consistency
Quality, Correctness, and Relevance Protect and Control its Availability (Secure Access)Protect and Control its Availability (Secure Access)
Who can Access What Information in Which Location and at What Time?
Long-Term Persistent Storage/RecoverabilityLong-Term Persistent Storage/Recoverability Cost, Reusability, Longitudinal, and Cumulative
Experience Integration of Past, Present and Future Information via Integration of Past, Present and Future Information via
Intranet and Internet AccessIntranet and Internet Access What are Implications/Challenges for BMI?What are Implications/Challenges for BMI?
Let’s Discuss Briefly…
IIE-23
CSE300
Towards Information ConsistencyTowards Information Consistency Consistency of Information is Key!Consistency of Information is Key! Consistency Gauged with respect to:Consistency Gauged with respect to:
Usage of Information Persistency of Information Integrity/Security of Information
Allowable Values and Protection from Misuse Validity (Relevance) of Information
Means Something to Someone in a Postive Way Discussion Questions:Discussion Questions:
Why is Consistency Important for BMI? How is Consistency Attained for BMI? What Else Impacts Consistency BMI?
IIE-24
CSE300
What's Available to Support IE?What's Available to Support IE? What Can be Provided to Make the Advanced What Can be Provided to Make the Advanced
Application Design Process:Application Design Process: More Complete? More Robust? More Responsive? Less Error Prone?
Current Choices to Support Information Engineering:Current Choices to Support Information Engineering: Conventional Programming Languages and Data
Models Object-Oriented Programming Languages Object-Oriented DBS XML Databases Middleware and SOA (Web) Data Mining/Warehouses
IIE-25
CSE300
What are Key Questions?What are Key Questions? Focus on Information and its BehaviorFocus on Information and its Behavior
What are Different Kinds of Information? How is Information Manipulated? Is Same Information Stored in Different Ways? What are Information Interdependencies? Will Information Persist? Long-Term DB?
Versions of Information? What Past Info. is Needed from Legacy DBs or
Applications? Who Needs Access to What Info. When? What Information is Available Across WWW?
All of these Questions Apply to BMI!All of these Questions Apply to BMI!
IIE-26
CSE300
Information Usage and RepositoriesInformation Usage and Repositories How do we Store and Utilize Information?How do we Store and Utilize Information?
Databases Data Mining
What are Key Issues?What are Key Issues? Information Sharing/Data Correctness Collaboration
1. Among Providers and Researchers
2. Among Providers and Patients
3. Among Patients (Support Groups) Security
1. Control of Patient Information (De-identified)
2. Secure Exchange/Patient Ownership
3. Establish Custom Patient Controlled Groups What is the Role of Web in Informatics?What is the Role of Web in Informatics?
IIE-27
CSE300
The Role of a DatabaseThe Role of a Database Database is a Norm in Today's and Tomorrow's Database is a Norm in Today's and Tomorrow's
ApplicationsApplications Usage Information Tightly Linked to its StorageUsage Information Tightly Linked to its Storage Integration of Database - Key ComponentIntegration of Database - Key Component Support Many Representations of ``Same'' InformationSupport Many Representations of ``Same'' Information Promotes Retrieval of Information Geared Towards Promotes Retrieval of Information Geared Towards
User Needs and ResponsibilitiesUser Needs and Responsibilities Gap Exists Between Standalone Programming Gap Exists Between Standalone Programming
Applications and Database SystemsApplications and Database Systems For BMI:For BMI:
Database (Data Warehouse) is a Key Feature Need for Access to Data (De-identified) Need to Share and Interact among Stakeholders
IIE-28
CSE300
DBMS ArchitectureDBMS Architecture DBMS LanguagesDBMS Languages
Data Definition Language (DDL) Data Manipulation Language (DML)
From Embedded Queries or DB Commands Within a Program
“Stand-alone” Query Language Host Language:Host Language:
DML Specification (e.g., SQL) is Embedded in a “Host” Programming Language (e.g., Java, C++)
DBMS InterfacesDBMS Interfaces Menu-Based Interface Graphical Interface Forms-Based Interface Interface for DBA (DB Administrator)
IIE-29
CSE300
ANSI/SPARC - Three Schema ArchitectureANSI/SPARC - Three Schema Architecture External Data Schema (Users’ view)External Data Schema (Users’ view) Conceptual Data Schema (Logical Schema)Conceptual Data Schema (Logical Schema) Internal Data Schema (Physical Schema)Internal Data Schema (Physical Schema)
IIE-30
CSE300
How are these Used for BMI?How are these Used for BMI? Internal Data Schema (Physical Schema)Internal Data Schema (Physical Schema)
Hidden Data Representation for Storage of BMI Data in Proprietary Format
Under the Control of DB System Conceptual Data Schema (Logical Schema)Conceptual Data Schema (Logical Schema)
The Data Model for the BMI Application Access to Schema Controllable via SQL
External Data Schema (Users’ view)External Data Schema (Users’ view) Subsets of the Data Model for Different Users External View for Patients External View for Providers External View for Clinical Researchers Need Ability for a Patient to Control Access to
his/her Own External View
IIE-31
CSE300
Data IndependenceData Independence Ability that Allows Application Programs Not Being Ability that Allows Application Programs Not Being
Affected by Changes in Irrelevant Parts of the Affected by Changes in Irrelevant Parts of the Conceptual Data Representation, Data Storage Conceptual Data Representation, Data Storage Structure and Data Access MethodsStructure and Data Access Methods
Invisibility (Transparency) of the Details of Entire Invisibility (Transparency) of the Details of Entire Database Organization, Storage Structure and Access Database Organization, Storage Structure and Access Strategy to the UsersStrategy to the Users Both Logical and Physical
Recall Software Engineering Concepts:Recall Software Engineering Concepts: Abstraction the Details of an Application's
Components Can Be Hidden, Providing a Broad Perspective on the Design
Representation Independence: Changes Can Be Made to the Implementation that have No Impact on the Interface and Its Users
IIE-32
CSE300
Physical Data IndependencePhysical Data Independence The Ability to Modify the Physical Data The Ability to Modify the Physical Data
Representation Without Causing Application Representation Without Causing Application Programs to Be RewrittenPrograms to Be Rewritten
Examples:Examples: Transparency of the Physical Storage Organization Transparency of Physical Access Paths Numeric Data Representation and Units Character Data Representation Data Coding Physical Data Structure
All of these are Vital for BMI – Particularly if we Use All of these are Vital for BMI – Particularly if we Use Standard to Achieve Application IndependenceStandard to Achieve Application Independence
IIE-33
CSE300
Physical Data IndependencePhysical Data Independence Physical Data Independence is a Measure of How Physical Data Independence is a Measure of How
Much the Internal Schema Can Change Without Much the Internal Schema Can Change Without Affecting the Application ProgramsAffecting the Application Programs
In BMI – Allows us to Plug and Play Different DBMS In BMI – Allows us to Plug and Play Different DBMS Platforms – Extensible and Versatile IntegrationPlatforms – Extensible and Versatile Integration
Physical
IIE-34
CSE300
Logical Data IndependenceLogical Data Independence Transparency of the Entire Database Conceptual Transparency of the Entire Database Conceptual
OrganizationOrganization As a Result:As a Result:
Transparency of Logical Access Strategy Addition of New Entities Removal of Entities Virtual (Derived) Data Items Union of Records
ViewsViews Common Mechanism for Logical Data
Dependency Provide Different Logical Data Contexts to
Different Users Based on Their Needs Update Views vs. Read-Only Views
IIE-35
CSE300
Logical Data IndependenceLogical Data Independence Logical Data Independence is a Measure of How Logical Data Independence is a Measure of How
Much the Conceptual Schema Can Change Without Much the Conceptual Schema Can Change Without Affecting the Application ProgramsAffecting the Application Programs
For BMI – Allows us to Separate End User For BMI – Allows us to Separate End User Applications (Patients, Providers, etc.) from DBApplications (Patients, Providers, etc.) from DB
Logical
IIE-36
CSE300
Classic Information System DesignClassic Information System Design
IIE-37
CSE300
Data vs. InformationData vs. Information
IIE-38
CSE300
Programming Language Systems vs. DBSProgramming Language Systems vs. DBS Similarities and Differences Exist At System Level:Similarities and Differences Exist At System Level:
Shared Resources vs. Shared Data Execution Granularity - Programs vs. Transactions Granularity Difference - Files vs. Instances
Classic Problem of “Impedance Mismatch”Classic Problem of “Impedance Mismatch” Thin Layer of Overlap between PLS (C++, Java,
etc.) and Relational Database System What will Future Bring?
SQL3 with Object-Oriented Extensions XML Databases (Apached Xindice, Sendra, etc.)
Today Tomorrow?
PLS
RDBS
PLS
XML DBS
IIE-39
CSE300
What is Today’s Impedance Mismatch?What is Today’s Impedance Mismatch? Relational Data Organizes Information into Flat FilesRelational Data Organizes Information into Flat Files
Relational Tables with Primary Key High Number of Tuples per Table (1000s & more) Limited Number of Tables (10-50) for Even Large
Size Application Limited Linkages Among Tables (Foreign Keys)
What Does BMI/PHR/EMR Require?What Does BMI/PHR/EMR Require? For Each Patient, Track Multiple Dependencies
Visits per Patient Tests per Patient Prescriptions per Patient
Data Inherently Complex and Interdependent Flattened into Relational Format
IIE-40
CSE300
The Health Care Application - ClassesThe Health Care Application - Classes
IIE-41
CSE300
The Health Care Application - ClassesThe Health Care Application - Classes
IIE-42
CSE300
The Health Care Application - ClassesThe Health Care Application - Classes
IIE-43
CSE300
The Health Care Application - RelationshipsThe Health Care Application - Relationships
IIE-44
CSE300
How Does Mismatch Occur?How Does Mismatch Occur?
Above – Relational TablesAbove – Relational Tables Stage Data from Tables into OO (e.g. Java) format Utilize JDBC What are the Implications/Impacts?
On Left – OO ClassesOn Left – OO Classes Inheritance Dependencies
Programmatic ViewProgrammatic View C++ or Java Usage Staging from DB to OO
Item(Phy_Name*, Date*, Visit_Flag, Symptom, Diagnosis, Treatment, Presc_Flag, Pre_No, Pharm_Name, Medication, Test_Flag, Test_Code, Spec_No, Status, Tech)
IIE-45
CSE300
Implications and ImpactImplications and Impact Three Copies of “Same” Information in DifferentThree Copies of “Same” Information in Different
Database Table (Item) OO Representation – Server Side (Classes) GUI Display – Client Side (html/xml)
What can this Lead to? What can this Lead to?
Item(Phy_Name*, Date*, Visit_Flag, Symptom, Diagnosis, Treatment, Presc_Flag, Pre_No, Pharm_Name, Medication, Test_Flag, Test_Code, Spec_No, Status, Tech)
Dr. D, Jan 01, 08Fever, Flu, Bed RestNo ScriptsNo Tests
IIE-46
CSE300
What is one Possible Solution?What is one Possible Solution? Standards and Usage of XMLStandards and Usage of XML
Consider CDA – Clinical Document Architecture Standard for Clinical (Provider) Medical Record
Clinical Record Organized as:Clinical Record Organized as: <patient_encounter> - location <legal_authenticator> - MD <originating_organization> and <provider> <patient> - name, birthdate, gender <body_confidentiality-”CONF1”> - note
History Past Medical History Medications Allergies Social History Physical Exam Vitals (BP, Resp, Temp, HR) Etc...
IIE-47
CSE300
What is one Possible Solution?What is one Possible Solution? Let’s Explore this in Greater DetailLet’s Explore this in Greater Detail Starting with the CDA HeaderStarting with the CDA Header<?xml version="1.0"?><!DOCTYPE levelone PUBLIC "-//HL7//DTD CDA Level One 1.0//EN" "levelone_1.0.dtd"><levelone> <clinical_document_header> <id EX="a123" RT="2.16.840.1.113883.3.933"/> <set_id EX="B" RT="2.16.840.1.113883.3.933"/> <version_nbr V="2"/> <document_type_cd V="11488-4" S="2.16.840.1.113883.6.1" DN="Consultation note"/> <origination_dttm V="2000-04-07"/> <confidentiality_cd ID="CONF1" V="N" S="2.16.840.1.113883.5.1xxx"/> <confidentiality_cd ID="CONF2" V="R" S="2.16.840.1.113883.5.1xxx"/> <document_relationship> <document_relationship.type_cd V="RPLC"/> <related_document> <id EX="a234" RT="2.16.840.1.113883.3.933"/> <set_id EX="B" RT="2.16.840.1.113883.3.933"/> <version_nbr V="1"/> </related_document> </document_relationship> <fulfills_order> <fulfills_order.type_cd V="FLFS"/> <order><id EX="x23ABC" RT="2.16.840.1.113883.3.933"/></order> <order><id EX="x42CDE" RT="2.16.840.1.113883.3.933"/></order> </fulfills_order>
IIE-48
CSE300
CDA Example - ContinuedCDA Example - Continued
IIE-49
CSE300
CDA Example - ContinuedCDA Example - Continued
IIE-50
CSE300
CDA Example - ContinuedCDA Example - Continued
IIE-51
CSE300
CDA Example - ContinuedCDA Example - Continued
IIE-52
CSE300
CDA Example - ContinuedCDA Example - Continued
IIE-53
CSE300
CDA Example - ContinuedCDA Example - Continued
IIE-54
CSE300
CDA Example - ContinuedCDA Example - Continued
IIE-55
CSE300
CDA Example - ContinuedCDA Example - Continued
IIE-56
CSE300
Information Sharing/Access: Potential PitfallsInformation Sharing/Access: Potential Pitfalls Another Critical Issue is Information SharingAnother Critical Issue is Information Sharing
Perception: How do I see/understand Data/Info? Differences: What is the Reality?
Dealing with Information at Different LevelsDealing with Information at Different Levels Syntax – Format of Information Semantics – Meaning of Information Pragmatics – Usage of Information
When Unifying Databases/Information Repositories, When Unifying Databases/Information Repositories, Must Address all Three!Must Address all Three!
Data Integrity and Data SecurityData Integrity and Data Security Correct and Consistent Values Assurance in All Secure Accesses
For BMI – All of the Above are Critical for Correct For BMI – All of the Above are Critical for Correct Usage and Interpretation in All Contexts (T1, T2, …) Usage and Interpretation in All Contexts (T1, T2, …)
IIE-57
CSE300
Information Syntactic ConsiderationsInformation Syntactic Considerations Syntax is Structure and Format of the Information Syntax is Structure and Format of the Information
That is Needed to Support a CoalitionThat is Needed to Support a Coalition Incorrect Structure or Format Could Result in Simple Incorrect Structure or Format Could Result in Simple
Error Message to Catastrophic EventError Message to Catastrophic Event For Sharing, Strict Formats Need to be MaintainedFor Sharing, Strict Formats Need to be Maintained Health Care Data Suffers from Lack of StandardsHealth Care Data Suffers from Lack of Standards
Standards for Diagnosis (Insurance Industry) Emerging Standards Include:
Health Level 7 (HL7) Based on XML
Formats Non-Standard for Different Health Formats Non-Standard for Different Health Organizations, Insurers, Pharmacy Networks, etc. Organizations, Insurers, Pharmacy Networks, etc. N*N Translations Prone to Errors!
IIE-58
CSE300
Information Semantics ConcernsInformation Semantics Concerns Semantics (Meaning and Interpretation)Semantics (Meaning and Interpretation)
NATO and US - Different Message Formats Distances (Miles vs. Kilometers) Grid Coordinates (Mils, Degrees) Maps (Grid, True, and Magnetic North)
What Can Happen in Health Care Data?What Can Happen in Health Care Data? Possible to Confuse Dosages of Medications? Weight of Patients (Pounds vs. Kilos)? Measurement of Vital Signs? Dana Farber Chemo Death – Checks/Balances What Others are Possible?
IIE-59
CSE300
Syntactic & Semantic ConsiderationsSyntactic & Semantic Considerations What’s Available to Support Information Sharing?What’s Available to Support Information Sharing? How do we Insure that Information can be Accurately How do we Insure that Information can be Accurately
and Precisely Exchanged?and Precisely Exchanged? How do we Associate Semantics with the Information How do we Associate Semantics with the Information
to be Exchanged?to be Exchanged? What Can we Do to Verify the Syntactic Exchange What Can we Do to Verify the Syntactic Exchange
and that Semantics are Maintained?and that Semantics are Maintained? Can Information Exchange Facilitate Federation? Can Information Exchange Facilitate Federation? Can this be Handled Dynamically?Can this be Handled Dynamically? Or, Must we Statically Solve Information Sharing in Or, Must we Statically Solve Information Sharing in
Advance?Advance?
IIE-60
CSE300
Information Pragmatics ConsiderationsInformation Pragmatics Considerations Pragmatics Require that we Totally Understand Pragmatics Require that we Totally Understand
Information Usage and Information MeaningInformation Usage and Information Meaning What are the Critical Information Sources? How will Information Flow Among Them? What Systems Need Access to these Sources? How will that Access be Delivered? Who (People/Roles) will Need to See What When?
How will What a Person Sees Impact Other
Sources? Focus on: Way that Information is Utilized and Focus on: Way that Information is Utilized and
Understood in its Specific ContextUnderstood in its Specific Context Can Medical Info be Misused even if Understood?Can Medical Info be Misused even if Understood?
IIE-61
CSE300
Information Pragmatics ConsiderationsInformation Pragmatics Considerations What are Pragmatics Issues re. Underinsured and What are Pragmatics Issues re. Underinsured and
Uninsured Populations in Event?Uninsured Populations in Event? How Can we Use Info Effectively if we Don’t
Know if it is Complete? Has Info from All Sources Been Collected? What Happens if Same Patient in Different
Repositories Can’t be Reconciled? What if Patient in Unresponsive and Can’t Supply
any Info? Is Usage of Info Complicated due to
Incompleteness? Multiple Locations? Or, if the Event is Major – will all Patient Or, if the Event is Major – will all Patient
Populations Suffer Same Substandard Care?Populations Suffer Same Substandard Care?
IIE-62
CSE300
Collaboration and SecurityCollaboration and Security Two Concepts go Hand in HandTwo Concepts go Hand in Hand Strong ParallelsStrong Parallels
Collaboration Among Providers and Researchers Among Providers and Patients Among Patients (Support Groups)
Security Control of Patient Information (De-identified) Secure Exchange/Patient Ownership Establish Custom Patient Controlled Groups
Let’s Explore them Both via our Semester ProjectLet’s Explore them Both via our Semester Project Also Consider Emergent and Policy IssuesAlso Consider Emergent and Policy Issues
IIE-63
CSE300
Collaboration: Providers and ResearchersCollaboration: Providers and Researchers ProvidersProviders
Seeking new Treatment Plans Looking for Clinical Research Studies for Patients Looking to Communicate with Clinical
Researchers ResearchersResearchers
Publish Evidence-Based Guidelines New Treatments Collect Data on Provider Visits Provide Forum to Discuss with Provider Allow Provider to Upload Anonymous Outcomes
Also – Need to Collaborate Among Researchers of All Also – Need to Collaborate Among Researchers of All Types (Sharepoint, WIKIs, etc.)Types (Sharepoint, WIKIs, etc.)
IIE-64
CSE300
Collaboration: Providers and PatientsCollaboration: Providers and Patients PatientsPatients
Open Personal Health Record to Providers Patients have
Data Entry Facility for Chronic Conditions Ability to Graph and Track their Disease
Education Materials also Available ProvidersProviders
Securely Communicate (email) with Patients (see https://www.relayhealth.com/rh/specific/patients/default.aspx)
Access to Authorized Patient Data Tracking of Patients (to Reduce Office Visits) Proactive Intervention to Head off Potential
Hospitalizations/Problems via Treatment Algorithms to Auto-Notify Based on Data Values
IIE-65
CSE300
Collaboration: Among PatientsCollaboration: Among Patients PatientsPatients
Provide Each with a List of Support Groups Allow them to Join Groups or Form New Groups Secure Communication via:
Email Chatting Environment Link to Actual (Physical Meetings)
Repository of Available Support Groups Overall:Overall:
Patients can Meet other Patients with Same Issues Vital for Patients with Rare Diseases Form On-Line Communities
IIE-66
CSE300
Security: General ConceptsSecurity: General Concepts AuthenticationAuthentication
Proving you are who you are Signing a Message Is the Client who S/he Says they are?
AuthorizationAuthorization Granting/Denying Access Revoking Access Does the Client have Permission to do what S/he
Wants? EncryptionEncryption
Establishing Communications Such that No One but Receiver will Get the Content of the Message
Symmetric Encryption Public Key Encryption
IIE-67
CSE300
Key Security IssuesKey Security Issues Legal and Ethical Issues Legal and Ethical Issues
Information that Must be Protected Information that Must be Accessible
Policy Issues Policy Issues Who Can See What Information When? Applications Limits w.r.t. Data vs. Users?
System Level EnforcementSystem Level Enforcement What is Provided by the DBMS? Programming
Language? OS? Application? How Do All of the Pieces Interact?
Multiple Security Levels/Organizational EnforcementMultiple Security Levels/Organizational Enforcement Mapping Security to Organizational Hierarchy Protecting Information in Organization
IIE-68
CSE300
What are Key Access Control Concepts?What are Key Access Control Concepts? AssuranceAssurance
Are the Security Privileges for Each User Adequate to Support their Activities?
Do the Security Privileges for Each User Meet but Not Exceed their Capabilities?
ConsistencyConsistency Are the Defined Security Privileges for Each User
Internally Consistent? Least-Privilege Principle: Just Enough Access
Are the Defined Security Privileges for Related Users Globally Consistent? Mutual-Exclusion: Read for Some-Write for Others
IIE-69
CSE300
Available Security ApproachesAvailable Security Approaches Mandatory Access Control (MAC)Mandatory Access Control (MAC)
Bell/Lapadula Security Model Security Classification Levels for Data Items Access Based on Security Clearance of User
Role Based Access Control (RBAC)Role Based Access Control (RBAC) Govern Access to Information based on Role Users can Play Different Roles at Different Times
Responsibilities of Users Guiding Factor Facilitate User Interactions while Simultaneously
Protecting Sensitive Data Discretionary Access Control (DAC)Discretionary Access Control (DAC)
Richer Set of Access Modes - Govern Access to Information based on User Id
Discretionary Rules on Access Privileges Focused on Application Needs/Requirements
IIE-70
CSE300
Mandatory Security MechanismMandatory Security Mechanism Typical Security Classification Levels for Typical Security Classification Levels for
Subjects/programs and Objects/resourcesSubjects/programs and Objects/resources Top Secret (TS) and Secret (S) Confidential (C) and Unclassified (U)
Rules:Rules: TS is the Highest and U is the Lowest Level TS > S > C > U Security Levels:
C1 is Security Clearance Given to User U1 C2 is Security Classification Given to Object O1 U1 can Access O1 iff C1 C2 This is Referred to as the Domination of U1 Over O1
Not Prevalent in BMI – But May have RelevanceNot Prevalent in BMI – But May have Relevance
IIE-71
CSE300
Role Based Access Control (RBAC)Role Based Access Control (RBAC) Focuses on Defining Roles of Typical BehaviorFocuses on Defining Roles of Typical Behavior
Nurse, Nurse-Manager, Education-RN Physician, Attending-MD, Specialist Student, Faculty-Advisor, Head Focus on Duties that are Shared
During Authorization of Roles to UsersDuring Authorization of Roles to Users Establish Boundaries of Access User Steve with Role Faculty-Advisor
Limited to Faculty Capabilities on Peoplesoft Only Can Manipulate His Advisees
User Steve with Role Associate Head Possible Overlap in Responsibilities w/ Faculty-
Advisor Other Activities not given to Faculty-Advisor Role
IIE-72
CSE300
Why is RBAC Needed?Why is RBAC Needed? In Health Care, different professionals (e.g., Nurses In Health Care, different professionals (e.g., Nurses
vs. Physicians vs. Administrators, etc.) Require Select vs. Physicians vs. Administrators, etc.) Require Select Access to Sensitive Patient DataAccess to Sensitive Patient Data
Suppose we have a Patient Access ClientSuppose we have a Patient Access Client Lois playing the Nurse Role would be Allowed to
Enter Patient History, Record Vital Signs, etc. Steve playing M.D. Role would be Allowed to do
all of a Nurse plus Write Orders, Enter Scripts, etc. Vicky playing Admin Role would be Allowed to
Enter Demographic/Insurance Info. Role Dictates Client BehaviorRole Dictates Client Behavior
Physician’s Write Scripts Nurses Enter Patient Data (Vitals + History) All Access Shared Medical Record Access is Limited Based on Role
IIE-73
CSE300
Discretionary Access ControlDiscretionary Access Control DiscretionaryDiscretionary
Grant Privileges to Users, Including Capabilities to Access Specific Data Items in a Specific Mode
Available in Most Commercial DBMSs Aspects of DACAspects of DAC
User’s Identity Predefined Discretionary “Rules” Defined by the
Security Administrator Allows User to “Delegate” Capabilities to Another
User Delegate Capabilities and Ability to Delegate
Role Delegation and Delegation AuthorityRole Delegation and Delegation Authority DAC Available in SQL2DAC Available in SQL2
IIE-74
CSE300
What is Role Delegation?What is Role Delegation? Role Delegation, a User-to-User Relationship, Allows Role Delegation, a User-to-User Relationship, Allows
an Original User (OU) to Transfer Responsibility for a an Original User (OU) to Transfer Responsibility for a Particular Role to a Delegated User (DU)Particular Role to a Delegated User (DU)
Two Major Types of DelegationTwo Major Types of Delegation Administratively-directed Delegation has an
Administrative Infrastructure Outside the Direct Control of a User Mediates Delegation
User-directed Delegation has an User (Playing a Role) Determining If and When to Delegate a Role to Another User
In Both, Security Administrators Still Oversee Who In Both, Security Administrators Still Oversee Who Can Do What When w.r.t. DelegationCan Do What When w.r.t. Delegation
IIE-75
CSE300
Why is Role Delegation Important?Why is Role Delegation Important? Many Different Scenarios Under Which Privileges Many Different Scenarios Under Which Privileges
May Want to be Passed to Other IndividualsMay Want to be Passed to Other Individuals Large organizations often require delegation to
meet demands on individuals in specific roles for certain periods of time
True in Many Different Sectors Health Care and Financial Services Engineering and Academic Setting
Example: Reda Delegates Head Role to Steve when Traveling
Key Issues:Key Issues: Who Controls Delegation to Whom? How are Delegation Requirements Enforced?
IIE-76
CSE300
Coalitions for Clinical/Translational ScienceCoalitions for Clinical/Translational Science
UConnHealthCenter
UConnStorrs
DCF,DSS, etc.
SaintFrancis,
CCMC, …
Pfizer Bayer
NIH FDA
NSF
Info. Sharing - Joint R&DSupport T1, T2, and Clinical ResearchCompany and University PartnershipsCollaborative Funding OpportunitiesCohesive and Trusted Environment
Existing Systems/Databases and New Applications
How do you Protect Commercial Interests?Promote Research Advancement? Free Read for Some Data/Limited for Other?Commercialization vs. Intellectual Property?
Balancing Cooperation with Propriety
IIE-77
CSE300
Emergent Public Policy IssuesEmergent Public Policy Issues How do we Protect a Person’s DNA?How do we Protect a Person’s DNA?
Who Owns a Person’s DNA? Who Can Profit from Person’s DNA? Can Person’s DNA be Used to Deny Insurance?
Employment? Etc. How do you Define Security Limitations/Access?
What about i2b2 – Informatics for Integrating Biology What about i2b2 – Informatics for Integrating Biology and the Bedside (see and the Bedside (see https://www.i2b2.org/https://www.i2b2.org/)) Scalable Informatics Framework to Bridge
Clinical Research Data Vast Data Banks for Basic Science Research
Goal: Understand Genetic Bases of Diseases
IIE-78
CSE300
Emergent Public Policy IssuesEmergent Public Policy Issues Can DNA Repositories be Anonymously Available for Can DNA Repositories be Anonymously Available for
Medical Research?Medical Research? Do Societal Needs Trump Individual Rights? Can DNA be Made Available Anonymously for
Medical Research? De-identified Data Repositories Privacy Protecting Data Mining
International Repository Might Allow Medical Researchers Access to Large Enough Data Set for Rare Conditions (e.g., Orphan Drug Act)
Individual Rights vs. Medical AdvancesIndividual Rights vs. Medical Advances
IIE-79
CSE300
Internet and the WebInternet and the Web A Major Opportunity for BusinessA Major Opportunity for Business
A Global Marketplace Business Across State and Country Boundaries
A Way of Extending Services Online Payment vs. VISA, Mastercard
A Medium for Creation of New Services Publishers, Travel Agents, Teller, Virtual Yellow
Pages, Online Auctions … A Boon for AcademiaA Boon for Academia
Research Interactions and Collaborations Free Software for Classroom/Research Usage Opportunities for Exploration of Technologies in
Student Projects What are Implications for BMI? Where is the Adv?What are Implications for BMI? Where is the Adv?
IIE-80
CSE300
IntranetIntranet Decision
support Mfg.. System
monitoring corporate
repositories Workgroups
Server
CorporateNetwork
Server
ServerServer
CorporateNetwork
Internet
InternetInternet Sales Marketing Information Services
Business to BusinessBusiness to Business Information sharing Ordering info./status Targeted electronic
commerce
WWW: Three Market SegmentsWWW: Three Market Segments
Provider Network
Exposure to Outside
Provider Network
IIE-81
CSE300
Information Delivery Problems on the NetInformation Delivery Problems on the Net Everyone can Publish Information on the Web Everyone can Publish Information on the Web
Independently at Any TimeIndependently at Any Time Consequently, there is an Information Explosion Identifying Information Content More Difficult
There are too Many Search Engines but too Few There are too Many Search Engines but too Few Capable of Returning High Quality DataCapable of Returning High Quality Data
Most Search Engines are Useful for Ad-hoc Searches Most Search Engines are Useful for Ad-hoc Searches but Awkward for Tracking Changesbut Awkward for Tracking Changes
What are Information Delivery Issues for BMI?What are Information Delivery Issues for BMI? Publishing of Patient Education Materials Publishing of Provider Education Materials How Can Patients/Providers find what Need? How do they Know if its Relevant? Reputable?
IIE-82
CSE300
Example Web ApplicationsExample Web Applications Scenario 1: World Wide WaitScenario 1: World Wide Wait
A Major Event is Underway and the Latest, Up-to-the Minute Results are Being Posted on the Web
You Want to Monitor the Results for this Important Event, so you Fire up your Trusty Web Browser, Pointing at the Result Posting Site, and Wait, and Wait, and Wait …
What is the Problem?What is the Problem? The Scalability Problems are the Result of a
Mismatch Between the Data Access Characteristics of the Application and the Technology Used to Implement the Application
May not be Relevant to BMI: Hard to Apply ScenarioMay not be Relevant to BMI: Hard to Apply Scenario
IIE-83
CSE300
Example Web ApplicationsExample Web Applications Scenario 2: Scenario 2:
Many Applications Today have the Need for Tracking Changes in Local and Remote Data Sources and Notifying Changes If Some Condition Over the Data Source(s) is Met
To Monitor Changes on Web, You Need to Fire Your Trusty Web Browser from Time to Time, Cache the Most Recent Result, and Difference Manually Each Time You Poll the Data Source(s)
Issue: Pure Pull is Not the Answer to All ProblemsIssue: Pure Pull is Not the Answer to All Problems BMI: If a Patient Enters Data that Sets off a Chain BMI: If a Patient Enters Data that Sets off a Chain
Reaction, how Can Provider be Notified and in Turn Reaction, how Can Provider be Notified and in Turn the Provider Notify the Patient (Bad Health Event)the Provider Notify the Patient (Bad Health Event)
IIE-84
CSE300
What is the Problem?What is the Problem? Applications are Asymmetric but the Web is NotApplications are Asymmetric but the Web is Not
Computation Centric vs. Information Flow Centric Type of AsymmetryType of Asymmetry
Network Asymmetry Satellite, CATV, Mobile Clients, Etc.
Client to Server Ratio Too Many Clients can Swamp Servers
Data Volume Mouse and Key Click vs. Content Delivery
Update and Information Creation Clients Need to be Informed or Must Poll
Clearly, for BMI, Simple Web Environment/Browser Clearly, for BMI, Simple Web Environment/Browser is Not Sufficient – No Auto-Notificationis Not Sufficient – No Auto-Notification
IIE-85
CSE300
What are Information Delivery Styles?What are Information Delivery Styles? Pull-Based SystemPull-Based System
Transfer of Data from Server to Client is Initiated by a Client Pull
Clients Determine when to Get Information Potential for Information to be Old Unless Client
Periodically Pulls Push-Based SystemPush-Based System
Transfer of Data from Server to Client is Initiated by a Server Push
Clients may get Overloaded if Push is Too Frequent
HybridHybrid Pull and Push Combined Pull First and then Push Continually
IIE-86
CSE300
Publish/SubscribePublish/Subscribe Semantics: Servers Publish/Clients SubscribeSemantics: Servers Publish/Clients Subscribe
Servers Publish Information Online Clients Subscribe to the Information of Interest
(Subscription-based Information Delivery) Data Flow is Initiated by the Data Sources
(Servers) and is Aperiodic Danger: Subscriptions can Lead to Other
Unwanted Subscriptions ApplicationsApplications
Unicast: Database Triggers and Active Databases 1-to-n: Online News Groups
May work for Clinical Researcher to Provider PushMay work for Clinical Researcher to Provider Push
IIE-87
CSE300
Design Options for NodesDesign Options for Nodes Three Types of Nodes:Three Types of Nodes:
Data Sources Provide Base Data which is to be Disseminated
Clients Who are the Net Consumers of the Information
Information Brokers Acquire Information from Other Data Sources, Add
Value to that Information and then Distribute this Information to Other Consumers
By Creating a Hierarchy of Brokers, Information Delivery can be Tailored to the Need of Many Users
Brokers may be Ideal Intermediaries for BMI!Brokers may be Ideal Intermediaries for BMI! Act on Behalf of Patients, Providers Incorporate Secure Access
IIE-88
CSE300 Ubiquitous/Pervasive
Many computers and information appliances everywhere,
networked together
Research ChallengesResearch Challenges Inherent Complexity:Inherent Complexity:
Coping with Latency (Sometimes Unpredictable)
Failure Detection and Recovery (Partial Failure)
Concurrency, Load Balancing, Availability, Scale
Service Partitioning Ordering of Distributed Events
““Accidental” Complexity:Accidental” Complexity: Heterogeneity: Beyond the Local
Case: Platform, Protocol, Plus All Local Heterogeneity in Spades.
Autonomy: Change and Evolve Autonomously
Tool Deficiencies: Language Support (Sockets,rpc), Debugging, Etc.
IIE-89
CSE300
Problem: too many sources,too much information
Internet:Information Jungle
Clean, Reliable,Timely Information,Anywhere
DigitalEarth
Sensors
PersonalizedFiltering &Info. Delivery
Infopipes
Resou
rce A
dapta
tion Property Mgmt
Information QualityContinual Queries
Mic
rofe
edba
ck
specializationInfosphereInfosphere
IIE-90
CSE300
ThinClient
WebServer
MainframeDatabaseServer
Current State-of-ArtCurrent State-of-Art
IIE-91
CSE300 Infotaps &
Fat Clients
Varietyof Servers
Sensors
DatabaseServer
Many sources
Infosphere Scenario – for BMIInfosphere Scenario – for BMI
IIE-92
CSE300
Heterogeneity and AutonomyHeterogeneity and Autonomy Heterogeneity:Heterogeneity:
How Much can we Really Integrate? Syntactic Integration
Different Formats and Models Web/SQL Query Languages
Semantic Interoperability Basic Research on Ontology, Etc
AutonomyAutonomy No Central DBA on the Net Independent Evolution of Schema and Content Interoperation is Voluntary Interface Technology (Support for Isvs)
DCOM: Microsoft Standard CORBA, Etc...
IIE-93
CSE300
Security and Data QualitySecurity and Data Quality SecuritySecurity
System Security in the Broad Sense Attacks: Penetrations, Denial of Service System (and Information) Survivability
Security Fault Tolerance Replication for Performance, Availability, and
Survivability Data QualityData Quality
Web Data Quality Problems Local Updates with Global Effects Unchecked Redundancy (Mutual Copying) Registration of Unchecked Information Spam on the Rise
IIE-94
CSE300
Legacy Data ChallengeLegacy Data Challenge Legacy Applications and DataLegacy Applications and Data
Definition: Important and Difficult to Replace Typically, Mainframe Mission Critical Code Most are OLTP and Database Applications
Evolution of Legacy DatabasesEvolution of Legacy Databases Client-server Architectures Wrappers Expensive and Gradual in Any Case
IIE-95
CSE300
Potential Value Added/Jumping on BandwagonPotential Value Added/Jumping on Bandwagon Sophisticated Query CapabilitySophisticated Query Capability
Combining SQL with Keyword Queries Consistent UpdatesConsistent Updates
Atomic Transactions and Beyond But Everything has to be in a Database!But Everything has to be in a Database!
Only If we Stick with Classic DB Assumptions Relaxing DB AssumptionsRelaxing DB Assumptions
Interoperable Query Processing Extended Transaction Updates
Commodities DB SoftwareCommodities DB Software A Little Help is Still Good If it is Cheap Internet Facilitates Software Distribution Databases as Middleware
IIE-96
CSE300
Data Warehousing and Data MiningData Warehousing and Data Mining Data WarehousingData Warehousing
Provide Access to Data for Complex Analysis, Knowledge Discovery, and Decision Making
Underlying Infrastructure in Support of Mining Provides Means to Interact with Multiple DBs OLAP (on-Line Analytical Processing) vs. OLTP
Data MiningData Mining Discovery of Information in a Vast Data Sets Search for Patterns and Common Features based Discover Information not Previously Known
Medical Records Accessible Nationwide Research/Discover Cures for Rare Diseases
Relies on Knowledge Discovery in DBs (KDD)
IIE-97
CSE300
Data Warehousing and OLAPData Warehousing and OLAP A Data Warehouse A Data Warehouse
Database is Maintained Separately from an Operational Database
“A Subject-Oriented, Integrated, Time-Variant, and Non-Volatile Collection of Data in Support for Management’s Decision Making Process [W.H.Inmon]”
OLAP (on-Line Analytical Processing)OLAP (on-Line Analytical Processing) Analysis of Complex Data in the Warehouse Attempt to Attain “Value” through Analysis Relies on Trained and Adept Skilled Knowledge
Workers who Discover Information Data MartData Mart
Organized Data for a Subset of an Organization Establish De-Identified Marts for BMI Research
IIE-98
CSE300
Corporate data warehouse
Data Mart Data MartData MartData Mart
Corporate data
Option 1:Consolidate Data Marts
Option 2:Build from scratch
...
Building a Data WarehouseBuilding a Data Warehouse Option 1Option 1
Leverage Existing Repositories
Collate and Collect May Not Capture All
Relevant Data
Option 2Option 2 Start from Scratch Utilize Underlying
Corporate Data
IIE-99
CSE300
BMI data warehouse
Data Mart Data MartData MartData Mart...
BMI – Partition/Excerpt Data WarehouseBMI – Partition/Excerpt Data Warehouse Clinical and Epidemiological Research (and for T2 and T1) Clinical and Epidemiological Research (and for T2 and T1)
Each Study Submitted to Institutional Review Board (IRB)Each Study Submitted to Institutional Review Board (IRB) For Human Subjects (Assess Risks, Protect Privacy) See: http://resadm.uchc.edu/hspo/irb/
To Satisfy IRB (and Privacy, Security, etc.), Reverse Process to To Satisfy IRB (and Privacy, Security, etc.), Reverse Process to Create a Data Mart for each Approved StudyCreate a Data Mart for each Approved Study Export/Excerpt Study Data from Warehouse May be Single or Multiple Sources
IIE-100
CSE300
Data Warehouse CharacteristicsData Warehouse Characteristics Utilizes a “Multi-Dimensional” Data ModelUtilizes a “Multi-Dimensional” Data Model Warehouse Comprised ofWarehouse Comprised of
Store of Integrated Data from Multiple Sources Processed into Multi-Dimensional Model
Warehouse Supports ofWarehouse Supports of Times Series and Trend Analysis “Super-Excel” Integrated with DB Technologies
Data is Less Volatile than Regular DB Data is Less Volatile than Regular DB Doesn’t Dramatically Change Over Time Updates at Regular Intervals Specific Refresh Policy Regarding Some Data
IIE-101
CSE300
External data sources
metadata
Operational databasesExtraxtTransformLoadRefresh
monitor
integrator
Data Warehouse
Data marts
OLAP Server
Summarizationreport
Query report
Data mining
serve
Three Tier ArchitectureThree Tier Architecture
IIE-102
CSE300
Data Warehouse DesignData Warehouse Design Most of Data Warehouses use a Start Schema to Most of Data Warehouses use a Start Schema to
Represent Multi-Dimensional Data ModelRepresent Multi-Dimensional Data Model Each Dimension is Represented by a Each Dimension is Represented by a Dimension Dimension
TableTable that Provides its Multidimensional Coordinates that Provides its Multidimensional Coordinates and Stores Measures for those Coordinatesand Stores Measures for those Coordinates
A A Fact TableFact Table Connects All Dimension Tables with a Connects All Dimension Tables with a Multiple JoinMultiple Join Each Tuple in Fact Table Represents the Content
of One Dimension Each Tuple in the Fact Table Consists of a Pointer
to Each of the Dimensional Tables Links Between the Fact Table and the Dimensional
Tables for a Shape Like a Star
IIE-103
CSE300
What is a Multi-Dimensional Data Cube?What is a Multi-Dimensional Data Cube? Representation of Information in Two or More Representation of Information in Two or More
DimensionsDimensions Typical Two-Dimensional - SpreadsheetTypical Two-Dimensional - Spreadsheet In Practice, to Track Trends or Conduct Analysis, In Practice, to Track Trends or Conduct Analysis,
Three or More Dimensions are UsefulThree or More Dimensions are Useful For BMI – Axes for Diagnosis, Drug, Subject AgeFor BMI – Axes for Diagnosis, Drug, Subject Age
IIE-104
CSE300
Multi-Dimensional SchemasMulti-Dimensional Schemas Supporting Multi-Dimensional Schemas Requires Supporting Multi-Dimensional Schemas Requires
Two Types of Tables:Two Types of Tables: Dimension Table: Tuples of Attributes for Each
Dimension Fact Table: Measured/Observed Variables with
Pointers into Dimension Table Star SchemaStar Schema
Characterizes Data Cubes by having a Single Fact Table for Each Dimension
Snowflake SchemaSnowflake Schema Dimension Tables from Star Schema are
Organized into Hierarchy via Normalization Both Represent Storage Structures for CubesBoth Represent Storage Structures for Cubes
IIE-105
CSE300
Date
Product
Store
Customer
Unit_Sales
Dollar_Sales
ProductNoProdNameProdDescCategoryu
Product
CustIDCustNameCustCityCustCountry
Customer
DateMonthYear
Date
StoreIDCityStateCountryRegion
Store
Sale Fact Table
Example of Star SchemaExample of Star Schema
IIE-106
CSE300
Visit Date
Vitals
Symptoms
Patient
Medications
Etc.
BPTempRespHR (Pulse)
Vitals
PatientIDPatientNamePatientCityPatientCountry
Patient
DateMonthYear
Date
PulmonaryHeartMus-SkelSkinDigestive
Symptoms
Patient Fact Table
Example of Star Schema for BMIExample of Star Schema for BMI
Reference another StarSchema for all Meds
IIE-107
CSE300
A Second Example of Star Schema … A Second Example of Star Schema …
IIE-108
CSE300
and Corresponding Snowflake Schemaand Corresponding Snowflake Schema
IIE-109
CSE300
Data Warehouse IssuesData Warehouse Issues Data AcquisitionData Acquisition
Extraction from Heterogeneous Sources Reformatted into Warehouse Context - Names,
Meanings, Data Domains Must be Consistent Data Cleaning for Validity and Quality
is the Data as Expected w.r.t. Content? Value? Transition of Data into Data Model of Warehouse Loading of Data into the Warehouse
Other Issues Include:Other Issues Include: How Current is the Data? Frequency of Update? Availability of Warehouse? Dependencies of Data? Distribution, Replication, and Partitioning Needs? Loading Time (Clean, Format, Copy, Transmit,
Index Creation, etc.)? For CTSA – Data Ownership (Competing Hosps).
IIE-110
CSE300
Knowledge DiscoveryKnowledge Discovery Data Warehousing Requires Knowledge Discovery to Data Warehousing Requires Knowledge Discovery to
Organize/Extract Information MeaningfullyOrganize/Extract Information Meaningfully Knowledge DiscoveryKnowledge Discovery
Technology to Extract Interesting Knowledge (Rules, Patterns, Regularities, Constraints) from a Vast Data Set
Process of Non-trivial Extraction of Implicit, Previously Unknown, and Potentially Useful Information from Large Collection of Data
Data MiningData Mining A Critical Step in the Knowledge Discovery
Process Extracts Implicit Information from Large Data Set
IIE-111
CSE300
Steps in a KDD ProcessSteps in a KDD Process Learning the Application Domain (goals)Learning the Application Domain (goals) Gathering and Integrating DataGathering and Integrating Data Data CleaningData Cleaning Data IntegrationData Integration Data Transformation/ConsolidationData Transformation/Consolidation Data MiningData Mining
Choosing the Mining Method(s) and Algorithm(s) Mining: Search for Patterns or Rules of Interest
Analysis and Evaluation of the Mining ResultsAnalysis and Evaluation of the Mining Results Use of Discovered Knowledge in Decision MakingUse of Discovered Knowledge in Decision Making Important CaveatsImportant Caveats
This is Not an Automated Process! Requires Significant Human Interaction!
IIE-112
CSE300
OLAP StrategiesOLAP Strategies OLAP Strategies OLAP Strategies
Roll-Up: Summarization of Data Drill-Down: from the General to Specific (Details) Pivot: Cross Tabulate the Data Cubes Slide and Dice: Projection Operations Across
Dimensions Sorting: Ordering Result Sets Selection: Access by Value or Value Range
Implementation IssuesImplementation Issues Persistent with Infrequent Updates (Loading) Optimization for Performance on Queries is More
Complex - Across Multi-Dimensional Cubes Recovery Less Critical - Mostly Read Only Temporal Aspects of Data (Versions) Important
IIE-113
CSE300
Product
Product Store Date Sale
acron Rolla,MO 7/3/99 325.24
budwiser LA,CA 5/22/99 833.92
large pants NY,NY 2/12/99 771.24
3’ diaper Cuba,MO 7/30/99 81.99
PantsDiapers
BeerNuts
West
East
Central
Mountain
South
Jan Feb March April
Date
Region
On-Line Analytical ProcessingOn-Line Analytical Processing Data CubeData Cube
A Multidimensonal Array Each Attribute is a Dimension
In Example Below, the Data Must be Interpreted so In Example Below, the Data Must be Interpreted so that it Can be Aggregated by Region/Product/Datethat it Can be Aggregated by Region/Product/Date
IIE-114
CSE300
Medication
Patient Med BirthDat Dosage
Steve Lipitor 1/1/45 10mg
John Zocor 2/2/55 80mg
Harry Crestor 3/3/65 5mg
Lois Lipitor 4/4/66 20mg
Charles Crestor 7/1/59 10mg
LescolCrestor
ZocorLipitor
5
10
20
40
80
1940s 1950s 1960s 1970s
Decade
Dosage
On-Line Analytical ProcessingOn-Line Analytical Processing For BMI – Imagine a Data Table with Patient DataFor BMI – Imagine a Data Table with Patient Data
Define Axis Summarize Data Create Perspective to Match Research Goal Essentially De-identified Data Mart
IIE-115
CSE300
Months
Cities
Prod
ucts
Sal
es
Multi-Dimensional Data Cube
Months
Cities
Prod
ucts
Sal
es
Slice on city Atlanta
Examples of Data MiningExamples of Data Mining The Slicing ActionThe Slicing Action
A Vertical or Horizontal Slice Across Entire Cube
IIE-116
CSE300
March 2000
Atla
nta
Electronics Dice on Electronics and Atlanta
Months
Cities
Prod
ucts
Sal
es
Examples of Data MiningExamples of Data Mining The Dicing ActionThe Dicing Action
A Slide First Identifies on Dimension A Selection of Any Cube within the Slice which Essentially
Constrains All Three Dimensions
Prod
ucts
Sal
es
Months
Atlanta
IIE-117
CSE300
Examples of Data MiningExamples of Data Mining
Drill Down - Takes a Facet (e.g., Q1) Drill Down - Takes a Facet (e.g., Q1) and Decomposes into Finer Detail and Decomposes into Finer Detail
Q1 Q2 Q3 Q4
Location (city, GA)
Pro
duct
s Sa
les
Jan Feb March
Citi
esP
rodu
cts
Sale
s
Drill down on Q1
Roll Up on Location(State, USA)
Atlanta
Columbus
Gainesville
Savannah
Q1 Q2 Q3 Q4
Pro
duct
s Sa
les
Arizona
CaliforniaGeorgiaIowa
Roll Up: Combines Multiple DimensionsRoll Up: Combines Multiple DimensionsFrom Individual Cities to StateFrom Individual Cities to State
IIE-118
CSE300
Time series data
Geographical and Satellite Data
Spatial databases
Multimedia databases
World Wide Web
Mining Other Types of DataMining Other Types of Data Analysis and Access Dramatically More Complicated!Analysis and Access Dramatically More Complicated! Time Series Data for Glucose, BP, Peak Flow, etc.Time Series Data for Glucose, BP, Peak Flow, etc.
IIE-119
CSE300
Advantages/Objectives of Data MiningAdvantages/Objectives of Data Mining Descriptive MiningDescriptive Mining
Discover and Describe General Properties 60% People who buy Beer on Friday also have
Bought Nuts or Chips in the Past Three Months Predictive MiningPredictive Mining
Infer Interesting Properties based on Available Data
People who Buy Beer on Friday usually also Buy Nuts or Chips
Result of MiningResult of Mining Order from Chaos Mining Large Data Sets in Multiple Dimensions
Allows Businesses, Individuals, etc. to Learn about Trends, Behavior, etc.
Impact on Marketing Strateg
IIE-120
CSE300
Data Mining Methods (1)Data Mining Methods (1) AssociationAssociation
Discover the Frequency of Items Occurring Together in a Transaction or an Event
Example 80% Customers who Buy Milk also Buy Bread
Hence - Bread and Milk Adjacent in Supermarket 50% of Customers Forget to Buy Milk/Soda/Drinks
Hence - Available at Register PredictionPrediction
Predicts Some Unknown or Missing Information based on Available Data
Example Forecast Sale Value of Electronic Products for Next
Quarter via Available Data from Past Three Quarters
IIE-121
CSE300
Association RulesAssociation Rules Motivated by Market AnalysisMotivated by Market Analysis Rules of the Form Rules of the Form
Item1^Item2^…^ ItemkItemk+1 ^ … ^ Itemn ExampleExample
“Beer ^ Soft Drink Pop Corn” Problem: Discovering All Interesting Association Problem: Discovering All Interesting Association
Rules in a Large Database is Difficult!Rules in a Large Database is Difficult! Issues
Interestingness Completeness Efficiency
Basic Measurement for Association Rules Support of the Rule Confidence of the Rule
IIE-122
CSE300
Data Mining Methods (2)Data Mining Methods (2) ClassificationClassification
Determine the Class or Category of an Object based on its Properties
Example Classify Companies based on the Final Sale Results in
the Past Quarter ClusteringClustering
Organize a Set of Multi-dimensional Data Objects in Groups to Minimize Inter-group Similarity is and Maximize Intra-group Similarity
Example Group Crime Locations to Find Distribution Patterns
IIE-123
CSE300
ClassificationClassification Two StagesTwo Stages
Learning Stage: Construction of a Classification Function or Model
Classification Stage: Predication of Classes of Objects Using the Function or Model
Tools for ClassificationTools for Classification Decision Tree Bayesian Network Neural Network Regression
ProblemProblem Given a Set of Objects whose Classes are Known
(Training Set), Derive a Classification Model which can Correctly Classify Future Objects
IIE-124
CSE300
AttributesAttributes
Class Attribute - Play/Don’t Play the GameClass Attribute - Play/Don’t Play the Game Training SetTraining Set
Values that Set the Condition for the Classification What are the Pattern Below?
Attribute Possible Valuesoutlook sunny, overcast, raintemperature continuoushumidity continuouswindy true, false
Outlook Temperature Humidity Windy Playsunny 85 85 false Noovercast 83 78 false Yessunny 80 90 true Nosunny 72 95 false Nosunny 72 70 false Yes… … … … ...
An ExampleAn Example
IIE-125
CSE300
Data Mining Methods (3)Data Mining Methods (3) SummarizationSummarization
Characterization (Summarization) of General Features of Objects in the Target Class
Example Characterize People’s Buying Patterns on the Weekend Potential Impact on “Sale Items” & “When Sales Start” Department Stores with Bonus Coupons
DiscriminationDiscrimination Comparison of General Features of Objects
Between a Target Class and a Contrasting Class Example
Comparing Students in Engineering and in Art Attempt to Arrive at Commonalities/Differences
IIE-126
CSE300
barcode category brand content size
14998 milk diaryland Skim 2L
12998 mechanical MotorCraft valve 23a 12in
… … … … ...
food
Milk … bread
Skim milk … 2% milk White whole bread … wheat
Lucern … DairylandWonder … Safeway
Category Content Count
milk skim 280milk 2% 98… … ...
Summarization TechniqueSummarization Technique Attribute-Oriented Induction Attribute-Oriented Induction Generalization using Concert hierarchy (Taxonomy)Generalization using Concert hierarchy (Taxonomy)
IIE-127
CSE300
Why is Data Mining Popular?Why is Data Mining Popular? Technology PushTechnology Push
Technology for Collecting Large Quantity of Data Bar Code, Scanners, Satellites, Cameras
Technology for Storing Large Collection of Data Databases, Data Warehouses Variety of Data Repositories, such as Virtual Worlds,
Digital Media, World Wide Web Corporations want to Improve Direct Marketing and Corporations want to Improve Direct Marketing and
Promotions - Driving Technology AdvancesPromotions - Driving Technology Advances Targeted Marketing by Age, Region, Income, etc. Exploiting User Preferences/Customized Shopping
What is Potential for BMI?What is Potential for BMI? How do you see Data Mining Utilized? What are Key Issues to Worry About?
IIE-128
CSE300
Requirements & Challenges in Data MiningRequirements & Challenges in Data Mining Security and Social Security and Social
What Information is Available to Mine? Preferences via Store Cards/Web Purchases What is Your Comfort Level with Trends?
User Interfaces and VisualizationUser Interfaces and Visualization What Tools Must be Provided for End Users of
Data Mining Systems? How are Results for Multi-Dimensional Data
Displayed? Performance GuaranteesPerformance Guarantees
Range from Real-Time for Some Queries to Long-Term for Other Queries
Data Sources of Complex Data Types or Unstructured Data Sources of Complex Data Types or Unstructured Data - Ability to Format, Clean, and Load Data SetsData - Ability to Format, Clean, and Load Data Sets
IIE-129
CSE300
Concluding RemarksConcluding Remarks We’ve looked at:We’ve looked at:
Informatics Information Engineering Information Usage and Repositories
Focused on Their Applicability and Relevance for Focused on Their Applicability and Relevance for BMIBMI
Likely Generated More Questions than AnswersLikely Generated More Questions than Answers