swea1 cse 5095 software and enterprise architectures prof. steven a. demurjian, sr. computer science...
Post on 22-Dec-2015
217 views
TRANSCRIPT
SWEA1
CSE5095
Software and Enterprise ArchitecturesSoftware and Enterprise Architectures
Prof. Steven A. Demurjian, Sr.Computer Science & Engineering Department
The University of Connecticut371 Fairfield Road, Box U-255
Storrs, CT 06269-2155
[email protected]://www.engr.uconn.edu/
~steve(860) 486 - 4818
Copyright © 2008 by S. Demurjian, Storrs, CT.
SWEA2
CSE5095
Software ArchitecturesSoftware Architectures Emerging Discipline in Mid-1990sEmerging Discipline in Mid-1990s Software as Collection of Interacting ComponentsSoftware as Collection of Interacting Components What are Local Interactions (within Component)?What are Local Interactions (within Component)? What are Global Interactions (between Components)?What are Global Interactions (between Components)? Advantages of SW Architectural DesignAdvantages of SW Architectural Design
Understand Communication/Synchronization Definition of Database Requirements Identification of Performance/Scaling Issues Detailing of Security Needs and Constraints
Towards Large-Scale Software DevelopmentTowards Large-Scale Software Development For Biomedical Informatics:For Biomedical Informatics:
What are Architectures for Data Sharing? How is Interoperability Facilitated?
SWEA3
CSE5095
Concepts of Software ArchitecturesConcepts of Software Architectures Exceed Traditional Algorithm/Data Structure Exceed Traditional Algorithm/Data Structure
PerspectivePerspective Emphasize Componentwise Organization and System Emphasize Componentwise Organization and System
FunctionalityFunctionality Focus on Global and Local InteractionsFocus on Global and Local Interactions Identify Communication/Synchronization Identify Communication/Synchronization
RequirementsRequirements Define Database Needs and DependenciesDefine Database Needs and Dependencies Consider Performance/Scaling IssuesConsider Performance/Scaling Issues Understand Potential Evolution DimensionsUnderstand Potential Evolution Dimensions
SWEA4
CSE5095
The HTSS Software ArchitectureThe HTSS Software Architecture
ICICICIC
CRCRCRCR
CRCR
CRCR
ILILILIL
ILIL
SDOSDO
SDOSDO EDOEDO
EDOEDO
OrderOrder
PaymentPayment
ItemItemItemDBItemDBLocalLocalServerServer
Non-LocalClient Int.
InventoryInventoryControlControl
ItemDBItemDBGlobalGlobalServerServer
OrderDBOrderDB
SupplierDBSupplierDB
CreditCardDBCreditCardDB
ATM-BanKDBATM-BanKDB
IL: Item LocatorIL: Item LocatorCR: Cash RegisterCR: Cash RegisterIC: Invent. ControlIC: Invent. ControlDO: Deli Orderer forDO: Deli Orderer for Shopper/EmployeeShopper/Employee
SWEA5
CSE5095
Multiple Backend Database System (MBDS)Multiple Backend Database System (MBDS)
DatabaseController
BackendDatabase Processor
BackendDatabaseProcessor
BackendDatabase Processor
Host/User
SWEA6
CSE5095
The MBDS ProcessesThe MBDS Processes
Get Msg.Put Msg.
RequestPreparation
Post Processing
Get Msg. Put Msg.
DirectoryManagement
Record Processing
ConcurrencyControl
Disk I/O
DatabaseController
BackendDatabase Processor
SWEA7
CSE5095
Multiple Processes in MBDSMultiple Processes in MBDS
No. Type SRC DST1 New Request Host ReqP2 Results of Request PoPr Host3 Number of Reqs in Transaction ReqP PoPr4 Aggregate Operators (Sum, etc.) ReqP PoPr6 Parsed Request to Backends ReqP DM12 Backend Aggregate Operator Results RecP PoPr15 Ids for Accessing Database Indexes DM DMs16 Request and Disk Addresses DM RecP21 Ids for Accessing Database Records DM CC22 Locks Obtained: Okay to Execute CC RecP23 Request ID of Finished Request RecP CC
SWEA8
CSE5095
Message Passing in MBDSMessage Passing in MBDS
Get Msg.Put Msg.
RequestPreparation
Post Processing
Get Msg. Put Msg.
DirectoryManagement
Record Processing
ConcurrencyControl Disk I/O
F15 FromOther
BackendE15 To Backend(s)
A1 B3
C4D6
D6,F15 E15
G21 H22
I16
J23
K12
K12
K12
SWEA9
CSE5095
Software Design LevelsSoftware Design Levels Architecturally:Architecturally:
Modules Interconnections Among Modules Decomposition into Subsystems
Code:Code: Algorithms/Data Structures Tasking/Control Threads
Executable:Executable: Memory Management Runtime Environment
Is this a Realistic/Accurate View?Is this a Realistic/Accurate View? Yes for a Single “Application” What about Application of Applications? System of Systems?
SWEA10
CSE5095
Software Engineering - an Oxymoron?Software Engineering - an Oxymoron? Is there any Engineering?Is there any Engineering? Is there any Science?Is there any Science? Collection of Disparate Techniques:Collection of Disparate Techniques:
Data-Flow Diagrams E-R Diagrams Finite State Machines Petri Nets UML Class, Object, Sequence, Etc. Design Patterns Model Drive Architectures
What is being “Engineered”?What is being “Engineered”? How do we Know we are Done?How do we Know we are Done?
E.g. Does Artifact Match Specification?
SWEA11
CSE5095
What's Available for Engineering Software?What's Available for Engineering Software? Specification (Abstract Models, Algebraic Semantics)Specification (Abstract Models, Algebraic Semantics) Software Structure (Bundling Representation with Software Structure (Bundling Representation with
Algorithms)Algorithms) Languages Issues (Models, Scope, User-Defined Languages Issues (Models, Scope, User-Defined
Types)Types) Information Hiding (Protect Integrity of Information)Information Hiding (Protect Integrity of Information) Integrity Constraints (Invariants of Data Structures)Integrity Constraints (Invariants of Data Structures) Is this up to date? Is this up to date? What else can be Added to List?What else can be Added to List?
Design Patters Model Driven Architectures XML –Data Modeling and Dependencies Others?
SWEA12
CSE5095
Engineering Success in ComputingEngineering Success in Computing Compilers Have Had Great SuccessCompilers Have Had Great Success
Originally by Hand Then Compiler Compilers Parser Generators - Lex/Yacc
Solid Science Behind CompilersSolid Science Behind Compilers Regular, Context Free, Context Sensitive
Languages FSAs, PDAs, CFGs, etc.
Science has Provided Engineering Success re. Ease Science has Provided Engineering Success re. Ease and Accuracy of Modern Compiler Writingand Accuracy of Modern Compiler Writing
SWEA13
CSE5095
History of ProgrammingHistory of Programming C - Still Remains Industry StronghorseC - Still Remains Industry Stronghorse
Separate Compilation Decomposition of System into Subsystems, etc. Shared Declarations ADTs in C, But Compiler won't Enforce Them
Modula-II and Ada 83 HadModula-II and Ada 83 Had Information Hiding Public/Private Paradigm Module/Package Concepts Import/Export Paradigm
Rigor Enforced by Compiler – but Can’tRigor Enforced by Compiler – but Can’t Bind/Group Modules into Subsystems Precisely Specify Interconnections and Interactions
Among Subsystems and Components
SWEA14
CSE5095
‘‘Recent-Past’ Generation?Recent-Past’ Generation? C++ and Ada95C++ and Ada95
Considered “Legacy” Languages - Old Java, C# - Are they Headed Toward Legacy?Java, C# - Are they Headed Toward Legacy?
How do they Rate? What Do they Offer that Hasn't been Offered
Before? What are Unique Benefits and Potential of Java?
What about new Web Technologies?What about new Web Technologies? Javascript, Perl, PhP, Phython, Ruby XML and SOAP How do all of these fit into this process? Particularly in Regards to C/S Solutions!
SWEA15
CSE5095
What's Next Step?What's Next Step? Architectural Description LanguagesArchitectural Description Languages
Provide Tools to Describe Architectures Definition and Communication
Codification of Architectural ExpertiseCodification of Architectural Expertise Frameworks for Specific DomainsFrameworks for Specific Domains DB vs. GUI vs. Embedded vs. C/SDB vs. GUI vs. Embedded vs. C/S Formal Underpinning for Engineering RigorFormal Underpinning for Engineering Rigor What has Appeared for Each of these?What has Appeared for Each of these?
Struts for GUI Open Source Frameworks (mediawiki) Wide-Ranging Standards (XML) Model-Driven Architectures What Else???
SWEA16
CSE5095
Architectural StylesArchitectural Styles What are Popular Architectural Styles?What are Popular Architectural Styles?
How are they Characterized? Example in Practice
Explore a Taxonomy of StylesExplore a Taxonomy of Styles Focus on “Micro-Architectures”Focus on “Micro-Architectures”
Components Flow Among Components Represents “Single” Application
Forms Basis for “Macro-Architectures”Forms Basis for “Macro-Architectures” System of Systems Application of Applications Significantly Scaling Up
SWEA17
CSE5095
Taxonomy of Architectural StylesTaxonomy of Architectural Styles Data Flow SystemsData Flow Systems
Batch Sequential Pipes and Filters
Call & Return SystemsCall & Return Systems Main/Subroutines
(C, Pascal) Object Oriented Implicit Invocation Hierarchical Systems
Virtual MachinesVirtual Machines Interpreters Rule Based Systems
Data Centered SystemsData Centered Systems DBS Hypertext Blackboards
Independent Independent ComponentsComponents Communicating
Processes/Event Systems
Client/ServerClient/Server Two-Tier Multi-Tier
SWEA18
CSE5095
Taxonomy of Architectural StylesTaxonomy of Architectural Styles Establish Framework of … Establish Framework of …
Components Building Blocks for Constructing Systems A Major Unit of Functionality Examples Include: Client, Server, Filter, Layer, DB
Connectors Defining the Ways that Components Interact What are the Protocols that Mandate the Allowable
Interactions Among Components? How are Protocols Enforced at Run/Design Time? Examples Include: Procedure Call, Event Broadcast,
DB Protocol, Pipe
SWEA19
CSE5095
Overall FrameworkOverall Framework What Is the Design Vocabulary?What Is the Design Vocabulary?
Connectors and Components What Are Allowable Structural Patterns?What Are Allowable Structural Patterns?
Constraints on Combining Components & Connectors
What Is the Underlying Conceptual Model?What Is the Underlying Conceptual Model? Von Newman, Parallel, Agent, Message-Passing… Are their New Emerging Models? Collaborative Environments/Shareware?
What Are Essential Invariants of a Style?What Are Essential Invariants of a Style? Limits on Allowable Components & Connectors
Common Examples of UsageCommon Examples of Usage Advantages and Disadvantages of a StyleAdvantages and Disadvantages of a Style Common Specializations of a StyleCommon Specializations of a Style
SWEA20
CSE5095
Pipes and FiltersPipes and Filters
Filters:Filters: Invariant: Unaware of up and Down Stream
Behavior Streamed Behavior: Output Could Go From
One Filter to the Next One Allowing Multiple Filters to Run in Parallel.
SortSort
SortSort
MergeMerge
Connectors for Flow Streams of I/O
Components with Input and Output
Components are IndependentEntities. No Shared State!
SWEA21
CSE5095
Pipes and FiltersPipes and Filters Possible Specializations:Possible Specializations:
Pipelines - Linear Sequence Bounded - Limits on Data Amounts Typed Pipes - Known Data Format
What is a Classic Example?What is a Classic Example? Other Examples:Other Examples:
Compilers Sequential Processes Parallel Processes
SWEA22
CSE5095
Pipes and Filters - Another ExamplePipes and Filters - Another Example Text Information Retrieval Systems Text Information Retrieval Systems
Scanning Newspapers for Key Words, Etc. Also, Boolean Search Expressions
Where is Such an Architecture Utilized Today?Where is Such an Architecture Utilized Today? What is Potential Usage in BMI?What is Potential Usage in BMI?
User
SearchSearchControllerController
SearchSearchDBDB
QueryQueryResolverResolver
Term Term ComparatorComparator
DiskDiskControllerController
CommandsCommands
ProgrammingProgrammingControlControl
ResultResult
DataData
SWEA23
CSE5095
ADTs and OO ArchitecturesADTs and OO Architectures Widespread Usage in the 1990’sWidespread Usage in the 1990’s Advantages Are Well Known Advantages Are Well Known
Disadvantages:Disadvantages: Interaction Required Object Identity If Identity Changes, It Is Difficult to Track All
Affected Objects.
obj
obj obj
obj obj
objobj
obj
op
opopop
op
op op
op
opop op
opop
Connectors
Components
SWEA24
CSE5095
Implicit InvocationImplicit Invocation Similar to OO in the Sense that Components Can Call Similar to OO in the Sense that Components Can Call
Services on Other ComponentsServices on Other Components How Does this Work?How Does this Work?
Components Have List of Events they can Raise and List of Procedures to Handle Events
When Event is Raised, it is Broadcast All Components that Have Procedure to Handle
Broadcast Event will Act Upon it The Component That Raised the Event has no
Knowledge of Which Component(s) will Handle Event
What are Some Examples?What are Some Examples?
SWEA25
CSE5095
Implicit InvocationImplicit Invocation AdvantagesAdvantages
No Need to Know the Targeted Components Single Event can Impact Multiple Components New Event Handlers can Easily be Added New Events Can then be Raised
DisadvantagesDisadvantages No Control Over the Order of Processing When an
Event is Raised No Control Over “Who” and “How Many” Process
Events Very Non-Deterministic System Behavior
SWEA26
CSE5095
What has OO Evolved Into?What has OO Evolved Into? What has Classic OO Solution Evolved into Today?What has Classic OO Solution Evolved into Today?
Client (Browser + Struts) Server (Many Variants of OO Languages) Database Server (typically Relational)
Different Style (e.g., Design Pattern)Different Style (e.g., Design Pattern) Does Pattern Capture All Aspects of Style? Do we Need to Couple Technology with Pattern?
Item(Phy_Name*, Date*, Visit_Flag, Symptom, Diagnosis, Treatment, Presc_Flag, Pre_No, Pharm_Name, Medication, Test_Flag, Test_Code, Spec_No, Status, Tech)
Dr. D, Jan 01, 08Fever, Flu, Bed RestNo ScriptsNo Tests
SWEA27
CSE5095
Layered SystemsLayered Systems
Components - Virtual Machine at Each LayerComponents - Virtual Machine at Each Layer Connectors - Protocols That Specify How Layers Connectors - Protocols That Specify How Layers
InteractInteract Interaction Is Restricted to Adjacent LayersInteraction Is Restricted to Adjacent Layers
Users
Corelevel
Base UtilityUseful Systems
SWEA28
CSE5095
Layered SystemsLayered Systems Advantages:Advantages:
Increasing Levels of Abstraction Support Enhancement - New Layers Support for Reuse
Drawbacks:Drawbacks: Not Feasible for All Systems Performance Issues With Multiple Layers Defining Abstractions Is Difficult.
SWEA29
CSE5095
PatientData
De-identified
Aggregated
Layered Systems in BMILayered Systems in BMI One Approach to Constructing Access to Patient Data for Clinical One Approach to Constructing Access to Patient Data for Clinical
Research and Clinical Practice Research and Clinical Practice Construct Layered Data Repositories as BelowConstruct Layered Data Repositories as Below
Each Layer Targets Different User Group Need to Fine Tune Access Even within Layers
Provider
Cl. Researchers
Public Health Researchers
SWEA30
CSE5095
ISO as Layered ArchitectureISO as Layered Architecture ISO Open Systems Interconnect (OSI) ModelISO Open Systems Interconnect (OSI) Model
Now Widely Used as a Reference Architecture 7-layer Model Provides Framework for Specific Protocols (Such
as IP, TCP, FTP, RPC, UDP, RSVP, …)
ApplicationPresentation
SessionTransportNetwork
Data LinkPhysical
ApplicationPresentation
SessionTransportNetwork
Data LinkPhysical
SWEA31
CSE5095
Application ApplicationPresentation Presentation
Session SessionTransport TransportNetwork Network
Data LinkPhysical
Data LinkPhysical
ISO OSI ModelISO OSI Model
Physical (Hardware)/Data Link Layer Networks: Ethernet, Token Ring, Physical (Hardware)/Data Link Layer Networks: Ethernet, Token Ring, ATMATM
Network Layer Net: The InternetNetwork Layer Net: The Internet Transport Layer Net: Tcp-based NetworkTransport Layer Net: Tcp-based Network Presentation/Session Layer Net: Http/html, RPC, PVM, MPIPresentation/Session Layer Net: Http/html, RPC, PVM, MPI
Applications, E.g., WWW, Window System, AlgorithmApplications, E.g., WWW, Window System, Algorithm
SWEA32
CSE5095
RepositoriesRepositories
Knowledge Sources Interact With the Blackboard.Knowledge Sources Interact With the Blackboard. Blackboard Contains the Problem Solving State Data.Blackboard Contains the Problem Solving State Data. Control Is Driven by the State of the Blackboard.Control Is Driven by the State of the Blackboard. DB Systems Are a Form of Repository With a Layer DB Systems Are a Form of Repository With a Layer
Between the BB and the KSs - Supports Between the BB and the KSs - Supports Concurrent Access, Security, Integrity, Recovery
ks8
ks6
ks7
ks1
ks2
ks3
ks4 ks5
Blackboard(shared data)
SWEA33
CSE5095
Database System as a RepositoryDatabase System as a Repository
Clients Interact With the DBMSClients Interact With the DBMS Database Contains the Problem Solving State DataDatabase Contains the Problem Solving State Data Control is Driven by the State of the DatabaseControl is Driven by the State of the Database
Concurrent Access, Security, Integrity, Recovery Single Layer System: Clients have Direct Access Control of Access to Information must be
Carefully Defined within DB Security/Integrity
c8
c6
c7
c1
c2
c3
c4 c5
Database(shared data)
SWEA34
CSE5095
Team Project as a RepositoryTeam Project as a Repository
Clients are Providers, Patients, Clinical ResearchersClients are Providers, Patients, Clinical Researchers Database Underlies Web PortalDatabase Underlies Web Portal Simply a Portion of ArchitectureSimply a Portion of Architecture
Interactions with PHR (Patients) Interactions with EMR (Providers) Interactions with Database/Warehouse (Researchers)
c8
c6
c7
c1
c2
c3
c4 c5
Web PortalShared
SWEA35
CSE5095
InterpretersInterpreters
What Are Components and Connectors?What Are Components and Connectors? Where Have Interpreters Been Used in CS&E?Where Have Interpreters Been Used in CS&E?
LISP, ML, Java, Other Languages, OS Command Line
Data(program state)
Program beinginterpreted
Simulatedinterpretation
engine
Internalinterpreter
state
Inputs
OutputsSelectedinstructionSelected
data
SWEA36
CSE5095
Java as Interpreter Java as Interpreter
SWEA37
CSE5095
Process Control ParadigmsProcess Control Paradigms
Also:Also: Open vs. Close Loop Systems Well Defined Control and Computational
Characters Heavily Used in Engineering Fields.
Process
ProcessController
Set point
Set point
Input variables
Input variables
Controlledvariable
Controlledvariable
s tomanipulated
variables
s tomanipulated
variables
Controller
With FeedbackWith Feedback
Without FeedbackWithout Feedback
SWEA38
CSE5095
Process Architecture: Statechart Diagram?Process Architecture: Statechart Diagram?
SWEA39
CSE5095
Breath
Waiting forResp. Signal
Resp Signal
timeout
TriggerLocalAlarm
TriggerRemoteAlarm
Heartbeat
Waiting forHeart Signal
Heart Signal
irregular beat
Alarm Reset
Process Architecture: Activity Diagram?Process Architecture: Activity Diagram? Clear Applicability to Medical Processes that have Clear Applicability to Medical Processes that have
Underlying BMI – Low Level ProcessesUnderlying BMI – Low Level Processes
SWEA40
CSE5095
Design Patterns as Software ArchitecturesDesign Patterns as Software Architectures Emerged as the Recognition that in Object-Oriented Emerged as the Recognition that in Object-Oriented
Systems Repetitions in Design OccurredSystems Repetitions in Design Occurred Gained Prominence in 1995 with Publication of Gained Prominence in 1995 with Publication of
“Design Patterns: Elements of Reusable Object-“Design Patterns: Elements of Reusable Object-Oriented Software”, Addison-WesleyOriented Software”, Addison-Wesley “… descriptions of communicating objects and
classes that are customized to solve a general design problem in a particular context…”
Akin to Complicated Generic Usage of Patterns RequiresUsage of Patterns Requires
Consistent Format and Abstraction Common Vocabulary and Descriptions
Simple to Complex Patterns – Wide RangeSimple to Complex Patterns – Wide Range
SWEA41
CSE5095
The Observer PatternThe Observer Pattern Utilized to Define a One-to-Many Relationship Utilized to Define a One-to-Many Relationship
Between ObjectsBetween Objects When Object Changes State – all Dependents are When Object Changes State – all Dependents are
Notified and Automatically UpdatedNotified and Automatically Updated Loosely Coupled Objects Loosely Coupled Objects
When one Object (Subject – an Active Object) Changes State than Multiple Objects (Observers – Passive Objects) Notified
Observer Object Implements Interface to Specify the Way that Changes are to Occur
Two Interfaces and Two Concrete Classes
SWEA42
CSE5095
The Observer PatternThe Observer Pattern
SWEA43
CSE5095
Model View ControllerModel View Controller http://java.sun.com/blueprints/patterns/MVC-detailed.htmlhttp://java.sun.com/blueprints/patterns/MVC-detailed.html
SWEA44
CSE5095
Model View Controller Model View Controller Three Parts of the Pattern:Three Parts of the Pattern:
Model Enterprise Data and Business Rules for Accessing and
Updating Data View
Renders the Contents (or Portion) of Model Deals with Presentation of Stored Data Pull or Push Model Possible
Controller Translates Interactions with View into Actions on
Model Actions could be Button Clicks (GUI), Get/Post http
(Web), etc.
SWEA45
CSE5095
Model View ControllerModel View Controller http://java.sun.com/blueprints/patterns/MVC-detailed.htmlhttp://java.sun.com/blueprints/patterns/MVC-detailed.html
SWEA46
CSE5095
UML for System ModelingUML for System Modeling UML is a Language for Specifying, Visualizing, UML is a Language for Specifying, Visualizing,
Constructing, and Documenting Software ArtifactsConstructing, and Documenting Software Artifacts What Does a Modeling Language Provide?What Does a Modeling Language Provide?
Model Elements: Concepts and Semantics Notation: Visual Rendering of Model Elements Guidelines: Hints and Suggestions for Using
Elements in Notation References and ResourcesReferences and Resources
Web: http://www.uml.org/ Is UML Sufficient for Complexity of BMI?Is UML Sufficient for Complexity of BMI?
Able to Model Information Needs for BMI? Able to Represent Required Architectures?
SWEA47
CSE5095
UML Diagrammatic RepresentationsUML Diagrammatic Representations Component Diagram: Captures the Physical Structure Component Diagram: Captures the Physical Structure
of the Implementationof the Implementation Deployment Diagram: Captures the Topology of a Deployment Diagram: Captures the Topology of a
System’s HardwareSystem’s Hardware Collaboration Diagram: Captures Dynamic Behavior Collaboration Diagram: Captures Dynamic Behavior
(Message-Oriented)(Message-Oriented) What About Other Diagrams?What About Other Diagrams?
State Chart Diagram: Captures Dynamic Behavior (Event-Oriented)
Activity Diagram: Captures Dynamic Behavior (Activity-Oriented)
These and Others Seem too Low Level … What is Role of UML for BMI?What is Role of UML for BMI?
Yet Another Design Artifact Can it be More?
SWEA48
CSE5095
Component DiagramComponent Diagram Captures the Physical Structure of the ImplementationCaptures the Physical Structure of the Implementation
SWEA49
CSE5095
Deployment DiagramDeployment Diagram Captures the Topology of a System’s HardwareCaptures the Topology of a System’s Hardware
SWEA50
CSE5095
Collaboration DiagramCollaboration Diagram
SWEA51
CSE5095
Single and Multi-Tier ArchitecturesSingle and Multi-Tier Architectures Widespread use in Practice for All Types of Widespread use in Practice for All Types of
Distributed Systems and ApplicationsDistributed Systems and Applications Two Kinds of ComponentsTwo Kinds of Components
Servers: Provide Services - May be Unaware of Clients Web Servers (unaware?) Database Servers and Functional Servers (aware?)
Clients: Request Services from Servers Must Identify Servers May Need to Identify Self A Server Can be Client of Another Server
Expanding from Micro-Architectures (Single Expanding from Micro-Architectures (Single Computer/One Application) to Macro-ArchitectureComputer/One Application) to Macro-Architecture
SWEA52
CSE5095
Single and Multi-Tier ArchitecturesSingle and Multi-Tier Architectures Normally, Clients and Servers are Independent Normally, Clients and Servers are Independent
Processes Running in ParallelProcesses Running in Parallel Connectors Provide Means for Service Requests and Connectors Provide Means for Service Requests and
Answers to be Passes Among Clients/ServersAnswers to be Passes Among Clients/Servers Connectors May be RPC, RMI, etc.Connectors May be RPC, RMI, etc. AdvantagesAdvantages
Parallelism, Independence Separation of Concerns, Abstraction Others?
DisadvantagesDisadvantages Complex Implementation Mechanisms Scalability, Correctness, Real-Time Limits Others?
SWEA53
CSE5095
Example: Software Architectural StructureExample: Software Architectural Structure
Initial Data Entry Operator(Scanning & Posting)
10-100MB Network
Advanced Data Entry
Operators
DocumentServerStored
Images/CD
DatabaseServerRunningOracle
RMI Registry
Functional Server
RMI Act.Obj/Server
RMI Act.Obj/Server
Analyst Manager
SWEA54
CSE5095
Business Process ModelBusiness Process Model
Scanner
Licensing
LicensingDivisionScanningOperator
StoredImages
BasicInformationEntered
DB
CompletedApplications
HistoricalRecords
Printer
New LicensesNew AppointmentsFOI
Letters (RequestInformation, etc.)
Licensing DivisionData Entry Operator
SupervisorReview
DB
DB DB
DB
SWEA55
CSE5095
Two-Tier ArchitectureTwo-Tier Architecture Small Manufacturer Previously on C++ Small Manufacturer Previously on C++ New Order Entry, Inventory, and Invoicing New Order Entry, Inventory, and Invoicing
Applications in Java Programming Language Applications in Java Programming Language Existing Customer and Order Database Existing Customer and Order Database Most of Business Logic in Stored Procedures Most of Business Logic in Stored Procedures Tool-generated GUI Forms for Java ObjectsTool-generated GUI Forms for Java Objects
SWEA56
CSE5095
Three-Tier ArchitectureThree-Tier Architecture Passenger Check-in for Regional Airline Passenger Check-in for Regional Airline Local Database for Seating on Today's Flights Local Database for Seating on Today's Flights Clients Invoke EJBs at Local Site Through RMI Clients Invoke EJBs at Local Site Through RMI EJBs Update Database and Queue Updates EJBs Update Database and Queue Updates JMS Queues Updates to Legacy System JMS Queues Updates to Legacy System DBC API Used to Access Local Database DBC API Used to Access Local Database
SWEA57
CSE5095
Four-Tier ArchitectureFour-Tier Architecture Web Access to Brokerage Accounts Web Access to Brokerage Accounts Only HTML Browser Required on Front End Only HTML Browser Required on Front End "Brokerbean" EJB Provides Business Logic "Brokerbean" EJB Provides Business Logic Login, Query, Trade Servlets Call Brokerbean Login, Query, Trade Servlets Call Brokerbean Use JNDI to Find EJBs, RMI to Invoke ThemUse JNDI to Find EJBs, RMI to Invoke Them
SWEA58
CSE5095
Architecture ComparisonsArchitecture Comparisons Two-tier Through JDBC API is Simplest Two-tier Through JDBC API is Simplest Multi-tier: Separate Business Logic, Protect Database Multi-tier: Separate Business Logic, Protect Database
Integrity, More Scaleable Integrity, More Scaleable JMS Queues vs. Synchronous (RMI or IDL): JMS Queues vs. Synchronous (RMI or IDL):
Availability, Response Time, Decoupling JMS Publish & Subscribe: Off-line Notification RMI JMS Publish & Subscribe: Off-line Notification RMI
IIOP vs. JRMP vs. Java IDL: IIOP vs. JRMP vs. Java IDL: Standard Cross-language Calls or Full Java
Functionality JTS: Distributed Integrity, Lockstep ActionsJTS: Distributed Integrity, Lockstep Actions
SWEA59
CSE5095
Comments on Architectural StylesComments on Architectural Styles Architectural Styles Provide Patterns Architectural Styles Provide Patterns
Suppose Designing a New System During Requirements Discovery, Behavior and
Structure of System Will Emerge Attempt to Match to Architectural Style Modify, Extend Style as Needed
By Choosing Existing Architectural StyleBy Choosing Existing Architectural Style Know Advantages and Disadvantages Ability to Focus in on Problem Areas and
Bottlenecks Can Adjust Architecture Accordingly
Architectures Range from Large Scale to Small Scale Architectures Range from Large Scale to Small Scale in their Applicabilityin their Applicability
We’ll see Examples for BMI Shortly …We’ll see Examples for BMI Shortly …
SWEA60
CSE5095
Other Issues in Software ArchitecturesOther Issues in Software Architectures Consider a Set of ApplicationsConsider a Set of Applications
New Software Legacy, COTS, Databases, etc.
A Distributed Application is a Set of Applications A Distributed Application is a Set of Applications Deployed Over a Network that CommunicateDeployed Over a Network that Communicate
Relationship Between ApplicationsRelationship Between Applications Different Implementations of “Same” Application on Different Implementations of “Same” Application on
Different Hardware PlatformsDifferent Hardware Platforms Configuration of Various Hardware NodesConfiguration of Various Hardware Nodes Different Node Types in the NetworkDifferent Node Types in the Network Issue:Issue:
What is the ‘Best’ Way to Deploy Applications Across the Network of Available Resources?
SWEA61
CSE5095
Distributed Application & Hardware NodesDistributed Application & Hardware Nodes
Computers & Connections May have Computers & Connections May have Different Characteristics that Affect Different Characteristics that Affect their Usage their Usage Speed Storage Bandwidth
SWEA62
CSE5095
Objective: ‘Best’ DeploymentObjective: ‘Best’ Deployment A Distributed System is A Distributed System is
Optimally Deployed if it Yields Optimally Deployed if it Yields the Best Performancethe Best Performance
Performance: Efficient Use of Performance: Efficient Use of Resources via Throughput, Resources via Throughput, Response Time, or Number of Response Time, or Number of MessagesMessages
What are Implications in BMI?What are Implications in BMI? Need to Bring Together
Multiple Assets Work Efficiently Across
Network Unifying Clinical Research
Repositories
SWEA63
CSE5095
softwareelements
hardwareelements
protocolsinterfaces
interactionpatterns
connectionsSpecification
Distr. Systems: Combo of RequirementsDistr. Systems: Combo of Requirements
SWEA64
CSE5095
Performance
algorithms
middleware
underlyingnetwork
usagepatterns
deployment
softwarearchitecture
replicationdegree
processingnodes
Deployment Influenced by Many FactorsDeployment Influenced by Many Factors
SWEA65
CSE5095
Framework for Design and DeploymentFramework for Design and Deployment
SOFTWARE HARDWARE
Dependencies
Deployment
PERFORMANCE
SWEA66
CSE5095
What is What is II55?? Five Definition LanguagesFive Definition Languages
Interface Inheritance Implementation Instantiation Installation
Five Formal Integrated Graphical Languages Based on Five Formal Integrated Graphical Languages Based on UML’s Implementation DiagramsUML’s Implementation Diagrams
The Application, Network, Dependencies and the The Application, Network, Dependencies and the Deployment are Part of an Integrated FrameworkDeployment are Part of an Integrated Framework
SWEA67
CSE5095
Interface (I1)Interface (I1) - Types of Components, - Types of Components, Nodes and ConnectorsNodes and Connectors
Implementation (I2)Implementation (I2) - Classes of - Classes of Components, Nodes and ConnectorsComponents, Nodes and Connectors
Integration (I3)Integration (I3) - Dependencies Between - Dependencies Between Component and Node ClassesComponent and Node Classes
Instantiation (I4)Instantiation (I4) - Instances of Each Class - Instances of Each Class DefinitionDefinition
Installation (I5)Installation (I5) - Deployment of Each - Deployment of Each Instance (Requirements and Complete Instance (Requirements and Complete Deployment)Deployment)
The Five Levels of The Five Levels of II55
Abstraction
Detail
SWEA68
CSE5095
TypesTypes - Generic Definition of Components, Nodes, and - Generic Definition of Components, Nodes, and Connectors According to Their RoleConnectors According to Their Role Defined in I1 Used in I2 to Define Classes
ClassesClasses - Different Implementations of the Types - Different Implementations of the Types Defined in I2 Used in I3 to Associate Software Components and
Hardware Artifacts and I4 to Define Instances InstancesInstances - Identical Copies of the Different Classes - Identical Copies of the Different Classes
Defined in I4 Used in I5 to Deploy Instances Across Nodes
Levels of Specification in Levels of Specification in II55
SWEA69
CSE5095
UMLUML UML is a Set of Graphical Specification Languages UML is a Set of Graphical Specification Languages
(OMG’s Standard Design Language Since November, (OMG’s Standard Design Language Since November, 1997)1997)
Implementation DiagramsImplementation Diagrams Component Diagrams:
Show the Physical Structure of the Code in Terms of Code Components and Their Dependencies
Deployment Diagrams: Show the Physical Architecture of the Hardware and
Software in the System. They Have a Type and an Instance Version.
SWEA70
CSE5095
UMLUML When to Use Deployment Diagrams When to Use Deployment Diagrams “… “… In practice, I haven’t seen this kind of diagram In practice, I haven’t seen this kind of diagram
used much. Most people do draw diagrams to show used much. Most people do draw diagrams to show this kind of information but they are informal this kind of information but they are informal cartoons. On the whole, I don’t have a problem with cartoons. On the whole, I don’t have a problem with that since each system has its own physical that since each system has its own physical characteristics that your want to emphasize. As we characteristics that your want to emphasize. As we wrestle more and more with distributed systems, wrestle more and more with distributed systems, however, I’m sure we will require more formality as however, I’m sure we will require more formality as we understand better which issues need to be we understand better which issues need to be highlighted in deployment diagrams.”highlighted in deployment diagrams.” From “UML Distilled. Applying the Standard
Object Modeling Language”, by Martin Fowler. Addison-Wesley, Object Technology Series, 7th. Reprint June, 1998.
SWEA71
CSE5095
Pros and Cons of Graphical ModelingPros and Cons of Graphical Modeling Advantages:Advantages:
Clear to Show Structure
Excellent Communication Vehicle
Addresses Different Aspects of Modeling in an Integrated Fashion
DisadvantagesDisadvantages:: Shows Little (or No)
Details There is a Big Gap
Between Specification and Implementation
Limited by Screen Size & Printable Page
Solution: Associate a Complete Textual Solution: Associate a Complete Textual Specification to Graphical Model that Contains Specification to Graphical Model that Contains the Necessary Details for Each Elementthe Necessary Details for Each Element
SWEA72
CSE5095
Design ConceptsDesign Concepts Interface Interaction With the Outer World Interface Interaction With the Outer World
Signature + Requested ServicesSignature + Requested Services Type: Abstract Entity - Interface + SemanticsType: Abstract Entity - Interface + Semantics Subtype: Inherits the Supertype DefinitionSubtype: Inherits the Supertype Definition Class: Implementation of a TypeClass: Implementation of a Type Realization: Relation Between a Type and a Class Realization: Relation Between a Type and a Class
That Implements ItThat Implements It Subclass: Inherits the Superclass ImplementationSubclass: Inherits the Superclass Implementation Instance: Element of a ClassInstance: Element of a Class
SWEA73
CSE5095
The The II55 Framework Framework An Integrated Specification Framework for An Integrated Specification Framework for
Distributed SystemsDistributed Systems Support for the Architectural Specification of OO
and Component Based Distributed Systems Heterogeneous Network - Platforms
A Five Level Framework for Defining Software and A Five Level Framework for Defining Software and Hardware (Platforms) With a Uniform Notation and Hardware (Platforms) With a Uniform Notation and With Different Levels of AbstractionWith Different Levels of Abstraction
Specified Textually in Z or Graphically in UMLSpecified Textually in Z or Graphically in UML Emphasis on Implementation Diagrams
Please See http://www.engr.uconn.edu/~ceciliaPlease See http://www.engr.uconn.edu/~cecilia
SWEA74
CSE5095
Dependencies Between LevelsDependencies Between Levels
Component Types Node Types INTERFACE
IMPLEMENTATION
INTEGRATION
INSTANTIATION
INSTALLATION
Component Classes Node Classes
ImplementationDependencies
Inst. Components Inst. Nodes
Installation Req. (together,separated)
Installation Req. (fix location)
Complete Installation
System Instantiation
SWEA75
CSE5095
Components TypesComponents Types Type Supertypes Associated
Interfaces Calls
PropertiesProperties Types are Unique Supertypes Must Be
Part of I1S Calls Must Be
Satisfied in I1S
Interface - Software: I1SInterface - Software: I1S
SWEA76
CSE5095 Client
FrontEnd
response
requestreceive
<<call>>
Replicareceivegossip
<<call>>
<<call>><<call>>
<<call>>
Interface - Software: I1SInterface - Software: I1S
SWEA77
CSE5095
Interface - Hardware: I1HInterface - Hardware: I1H Node TypesNode Types Connector TypesConnector Types ConnectionsConnections
PropertiesProperties All Node Types Must Be
Connected Only Node and Connector
Types Defined Take Part in the Connections
SUNIntel
Pentium
MPI
Sockets
SWEA78
CSE5095
Implementation - Software: I2SImplementation - Software: I2S Component ClassesComponent Classes
Component Type Class Superclasses Calls to Classes
Interfaces Properties:Properties:
Only Types in I1S are Allowed
Superclasses Are Realizations of the Supertypes
Calls & Inheritance are Satisfied Within I2S
SWEA79
CSE5095
XFrontEndrequestreceive
PCCtrClresponse
XCtrCl
response
<<call>> <<call>>
<<call>>
Counterreceivegossip
<<call>>
Implementation - Software: I2SImplementation - Software: I2S
SWEA80
CSE5095
Implementation - Hardware: I2HImplementation - Hardware: I2H Node ClassesNode Classes
Node Type Class
Connector ClassesConnector Classes Type Class
Connections Between Connections Between Node ClassesNode Classes
PropertiesProperties Node and Connector
Classes Refine the Types in I1H
Connections are With Connector Classes That Refine Connector Types in I1H
SWEA81
CSE5095
SUNIntel
Pentium
MPI
Sockets
SUN OS 4.1.4 Win95
MPI_Impl
CSockets
<<realizes>><<realizes>>
Implementation - Hardware: I2HImplementation - Hardware: I2H
SWEA82
CSE5095
Software and Hardware Integration: I3Software and Hardware Integration: I3 Relation <<supports>>Relation <<supports>>
Instances of the Component Class May Run on Instances of the Node Class
Important Step Since it Constrains Deployment Options
PropertiesProperties Only Node and Component Classes Defined in
I2 Can Participate of the <<supports>> Relation
SWEA83
CSE5095
XFrontEnd
request
PCCtrClresponse
XCtrClresponse
SUN OS 4.1.4 Win95
MPI_Impl
CSockets
receivegossip
Counter
<<supports>><<supports>>
<<supports>>
<<supports>>
receive
Software and Hardware Integration: I3Software and Hardware Integration: I3
SWEA84
CSE5095
Instantiation - Software: I4SInstantiation - Software: I4S Component InstancesComponent Instances
Class Identification Calls
PropertiesProperties Instance Calls Refine
Class Calls Only Classes in I2S May
Be Instantiated
SWEA85
CSE5095
c4:XCtrCl
c3:PCCtrCl
c2:PCCtrCl
c1:PCCtrCl
ct1:Counterreceive
gossip
ct2:Counter
ct3:Counter
ct4:Counter
ct5:Counter
receive
gossip
receive
gossip
receive
gossip
receive
gossip
ct6:Counterreceive
gossip
response
response
response
response
fe1:XF
rontE
nd
receiverequest
fe2:XF
rontE
nd
receiverequest
Instantiation - Software: I4SInstantiation - Software: I4S
SWEA86
CSE5095
Instantiation - Hardware: I4HInstantiation - Hardware: I4H Node Instances Node Instances
Class Identification
Connector InstancesConnector Instances Class Identification Set of Connected
Nodes
PropertiesProperties There are Only
Instances of the Node & Connector Classes Defined in I2H
Connectors Refine I2H Connections
SWEA87
CSE5095
sun6:SunOS4.1.4
sun7:SunOS4.1.4
sun8:SunOS4.1.4
sun9:SunOS4.1.4
sun10:SunOS4.1.4
sun1:SunOS4.1.4
sun2:SunOS4.1.4
sun3:SunOS4.1.4
sun4:SunOS4.1.4
sun5:SunOS4.1.4
pc1:Win95 pc2:Win95 pc3:Win95 pc4:Win95
sock1 sock2 sock3 sock4
mpi1
Instantiation - Hardware: I4HInstantiation - Hardware: I4H
SWEA88
CSE5095
Installation RequirementsInstallation Requirements A Set of Component Instances Must Be Deployed A Set of Component Instances Must Be Deployed
Together or SeparatedTogether or Separated Fix the Location of Some Component InstancesFix the Location of Some Component Instances All Installation Requirements Must Be Consistent All Installation Requirements Must Be Consistent
With the Requirements Imposed by All the Previous With the Requirements Imposed by All the Previous Specification LevelsSpecification Levels
RequirementsRequirements Together Separated Fix
SWEA89
CSE5095
Installation - Requirements: Ifix, IseparatedInstallation - Requirements: Ifix, Iseparated
fe2:XFrontEnd
receive
request
fe1:XFrontEnd
receive
request
sun2:SunOS4.1.4 sun3:SunOS4.1.4
separated = {ct1:Counter, ct2:Counter, ct3:Counter,ct4:Counter, ct5:Counter, ct6:Counter}
SWEA90
CSE5095
Mapping Applications to HardwareMapping Applications to Hardware Applications (Left) and Hardware (Right) InstancesApplications (Left) and Hardware (Right) Instances Restrictions on Restrictions on
Which Applications can be Deployed on Which Hardware?
Which Applications Deployed Together? Which Applications Must be Separate?
SWEA91
CSE5095
Objective: ‘Best” Optimal Deployment Objective: ‘Best” Optimal Deployment
SWEA92
CSE5095
Using Using II5 5 for BMI for BMI Focus at Architectural LevelFocus at Architectural Level
Multiple Assets to Bring Together Hospital EMRs, Provider EMRs, Other Systems
Multiple and Disparate Hardware Different Contexts and Needs
Clinical Practice – (Near) Real-Time Integration/Access
Clinical Research – De-Identified Integrated Repository Performance will be Key IssuePerformance will be Key Issue
Clinical Practice – Time of Access Clinical Research – Volume of Information
Some Genomic Data Requires Terabytes of Data! Information overload Possible
SWEA93
CSE5095
The Next Big ChallengeThe Next Big Challenge Macro-ArchitecturesMacro-Architectures
System of Systems Application of Applications
Involves Two Key IssuesInvolves Two Key Issues Interoperability
Heterogeneous Distributed Databases Heterogeneous Distributed Systems Autonomous Applications
Scalability Rapid and Continuous Growth Amount of Data Variety of Data Types Different Privacy Levels or Ownerships of Data
SWEA94
CSE5095
Interoperability: A Classic ViewInteroperability: A Classic View
FDB Global Schema
FederatedIntegration
Local Schema
Local Schema
Local Schema
FDB Global Schema 4
FederatedIntegration
FDB 1Local
SchemaFDB3
Federation Federation
Simple Federation Multiple Nested Federation
SWEA95
CSE5095
What is CORBA?What is CORBA? Differs from Typical Programming LanguagesDiffers from Typical Programming Languages Objects can be …Objects can be …
Located Throughout Network Interoperate with Objects on other Platforms Written in Ant PLs for which there is mapping
from IDL to that Language
Object Request Broker
ApplicationInterfaces
Domain Interfaces
Object Services
SWEA96
CSE5095
What is CORBA?What is CORBA? Allow Interactions from Client to Server CORBA Allow Interactions from Client to Server CORBA Installed on All Participating MachinesInstalled on All Participating Machines
Client Application Server Application
Client ORB Core Server ORB Core
StaticStub
DII DSISkeleton
ORBInterface
ORBInterface
Object Adapter
Network
IDL - Independent Same for allapplications
There may be multipleobject adapters
SWEA97
CSE5095
ClientApplication
IDL file
Stub Skeleton
IDL Compiler IDL Compiler
ORB/IIOP ORB/IIOP
ObjectImplementation
CORBA-Based DevelopmentCORBA-Based Development
SWEA98
CSE5095
Information Broker
•Mediator-Based Systems•Agent-Based Systems
Database Interoperability in the InternetDatabase Interoperability in the Internet TechnologyTechnology
Web/HTTP, JDBC/ODBC, CORBA (ORBs + IIOP), XML
ArchitectureArchitecture
SWEA99
CSE5095
Java ClientJava ClientLegacyLegacy
ApplicationApplication
JavaJavaWrapperWrapper
Object Request Broker (ORB)Object Request Broker (ORB)
CORBA is the Medium of Info. Exchange
Requires Java/CORBA Capabilities
ORB Integration:Java Client + Legacy ApplicationORB Integration:Java Client + Legacy Application
SWEA100
CSE5095
LegacyLegacyApplicationApplication
NetworkNetwork
Java ClientJava Client
Java Application CodeJava Application Code
WRAPPERWRAPPER
Mapping ClassesMapping Classes
JAVA LAYERJAVA LAYER
NATIVE LAYERNATIVE LAYER
Native Functions (C++)Native Functions (C++)RPC Client Stubs (C)RPC Client Stubs (C)
Interactions Between Java ClientInteractions Between Java Clientand Legacy Appl. via C and RPCand Legacy Appl. via C and RPC
C is the Medium of Info. ExchangeC is the Medium of Info. Exchange
Java Client with C++/C WrapperJava Client with C++/C Wrapper
Java Client with Wrapper to Legacy ApplicationJava Client with Wrapper to Legacy Application
SWEA101
CSE5095
NetworkNetwork
Java Application CodeJava Application Code
JAVA NETWORK WRAPPERJAVA NETWORK WRAPPER
Mapping ClassesMapping Classes
NATIVE LAYERNATIVE LAYER
JAVA LAYERJAVA LAYER
Native Functions that Native Functions that Map to COTS ApplMap to COTS Appl
Java ClientJava Client Java ClientJava Client
Java Application CodeJava Application Code
JAVA NETWORK WRAPPERJAVA NETWORK WRAPPER
Mapping ClassesMapping Classes
NATIVE LAYERNATIVE LAYER
JAVA LAYERJAVA LAYER
Native Functions that Native Functions that Map to Legacy ApplMap to Legacy Appl
COTS ApplicationCOTS Application Legacy ApplicationLegacy Application
Java is Medium of Info. Exchange - C/C++ Appls with Java Wrappers
COTS and Legacy Appls. to Java ClientsCOTS and Legacy Appls. to Java Clients
SWEA102
CSE5095
Java ClientJava Client
LegacyLegacyApplicationApplication
Relational Relational DatabaseDatabase
System(RDS)System(RDS)
Transformed Transformed Legacy DataLegacy Data
Updated DataUpdated Data
Extract and Extract and Generate DataGenerate Data
Transform andTransform andStore DataStore Data
Java Client to Legacy App via RDBSJava Client to Legacy App via RDBS
SWEA103
CSE5095
Driver Driver
Java Application
JDBC API
Driver Manager
Oracle SybaseAccess
Driver
JDBCJDBC
Driver Driver
JDBC API Provides DB Access Protocols for Open, JDBC API Provides DB Access Protocols for Open, Query, Close, etc.Query, Close, etc.
Different Drivers for Different DB PlatformsDifferent Drivers for Different DB Platforms
SWEA104
CSE5095
Connecting a DB to the WebConnecting a DB to the Web
Web Server are Web Server are StatelessStateless
DB Interactions Tend DB Interactions Tend to be Statefulto be Stateful
Invoking a CGI Invoking a CGI Script on Each DB Script on Each DB Interaction is Very Interaction is Very Expensive, Mainly Expensive, Mainly Due to the Cost of Due to the Cost of DB OpenDB Open
DBMS
Web Server
Browser
Internet
CGI Script Invocationor JDBC Invocation
SWEA105
CSE5095
Connecting More EfficientlyConnecting More Efficiently
To Avoid Cost of To Avoid Cost of Opening Database, One Opening Database, One can Use Helper can Use Helper Processes that Always Processes that Always Keep Database Open Keep Database Open and Outlive Web and Outlive Web ConnectionConnection
Newly Invoked CGI Newly Invoked CGI Scripts Connect to a Scripts Connect to a Preexisting Helper Preexisting Helper ProcessProcess
System is Still StatelessSystem is Still Stateless
DBMS
Web Server
Browser
Internet
CGI Scriptor JDBC Invocation
Helper Processes
SWEA106
CSE5095
DB-Internet ArchitectureDB-Internet Architecture
WWW Client(Netscape)
WWW Client(HotJava)
WWW client(Info. Explore)
Internet
HTTP Server
DBWeb Gateway
DBWeb Gateway
DBWeb Gateway
DBWeb Gateway
DBWeb Dispatcher
SWEA107
CSE5095
Biomedical ArchitecturesBiomedical Architectures Transcend Normal Two, Three, and Four Tier Transcend Normal Two, Three, and Four Tier
Solutions – Macro-Architecture Solutions – Macro-Architecture An Architecture of Architectures!An Architecture of Architectures!
Need to Integrate Systems that are Themselves Multi-Tier and Distributed
Need to Resolve Data Ownership Issues State of Connecticut Agencies Don’t Share Competing Hospitals Seek to Protect Market Share
T1, T2, and Clinical Research Requires Interoperating Genomic Databases/Supercomputers Integration of De-identified Patient Data from Multiple
Sources to Allow Sufficient Study Samples De-identified Data Repositories or Data Marts
Dealing with Ownership Issues (DNA Research)
SWEA108
CSE5095
Patients Providers
Clinical Researchers
Web-BasedPortal(XML + HL7)Open Source DB(XML or MySQL)
EMR
FeedbackRepository
EducationMaterials
PHR
Consider Team Project ArchitectureConsider Team Project Architecture
SWEA109
CSE5095
Internet and the WebInternet and the Web A Major Opportunity for BusinessA Major Opportunity for Business
A Global Marketplace Business Across State and Country Boundaries
A Way of Extending Services Online Payment vs. VISA, Mastercard
A Medium for Creation of New Services Publishers, Travel Agents, Teller, Virtual Yellow
Pages, Online Auctions … A Boon for AcademiaA Boon for Academia
Research Interactions and Collaborations Free Software for Classroom/Research Usage Opportunities for Exploration of Technologies in
Student Projects What are Implications for BMI? Where is the Adv?What are Implications for BMI? Where is the Adv?
SWEA110
CSE5095
IntranetIntranet Decision
support Mfg.. System
monitoring corporate
repositories Workgroups
Server
CorporateNetwork
Server
ServerServer
CorporateNetwork
Internet
InternetInternet Sales Marketing Information Services
Business to BusinessBusiness to Business Information sharing Ordering info./status Targeted electronic
commerce
WWW: Three Market SegmentsWWW: Three Market Segments
Provider Network
Exposure to Outside
Provider Network
SWEA111
CSE5095
Information Delivery Problems on the NetInformation Delivery Problems on the Net Everyone can Publish Information on the Web Everyone can Publish Information on the Web
Independently at Any TimeIndependently at Any Time Consequently, there is an Information Explosion Identifying Information Content More Difficult
There are too Many Search Engines but too Few There are too Many Search Engines but too Few Capable of Returning High Quality DataCapable of Returning High Quality Data
Most Search Engines are Useful for Ad-hoc Searches Most Search Engines are Useful for Ad-hoc Searches but Awkward for Tracking Changesbut Awkward for Tracking Changes
What are Information Delivery Issues for BMI?What are Information Delivery Issues for BMI? Publishing of Patient Education Materials Publishing of Provider Education Materials How Can Patients/Providers find what Need? How do they Know if its Relevant? Reputable?
SWEA112
CSE5095
Example Web ApplicationsExample Web Applications Scenario 1: World Wide WaitScenario 1: World Wide Wait
A Major Event is Underway and the Latest, Up-to-the Minute Results are Being Posted on the Web
You Want to Monitor the Results for this Important Event, so you Fire up your Trusty Web Browser, Pointing at the Result Posting Site, and Wait, and Wait, and Wait …
What is the Problem?What is the Problem? The Scalability Problems are the Result of a
Mismatch Between the Data Access Characteristics of the Application and the Technology Used to Implement the Application
May not be Relevant to BMI: Hard to Apply ScenarioMay not be Relevant to BMI: Hard to Apply Scenario
SWEA113
CSE5095
Example Web ApplicationsExample Web Applications Scenario 2: Scenario 2:
Many Applications Today have the Need for Tracking Changes in Local and Remote Data Sources and Notifying Changes If Some Condition Over the Data Source(s) is Met
To Monitor Changes on Web, You Need to Fire Your Trusty Web Browser from Time to Time, Cache the Most Recent Result, and Difference Manually Each Time You Poll the Data Source(s)
Issue: Pure Pull is Not the Answer to All ProblemsIssue: Pure Pull is Not the Answer to All Problems BMI: If a Patient Enters Data that Sets off a Chain BMI: If a Patient Enters Data that Sets off a Chain
Reaction, how Can Provider be Notified and in Turn Reaction, how Can Provider be Notified and in Turn the Provider Notify the Patient (Bad Health Event)the Provider Notify the Patient (Bad Health Event)
SWEA114
CSE5095
What is the Problem?What is the Problem? Applications are Asymmetric but the Web is NotApplications are Asymmetric but the Web is Not
Computation Centric vs. Information Flow Centric Type of AsymmetryType of Asymmetry
Network Asymmetry Satellite, CATV, Mobile Clients, Etc.
Client to Server Ratio Too Many Clients can Swamp Servers
Data Volume Mouse and Key Click vs. Content Delivery
Update and Information Creation Clients Need to be Informed or Must Poll
Clearly, for BMI, Simple Web Environment/Browser Clearly, for BMI, Simple Web Environment/Browser is Not Sufficient – No Auto-Notificationis Not Sufficient – No Auto-Notification
SWEA115
CSE5095
What are Information Delivery Styles?What are Information Delivery Styles? Pull-Based SystemPull-Based System
Transfer of Data from Server to Client is Initiated by a Client Pull
Clients Determine when to Get Information Potential for Information to be Old Unless Client
Periodically Pulls Push-Based SystemPush-Based System
Transfer of Data from Server to Client is Initiated by a Server Push
Clients may get Overloaded if Push is Too Frequent
HybridHybrid Pull and Push Combined Pull First and then Push Continually
SWEA116
CSE5095
Publish/SubscribePublish/Subscribe Semantics: Servers Publish/Clients SubscribeSemantics: Servers Publish/Clients Subscribe
Servers Publish Information Online Clients Subscribe to the Information of Interest
(Subscription-based Information Delivery) Data Flow is Initiated by the Data Sources
(Servers) and is Aperiodic Danger: Subscriptions can Lead to Other
Unwanted Subscriptions ApplicationsApplications
Unicast: Database Triggers and Active Databases 1-to-n: Online News Groups
May work for Clinical Researcher to Provider PushMay work for Clinical Researcher to Provider Push
SWEA117
CSE5095
Design Options for NodesDesign Options for Nodes Three Types of Nodes:Three Types of Nodes:
Data Sources Provide Base Data which is to be Disseminated
Clients Who are the Net Consumers of the Information
Information Brokers Acquire Information from Other Data Sources, Add
Value to that Information and then Distribute this Information to Other Consumers
By Creating a Hierarchy of Brokers, Information Delivery can be Tailored to the Need of Many Users
Brokers may be Ideal Intermediaries for BMI!Brokers may be Ideal Intermediaries for BMI! Act on Behalf of Patients, Providers Incorporate Secure Access
SWEA118
CSE5095 Ubiquitous/Pervasive
Many computers and information appliances everywhere,
networked together
Research ChallengesResearch Challenges Inherent Complexity:Inherent Complexity:
Coping with Latency (Sometimes Unpredictable)
Failure Detection and Recovery (Partial Failure)
Concurrency, Load Balancing, Availability, Scale
Service Partitioning Ordering of Distributed Events
““Accidental” Complexity:Accidental” Complexity: Heterogeneity: Beyond the Local
Case: Platform, Protocol, Plus All Local Heterogeneity in Spades.
Autonomy: Change and Evolve Autonomously
Tool Deficiencies: Language Support (Sockets,rpc), Debugging, Etc.
SWEA119
CSE5095
Problem: too many sources,too much information
Internet:Information Jungle
Clean, Reliable,Timely Information,Anywhere
DigitalEarth
Sensors
PersonalizedFiltering &Info. Delivery
Infopipes
Resou
rce A
dapta
tion Property Mgmt
Information QualityContinual Queries
Mic
rofe
edba
ck
specializationInfosphereInfosphere
SWEA120
CSE5095
ThinClient
WebServer
MainframeDatabaseServer
Current State-of-ArtCurrent State-of-Art
SWEA121
CSE5095 Infotaps &
Fat Clients
Varietyof Servers
Sensors
DatabaseServer
Many sources
Infosphere Scenario – for BMIInfosphere Scenario – for BMI
SWEA122
CSE5095
Heterogeneity and AutonomyHeterogeneity and Autonomy Heterogeneity:Heterogeneity:
How Much can we Really Integrate? Syntactic Integration
Different Formats and Models Web/SQL Query Languages
Semantic Interoperability Basic Research on Ontology, Etc
AutonomyAutonomy No Central DBA on the Net Independent Evolution of Schema and Content Interoperation is Voluntary Interface Technology (Support for Isvs)
DCOM: Microsoft Standard CORBA, Etc...
SWEA123
CSE5095
Security and Data QualitySecurity and Data Quality SecuritySecurity
System Security in the Broad Sense Attacks: Penetrations, Denial of Service System (and Information) Survivability
Security Fault Tolerance Replication for Performance, Availability, and
Survivability Data QualityData Quality
Web Data Quality Problems Local Updates with Global Effects Unchecked Redundancy (Mutual Copying) Registration of Unchecked Information Spam on the Rise
SWEA124
CSE5095
Legacy Data ChallengeLegacy Data Challenge Legacy Applications and DataLegacy Applications and Data
Definition: Important and Difficult to Replace Typically, Mainframe Mission Critical Code Most are OLTP and Database Applications
Evolution of Legacy DatabasesEvolution of Legacy Databases Client-server Architectures Wrappers Expensive and Gradual in Any Case
SWEA125
CSE5095
Potential Value Added/Jumping on BandwagonPotential Value Added/Jumping on Bandwagon Sophisticated Query CapabilitySophisticated Query Capability
Combining SQL with Keyword Queries Consistent UpdatesConsistent Updates
Atomic Transactions and Beyond But Everything has to be in a Database!But Everything has to be in a Database!
Only If we Stick with Classic DB Assumptions Relaxing DB AssumptionsRelaxing DB Assumptions
Interoperable Query Processing Extended Transaction Updates
Commodities DB SoftwareCommodities DB Software A Little Help is Still Good If it is Cheap Internet Facilitates Software Distribution Databases as Middleware
SWEA126
CSE5095
Data Warehousing and Data MiningData Warehousing and Data Mining Data WarehousingData Warehousing
Provide Access to Data for Complex Analysis, Knowledge Discovery, and Decision Making
Underlying Infrastructure in Support of Mining Provides Means to Interact with Multiple DBs OLAP (on-Line Analytical Processing) vs. OLTP
Data MiningData Mining Discovery of Information in a Vast Data Sets Search for Patterns and Common Features based Discover Information not Previously Known
Medical Records Accessible Nationwide Research/Discover Cures for Rare Diseases
Relies on Knowledge Discovery in DBs (KDD)
SWEA127
CSE5095
Data Warehousing and OLAPData Warehousing and OLAP A Data Warehouse A Data Warehouse
Database is Maintained Separately from an Operational Database
“A Subject-Oriented, Integrated, Time-Variant, and Non-Volatile Collection of Data in Support for Management’s Decision Making Process [W.H.Inmon]”
OLAP (on-Line Analytical Processing)OLAP (on-Line Analytical Processing) Analysis of Complex Data in the Warehouse Attempt to Attain “Value” through Analysis Relies on Trained and Adept Skilled Knowledge
Workers who Discover Information Data MartData Mart
Organized Data for a Subset of an Organization Establish De-Identified Marts for BMI Research
SWEA128
CSE5095
Corporate data warehouse
Data Mart Data MartData MartData Mart
Corporate data
Option 1:Consolidate Data Marts
Option 2:Build from scratch
...
Building a Data WarehouseBuilding a Data Warehouse Option 1Option 1
Leverage Existing Repositories
Collate and Collect May Not Capture All
Relevant Data
Option 2Option 2 Start from Scratch Utilize Underlying
Corporate Data
SWEA129
CSE5095
BMI data warehouse
Data Mart Data MartData MartData Mart...
BMI – Partition/Excerpt Data WarehouseBMI – Partition/Excerpt Data Warehouse Clinical and Epidemiological Research (and for T2 and T1) Clinical and Epidemiological Research (and for T2 and T1)
Each Study Submitted to Institutional Review Board (IRB)Each Study Submitted to Institutional Review Board (IRB) For Human Subjects (Assess Risks, Protect Privacy) See: http://resadm.uchc.edu/hspo/irb/
To Satisfy IRB (and Privacy, Security, etc.), Reverse Process to To Satisfy IRB (and Privacy, Security, etc.), Reverse Process to Create a Data Mart for each Approved StudyCreate a Data Mart for each Approved Study Export/Excerpt Study Data from Warehouse May be Single or Multiple Sources
SWEA130
CSE5095
Data Warehouse CharacteristicsData Warehouse Characteristics Utilizes a “Multi-Dimensional” Data ModelUtilizes a “Multi-Dimensional” Data Model Warehouse Comprised ofWarehouse Comprised of
Store of Integrated Data from Multiple Sources Processed into Multi-Dimensional Model
Warehouse Supports ofWarehouse Supports of Times Series and Trend Analysis “Super-Excel” Integrated with DB Technologies
Data is Less Volatile than Regular DB Data is Less Volatile than Regular DB Doesn’t Dramatically Change Over Time Updates at Regular Intervals Specific Refresh Policy Regarding Some Data
SWEA131
CSE5095
External data sources
metadata
Operational databasesExtraxtTransformLoadRefresh
monitor
integrator
Data Warehouse
Data marts
OLAP Server
Summarizationreport
Query report
Data mining
serve
Three Tier ArchitectureThree Tier Architecture
SWEA132
CSE5095
Data Warehouse DesignData Warehouse Design Most of Data Warehouses use a Start Schema to Most of Data Warehouses use a Start Schema to
Represent Multi-Dimensional Data ModelRepresent Multi-Dimensional Data Model Each Dimension is Represented by a Each Dimension is Represented by a Dimension Dimension
TableTable that Provides its Multidimensional Coordinates that Provides its Multidimensional Coordinates and Stores Measures for those Coordinatesand Stores Measures for those Coordinates
A A Fact TableFact Table Connects All Dimension Tables with a Connects All Dimension Tables with a Multiple JoinMultiple Join Each Tuple in Fact Table Represents the Content
of One Dimension Each Tuple in the Fact Table Consists of a Pointer
to Each of the Dimensional Tables Links Between the Fact Table and the Dimensional
Tables for a Shape Like a Star
SWEA133
CSE5095
What is a Multi-Dimensional Data Cube?What is a Multi-Dimensional Data Cube? Representation of Information in Two or More Representation of Information in Two or More
DimensionsDimensions Typical Two-Dimensional - SpreadsheetTypical Two-Dimensional - Spreadsheet In Practice, to Track Trends or Conduct Analysis, In Practice, to Track Trends or Conduct Analysis,
Three or More Dimensions are UsefulThree or More Dimensions are Useful For BMI – Axes for Diagnosis, Drug, Subject AgeFor BMI – Axes for Diagnosis, Drug, Subject Age
SWEA134
CSE5095
Multi-Dimensional SchemasMulti-Dimensional Schemas Supporting Multi-Dimensional Schemas Requires Supporting Multi-Dimensional Schemas Requires
Two Types of Tables:Two Types of Tables: Dimension Table: Tuples of Attributes for Each
Dimension Fact Table: Measured/Observed Variables with
Pointers into Dimension Table Star SchemaStar Schema
Characterizes Data Cubes by having a Single Fact Table for Each Dimension
Snowflake SchemaSnowflake Schema Dimension Tables from Star Schema are
Organized into Hierarchy via Normalization Both Represent Storage Structures for CubesBoth Represent Storage Structures for Cubes
SWEA135
CSE5095
Date
Product
Store
Customer
Unit_Sales
Dollar_Sales
ProductNoProdNameProdDescCategoryu
Product
CustIDCustNameCustCityCustCountry
Customer
DateMonthYear
Date
StoreIDCityStateCountryRegion
Store
Sale Fact Table
Example of Star SchemaExample of Star Schema
SWEA136
CSE5095
Visit Date
Vitals
Symptoms
Patient
Medications
Etc.
BPTempRespHR (Pulse)
Vitals
PatientIDPatientNamePatientCityPatientCountry
Patient
DateMonthYear
Date
PulmonaryHeartMus-SkelSkinDigestive
Symptoms
Patient Fact Table
Example of Star Schema for BMIExample of Star Schema for BMI
Reference another StarSchema for all Meds
SWEA137
CSE5095
A Second Example of Star Schema … A Second Example of Star Schema …
SWEA138
CSE5095
and Corresponding Snowflake Schemaand Corresponding Snowflake Schema
SWEA139
CSE5095
Data Warehouse IssuesData Warehouse Issues Data AcquisitionData Acquisition
Extraction from Heterogeneous Sources Reformatted into Warehouse Context - Names,
Meanings, Data Domains Must be Consistent Data Cleaning for Validity and Quality
is the Data as Expected w.r.t. Content? Value? Transition of Data into Data Model of Warehouse Loading of Data into the Warehouse
Other Issues Include:Other Issues Include: How Current is the Data? Frequency of Update? Availability of Warehouse? Dependencies of Data? Distribution, Replication, and Partitioning Needs? Loading Time (Clean, Format, Copy, Transmit,
Index Creation, etc.)? For CTSA – Data Ownership (Competing Hosps).
SWEA140
CSE5095
Knowledge DiscoveryKnowledge Discovery Data Warehousing Requires Knowledge Discovery to Data Warehousing Requires Knowledge Discovery to
Organize/Extract Information MeaningfullyOrganize/Extract Information Meaningfully Knowledge DiscoveryKnowledge Discovery
Technology to Extract Interesting Knowledge (Rules, Patterns, Regularities, Constraints) from a Vast Data Set
Process of Non-trivial Extraction of Implicit, Previously Unknown, and Potentially Useful Information from Large Collection of Data
Data MiningData Mining A Critical Step in the Knowledge Discovery
Process Extracts Implicit Information from Large Data Set
SWEA141
CSE5095
Steps in a KDD ProcessSteps in a KDD Process Learning the Application Domain (goals)Learning the Application Domain (goals) Gathering and Integrating DataGathering and Integrating Data Data CleaningData Cleaning Data IntegrationData Integration Data Transformation/ConsolidationData Transformation/Consolidation Data MiningData Mining
Choosing the Mining Method(s) and Algorithm(s) Mining: Search for Patterns or Rules of Interest
Analysis and Evaluation of the Mining ResultsAnalysis and Evaluation of the Mining Results Use of Discovered Knowledge in Decision MakingUse of Discovered Knowledge in Decision Making Important CaveatsImportant Caveats
This is Not an Automated Process! Requires Significant Human Interaction!
SWEA142
CSE5095
OLAP StrategiesOLAP Strategies OLAP Strategies OLAP Strategies
Roll-Up: Summarization of Data Drill-Down: from the General to Specific (Details) Pivot: Cross Tabulate the Data Cubes Slide and Dice: Projection Operations Across
Dimensions Sorting: Ordering Result Sets Selection: Access by Value or Value Range
Implementation IssuesImplementation Issues Persistent with Infrequent Updates (Loading) Optimization for Performance on Queries is More
Complex - Across Multi-Dimensional Cubes Recovery Less Critical - Mostly Read Only Temporal Aspects of Data (Versions) Important
SWEA143
CSE5095
Product
Product Store Date Sale
acron Rolla,MO 7/3/99 325.24
budwiser LA,CA 5/22/99 833.92
large pants NY,NY 2/12/99 771.24
3’ diaper Cuba,MO 7/30/99 81.99
PantsDiapers
BeerNuts
West
East
Central
Mountain
South
Jan Feb March April
Date
Region
On-Line Analytical ProcessingOn-Line Analytical Processing Data CubeData Cube
A Multidimensonal Array Each Attribute is a Dimension
In Example Below, the Data Must be Interpreted so In Example Below, the Data Must be Interpreted so that it Can be Aggregated by Region/Product/Datethat it Can be Aggregated by Region/Product/Date
SWEA144
CSE5095
Medication
Patient Med BirthDat Dosage
Steve Lipitor 1/1/45 10mg
John Zocor 2/2/55 80mg
Harry Crestor 3/3/65 5mg
Lois Lipitor 4/4/66 20mg
Charles Crestor 7/1/59 10mg
LescolCrestor
ZocorLipitor
5
10
20
40
80
1940s 1950s 1960s 1970s
Decade
Dosage
On-Line Analytical ProcessingOn-Line Analytical Processing For BMI – Imagine a Data Table with Patient DataFor BMI – Imagine a Data Table with Patient Data
Define Axis Summarize Data Create Perspective to Match Research Goal Essentially De-identified Data Mart
SWEA145
CSE5095
Months
Cities
Prod
ucts
Sal
es
Multi-Dimensional Data Cube
Months
Cities
Prod
ucts
Sal
es
Slice on city Atlanta
Examples of Data MiningExamples of Data Mining The Slicing ActionThe Slicing Action
A Vertical or Horizontal Slice Across Entire Cube
SWEA146
CSE5095
March 2000
Atla
nta
Electronics Dice on Electronics and Atlanta
Months
Cities
Prod
ucts
Sal
es
Examples of Data MiningExamples of Data Mining The Dicing ActionThe Dicing Action
A Slide First Identifies on Dimension A Selection of Any Cube within the Slice which Essentially
Constrains All Three Dimensions
Prod
ucts
Sal
es
Months
Atlanta
SWEA147
CSE5095
Examples of Data MiningExamples of Data Mining
Drill Down - Takes a Facet (e.g., Q1) Drill Down - Takes a Facet (e.g., Q1) and Decomposes into Finer Detail and Decomposes into Finer Detail
Q1 Q2 Q3 Q4
Location (city, GA)
Pro
duct
s Sa
les
Jan Feb March
Citi
esP
rodu
cts
Sale
s
Drill down on Q1
Roll Up on Location(State, USA)
Atlanta
Columbus
Gainesville
Savannah
Q1 Q2 Q3 Q4
Pro
duct
s Sa
les
Arizona
CaliforniaGeorgiaIowa
Roll Up: Combines Multiple DimensionsRoll Up: Combines Multiple DimensionsFrom Individual Cities to StateFrom Individual Cities to State
SWEA148
CSE5095
Time series data
Geographical and Satellite Data
Spatial databases
Multimedia databases
World Wide Web
Mining Other Types of DataMining Other Types of Data Analysis and Access Dramatically More Complicated!Analysis and Access Dramatically More Complicated! Time Series Data for Glucose, BP, Peak Flow, etc.Time Series Data for Glucose, BP, Peak Flow, etc.
SWEA149
CSE5095
Advantages/Objectives of Data MiningAdvantages/Objectives of Data Mining Descriptive MiningDescriptive Mining
Discover and Describe General Properties 60% People who buy Beer on Friday also have
Bought Nuts or Chips in the Past Three Months Predictive MiningPredictive Mining
Infer Interesting Properties based on Available Data
People who Buy Beer on Friday usually also Buy Nuts or Chips
Result of MiningResult of Mining Order from Chaos Mining Large Data Sets in Multiple Dimensions
Allows Businesses, Individuals, etc. to Learn about Trends, Behavior, etc.
Impact on Marketing Strateg
SWEA150
CSE5095
Data Mining Methods (1)Data Mining Methods (1) AssociationAssociation
Discover the Frequency of Items Occurring Together in a Transaction or an Event
Example 80% Customers who Buy Milk also Buy Bread
Hence - Bread and Milk Adjacent in Supermarket 50% of Customers Forget to Buy Milk/Soda/Drinks
Hence - Available at Register PredictionPrediction
Predicts Some Unknown or Missing Information based on Available Data
Example Forecast Sale Value of Electronic Products for Next
Quarter via Available Data from Past Three Quarters
SWEA151
CSE5095
Association RulesAssociation Rules Motivated by Market AnalysisMotivated by Market Analysis Rules of the Form Rules of the Form
Item1^Item2^…^ ItemkItemk+1 ^ … ^ Itemn ExampleExample
“Beer ^ Soft Drink Pop Corn” Problem: Discovering All Interesting Association Problem: Discovering All Interesting Association
Rules in a Large Database is Difficult!Rules in a Large Database is Difficult! Issues
Interestingness Completeness Efficiency
Basic Measurement for Association Rules Support of the Rule Confidence of the Rule
SWEA152
CSE5095
Data Mining Methods (2)Data Mining Methods (2) ClassificationClassification
Determine the Class or Category of an Object based on its Properties
Example Classify Companies based on the Final Sale Results in
the Past Quarter ClusteringClustering
Organize a Set of Multi-dimensional Data Objects in Groups to Minimize Inter-group Similarity is and Maximize Intra-group Similarity
Example Group Crime Locations to Find Distribution Patterns
SWEA153
CSE5095
ClassificationClassification Two StagesTwo Stages
Learning Stage: Construction of a Classification Function or Model
Classification Stage: Predication of Classes of Objects Using the Function or Model
Tools for ClassificationTools for Classification Decision Tree Bayesian Network Neural Network Regression
ProblemProblem Given a Set of Objects whose Classes are Known
(Training Set), Derive a Classification Model which can Correctly Classify Future Objects
SWEA154
CSE5095
AttributesAttributes
Class Attribute - Play/Don’t Play the GameClass Attribute - Play/Don’t Play the Game Training SetTraining Set
Values that Set the Condition for the Classification What are the Pattern Below?
Attribute Possible Valuesoutlook sunny, overcast, raintemperature continuoushumidity continuouswindy true, false
Outlook Temperature Humidity Windy Playsunny 85 85 false Noovercast 83 78 false Yessunny 80 90 true Nosunny 72 95 false Nosunny 72 70 false Yes… … … … ...
An ExampleAn Example
SWEA155
CSE5095
Data Mining Methods (3)Data Mining Methods (3) SummarizationSummarization
Characterization (Summarization) of General Features of Objects in the Target Class
Example Characterize People’s Buying Patterns on the Weekend Potential Impact on “Sale Items” & “When Sales Start” Department Stores with Bonus Coupons
DiscriminationDiscrimination Comparison of General Features of Objects
Between a Target Class and a Contrasting Class Example
Comparing Students in Engineering and in Art Attempt to Arrive at Commonalities/Differences
SWEA156
CSE5095
barcode category brand content size
14998 milk diaryland Skim 2L
12998 mechanical MotorCraft valve 23a 12in
… … … … ...
food
Milk … bread
Skim milk … 2% milk White whole bread … wheat
Lucern … DairylandWonder … Safeway
Category Content Count
milk skim 280milk 2% 98… … ...
Summarization TechniqueSummarization Technique Attribute-Oriented Induction Attribute-Oriented Induction Generalization using Concert hierarchy (Taxonomy)Generalization using Concert hierarchy (Taxonomy)
SWEA157
CSE5095
Why is Data Mining Popular?Why is Data Mining Popular? Technology PushTechnology Push
Technology for Collecting Large Quantity of Data Bar Code, Scanners, Satellites, Cameras
Technology for Storing Large Collection of Data Databases, Data Warehouses Variety of Data Repositories, such as Virtual Worlds,
Digital Media, World Wide Web Corporations want to Improve Direct Marketing and Corporations want to Improve Direct Marketing and
Promotions - Driving Technology AdvancesPromotions - Driving Technology Advances Targeted Marketing by Age, Region, Income, etc. Exploiting User Preferences/Customized Shopping
What is Potential for BMI?What is Potential for BMI? How do you see Data Mining Utilized? What are Key Issues to Worry About?
SWEA158
CSE5095
Requirements & Challenges in Data MiningRequirements & Challenges in Data Mining Security and Social Security and Social
What Information is Available to Mine? Preferences via Store Cards/Web Purchases What is Your Comfort Level with Trends?
User Interfaces and VisualizationUser Interfaces and Visualization What Tools Must be Provided for End Users of
Data Mining Systems? How are Results for Multi-Dimensional Data
Displayed? Performance GuaranteesPerformance Guarantees
Range from Real-Time for Some Queries to Long-Term for Other Queries
Data Sources of Complex Data Types or Unstructured Data Sources of Complex Data Types or Unstructured Data - Ability to Format, Clean, and Load Data SetsData - Ability to Format, Clean, and Load Data Sets
SWEA159
CSE5095
Robert H. Aseltine, Jr., Ph.D.Robert H. Aseltine, Jr., Ph.D.Cal CollinsCal Collins
January 16, 2008January 16, 2008
An Initiative of the An Initiative of the University of ConnecticutUniversity of Connecticut
Center for Public Health and Health PolicyCenter for Public Health and Health Policy
SWEA160
CSE5095
What is CHIN?What is CHIN? State of Connecticut Agencies Collect and Maintain State of Connecticut Agencies Collect and Maintain
Data in Separate Databases such as:Data in Separate Databases such as: Vital Statistics: Birth, Death (DPH) Surveillance data: Lead Screening and
Immunization Registries (DPH) Administrative services: LINK system (DCF),
CAMRIS (DMR) Benefit programs: WIC (DPH), Medicaid (DSS) Educational achievement: (PSIS)
Such Data is Un-Integrated Such Data is Un-Integrated Impossible to Track Assess Target Populations Difficult to Develop Evidence-Based Practices Limits Meaningful Interactions Among State
Agencies
SWEA161
CSE5095
UCONN Health CenterLow Birth Weight Infant Registry
Last NameLast Name First NameFirst Name DOBDOB SSNSSN Birth Wt.Birth Wt.(kg)(kg)
Appel April 01/01/101/01/1999999
016-000-9876016-000-9876 2.82.8
Berry John 02/02/102/02/1997997
216-000-4576216-000-4576 2.92.9
Carat Colleen 03/03/103/03/1993993
119-000-1234119-000-1234 1.91.9
Ernst Max 04/04/104/04/1994994
116-000-3456116-000-3456 2.72.7
Gomez Gloria 05/05/105/05/1995995
036-000-9999036-000-9999 2.62.6
Hurst William 06/06/106/06/1996996
016-000-5599016-000-5599 3.13.1
Keller Helene 07/07/107/07/1997997
017-000-2340017-000-2340 2.52.5
Martinez Pedro 08/08/108/08/1998998
018-000-9886018-000-9886 3.03.0
Rodriguez Felix 09/09/109/09/1999999
029-000-9111029-000-9111 2.82.8
Smith Peggy 10/10/210/10/2000000
016-000-8787016-000-8787 2.52.5
Dept. of Mental RetardationBirth to Three System
Last NameLast Name First NameFirst Name DOBDOB StreetStreet TownTown
Allen Gwen 01/01/1901/01/199999
AppleApple EnfieEnfie
Buck Jerome 07/01/1907/01/199999
BurbankBurbank West West
Cleary Jane 03/03/1903/03/199393
CedarCedar TollaTolla
Dory Daniel 03/03/1903/03/199393
DogfishDogfish HartfHartf
Ernst Max 04/04/1904/04/199494
ElmElm EnfieEnfie
Friday Joe 11/03/1911/03/199999
FruitFruit WindWind
Glenn Valerie 03/23/1903/23/199898
GlenGlen BranfBranf
Martinez Pedro 08/08/1908/08/199898
HighHigh HartfHartf
Riley Lily 03/03/1903/03/199696
IpswichIpswich BridgBridg
Sanchez Ramon 03/03/1903/03/199393
JuniperJuniper New New
CT Dept. of EducationPSIS System
Last NameLast Name First NameFirst Name CMTCMTMathMath
Polio Vac Polio Vac DateDate
Days in Days in AttendanceAttendance
Appel April 134134 01/05/01/05/19991999
179179
Carat Colleen 256256 05/01/05/01/19981998
122122
Cleary Jane 268268 01/28/01/28/20002000
178178
Ernst Max 152152 01/09/01/09/19991999
145145
Gomez Gloria 289289 01/01/01/01/19991999
168168
Friday Joe 265265 10/01/10/01/19991999
170170
Keller Helene 309309 11/01/11/01/20012001
180180
Martinez Pedro 248248 12/01/12/01/20032003
180180
Riley Lily 201201 01/01/01/01/19991999
122122
Sanchez Ramon 249249 01/01/01/01/19991999
159159
Last NameLast Name First NameFirst Name DOBDOB SSNSSN Birth Wt.Birth Wt. StreetStreet TownTown CMT Math CMT Math Grade 3Grade 3
Polio Polio Vaccination Vaccination DateDate
Days in Days in AttendanceAttendance
Ernst Max 04/04/199404/04/1994 116-000-3456116-000-3456 2.72.7 ElmElm EnfieldEnfield 152152 01/09/199901/09/1999 145145
Martinez Pedro 08/08/199808/08/1998 018-000-9886018-000-9886 3.03.0 HighHigh HartfordHartford 248248 12/01/200312/01/2003 180180
What Do We Mean by “Integration?”
SWEA162
CSE5095
Key Challenges to Integrating DataKey Challenges to Integrating Data Security and Privacy Security and Privacy
HIPAA FERPA WIC, Social Security (Medicaid/Medicare)
regulations State statutes
Alteration/disruption of business practicesAlteration/disruption of business practices Unique identification of individuals/casesUnique identification of individuals/cases Accuracy and reliability of dataAccuracy and reliability of data Disparate hardware/software platformsDisparate hardware/software platforms
SWEA163
CSE5095
Key Challenges to Integrating DataKey Challenges to Integrating Data Security and Privacy Security and Privacy
HIPAA FERPA WIC, Social Security (Medicaid/Medicare)
regulations State statutes
Alteration/disruption of business practicesAlteration/disruption of business practices Unique identification of individuals/casesUnique identification of individuals/cases Accuracy and reliability of dataAccuracy and reliability of data Disparate hardware/software platformsDisparate hardware/software platforms
SWEA164
CSE5095
The Solution: CHINThe Solution: CHIN Connecticut Health Information NetworkConnecticut Health Information Network A Federated Network That:A Federated Network That:
Allows Shared Access to “Health”-related Data From Heterogeneous Databases
Allows Agencies to Retain Complete Control Over Access to Data
Has Minimal Impact on Business Practices Complies with Security and Privacy Statutes Incorporates Cutting-edge Approaches to Case
Matching Partnership of:Partnership of:
Early Partners: DPH, DCF, DDS, DoE, DOIT, UConn, Akaza Research
SWEA165
CSE5095
CHIN Processes and ComponentsCHIN Processes and Components
De-identify Data
CHIN Trusted Broker andDe-Identification Engine
Integrated,
De-identified Data
Define data elementsin CHIN
CHIN Metadata Registry
Map data elements to
source database
CHIN Contributor
Query Execution:Identifier Matching and
Data Merge
CHIN GRID and Trusted Broker
Review Committee Approval
CHIN EnterpriseAdministration
Build Query
CHIN Metadata Registry and CHIN Query
Builder
Publish “metadata”to CHIN with security
and privacy rules
CHIN Metadata Registry and CHIN Trusted
Broker
SWEA166
CSE5095
Original CHIN ArchitectureOriginal CHIN Architecture
http://publichealth.uconn.edu/CHIN.php
SWEA167
CSE5095
Second CHIN Architecture: User SideSecond CHIN Architecture: User Side
ContributorContributor ContributorContributorAA&&AA
SWEA168
CSE5095
Second CHIN Architecture: Contributor SideSecond CHIN Architecture: Contributor Side
Front EndFront EndTrustedTrustedBrokerBroker
AA&&AA
SWEA169
CSE5095
Current CHIN ArchitectureCurrent CHIN Architecture
SWEA170
CSE5095
CHIN Architecture: Standards-basedCHIN Architecture: Standards-based All data is mapped to Health Level Seven’s Clinical All data is mapped to Health Level Seven’s Clinical
Document Architecture (CDA) in XMLDocument Architecture (CDA) in XML Health Level Seven (HL7), is an ANSI-approved
Standards Developing Organization HL7 has its own XML Special Interest Group,
responsible for developing XML implementations of its standards in XML
HL7 is also an active participant in W3C, the organization responsible for the development of XML
CDA was approved as an ANSI standard in November of 2000.
Component Architecture communicates via Web Component Architecture communicates via Web Services and OGSA Grid standardsServices and OGSA Grid standards
SWEA171
CSE5095
CHIN Arch.: Proven, Open ComponentsCHIN Arch.: Proven, Open Components Components are based on open-source librariesComponents are based on open-source libraries
The grid-based servers Mako and Virtual Mako are part of the Mobius Project from Ohio State University’s Dept. of BioInformatics
The translation tools to get data into XML are provided by the XQuare and XBridge projects, hosted on the ObjectWeb website, an open source middleware community
The algorithm and code for identity management is FEBRL, Freely Extensible Biomedical Record Linkage, which was developed at Australian National University
NuSOAP Web Services Engine for component integration
SWEA172
CSE5095
FEBRLFEBRL Identifier matching in FEBRL proceeds in four steps: Identifier matching in FEBRL proceeds in four steps: Data cleansing and standardizationData cleansing and standardization
Removes, to the degree possible, string discrepancies based on common misspellings, extra white space, or misplaced name or address components.
IndexingIndexing Reduces the size of the number of record comparisons
which must be performed for scalability; blocking, sorting, and bigram indexing methods are all supported.
Record comparisonRecord comparison Conducted using an arbitrary composition of exact or
inexact string comparison methods over any combination of fields
Classification. Classification. Follows the Felligi-Sunter34 model, with records pairs
assigned a weight based on a pallet of probabilities and matches determined based on the record pair weights
SWEA173
CSE5095
FEBRLFEBRL The current prototype uses FEBRL to implement a simplistic The current prototype uses FEBRL to implement a simplistic
method of linkage whereby record pairs are declared a match if method of linkage whereby record pairs are declared a match if the first and last name are exactly equal. the first and last name are exactly equal.
Next StepsNext Steps Evaluate the accuracy of linking records over a rubric of
five data fields - first name, last name, date of birth, social security number, and gender.
Exact and inexact matching (ie misspellings and slight discrepancies), including experimental variations of the service based on the blinded bigram matching algorithm.
Assess false positives and false negatives produced by each palette of field comparison algorithms.
Evaluate the accuracy of linking records using fabricated data sets with characteristics similar to real datasets
Experiment with variations of canopy cluster matching algorithm.
SWEA174
CSE5095
Other CHIN IssuesOther CHIN Issues Why Choose an Open Architecture?
Increased Accountability Plenty of Documentation and Research Greater Transparency Ease of Installation, Maintenance, Dissemination
How is Data Ported into CHIN? CHIN is based on a Grid, with each organization
supporting its own data through a Contributor server
Agency staff has complete control over access to data on CHIN by other users
Only one server faces to the outside network
SWEA175
CSE5095
Creating a Contributor ServerCreating a Contributor Server
Datasource
Contributor ServerContains:
XML generated filesMako service
Java filesGenerate XML
External IP AddressConnection to CHIN Trusted Broker
External IP AddressConnection to CHIN Trusted Broker
Data Elements
*.xqy files*.xqy files XML files to XML files to
generate CDA generate CDA compliant filescompliant files
Published to MDR
Firewall
SSL
SWEA176
CSE5095
Connecting to rest of Network
•Metadata Registry takes information•About data elements•About data security•Datasource information
•Contributor profile is registered with CHIN Network Admin
Datasource
Contributor ServerContains:
XML generated filesMako service
Java filesGenerate XML
External IP AddressConnection to CHIN Trusted Broker
External IP AddressConnection to CHIN Trusted Broker
Data Elements
*.xqy files*.xqy files XML files to XML files to
generate CDA generate CDA compliant filescompliant files
Published to MDR
Firewall
SSL
Access
to CHIN
SWEA177
CSE5095
How do we get data out?How do we get data out? The Trusted Broker component:The Trusted Broker component:
Pulls XML from the Virtual Mako which reaches out to all Contributors
Compares records from different Contributors using FEBRL
De-identifies data sets to generate a final data set for Investigators
The Front End component:The Front End component: Provides a central place for users to connect to the
system Connects to the Metadata Registry and the Trusted
Broker via Web Services calls Allows different users of the system to perform
different actions
SWEA178
CSE5095
Getting Data from CHIN
SWEA179
CSE5095
Getting Data From CHINGetting Data From CHIN
•CHIN also contains:•A Front-end server to take queries•A Trusted Broker to compare data, perform record linkage, and de-identify results
XML Files FEBRL Result SetDeidentify
Final Result Set
SWEA180
CSE5095
Progress to DateProgress to Date Needs assessment completed Needs assessment completed Technical and functional specifications identifiedTechnical and functional specifications identified MOU’s with state agenciesMOU’s with state agencies Expanding list of partnersExpanding list of partners Prototype developedPrototype developed Funding for Model Network Funding for Model Network
Development/Deployment /Evaluation 2008Development/Deployment /Evaluation 2008
SWEA181
CSE5095
DemoDemo
SWEA182
CSE5095
EMR ArchitecturesEMR Architectures Provider-Based Systems have Two VariantsProvider-Based Systems have Two Variants
All Data In House Larger Providers (Clinics) Control All Own Data Sizeable IT Staff for 24-7 Operations Control of Own Backups
Limited In House – Off Site Storage (Larger, Multi-Site Practices Smaller Providers – Limited IT Staff Desire Out-of-Box Solution Local Data for Ease of Access Remote Storage – Promotes Off-Hours Access
Even 1st Variant – Service for “Backups”
SWEA183
CSE5095
EMR for Large Providers - AllScriptEMR for Large Providers - AllScript
SWEA184
CSE5095
EMR for Smaller ProvidersEMR for Smaller Providers
LocalEMR
Provider’s Office
Remote Access
Vendor’s LocationServer/Data Farm
LocalEMR
RemoteEMR
PatientData
SWEA185
CSE5095
Integrating Clinical RepositoriesIntegrating Clinical Repositories Provider/Hospital RelationshipProvider/Hospital Relationship
Provider has Privileges at Hospital Provider Chooses Office-Based EMR More Easily Integrated with Hospital EMR Emerging at Community Hospital Level
Example:Example: Milford Hospital, MA All Area Providers with Privileges Linked in Ability to See Patient Records, Tests, at Hospital Unclear on Uploads from Providers to Hospital However, No Link to UMass Medical Center (of
which Milford Hospital is Affiliated)
SWEA186
CSE5095
Integrating Clinical RepositoriesIntegrating Clinical Repositories CTSA – Region Wide Clinical/Translational ResearchCTSA – Region Wide Clinical/Translational Research Target Area HospitalsTarget Area Hospitals
St. Francis, Hartford, Hosp. Central CT, CCMC Each Hospital has Own Clinical Repository (EMR)
For Wider-Scoped T1, T2, and Clinical ResearchFor Wider-Scoped T1, T2, and Clinical Research Need to Integrate these Repositories at Some
Level What is Most Practical?
Setting up Centralized De-Identified Repository? Creating Data Marts as you go? What are Pros and Cons of Each?
Researcher Seeking CHF Patient Data Needs to have De-Identified Data Mart
SWEA187
CSE5095
Integrating Clinical RepositoriesIntegrating Clinical Repositories
SWEA188
CSE5095
Integrating Clinical RepositoriesIntegrating Clinical Repositories
SWEA189
CSE5095
Integrating Clinical RepositoriesIntegrating Clinical Repositories
SWEA190
CSE5095 NHIN Prototype Phase I
Integrating Clinical RepositoriesIntegrating Clinical Repositories
SWEA191
CSE5095
NHIN Prototype Phase II
Integrating Clinical RepositoriesIntegrating Clinical Repositories
SWEA192
CSE5095
SWEA193
CSE5095
Personal Health Record IntegrationPersonal Health Record Integration
SWEA194
CSE5095
Concluding RemarksConcluding Remarks Only Scratched Surface on ArchitecturesOnly Scratched Surface on Architectures
Micro Architectures Macro Architectures Super-Macro Architectures (We’ll see …)
What’s are Key Facets in the Discussion?What’s are Key Facets in the Discussion? Role and Impact of Standards Open Solutions Architectural Variants – Reuse “Architecture”
Can we Reuse CHIN for Clinical Practice? Are All Contributors Simply Each Hospital and EHR? How do we Connect all of the Pieces?
What are Next Steps?What are Next Steps? Let’s Review Some other Work Source: Wide Range of Presentations on Web