university of illinois at urbana-champaignnational center for supercomputing applications towards...
TRANSCRIPT
University of Illinois at Urbana-Champaign National Center for Supercomputing Applications
Towards Truly Ubiquitous Cyberinfrastructure
LAGrid ’07
Associate Director for Cyberenvironments and Technologies,National Center for Supercomputing Applications (NCSA),
University of Illinois at Urbana-Champaign
National Center for Supercomputing Applications
National Center for Supercomputing Applications
• Cyber-resources
• Innovative Systems
• Communities and Applications
• Cyberenvironments
Outline
• What’s Changing in Science?
• What Role should Cyberinfrastructure (CI) play?
• What Do Ubiquitous (and Persistent) mean for CI Development?
• Designing for Ubiquity
• Some Examples
• Conclusions
National Center for Supercomputing Applications
National Center for Supercomputing Applications
How is Science Changing?
• Quantitative Modeling and Simulation
• Better Data (e.g. Higher Signal to Noise)
• More Data (e.g. High Throughput)
– Closer ties between research and application
– Investigation of subtle, non-linear, multi-dimensional phenomena
– Statistical analysis of complex systems
National Center for Supercomputing Applications
The Research Process
It’s just the Scientific Method…
National Center for Supercomputing Applications
The Research Process
Fg~m
Conceptual
Logical
Physical
AssumptionsReference DataControls…
ReductionStatisticsAnalysis of Alternatives…
With Experimental Design…
National Center for Supercomputing Applications
The Research Process
Scientific
Inst
rum
ent M
ethod
Fg~m
High-speedcamera
And Multiple, Coupled Objectives…
National Center for Supercomputing Applications
The Research ProcessCollaborationReference Data CurationModel ValidationSub-discipline CreationBest-practice DisseminationApplicationEducation…
Scientific
Inst
rum
ent M
ethod
And Community Processes …
National Center for Supercomputing Applications
The Research Process
Non-linear, high-dimensional, coupled, multi-scale phenomena
Scientific
Inst
rum
ent M
ethod
And It’s No Longer Fg~m …
National Center for Supercomputing Applications
‘Amdahl’s Law’ for Scientific Progress:
Data discoveryTranslationExperiment setupGroup coordinationTool integrationTraining
Feature ExtractionData interpretationAcceptance of new models/toolsDissemination of best practicesInterdisciplinary communication
Data production Processing power Data transfer/storage !
National Center for Supercomputing Applications
What’s Needed to Support the Research Lifecycle?
DiscoverMineTranslateReferenceExtract
Experiment DesignAnnotation
Provenance
Gap Analysis
Reference Data
PublishShareCoordinateCurateValidateRelate
7 8 9
H2O () H2O2 ()
H2O H2O2
OH+ OH
H O
H2 O2
16 1718 19
20 2122 22a
15
3 4
23
5 6
2 1
10 11 12
14
137 8 97 8 9
H2O () H2O2 ()
H2O H2O2
OH+ OH
H O
H2 O2
16 1718 1916 1718 19
20 2122 22a20 21
22 22a
1515
3 43 4
2323
5 65 6
22 11
10 11 12
10 11 12
1414
1313
1
2
Valid Rang
e
Project Execution
Engineering Views
Standards /Best practice Sensor Data
Algorithms/Services
National Center for Supercomputing Applications
• There is a class of bovine-related problems for which shape is not important
• Yet shape is clearly needed in a general cow model
• Should we “reach consensus” here?• Is there one ‘best’ way to map volume to height?
Consider a Spherical Cow…
Moo!
ACM
E
Truck
ing
National Center for Supercomputing Applications
Key Issues for Ubiquitous & Persistent CI• CI must be built before the parts are done
• It must be evolvable by independent parties
• It must enable coordination without central control
• It must allow science to evolve / progress– No fixed domain model
• Researchers/educators must be able to work in multiple communities/value chains (across CI projects)
• It must convey knowledge as well as tools to end users
• It must align the interests of CI funders, developers, providers, users, …
Can this be done?
National Center for Supercomputing Applications
National Center for Supercomputing Applications
Yes!
• Design Principles for loosely coupled, scalable (not scaled) systems and organizations
• Agile, community/science driven development processes over longer-term community/science driven design
…e-Science, Semantic Grid, Web 2.0 …
…intelligence at the edges…
National Center for Supercomputing Applications
Key Cyberenvironment Design Concepts• Explicit Representations Separating How from
What:– Content (metadata, global IDs, …)
– Process (workflow, provenance, …)
– Virtual Organizations (policies, resources, semantics, translation)
– GUI Integration (portals, rich clients, …)
– …
University of Illinois at Urbana-Champaign National Center for Supercomputing Applications
F ile In te rv e n tion s
M aev iz – [M em p h is T est B ed ]
In v e n to ry H a zard s V u lne ra b ility D e c is ion s u p po rt In te rd e p e n d e n c ie s H e lp
?C o n seq u en ce T ab le
O K C ance l
E a rthquake Lev e l: 5% P E in 50 yea rs
D ec is ion O ption: E qu iv a len t C os t A na lys is
P ro b . D is trib u tio n P re fe re n ce P lo t P O S p lo t C o m pa re S che m e s
?S ch em e C o m p ariso n
O K C ance l
D e scrip tio n
S ch e m e #1C 2M R ebu ildC 2L R ebu ildU R M L R ebu ild
S ch e m e #2C 2M R ehab LSC 2L R ehab LSU R M L N o A ctio n
C o ns e q ue nc e C o m p aris o n
0102030405060708090
100
No Ac tion S c hem e #1 S c hem e #2
A lte rna tives
Loss
($M
)
Life Los s
D ollar Los s
Input Motion Parameter
So
cial
/Eco
no
mic
Imp
act
Lim
it S
tate
Input error margin
Response error margin
Input Motion Parameter
So
cial
/Eco
no
mic
Imp
act
Lim
it S
tate
Input error margin
Response error margin
Input Motion Parameter
So
cial
/Eco
no
mic
Imp
act
Lim
it S
tate
Input error margin
Response error margin
MAEViz – an Example Cyberenvironment(Consequence-Based Risk Management for Seismic Events)
Mid-America Earthquake Center
• Engineering View of MAE Center Research• Portal-based Collaboration Environment• Distributed Data/metadata Sources• Multi-disciplinary Collaboration
Hazard Definition
Inventory Selection
FragilityModels
Damage Prediction
Decision Support
National Center for Supercomputing Applications
Content Management
• Whatever ‘thing’ we are talking about, we want – To know its type,
– Have descriptive information so we can find and categorize it,
– Be able to version it,
– Specify who owns and can access it,
– Define its relationships to other things,
– Manage copies of it / know when you have it,
– Be able to translate it,
– Dynamically add new information we learn about it,
– …
National Center for Supercomputing Applications
Content Aware
• ARKs, DOI, LSID• WebDAV, JCR, RDF, SAM, Tupelo
Desktop
SecureEnterprise
Data
Public Reference
Data
Data/Metadata
National Center for Supercomputing Applications
Process Management Framework• Workflow description as a means of communicating
experiment protocol– Actors built as modules, web services, grid jobs…– Process execution managed through direct calls, service calls, data
transfer, events, manual processes, …
• Workflow generated by applications, by example, graphically, or discovered from provenance
• Execution performed using an engine with appropriate speed, reliability, availability of modules, etc.
• Workflow templates and provenance records treated as sharable content (versioned, compared, documented, …)
• Process descriptions captured at multiple levels of detail (scientific, mathematical, engineering, debugging, …)
• Community Provenance and Process extend across workflows
National Center for Supercomputing Applications
Process Management
Workflow Creation
Hierarchical Workflow
Application Interface
Provenance
Workflow-by-Example
X=f(y)Y = f2(z)
Scripting
National Center for Supercomputing Applications
Process Aware
• Workflow, Provenance, RDF
DiscoverProcessCapture
Execute
Report
National Center for Supercomputing Applications
Virtual Organizations
• Grid/portal concept for managing– Single sign-on security
– access control policies
– toolsets and views
– data sources
– processes and results
– resource pools
– vocabularies and models
– …
• Tools query VO manager to configure themselves based on VO context/policies/preferences
National Center for Supercomputing Applications
Pluggable User Interfaces
• Portlet/Rich-Client concept, broadened to include VO configuration of – Content sources
– Events
– Workflow/Provenance repositories
– Data models/ontologies
– Translations
• Portal technologies: JSR 168, Teamlets, WSRP, JSR 286, …
• Rich Clients: Eclipse/OSGi, JSR 170, 283, …
National Center for Supercomputing Applications
Group Aware
• Collaboratory, Portal, …
Plan, Coordinate, Share, Compare
WikiTask ListChatDocument RepositoryScenario RepositoryTraining Materials
SSO
National Center for Supercomputing Applications
Dynamic
• Plug-ins, WSRP, Provenance
Eclipse RCP
Workflow DataGIS
MAEviz
Plug-in Framework
Auto-update
New Third-PartyAnalyses
Compare, Contrast,Validate
National Center for Supercomputing Applications
Rich, VO-oriented plug-in mechanism
Third-partyPlug-in
Adds to menu
Adds to interface
Adds to workflow
Adds to provenance
Joins SecurityContext
Maps data modelX
X
X
X
Environmental Observatories
Rely on advances in: sensors and sensor networks at intensively instrumented sites shared by the research community cyberinfrastructure with high bandwidth to connect the sites, data repositories, and researchers into collaboratories distributed modeling platforms
From USGS
National Center for Supercomputing Applications
Observatories as a Community Focus
National Center for Supercomputing Applications
SensorsData
ProductsDerived
Data Products
Storage
QA/QC
Archive
Operations/Expt. Design
Cache Cache Cache
KnowledgeStore
Community Provisioning
Community Coordination/Knowledge Creation
Events
Model Dev/Validation
Research & Education Projects
ObservatoryOperation and
Evolution
On-demandServices and HPC
Third-party Resources
Data Access
Environmental Observatory Processes
DocumentationCoordination
Recommendations
National Center for Supercomputing Applications
Ubiquity = Supporting Scientific Discourse
• Cyberenvironments represent rethinking current practice to create CI– That is enabling rather than stifling– That evolves as fast a research evolves– That connects research and practice– That empowers individuals to contribute new resources– That can be ubiquitous and persistent– That enables resource repurposing to address new
questions– That opens new career paths for CI developers, data
scientists, systems engineers, …
National Center for Supercomputing Applications
Cyberinfrastructure Challenges
• How can CI increase the productivity and competitiveness of the scientific community?
• How can CI developers enhance their capacity to respond to user needs more rapidly and more effectively?
• How should CI technical design and organizational structures change to enable solutions at scale – as a ubiquitous, persistent infrastructure for science and engineering research and education?
National Center for Supercomputing Applications
Cyberenvironments
Mosaic and Cyberenvironments• Mosaic
– By early 1990s, the internet had a wealth of resources, but they were inaccessible to most scientists
– Individual publishing– Browsing versus retrieving– See “Web 2.0 ... The Machine is
Us/ing Us”
• Cyberenvironments– By the early 2000’s, the internet
and grid had a wealth of interactive resources, but they were inaccessible to most scientists
– Individual information models– Fusion versus gathering
National Center for Supercomputing Applications
Acknowledgments
NCSA CET Staff NCSA CollaboratorsCI Community
National Science Foundation/State of Illinois/ONR
Mathematical, Information and Computational Sciences Division of the Office of Science
Mid-America Earthquake Center
… and Thank You