ids书友会 - 主题1 - swinburne next generation research
TRANSCRIPT
Overview of Cloud Computing and Workflow Research in NGSP Group
Dr. Dong YUAN
Research Fellow
Swinburne University of Technology Melbourne, Australia
Outline
> SUCCESS Centre and NGSP Group
> Background: Big Data, Cloud Computing and Workflow
> Research Topics
– Data Management in Cloud Computing
– Performance Management in Scientific Workflows
– Security and Privacy Protection in the Cloud
– SwinDeW-C Cloud Workflow System
The Centre of SUCCESS
> SUCCESS: Swinburne University Centre for Computing and Engineering Software Systems
– SUCCESS is the “NO.1” Software Engineering Centre in Australia
– SUCCESS is one of the 7 Tire 1 Centres at Swinburne University of Technology (Times World Ranking: 351- 400, Academic Ranking of World Universities: 301- 400)
> The ambition of the Centre is to become the top centre for software research in the Southern Hemisphere within the next five years.
3
SUCCESS
> Research Focus Areas
– Knowledge and Data Intensive Systems
– Nature of Software
– Next Generation Software Platforms
– SE Education and IBL/RBL
– Software Analysis and Testing
– Software R&D Group
> http://www.swinburne.edu.au/ict/success/research-expertise/
4
NGSP (Small) Group Overview
> We conduct research into cloud computing and workflow technologies for complex software systems and services.
> Members:
Leader:Prof Yun Yang(PC Member forICSE 07/08, FSE09 ICSE 10/11/12)
Researchers:Dr Xiao Liu (Postdoc, China)Dr Dong Yuan (Postdoc)Gaofeng ZhangWenhao LiDahai CaoJofry Hadi SUTANTOAntonio Giardina
Others:Prof John GrundyProf Chengfei Liu
5
Visitors:Prof Lee OsterweilProf Lori ClarkeProf Ivan StojmenovicProf Paola InverardiProf Amit ShethProf Wil van der Aalst Prof Hai JinProf Hai Zhuge
> Primary projects:
– (Cloud) workflow technology: Scheduling and temporal analysis in cloud workflows
• ARC LP0990393 (Y Yang, R Kotagiri, J Chen, C Liu)
– Cloud computing: Intermediate data management in cloud computing
• ARC DP110101340 (Y Yang, J Chen, J Grundy)
> Secondary project:
– Management control systems for effective information sharing and security in government organisations
• ARC LP110100228 (S Cugenasen, Y Yang)
R&D Projects – Grants
6
> SwinDeW workflow family including SwinDeW-C
– Architectures / Models (D Cao)
– Scheduling / Data and service management (D Yuan, X Liu)
– Verification / Exception handling (X Liu)
> Cloud computing:
– Data management (D Yuan, X Liu, W Li)
– Privacy and Security (G Zhang, X Zhang, C Liu)
R&D Projects – Overview
7
> J. Chen and Y. Yang, Temporal Dependency based Checkpoint Selection for Dynamic Verification of Temporal Constraints in Scientific Workflow Systems. ACM Transactions on Software Engineering and Methodology, 20(3), 2011
> X. Liu, Y. Yang, Y. Jiang and J. Chen, Preventing Temporal Violations in Scientific Workflows: Where and How. IEEE Transactions on Software Engineering, 37(6):805-825, Nov./Dec. 2011.
> D. Yuan, Y. Yang, X. Liu and J. Chen, On‑demand Minimum Cost Benchmarking for Intermediate Datasets Storage in Scientific Cloud Workflow Systems. Journal of Parallel and Distributed Computing, 71:(316-332), 2011
> J. Chen and Y. Yang, Localising Temporal Constraints in Scientific Workflows. Journal of Computer and System Sciences, Elsevier, 76(6):464-474, Sept. 2010
> G. Zhang, Y. Yang and J. Chen, A Historical Probability based Noise Generation Strategy for Privacy Protection in Cloud Computing. Journal of Computer and System Sciences, Elsevier, published online, Dec. 2011.
> Another 8 A* papers are currently under review…
Some Recent ERA A* Ranked Publications
8
Part 1: Outline
> SUCCESS Centre and NGSP Group
> Background: Big Data, Cloud Computing and Workflow
> Research Topics
– Data Management in Cloud Computing
– Performance Management in Scientific Workflows
– Security and Privacy Protection in the Cloud
– SwinDeW-C Cloud Workflow System
Big Data
> Data explosion
– TB (1012), PB(1015), exabyte (EB, 1018), zettabyte (ZB, 1021), yottabyte (YB,1024)
– The total amount of global data in 2010:
– Google processes ? data everyday in 2009:
– Every day, Facebook 10T, Twitter 7T, Youtube 4.5T
> Moore's law vs. data explosion speed
– Application data double every year over the next decade and further - [Szalay et al. Nature, 2006]
> Buzzwords: data storage, data processing, parallel, distributed, virtualisation, commodity machines, energy consumption, data centres, utility computing, software (everything) as a service
10
1.2 ZB
24 PB
11
Example: Pulsar Searching
> Astrophysics: pulsar searching
> Pulsars: the collapsed cores of stars that were once more massive than 6-10 times the mass of the Sun
> http://astronomy.swin.edu.au/cosmos/P/Pulsar
> Parkes Radio Telescope (http://www.parkes.atnf.csiro.au/)
> Swinburne Astrophysics group (http://astronomy.swinburne.edu.au/) has been conducting pulsar searching surveys (http://astronomy.swin.edu.au/pulsar/) based on the observation data from Parkes Radio Telescope.
> Typical scientific workflow which involves a large number of data and computation intensive activities. For a single searching process, the average data volume (not including the raw stream data from the telescope) is over 4 terabytes and the average execution time is about 23 hours on Swinburne high performance supercomputing facility (http://astronomy.swinburne.edu.au/supercomputing/).
left: Image of the Crab Nebula taken with the Palomar telescope right: A close up of the Crab Pulsar from the Hubble Space TelescopeCredit: Jeff Hester and Paul Scowen (Arizona State University) and NASA
Pulsar Searching Workflow
12
AccelerateCollect data
Transfer Data
Pulse Seek
FFA Seek
Get Candidates
Eliminate candidates
Fold to XML
Extract Beam
Get Candidates
U(SW)=24hours
…...
…...
……
…...
Make Decision
U(SW1)=15.25hoursU(SW2)=5.75hours
De-disperse (1200)
De-disperse (3600)
De-disperse (2400)
…...
Extract Beam
1hour
13hours
1.5hours
1hour
20minutes 4hours
20minutes1.5hours
10minutes 20minutes
Transfer Data
…… FFT
Seek
Data Collection
Data Pre-processing Decision Making
Candidate Searching
Dr. Willem van Straten
Benefits of Clouds> No upfront infrastructure investment
– No procuring hardware, setup, hosting, power, etc..
> On demand access
– Lease what you need and when you need..
> Efficient Resource Allocation
– Globally shared infrastructure …
> Nice Pricing
– Based on Usage, QoS, Supply and Demand, Loyalty, …
> Application Acceleration
– Parallelism for large-scale data analysis…
> Highly Availability, Scalable, and Energy Efficient
> Supports Creation of 3rd Party Services & Seamless offering
– Builds on infrastructure and follows similar Business model as Cloud
13
SwinDeW Workflow Series
SwinDeW – Swinburne Decentralised Workflow- foundation prototype based on p2p
– SwinDeW – past
– SwinDeW-S (for Services) – past
– SwinDeW-B (for BPEL4WS) – past
– SwinDeW-G (for Grid) – past
– SwinDeW-A (for Agents) – past
– SwinDeW-V (for Verification) – current
– SwinDeW-C (for Cloud) – current
Part 1: Outline
> SUCCESS Centre and NGSP Group
> Background: Big Data, Cloud Computing and Workflow
> Research Topics
– Data Management in Cloud Computing
– Performance Management in Scientific Workflows
– Security and Privacy Protection in the Cloud
– SwinDeW-C Cloud Workflow System
16
Dr. Dong Yuan
http://www.ict.swin.edu.au/personal/dyuan/
Data Management in Cloud Computing
Research Topics
Data Management in Cloud Computing
> Scientific applications in cloud computing
– Computation and data intensive applications
– Excessive computation and storage resources
– Pay-as-you-go model
> Three aspects of data management in the cloud
– Data storage
– Data placement
– Data replication
Data Storage
> Developing smart data storage strategies for reducing the cost of storing big data in the cloud
– Data regeneration (computation and storage trade-off)
– Data de-duplication
– Data compression
> Researcher: Dong Yuan
Publications
> D. Yuan, Y. Yang, X. Liu, J. Chen, On‑demand Minimum Cost Benchmarking for Intermediate Datasets Storage in Scientific Cloud Workflow Systems, Journal of Parallel and Distributed Computing, Elsevier, vol. 71(2), pp. 316-332, 2011.
> D. Yuan, Y. Yang, X. Liu, G. Zhang, J. Chen, A Data Dependency Based Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems, Concurrency and Computation: Practice and Experience, Wiley, 24(9), pp. 956-976, Jun. 2012.
> D. Yuan, Y. Yang, X. Liu, J. Chen, A Cost-Effective Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems, Proc. of 24th IEEE International Parallel & Distributed Processing Symposium (IPDPS10), Atlanta, USA, Apr. 2010.
> D. Yuan, Y. Yang, X. Liu and J. Chen, A Local-Optimisation based Strategy for Cost-Effective Datasets Storage of Scientific Applications in the Cloud, Proc. of 4th IEEE International Conference on Cloud Computing (Cloud2011), Washington DC, USA, July 4-9, 2011.
Data Placement
> Smart data placement strategies to reduce application cost
– Data correlation based strategy to reduce bandwidth cost
– Data usage based strategy to reduce storage cost
> Researchers: Dong Yuan, Jofry Hadi SUTANTO, Antonio Giardina
Publications
> D. Yuan, Y. Yang, X. Liu, J. Chen, A Data Placement Strategy in Scientific Cloud Workflows, Future Generation Computer Systems, Elsevier, vol. 26(8), pp. 1200-1214, 2010.
Data Replication
> To cost-effectively assure data reliability in the cloud
– Dynamic replication strategy
– Proactively checking based replication strategy
> Researchers: Wenhao Li, Dong Yuan
Publications
> W. Li, Y. Yang and D. Yuan, A Novel Cost-effective Dynamic Data Replication Strategy for Reliability in Cloud Data Centres. Proc. of International Conference on Cloud and Green Computing (CGC2011), pages 496-502, Sydney, Australia, Dec. 2011.
> W. Li, Y. Yang, J. Chen and D. Yuan, A Cost-Effective Mechanism for Cloud Data Reliability Management based on Proactive Replica Checking. Proc. of 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid2012), pages 564-571, Ottawa, Canada, May 2012.
Dr. Xiao Liu
http://www.ict.swin.edu.au/personal/xliu/
Performance Management in Scientific Workflows
Research Topics
25
Workflow QoS
> QoS dimensions
– time, cost, fidelity, reliability, security …
> QoS of Cloud Services
> Workflow QoS
– the overall QoS for a collection of cloud services
– but not simply add up!
26
Temporal QoS
> System performance
– Response time
– Throughput
> Temporal constraints
– Global constraints: deadlines
– Local constraints: milestones, individual activity durations
> Satisfactory temporal QoS
– High performance: fast response, high throughput
– On-time completion: low temporal violation rate
27
Problem Analysis
> Setting temporal constraints
– Prerequisite: effective forecasting of activity durations
> Monitoring temporal consistency state
– Monitor workflow execution state
– Detect potential temporal violations
> Temporal violation handling
– Where to conduct violation handling
– What strategies to be used
Temporal Framework
28
Forecasting Activity Durations
> Statistical time-series pattern based forecasting strategies
> Selected Publications:
– X. Liu, Z. Ni, D. Yuan, Y. Jiang, Z. Wu, J. Chen, Y. Yang, A Novel Statistical Time-Series Pattern based Interval Forecasting Strategy for Activity Durations in Workflow Systems, Journal of Systems and Software (JSS), vol. 84, no. 3, Pages 354-376, March 2011.
– X. Liu, J. Chen, K. Liu and Y. Yang, Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns, Proc. of 4th IEEE International Conference on e-Science (e-Science08), pages 23-30, Indianapolis, USA, Dec. 2008.
29
Setting Temporal Constraints
> Probability based temporal consistency model
> Time analysis based on Stochastic Petri Nets
> Selected Publications:
– X. Liu, Z. Ni, J. Chen, Y. Yang, A Probabilistic Strategy for Temporal Constraint Management in Scientific Workflow Systems, Concurrency and Computation: Practice and Experience (CCPE), Wiley, 23(16):1893-1919, Nov. 2011 .
– X. Liu, J. Chen and Y. Yang, A Probabilistic Strategy for Setting Temporal Constraints in Scientific Workflows, Proc. 6th International Conference on Business Process Management (BPM2008), Lecture Notes in Computer Science, Vol. 5240, pages 180-195, Milan, Italy, Sept. 2008.
30
Temporal Consistency Monitoring
> Minimum (Probability) Time Redundancy based Checkpoint Selection Strategy
> Temporal Dependency based Checkpoint Selection Strategy
> Selected Publications:
– X. Liu, Y. Yang, Y. Jiang and J. Chen, Preventing Temporal Violations in Scientific Workflows: Where and How. IEEE Transactions on Software Engineering, 37(6):805-825, Nov./Dec. 2011.
– J. Chen and Y. Yang, Temporal Dependency based Checkpoint Selection for Dynamic Verification of Temporal Constraints in Scientific Workflow Systems. ACM Transactions on Software Engineering and Methodology, 20(3), 2011
Violation Handling> Violation Handling Point Selection
> (Probability) Time deficit allocation
> Workflow local rescheduling strategy – ACO, GA, PSO
> Selected Publications:
– X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen and Y. Yang, A Novel General Framework for Automatic and Cost-Effective Handling of Recoverable Temporal Violations in Scientific Workflow Systems, Journal of Systems and Software, vol. 84, no. 3, pp. 492-509, 2011
32
Background > Data Security vs. Data Privacy
> Privacy in cloud computing
– Massive data store and compute in open cloud environment
– Customers cannot control inside cloud
The severity of privacy risk in cloud computing
One specific privacy risk in cloud computing
– Indirectly private information (collectively information)
– Normal service processes and functions (not disruption)
The approach: noise obfuscation for privacy protection
Privacy Protection in Cloud
> Roles in the view of privacy in regular IT system
– Privacy owner, Privacy user and Privacy theft
Privacy ownerPrivacy theft
Privacy userKeep safe between Privacy owner and Privacy user!
Privacy Protection in Cloud
> Roles in the view of privacy in Cloud
– Privacy owner, privacy user and privacy theft
Privacy ownerPrivacy theft
Privacy user
Virtualisation disable the “keeping safe between Privacy owner and Privacy user!”
Noise Obfuscation(1)> Background
– Massive data stores and computes in open cloud environments.
– Customers cannot control inside cloud.
> Main idea: “Dilute” real private information with noise information
– Not noise signal!
Real Information
Noise Information
Final Information
Noise Obfuscation(2) > A Motivating example:
– One customer, who often travels to one city in Australia, like ‘Sydney’, checks the weather report regularly from a weather service in cloud environments before departure. The frequent appearance of service requests about the weather report for ‘Sydney’ can reveal the privacy that the customer usually goes to ‘Sydney’. But if a system aids the customer to inject other requests like ‘Perth’ or ‘Darwin’ into the ‘Sydney’ queue, the service provider cannot distinguish which ones are real and which ones are ‘noise’ as it just sees a similar style of service request. These requests should be responded and cannot reveal the location privacy of the customer. In such cases, the privacy can be protected by noise obfuscation in general.
From ‘data’ privacy to ‘process’ privacy!
> Noise Generation
– Historical probability based noise generation strategy
– Time-series pattern based noise generation strategy
– Association probability based noise generation strategy
– ……
> Noise Utilisation
– Trust model and injection strategy for noise obfuscation
– ……
> Noise Cooperation Mechanism
– Privacy protection framework under noise obfuscation
Research Topics
Publications
> G. Zhang, Y. Yang and J. Chen, A Historical Probability based Noise Generation Strategy for Privacy Protection in Cloud Computing. Journal of Computer and System Sciences, Elsevier, 78(5):1374-1381, Sept. 2012.
> G. Zhang, Y. Yang, D. Yuan and J. Chen, A Trust-based Noise Injection Strategy for Privacy Protection in Cloud Computing. Software: Practice and Experience , Wiley, 42(4):431-445, Apr. 2012.
> G. Zhang, Y. Yang, X. Liu and J. Chen, A Time-series Pattern based Noise Generation Strategy for Privacy Protection in Cloud Computing. Proc. of 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid2012), pages 458-465, Ottawa, Canada, May 2012.
> G. Zhang, X. Zhang, Y. Yang, C. Liu and J. Chen, An Association Probability based Noise Generation Strategy for Privacy Protection in Cloud Computing. Proc. 10th International Conference on Service Oriented Computing (ICSoC2012), pages 639-647, Shanghai, China, Nov. 2012. (accepted on 13/7/2012)
SwinCloud – Cloud Computing Testbed
> SwinCloud
42
Swinburne Computing Facilities
Astrophysics Supercomputer
VMware
Cloud Simulation Environment
Data Centres with Hadoop
· GT4· SuSE Linux
Swinburne CS3
…...
…...
· GT4· CentOS Linux
Swinburne ESR
…...
…...
· GT4· CentOS Linux
General cloud workflow reference model
ADMINISTRATION & MONITORING
TOOLS
CLIENT APPS
WORKLIST HANDLER
INVOKED APPLICATIONS
WORKFLOW ENACTMENT SERVICE
WORKFLOW ENGINES
OTHER WORKFLOW ENACTMENT SERVICES
WORKFLOW ENGINES
Interface 1
Interface 5
Interface 2
Interface 3
Interface 4 - Interoperability
TYPICALWEB SERVICES
APPLICATION PROVISION SERVICE
TOOL AGENT(S)
WORKFLOW RELATIVE SERVICE
WORKFLOW ACCOMPANIMENT TOOLS
PROCESS DEFINITION
TOOLS
Process Definition Import/Export
The component proposed by WfMC
New component
Form Designing
Tools
Resouce Tools
Tool Agent Tools
Billing Tools
Resouce ServicesData
Service Organisation Service
Version Service
User Service
Security Service
Billing Service
Prototype : SwinDeW-C (Peer-to-Peer)Ⅰ
> SwinDeW-C
44
Activity
Workflow Execution
UKVPAC
HongKong
SwinburneCS3
· SwinDeW-G· GT4· CentOS Linux
BeihangCROWN· SwinDeW-G· CROWN· Linux
SwinburneESR
· SwinDeW-G· GT4· CentOS Linux
AstrophysicsSupercomputer
· SwinDeW-G· GT4· SuSE Linux
PfC
na 1na
2na
3na 4na
5na 6na Na
ma 1ma
2ma
3ma 4ma
5ma 6ma Ma
Amazon Data Centre
Google Data Centre
Microsoft Data Centre
SwinDeW-G Grid Computing Infrastructure
Commercial Cloud
Infrastructure
VMVMVM VM VMVMVM VMVMVMVMVM
……..
……..
……..Application
Layer
Platform Layer
Unified Resource
Layer
Fabric Layer
SwinCloud……..
VM
SwinDeW-C Peer
SwinDeW-C Coordinator Peer
Prototype : SwinFlow-Cloud (Centralised)Ⅱ
Workflow Client
Tools (2)
Workflow Client Tools (1)
Workflow Server EC2 Instance (Linux with Tomcat)
Legacy Application(s)
Application Provision Service on Cloud
Process Definition
Tool
ProcessEngine(s)
Workflow Control
DataWork List
Worklist Handler
Legacy Application(s)
Workflow Enactment Service
instantiated by
Process Definition
defines
maintains
interact via
Cloud Workflow RelevantServices
Workflow Relevant
Data
use
refer to
invokes Tool Agent(s)
update
Cloud Workflow Accompaniment ToolsServer
Configuration Tools
BillingSpecification
Tools
defines
Other Cloud Application(s)
Other Cloud Application(s)
invokes
Configuration
Server Status Checker
Cloud BillingMeterprices
Pricing policies
defines
Tool Agents
Tool AgentDefinition Tools
defines
looks up & callswatchesProcessEngine(s)
ProcessEngine(s)
Administration Tools
Cloud workflow implementation
> Client system
– Process definition tools
– Rule editor
– Organisation modelling tools
– Office calendar management tools
– Authority group tools
– User management tools
– Form designing tools
– Tool agent definition tools
– Simulation tools
New Progress
> Successfully deploy on the Amazon Cloud
> Eucalyptus: the cloud infrastructure platform
A Book
End
> Questions?