cloud data management literature survey and paper critique team members frank paladino aravind...
TRANSCRIPT
Cloud Data ManagementLiterature survey and Paper Critique
Team MembersFrank Paladino
Aravind Yeluripiti
Papers Selected
• P1:Data Management in the cloud: Limitations and Opportunities
• P2: Towards a Self-Adaptive Data Management System for Cloud Environments
• P3: CloudDB: One Size Fits All Revived• P4:Cloud Data Management for Online Games:
Potentials and Open Issues• P5: Data in the Cloud: The Changing Nature of
Managing Data Accessibility
P1:Data Management in the cloud: Limitations and Opportunities
• Data Management– Characteristics of cloud environment
• Elasticity – Ability to size according to need• Data Privacy and Security – Subject to local rules and regulations• Replication across long distances – availability and durability of data
– Applications to consider for cloud deployment• Transaction processing – ACID guarantees difficult to maintain• Analytical – Ideal candidate for cloud deployment
• Data Analysis– Map-Reduce like software– Shared Nothing Parallel Databases– Desired Features
• Efficiency – Query performance• Fault Tolerance – Tasks may be reassigned as needed• Heterogeneous Environments – Ability to run on multiple nodes in parallel• Encrypted Data – Ability to operate on sensitive data• Ability to Interface with Business Intelligence Software for visualization & query generation
– Neither solution implements all desired features.– A hybrid solution may be needed.
P2: Towards a Self-Adaptive Data Management System for Cloud Environments
• Large scale implementations require guarantees of data availability and security.
• Optimize BLOBSeer, a data sharing system capable of handling massive amounts of unstructured data with self-management capability.– Elasticity – The ability to self configure.– Data Availability – The ability to self optimize.– Security – The ability to self protect.
• Three layer approach– Introspection layer – state and behavior– Monitoring layer – gather data from instrumented nodes– Instrumentation layer – generate and send information
P3: CloudDB: One Size Fits All Revived
• Develop a data management platform called CloudDB to provide a portfolio of products and offer them as services.
• Enable clients to scale accordingly as their needs and requirements evolve.
• System Architecture– Client data replicated in 3 underlying data stores optimized for
varying workload needs.• Relational – Traditional RDBMS to handle transactional workloads.• Key/Value – Scalability for read/write intensive workloads.• Columnar – Read optimized, throughput oriented for analytical (OLAP)
workloads.
– Workload Manager – Query dispatching and scheduling– Dispatcher – Submits the query to one of the valid replicas.
P4:Cloud Data Management for Online Games: Potentials and Open Issues
• Approach: Classify data into four data sets• Account Data, Game Data, State Data, Log Data
• Potentials:– Account data:
• Each operation a Transaction• Scale - not large: RDBMS as a service
– Log data• Scale – large, write once• Analyzed after a long time – strong consistency not important• Requirements easily met by Hadoop/cassandra
– Game data• Not a challenge – unless stored on client side• Manage them with traditional distributed file system or in HDFS
– State data• Managing in real time - Biggest challenge for disk redundant RDBMS• Cloud data – not intended to provide real-time support – continue with existing system.• Back up using cloud-based storage (cassandra)
• Open issues:– Data consistency, customized functionality, data partitioning, network traffic.
P5: Data in the Cloud: The Changing Nature of Managing Data Accessibility• Key Findings
– Fear of losing control over enterprise data is increasingly outweighed by the benefits offered by cloud-based application services
– Effective management and use of data represents a significant challenge for most organizations
– Cloud computing increases data management complexity through security and privacy issues
• Recommendations– IT organizations considering cloud based services:
• Examine the information delivery expectations across various corporate roles to determine data management needs
• Evaluate providers ability to support access to source data, transforming, moving, consolidating of data into internal or external applications
– Vendors offering data management and integration in the cloud• include performance and monitoring services and information delivery capabilities
optimized for managing data and services in the cloud• Ensure virtual environment meets ongoing business requirements
Paper Critique
• Paper selected:– Ziqiang Diao and Eike Schallehn. Cloud Data
Management for Online Games: Potentials and Open Issues. In Data Management in the Cloud (DMC). Köllen-Verlag, 2013. Accepted for publication.
• Authors:
Introduction
• MMOG– Massively Multiplayer Online Game– Support hundreds or thousands of players from all over
the world in parallel.– Players can
• choose a new identity• establish a new social network• compete or cooperate with other players• Even realize dreams that cannot be completed in real-life
– In 2011, US Americans overall spend 26 million hours per day and 2.6 billion dollars in total for playing MMOGs
Introduction….contd.
• Types of MMOGs– Role-playing – First-person shooter– Real-time strategy – Turn-based strategy– Simulation• Sports• Racing
– Casual• Music/Rhythm• Social
Problem
• MMORPGs– Massively Multiplayer Online Role-Playing Games– MMORPGs keep the virtual game environment running
even in the case of no players.– Account information, the state data of objects and
characters must be recorded on the server side in real-time.
– All of the player behavior in the game should be monitored and backed up in order to maintain the order of the virtual world.
– More concurrent players: E.g. World of Warcraft – millions of concurrent players
Problem….contd.
• Millions of concurrent players– Exacerbates the burden of managing data
• A qualified database system for data persistence– must guarantee data consistency– Also be efficient and scalable
• Existing RDBMS cannot fully satisfy all these requirements simultaneously.
• With the increasing data volume, – the storage system becomes a bottleneck, and – solving scalability and availability issues become a major
cost factor and development risk.
Solution?
• Cloud storage systems– Ability to support highly concurrent data accesses and
huge storage• In contrast to conventional DBMS– Cloud systems are generally designed for Web
applications that have • Different access characteristics• Require lower or different consistency levels.
• Need to analyze MMORPGs in more detail– To access the usability of Cloud storage systems– Identify open issues and possible solutions
Existing system• Distributed RDBS for data persistence
– Can commit complex transactions and are proved to be stable.• E.g. MySQL cluster
– Adopts a shared nothing architecture to ensure the system scalability– Automatically partitions data within a table based on primary keys
across all nodes– Each node
• helps clients to access correct shards to satisfy a query or commit a transaction• Data is replicated to multiple nodes to guarantee availability
– Applies two-phase commit (2PC) mechanism • to propagate data changes to the primary replica and one secondary replica
synchronously, and• Asynchronously modifies other replicas
– Can support real-time responses when tables are maintained in memory; can also be used an in-memory DBS in MMORPGs.
Data Management requirements of MMORPGs
• From system’s point of view, the essence of a game is– Data processing,– Storage, and – Transmission among databases, servers and players
• According to data management requirements - for the following considerations, – Data is classified into four data sets – Different classes should be managed according to their requirements
The case for Cloud-based Data Management for MMORPGs
RDBMS for data persistence Cloud-Storage Systems (Scalability and availability)
High performance
Not designed for managing data with large number of attributes e.g. table with 100s of columns
Can manage all attributes by applying a simplified data model as well as data redundancy
Scalability Limited by its complex schema, dataset volume growth has a significant impact on system performance
Proven to have a great potential for scalability
Flexible data model
Good at normalizing table schema and removing data redundancy, not at adapting to a dynamic schema and processing big data
Typically adopt a flexible data model, such as key-value data model. There is no fixed schema for items. Each item consists of a key and a dynamic set of attributes
Simplified data processing
Follow strict transaction mechanism, such as table-level or row-level atomicity, multi-version concurrency control, transaction isolation and rollback
Designed for web-applications, where strong consistency is not as necessary as in business applications. Generally do not support transaction processing
Using Cassandra for MMORPGs
• Features– Decentralized (peer-to-peer structure) – no network bottleneck– Provides column family based data model; simplified data
model – increased read performance – Adopts a shared-nothing architecture – scale up easily– Provides a quorum based data replication mechanism – ensure
availability and fault tolerance • Open problems
– Read Repair to guarantee data consistency– Need to develop new functions based on features of MMORPGs– Data partitioning – increases processing costs– Network Traffic – potential bandwidth bottlenecks
Novelty, Challenge and Interest
• Novelty– Cloud data management: new– Application to MMORPGs: even newer (paper yet to be
published) • Challenges– Having to deal with inherent complexities of a MMORPGs – Novelty implying no current cloud-based solution to compare.
• Interest– General interest in Computer Games, special interest in
MMOG and extreme special interest in MMORPGs– Opportunity to look at the domain from a data management
perspective.
Application to other data sets
• Data Set classification– Account Data– Game Data– State Data– Log Data
• The techniques described in this paper would be appropriate for any application which can have data sets broken down into separate units as in the case for MMORPRGs, which should be handled based on individual data set requirements
Positives and Negatives
• Positives– Novelty and application oriented discussion– Comprehensive analysis of MMORPGs data
management requirements– Detailed comparison of RDBMS and Cloud-based
systems• Negatives– Extreme Novelty– No Implementation of proposed solution– No comparison to other data sets
Future work
• Implementation and evaluation of results• Exploring alternate cloud based approaches to
MMORPGs • Exploring possible adaptations of the techniques
proposed to other applications • Embracing security in public cloud based
architectures instead of resorting to private cloud based implementations
• Focus on cooperation of multiple DMSs in one MMORPG and customization of a new Cloud storage system for MMORPGs
References• Abadi, Daniel J. Data Management in the Cloud: Limitations and
Opportunities. In IEEE Data Engineering Bulletin, 2009. • Alexandra Carpen-Amarie, Towards a Self-Adaptive Data Management System
for Cloud Environments IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum table of contents pages 2077-2080 IEEE Computer Society Washington, DC, USA ©2011ISBN: 978-0-7695-4577-6 doi>10.1109/IPDPS.2011.381
• Hakan Hacigümüs, Jun'ichi Tatemura, Wang-Pin Hsiung, Hyun J. Moon, Oliver Po, Arsany Sawires, Yun Chi, and Hojjat Jafarpour. Clouddb: One size fits all revived. In SERVICES, 2010.
• Ziqiang Diao and Eike Schallehn. Cloud Data Management for Online Games: Potentials and Open Issues. In Data Management in the Cloud (DMC). Köllen-Verlag, 2013. Accepted for publication.
• Eric Thoo ,Data in the Cloud: The Changing Nature of Managing Data Accessibility Garter RAS Core Research Note G00165291, 27 February 2009, RA2 12302009