lecture26 (1)

Upload: sami-dick

Post on 03-Jun-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Lecture26 (1)

    1/20

    Dr. Bhavani Thuraisingham

    The University of Texas at Dallas (UTD)

    October 2011

    Cloud-based

    Assured Information Sharingand

    Identity Management

  • 8/12/2019 Lecture26 (1)

    2/20

    Team Members Sponsor: Air Force Office of Scientific Research

    The University of Texas at Dallas

    Faculty: Dr. Murat Kantarcioglu; Dr. Latifur Khan; Dr. Kevin

    Hamlen; Dr. Zhiqiang Lin

    Sub-contractors

    Prof. Elisa Bertino (Purdue) Ms. Anita Miller, Dr. Bob Johnson (North Texas Fusion Center)

    Collaborators

    Dr. Steve Barker, Kings College, U of London (EOARD)

    Dr. Barbara Carminati; Dr. Elena Ferrari, U of Insubria (EOARD)

    Prof. Peng Liu, Penn State

    Prof. Ting Yu, NC State

  • 8/12/2019 Lecture26 (1)

    3/20

    Outline Objectives

    Layered Framework Data Security Issues for Clouds

    Our Research

    FY11

    Cloud-based Assured Information Sharing Demonstration

    RDF-based Policy Engine on the Cloud

    Secure Query Processing in Hybrid Cloud

    CloudMask: Purdue University

    Stream-based Malware Detection on the Cloud

    Hypervisor (e.g., Xen) Integrity Issues and Forensics in the Cloud

    Preliminary Investigation of Identity Management

    FY10

    Secure Querying and Storing Relational Data with HIVE

    Secure Querying and Storing RDF in Hadoop with SPARQL XACML Implementation for Hadoop

    Amazon.com Web Services and Security

    Accountability and Access Control (Joint with Purdue)

    Acknowledgement: Research Funded by Air Force Office of Scientific Research

  • 8/12/2019 Lecture26 (1)

    4/20

    Objectives

    Cloud computingis an example of computing in which dynamically scalableand often virtualized resources are provided as a service over the Internet.

    Users need not have knowledge of, expertise in, or control over the

    technology infrastructure in the "cloud" that supports them.

    Our research on Cloud Computing is based on Hadoop, MapReduce, Xen

    Apache Hadoopis a Java software framework that supports data intensivedistributed applications under a free license. It enables applications to work

    with thousands of nodes and petabytes of data. Hadoop was inspired by

    Google's MapReduce and Google File System (GFS) papers.

    XEN is a Virtual Machine Monitor developed at the University of Cambridge,

    England

    Our goal is to build a secure cloud infrastructure to assured informationsharing applications

  • 8/12/2019 Lecture26 (1)

    5/20

    7/23/2014 5

    Layered Framework

    User Interface

    Hadoop/MapReduc/Storage

    HIVE/SPARQL/Query

    XEN/Linux/VMM

    Secure Virtual

    Network Monitor

    Policies

    XACML

    Risks/

    Costs

    QoSResource

    Allocation

    Cloud

    Monitors

    Figure2. Layered Framework for Assured Cloud

  • 8/12/2019 Lecture26 (1)

    6/20

    Secure Query Processing with

    Hadoop/MapReduce We have studied clouds based on Hadoop

    Query rewriting and optimization techniques designed and

    implemented for two types of data

    (i) Relational data: Secure query processing with HIVE

    (ii) RDF data: Secure query processing with SPARQL

    Demonstrated with XACML policies

    Joint demonstration with Kings College and University of Insubria

    First demo (2011): Each party submits their data and policies Our cloud will manage the data and policies

    Second demo (2012): Multiple clouds

  • 8/12/2019 Lecture26 (1)

    7/20

    Fine-grained Access Control with HiveSystem Architecture

    Table/View definition and loading,

    Users can create tables as well asload data into tables. Further, they

    can also upload XACML policies

    for the table they are creating.

    Users can also create XACML

    policies for tables/views.

    Users can define views only if

    they have permissions for alltables specified in the query used

    to create the view. They can also

    either specify or create XACML

    policies for the views they are

    defining.

    CollaborateCom 2010

  • 8/12/2019 Lecture26 (1)

    8/20

    Server

    Backend

    SPARQL Query Optimizer for SecureRDF Data Processing

    Web Interface

    Data Preprocessor

    N-Triples Converter

    Prefix Generator

    Predicate Based

    Splitter

    Predicate Object

    Based Splitter

    MapReduce Framework

    Parser

    Query Validator &

    Rewriter

    XACML PDP

    Plan Generator

    Plan Executor

    Query Rewriter By

    Policy

    New Data Query Answer

    To build an

    efficient storage

    mechanism usingHadoop for large

    amounts of data

    (e.g. a billion

    triples); build an

    efficient query

    mechanism for

    data stored in

    Hadoop; Integratewith Jena

    Developed a quer

    optimizer and

    query rewriting

    techniques for

    RDF Data with

    XACML policiesand implemented

    on top of JENA

    IEEE Transactions

    on Knowledge and

    Data Engineering,

    2011

  • 8/12/2019 Lecture26 (1)

    9/20

    Demonstration: Concept of Operation

    User Interface Layer

    Fine-grained Access Control

    with Hive

    SPARQL Query Optimizer

    for Secure RDF Data

    Processing

    Relational Data RDF Data

    Agency 1 Agency 2 Agency n

  • 8/12/2019 Lecture26 (1)

    10/20

    RDF-based Policy Engine on the Cloud

    Policy

    Transformation

    Layer

    ResultQuery

    DB DB RDF

    Policy Parser Layer

    Regular Expression-Query

    Data Controller Provenance Controller

    . . .

    RDF

    XML

    Policy / Graph

    Transformation Rules

    Access Control/ Redaction

    Policy (Traditional Mechanism)

    User Interface Layer

    High Level SpecificationPolicy

    Translator

    A testbed for evaluating different policy sets overdifferent data representation. Also supporting

    provenance as directed graph and viewing policy

    outcomes graphically

    Determine how access is granted to a resource as

    well as how a document is shared

    User specify policy: e.g., Access Control, Redaction,

    Released Policy

    Parse a high-level policy to a low-level

    representation

    Support Graph operations and visualization. Policy

    executed as graph operations

    Execute policies as SPARQL queries over large

    RDF graphs on Hadoop

    Support for policies over Traditional data and itsprovenance

    IFIP Data and Applications Security, 2010, ACM

    SACMAT 2011

  • 8/12/2019 Lecture26 (1)

    11/20

    Integration with

    Assured Information Sharing:

    User Interface Layer

    RDF Data Preprocessor

    Policy Translation and

    Transformation Layer

    MapReduce Framework for

    Query Processing

    Hadoop HDFS

    Agency 1 Agency 2 Agency n

    RDF Data

    and Policies

    SPARQL Query

    Result

  • 8/12/2019 Lecture26 (1)

    12/20

    Secure Storage and Query Processing in a

    Hybrid Cloud: Problem Motivation

    The use of hybrid clouds is an emerging trend in cloud computing Ability to exploit public resources for high throughput

    Yet, better able to control costs and data privacy

    Several key challenges

    Data Design: how to store data in a hybrid cloud? Solution must account for data representation used

    (unencrypted/encrypted), public cloud monetary costs and

    query workload characteristics

    Query Processing: how to execute a query over a hybrid cloud?

    Solution must provide query rewrite rules that ensure the

    correctness of a generated query plan over the hybrid cloud

  • 8/12/2019 Lecture26 (1)

    13/20

    Research Results Data Design:A user submits data, a query workload, monetary and confidentiality

    constraints

    Solve the data partitioning problem in four phases Partition the data into several public (Ppu) and private (Ppr) components

    For each partition, Ppu & Ppr, obtain their associated statistics

    Estimate the execution cost of given query workload based on a users choice of

    confidentiality level as well as the statistics associated with the partition

    Select the best partition as the one that minimizes query workload execution cost without

    violating monetary and confidentiality constraints

    Query Processing: A user submits a query Q

    Solve the query processing problem in four phases

    Query Rearrangement: Use query rewrite rules to transform an original query Q into public

    (Qpu) and private (Qpr) query(ies)

    Public Cloud Execution: Execute Qpu on public cloud

    Private Cloud Execution: Execute Qpr

    on private cloud Post-Processing: Combine the results of the execution of Qpu and Qpr into the final result

  • 8/12/2019 Lecture26 (1)

    14/20

    Hypervisor integrity and forensics

    in the Cloud

    Cloud integrity&forensics

    Hardware Layer

    Virtualization Layer (Xen, vSphere)

    Linux Solaris XP MacOS

    Secure control flow of hypervisor code

    Integrity via in-lined reference monitor

    Forensics data extraction in the cloud

    Multiple VMs

    De-mapping (isolate) each VM memory from physical memory

    Hypervisor

    OS

    Applications

    integrityforensics

  • 8/12/2019 Lecture26 (1)

    15/20

  • 8/12/2019 Lecture26 (1)

    16/20

    Cloud-based Malware Detection

    ACM Transactions on Management Information Systems

    Binary feature extraction involves

    Enumerating binary n-grams from the binaries and selecting the best n-grams

    based on information gain

    For a training data with 3,500 executables, number of distinct 6-grams can

    exceed 200 millions

    In a single machine, this may take hours, depending on available computing

    resourcesnot acceptable for training from a stream of binaries

    We use Cloud to overcome this bottleneck

    A Cloud Map-reduce framework is used

    to extract and select features from each chunk

    A 10-node cloud cluster is 10 times faster than a single node

    Very effective in a dynamic framework, where malware characteristics

    change rapidly

  • 8/12/2019 Lecture26 (1)

    17/20

    Fine-grained attribute-based privacy-preserving access control

    Fine-grained access control: different parts of the data can

    be covered by different access control policies

    Attribute-based access control: access control policies areexpressed in terms of identity attributes of subjects

    accessing the data

    Privacy-preserving: the cloud does not learn anything about

    the contents of the data and the values of the identity

    attributes of users System Developed is CloudMask

    Joint Paper at CollobarateCom 2011

    Key Features of CloudMask System:Elisa Bertino Purdue University and

    Murat Kantarcioglu, UT Dallas

  • 8/12/2019 Lecture26 (1)

    18/20

    Directions Secure VMM (Virtual Machine Monitor) and VNM (Virtual

    Network Monitor)

    Exploring XEN VMM and examining security issues

    Developing automated techniques for VMM

    introspection

    Will examine VNM issues January 2012

    Integrate Secure Storage Algorithms into Hadoop (FY

    2012)

    Identity Management (FY 2012)

    Technology Transfer through Knowledge and Security

    Analytics, LLC

  • 8/12/2019 Lecture26 (1)

    19/20

    October 2011

    Identity Management

    for the Cloud

    Kevin Hamlen, Peng Liu, Murat Kantarcioglu,

    Bhavani Thuraisingham, Ting Yu

  • 8/12/2019 Lecture26 (1)

    20/20

    Identity Management Considerations in a

    Cloud

    Trust model that handles (i) Various trust relationships, (ii) access control policies based on roles and

    attributes, iii) real-time provisioning, (iv) authorization, and (v) auditing and

    accountability.

    Several technologies have to be examined to develop the trust

    model Service-oriented technologies; standards such as SAML and XACML; and

    identity management technologies such as OpenID.

    Does one size fit all?

    Can we develop a trust model that will be applicable to all types of clouds

    such as private clouds, public clouds and hybrid clouds Identity architecturehas to be integrated into the cloud architecture.