a data masking technique for data warehouses ricardo jorge santos & marco vieira

19
A Data Masking Technique for Data Warehouses A Data Masking Technique for Data Warehouses Ricardo Jorge Santos & Marco Vieira Ricardo Jorge Santos & Marco Vieira CISUC – DEI – FCTUC CISUC – DEI – FCTUC University of Coimbra - Portugal University of Coimbra - Portugal Jorge Bernardino Jorge Bernardino CISUC – DEIS – ISEC CISUC – DEIS – ISEC Polytechnic Intitute of Coimbra - Portugal Polytechnic Intitute of Coimbra - Portugal ISEL, Lisbon – September/2011 ISEL, Lisbon – September/2011 INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM

Upload: ravi

Post on 08-Jan-2016

53 views

Category:

Documents


0 download

DESCRIPTION

INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM. A Data Masking Technique for Data Warehouses Ricardo Jorge Santos & Marco Vieira CISUC – DEI – FCTUC University of Coimbra - Portugal Jorge Bernardino CISUC – DEIS – ISEC Polytechnic Intitute of Coimbra - Portugal. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Data  Masking Technique  for Data  Warehouses Ricardo Jorge Santos & Marco Vieira

A Data Masking Technique for Data A Data Masking Technique for Data WarehousesWarehouses

Ricardo Jorge Santos & Marco VieiraRicardo Jorge Santos & Marco Vieira

CISUC – DEI – FCTUCCISUC – DEI – FCTUCUniversity of Coimbra - PortugalUniversity of Coimbra - Portugal

Jorge BernardinoJorge Bernardino

CISUC – DEIS – ISECCISUC – DEIS – ISECPolytechnic Intitute of Coimbra - PortugalPolytechnic Intitute of Coimbra - Portugal

ISEL, Lisbon – September/2011ISEL, Lisbon – September/2011

INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUMINTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM

Page 2: A Data  Masking Technique  for Data  Warehouses Ricardo Jorge Santos & Marco Vieira

AgendaAgenda BackgroundBackground

22Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

AgendaAgenda

BackgroundBackground

MotivationMotivation

MOBAT: A MOD Based Data Masking TechniqueMOBAT: A MOD Based Data Masking Technique

Optimization FeaturesOptimization Features

Experimental ResultsExperimental Results

Conclusions and Future WorkConclusions and Future Work

MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work

Page 3: A Data  Masking Technique  for Data  Warehouses Ricardo Jorge Santos & Marco Vieira

33

Security Concerns in Data WarehousingSecurity Concerns in Data Warehousing

A Data Warehouse (DW) is a critical asset for many A Data Warehouse (DW) is a critical asset for many

enterprisesenterprises

Stores all relevant historical and current business Stores all relevant historical and current business

information needed for supporting decision making information needed for supporting decision making

(sensitive data)(sensitive data)

Main targets for stealing or compromising sensitive dataMain targets for stealing or compromising sensitive data

Attack rate and complexity has increased in the recent Attack rate and complexity has increased in the recent

pastpast

AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work

Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

Page 4: A Data  Masking Technique  for Data  Warehouses Ricardo Jorge Santos & Marco Vieira

44

Data Security DomainsData Security Domains

Data Confidentiality: Data Confidentiality: Only the right users should access the right Only the right users should access the right

datadata

Data Integrity: Data Integrity: Data should always be correct, authentic and Data should always be correct, authentic and

consistentconsistent

Data Availability: Data Availability: User should always be able to access data User should always be able to access data

whenever neededwhenever needed

AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work

Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

Page 5: A Data  Masking Technique  for Data  Warehouses Ricardo Jorge Santos & Marco Vieira

55

Data Privacy Issues in Today’s DWs (Our Focus)Data Privacy Issues in Today’s DWs (Our Focus)

Masking solutions are not considered an acceptable Masking solutions are not considered an acceptable

solutionsolution

Encryption techniques introduce too much overheadsEncryption techniques introduce too much overheads Storage SpaceStorage Space Data Loading TimeData Loading Time Query Response TimeQuery Response Time

AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work

Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

Page 6: A Data  Masking Technique  for Data  Warehouses Ricardo Jorge Santos & Marco Vieira

66

Data Privacy Issues in Today’s DWs (Our Focus)Data Privacy Issues in Today’s DWs (Our Focus)

Important feature: Important feature: Facts in DW’s are mainly numerical-based Facts in DW’s are mainly numerical-based

columns!columns!

AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work

Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

Page 7: A Data  Masking Technique  for Data  Warehouses Ricardo Jorge Santos & Marco Vieira

77

MOBAT – MOd BAsed data masking Technique for DWsMOBAT – MOd BAsed data masking Technique for DWs

AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work

Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

MOBAT System ArchitectureMOBAT System Architecture

Page 8: A Data  Masking Technique  for Data  Warehouses Ricardo Jorge Santos & Marco Vieira

88

MOBAT – MOd BAsed data masking Technique for DWsMOBAT – MOd BAsed data masking Technique for DWs

Suppose table T => set of N numerical columns Ci = {C1, C2, C3, …, CN) to mask; total set of M rows Rj = {R1, R2, R3, …, RM).

Each value to mask in the table identified as a pair (Rj, Ci)Rj and Ci respectively represent the row and column to which the value refers

Each new masked value (Rj, Ci)’ is obtained by applying the following formula (1) for row j and column i of table T:

(Rj, Ci)’ = (Rj, Ci) – ((K3, j MOD K1) MOD K2, i) + K2, i

The inverse formula (2) for retrieving the original value is:

(Rj, Ci) = (Rj, Ci)’ + ((K3, j MOD K1) MOD K2, i) – K2, i

AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work

Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

Page 9: A Data  Masking Technique  for Data  Warehouses Ricardo Jorge Santos & Marco Vieira

99

MOBAT – Example DatasetMOBAT – Example Dataset

Supposing K1 = 7432, K2,1 = 34 and K2,2 = 17252

AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work

Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

Page 10: A Data  Masking Technique  for Data  Warehouses Ricardo Jorge Santos & Marco Vieira

1010

MOBAT – Example DatasetMOBAT – Example Dataset

Supposing K1 = 9264, K2,1 = 12 and K2,2 = 78254

AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work

Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

Page 11: A Data  Masking Technique  for Data  Warehouses Ricardo Jorge Santos & Marco Vieira

1111

MOBAT – QueryingMOBAT – Querying

Using TPC-H benchmark with four numerical fact columns (i = 4) (L_Quantity, L_ExtendedPrice, L_Tax and L_Discount) masked by MOBAT

New column L_KeyK3 for the j rows of the LineItem table, as the K3, j key

K1=9342K2, L_Quantity=12K2, L_ExtendedPrice=51234K2, L_Tax=6K2, L_Discount=4

SELECT SUM(L_ExtendedPrice * L_Discount) AS Total_RevenueFROM LineItem WHERE L_ShipDate>=TO_DATE('1994-01-01','YYYY-MM-DD') AND L_ShipDate<TO_DATE('1995-01-01','YYYY-MM-DD') AND L_Discount BETWEEN 0.05 AND 0.07 AND L_Quantity<24

SELECT SUM((L_ExtendedPrice+MOD(MOD(L_KeyK3,9342),51234)-51234) * (L_Discount+MOD(MOD(L_KeyK3,9342),4)-4)) AS Total_RevenueFROM LineItem WHERE L_ShipDate>=TO_DATE('1994-01-01','YYYY-MM-DD') AND L_ShipDate<TO_DATE('1995-01-01','YYYY-MM-DD') AND (L_Discount+MOD(MOD(L_KeyK3,9342),4)-4) BETWEEN 0.05 AND 0.07 AND (L_Quantity+MOD(MOD(L_KeyK3,9342),12)-12)<24

AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work

Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

Page 12: A Data  Masking Technique  for Data  Warehouses Ricardo Jorge Santos & Marco Vieira

1212

MOBAT – Optimizing Features & PerformanceMOBAT – Optimizing Features & Performance

The inclusion of K3,j requires additional storage spaceThe inclusion of K3,j requires additional storage space

KK3,j3,j can be created in several ways, all with different impact can be created in several ways, all with different impact

in performance:in performance:

Simply adding a new column to the previous existing fact tableSimply adding a new column to the previous existing fact table

Recreating the fact table including KRecreating the fact table including K3,j3,j from the start from the start

Using a 128-bit integer column already existing in the fact table Using a 128-bit integer column already existing in the fact table

(typically can be the primary key column)(typically can be the primary key column)

AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work

Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

Page 13: A Data  Masking Technique  for Data  Warehouses Ricardo Jorge Santos & Marco Vieira

1313

Experimental Evaluation Experimental Evaluation

AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work

Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

2.8GHz CPU, 2GB RAM (512MB for Oracle SGA), 1.5TB SATA 2.8GHz CPU, 2GB RAM (512MB for Oracle SGA), 1.5TB SATA

HDHD

Oracle 11g DBMSOracle 11g DBMS

One standard benchmark and one real-world DWOne standard benchmark and one real-world DW TPC-H Decision Support Benchmark with 1GB and 10GB scaleTPC-H Decision Support Benchmark with 1GB and 10GB scale Real-world Sales DW (2GB storage size)Real-world Sales DW (2GB storage size)

Page 14: A Data  Masking Technique  for Data  Warehouses Ricardo Jorge Santos & Marco Vieira

1414

Experimental Evaluation Experimental Evaluation

AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work

Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

Page 15: A Data  Masking Technique  for Data  Warehouses Ricardo Jorge Santos & Marco Vieira

1515

Experimental Evaluation Experimental Evaluation

AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work

Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

Page 16: A Data  Masking Technique  for Data  Warehouses Ricardo Jorge Santos & Marco Vieira

1616

Experimental Evaluation Experimental Evaluation

AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work

Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

Page 17: A Data  Masking Technique  for Data  Warehouses Ricardo Jorge Santos & Marco Vieira

1717

Conclusions Conclusions

Our technique decreases data storage space and Our technique decreases data storage space and

processing overheads, while still proving a significant level processing overheads, while still proving a significant level

of securityof security

Transparent method with minimal network bandwidth Transparent method with minimal network bandwidth

consumption overheads, due to only rewriting queriesconsumption overheads, due to only rewriting queries

Extremely easy and simple to implement in any DBMS / Extremely easy and simple to implement in any DBMS /

DW, with low costsDW, with low costs

Querying the database directly will produce only realistic Querying the database directly will produce only realistic

results (stored data is masked at all times)results (stored data is masked at all times)

AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work

Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

Page 18: A Data  Masking Technique  for Data  Warehouses Ricardo Jorge Santos & Marco Vieira

1818

Future WorkFuture Work

Developing the technique for also masking alphanumeric Developing the technique for also masking alphanumeric

valuesvalues

Assess its security strength in comparison with other Assess its security strength in comparison with other

solutionssolutions

Developing the technique for increasing its security Developing the technique for increasing its security

strengthstrength Using higher-sized keysUsing higher-sized keys Enabling data integrity checksEnabling data integrity checks Implementing false data injectionImplementing false data injection

AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work

Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

Page 19: A Data  Masking Technique  for Data  Warehouses Ricardo Jorge Santos & Marco Vieira

1919

THANK YOU!THANK YOU!

Questions and Comments?Questions and Comments?

Ricardo Jorge SantosRicardo Jorge [email protected]@gmail.com

ISEL, Lisbon – September/2011ISEL, Lisbon – September/2011

INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUMINTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM

A Data Masking Technique for Data A Data Masking Technique for Data WarehousesWarehouses