inexpensive datamasking for mysql with proxysql — data anonymization for developers / rene cannao...
TRANSCRIPT
Inexpensive Datamasking for MySQL with ProxySQLRené Cannaò
Who we are
René Cannaò
Founder of ProxySQL
MySQL SRE at Dropbox
thanks to:
Frédéric Descamps
MySQL Community Manager
Other Sessions
273. ProxySQL, MaxScale, MySQL Router and other database traffic managers / Petr Zaitsev (Percona)
155. ProxySQL Use Case Scenario / Alkin Tezuysal (Percona)
Agenda
● Database overview● What is ProxySQL● Features overview● Data masking● Rules● Masking rules● Obfuscation with mysqldump● Examples
Overview of ProxySQL
Application and Database layers
APPLICATIONS
DATABASES
Main motivations
empower the DBAs
Improves manageability
understand and improve performance
High performance and High Availabilitycreate a proxy layer to shield the database
Database as a Service (layered)
APPLICATIONS
DATABASES + MANAGER(s)
DAAS – REVERSE PROXY
What is ProxySQL?
The MySQL data stargate
How to deploy
How to deploy
ProxySQL Features (short list)
High Availability and Scalabilityseamless failoverfirewallquery throttlingquery timeoutquery mirroringruntime reconfigurationSchedulerSupport for Galera/PXC and Group Replication
on-the-fly rewrite of queriescaching reads outside the databaseconnection pooling and multiplexingcomplex query routing and r/w splitload balancingreal time statisticsmonitoringData maskingMultiple instances on same portsNative Clustering
Support for ClickHouse
Data MaskingData masking or data obfuscation is the process of hiding original data with random characters or data.
The main reason for applying masking to a data field is to protect data that is classified as personal identifiable data, personal sensitive data or commercially sensitive data, however the data must remain usable for the purposes of undertaking valid test cycles
Why using ProxySQL as data masking solution?Open Source & Free like in beer
Other solutions are expensive or not working
Not worse than the other solutions as currently none is perfect
The best solution would be to have this feature implemented in the server just after the handler API
Query Rules
instructions to "program" ProxySQL behavior
matching criteriaactionsflow control and chains
Query Rewrite
Dynamically rewrite queries sent by the application/client
without the client being awareon the flyusing ProxySQL query rulesrules defined using regular expressions, s/match/replace/
The conceptWe use Regular Expressions to modify the clients’ SQL statement and replace the column(s) we want to hide by some characters or generate fake data.
We will split our solution in two different solutions:● Provide access to the database to developers● Generate dump to populate a database to share
Only the defined users, in our example we use a developer, will have his statements modified.
The concept (2)We will also create two categories :
• data masking
• data obfuscating
Data MaskingHere we will just mask with a generic character the full value of the column or part of it:
Data ObfuscationHere we will just replace the value of the column with random characters of the same type, we create fake data
Access
INSERT INTO mysql_users(username, password, active, default_hostgroup)VALUES ('devel','devel',1,1);
INSERT INTO mysql_users(username, password, active, default_hostgroup)VALUES ('backup','dumpme',1,1);
Create a user for masking:
Create a user for backups:
RulesAvoid SELECT *
for the developer, we need to create some rules to block any SELECT * variant on the table
if the column is part of many tables, we need to do so for each of them
Rules (2)Mask or obfuscate the field
when the field is selected in the columns we need:● to replace the column by showing the first 2 characters and a
certain amount of X s or generate a random string● keep the column name● for mysqldump we need to allow SELECT * but mask and/or
obfuscate sensible values
Rules overview
rule_id: 1 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: `*first_name*` re_modifiers: caseless,global flagOUT: NULL replace_pattern: first_name apply: 0
Rule #1
rule_id: 2 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: (\(?)(`?\w+`?\.)?first_name(\)?)([ ,\n]) re_modifiers: caseless,global flagOUT: NULL replace_pattern: \1CONCAT(LEFT(\2first_name,2),REPEAT('X',10))\3 first_name\4 apply: 0
Rule #2
rule_id: 158 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: (\(?)(`?\w+`?\.)?salary(\)?)([ ,\n]) negate_match_pattern: 0 re_modifiers: CASELESS,GLOBAL flagOUT: NULL replace_pattern: \1CONCAT( floor(rand() * 50000) + 10000,'')\3 salary\4
Rule #2 - obfuscating
Let's imagine we want to provide fake number for `salaries`.`salary` column.We could instead of the previous rule use this one
rule_id: 3 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: \)(\)?) first_name\s+(\w), re_modifiers: caseless,global flagOUT: NULL replace_pattern: )\1 \2, apply: 1
Rule #3
rule_id: 4 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: \)(\)?) first_name\s+(.*)\s+from re_modifiers: caseless,global flagOUT: NULLreplace_pattern: )\1 \2 from apply: 1
Rule #4
rule_id: 5 active: 1 username: devel schemaname: employeesmatch_pattern: ^SELECT\s+\*.*FROM.*employees re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact [email protected] apply: 0
Rule #5
rule_id: 6 active: 1 username: devel schemaname: employeesmatch_pattern: ^SELECT\s+employees\.\*.*FROM.*employees re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact [email protected] apply: 0
Rule #6
rule_id: 7 active: 1 username: devel schemaname: employeesmatch_pattern: ^SELECT\s+(\w+)\.\*.*FROM.*employees\s+(as\s+)?(\1) re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact [email protected] apply: 0
Rule #6
Rules for mysqldumpTo provide a dump that might be used by developers, Q/A or support, we need to:
● generate valid data● obfuscate sensitive information● rewrite SQL statements issued by mysqldump● only for tables and columns with sensitive data
mysqldump rules
rule_id: 8 active: 1 user: backup schema: employees flagIN: 0 match: ^/\*!40001 SQL_NO_CACHE \*/ \* FROM `salaries` replace: SQL_NO_CACHE emp_no, ROUND(RAND()*100000), from_date, to_date FROM salaries flagOUT: NULL apply: 1
Rule #8
mysqldump rules
rule_id: 9 active: 1 user: backup schema: employees flagIN: 0 match: \* FROM `employees` replace: emp_no, CONCAT(LEFT(birth_date,2), FLOOR(RAND()*50)+10, RIGHT(birth_date,6)) birth_date, CONCAT(LEFT(first_name,2), REPEAT('x',LENGTH(first_name)-2)) first_name, CONCAT(LEFT(last_name,3), REPEAT('x',LENGTH(last_name)-3)) last_name, gender, hire_date FROM employees flagOUT: NULL apply: 1
Rule #9
Limitions
● better support in proxySQL >= 1.4.x○ RE2 an PCRE regexes
● all fields with the same name will be masked whatever the name of the table is in the same schema
● the regexps can always be not sufficient● block any query not matching whitelisted SQL statements
● the dump via ProxySQL solution seems to be the best
Make it easyThis is not really easy isn´t it ?You can use this small bash script (https://github.com/lefred/maskit) to generate them:
# ./maskit.sh -c first_name -t employees -d employeescolumn: first_nametable: employeesschema: employees
let's add the rules...
ExamplesEasy ones:
SELECT * FROM employees;
SELECT emp_no, last_name, first_name FROM employees;
Examples (2)More difficult:
select emp_no, concat(first_name), last_name from employees;
select emp_no, first_name, first_name from employees.employees
select emp_no, `first_name` from employees;
select emp_no, first_name -> from employees; (*)
Examples (3)More difficult:select t1.first_name from employees.employees as t1;
select emp_no, first_name as fred from employees;
select emp_no, first_name rene from employees;
select emp_no, first_name `as` from employees;
select first_name as `as`, last_name from employees;
select `t1`.`first_name` from employees.employees as t1;
Examples (4)More difficult:select first_name fred, last_name from employees;
select emp_no, first_name /* first_name */ from employees.employees;
/* */ select last_name, first_name from employees;
select CUSTOMERS.* from myapp.CUSTOMERS;
select a.* from employees.employees a;`
We need you!