database refactoring sreeni ananthakrishna 2006 nov

27
November 2006 Sreenivas Ananthakrishna 1 Database Refactoring Database Refactoring An introduction to Refactoring Databases & Evolutionary Database Design (Amber and Sadalage)

Upload: melbournepatterns

Post on 30-Jan-2015

957 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 1

Database RefactoringDatabase Refactoring

An introduction to

Refactoring Databases & Evolutionary Database Design (Amber and Sadalage)

Page 2: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 2

AgendaAgenda

► What is database refactoring about?What is database refactoring about?

► Evolutionary database development techniquesEvolutionary database development techniques

► Refactoring StrategiesRefactoring Strategies

► Classification of refactorings and examplesClassification of refactorings and examples

Page 3: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 3

What is database refactoring about?What is database refactoring about?

Improving database design Improving database design Making small and incremental changes to Making small and incremental changes to

the schemathe schema Maintain existing information and Maintain existing information and

behaviourbehaviour Functionality is not added/removedFunctionality is not added/removed Not just limited to the database, but also Not just limited to the database, but also

the applications that use itthe applications that use it

Page 4: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 4

A simple example…A simple example…

Customer

customerId <<PK>>

name

Account

accountId <<PK>>

customerId <<FK>>

accesses

balance

Customer

SynchronizeAccountBalance

{event = on update |on delete|on insert,

drop date = <date> }

balance

SynchronizeCustomerBalance

{event = on update |on delete|on insert,

drop date = <date> }

{drop date = <date>}

App A App B

maintainbalance()

maintainbalance()

Page 5: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 5

Why refactor ?Why refactor ?► Data models built upfront tend to be complex and Data models built upfront tend to be complex and

need cleaningneed cleaning

► Maintain consistency between application domain Maintain consistency between application domain and data modeland data model

► Address performance requirementsAddress performance requirements

► Identify and eliminate db smellsIdentify and eliminate db smells

Page 6: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 6

Database SmellsDatabase Smells

► Multipurpose Column - Multipurpose Column - eg.eg. Customer dob & employee Customer dob & employee start datestart date

► Multipurpose Table Multipurpose Table – eg. Customer table with – eg. Customer table with person/corpsperson/corps

► Redundant Data Redundant Data – same information in different tables– same information in different tables

► Table with too many columns Table with too many columns – eg. Customer with many – eg. Customer with many addressaddress

► Table with too many rows Table with too many rows ► Smart columns – Smart columns – eg. Data has positional contexteg. Data has positional context

► Fear of change – Fear of change – too risky to change schema, time to too risky to change schema, time to refactor!refactor!

Page 7: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 7

Evolutionary Database DevelopmentEvolutionary Database Development► Evolve data models vs upfront designEvolve data models vs upfront design

► Database regression testingDatabase regression testing

► Configuration management of database Configuration management of database artifactsartifacts

► Developer SandboxesDeveloper Sandboxes

Page 8: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 8

Database regression testingDatabase regression testing► Test the schemaTest the schema

Check logic in stored procedures and triggersCheck logic in stored procedures and triggers Test check and referential constraintsTest check and referential constraints View definitionsView definitions Default Values and InvariantsDefault Values and Invariants

► Test application codeTest application code Unit tests around application code which queries Unit tests around application code which queries

the db.the db.

► Test data migrationTest data migration

Page 9: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 9

Config management of DB ArtifactsConfig management of DB Artifacts

►Schema creation scriptsSchema creation scripts►Data loading/migration scriptsData loading/migration scripts►Reference dataReference data►Stored proceduresStored procedures►View definitionsView definitions►Test dataTest data►Regression TestsRegression Tests

Page 10: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 10

Developer SandboxesDeveloper Sandboxes

Page 11: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 11

Database Refactoring StrategiesDatabase Refactoring Strategies► Apply small changesApply small changes

Small changes allow easy/early detection of Small changes allow easy/early detection of errorserrors

► Identify Individual RefactoringsIdentify Individual Refactorings Instead of doing “move column” and “rename Instead of doing “move column” and “rename

column” in one go, version each individually.column” in one go, version each individually.

► Create database configuration tableCreate database configuration table Helps identify current version of the database Helps identify current version of the database

and can be used in migrations.and can be used in migrations.

Page 12: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 12

Database Refactoring Strategies (contd.)Database Refactoring Strategies (contd.)► Determine synchronization strategies during Determine synchronization strategies during

transition periodtransition period Triggers do real time update but might have Triggers do real time update but might have

performance impacts.performance impacts. Views might not supports updates but do not Views might not supports updates but do not

move datamove data Batch synch can be used during non-peak loads Batch synch can be used during non-peak loads

but might have to deal with multiple updatesbut might have to deal with multiple updates

► Encapsulate Database AccessEncapsulate Database Access Abstract database access eg. By using Abstract database access eg. By using

persistence frameworkspersistence frameworks

Page 13: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 13

Database Refactoring ClassificationDatabase Refactoring Classification

► StructuralStructural

► Data QualityData Quality

► ReferentialReferential

► ArchitecturalArchitectural

► MethodMethod

Page 14: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 14

Structural RefactoringsStructural RefactoringsRelated to structure of Tables, ViewsRelated to structure of Tables, Views

eg. eg. Move Column, Rename Table, Split Table, Merge ColumnMove Column, Rename Table, Split Table, Merge Column

Issues to consider when implementing:Issues to consider when implementing: Cyclic TriggersCyclic Triggers

Broken Views, Procedures, TriggersBroken Views, Procedures, Triggers

Transition period in multi-application setupTransition period in multi-application setup

Page 15: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 15

Introduce Surrogate KeyIntroduce Surrogate Key

► MotivationsMotivations Reduce coupling between schema and business domainReduce coupling between schema and business domain Increase consistency by having a uniform key strategyIncrease consistency by having a uniform key strategy Improve performance by having index based on simpler Improve performance by having index based on simpler

keykey

► Potential TradeoffsPotential Tradeoffs Surrogate keys are not suitable for all situationsSurrogate keys are not suitable for all situations Introducing a new key might require further key Introducing a new key might require further key

consolidation and more effortconsolidation and more effort

“Replace an existing natural key with a surrogate key”

Page 16: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 16

Introduce Surrogate Key (contd.)Introduce Surrogate Key (contd.)

contains

balance

PopulateOrderId

{event = on insert

drop date = <date> }

Order

customerNumber <<PK>> <<FK>> <<Natural>>

storeId <<PK>> <<Natural>>

OrderItem

customerNumber <<PK>> <<FK>> <<Natural>>

storeId <<PK>> <<Natural>>

orderItemNumber <<PK>>

orderId <<FK>> <<surrogate>>

orderId <<PK>> <<surrogate>>

{drop date = <date>}

Page 17: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 17

Data Quality RefactoringsData Quality RefactoringsRelated to improving quality of information in Related to improving quality of information in

dbdbeg. eg. Add Lookup Table, Introduce column constraint, Introduce Add Lookup Table, Introduce column constraint, Introduce common formatcommon format

Issues to consider when implementing:Issues to consider when implementing: Constraint violationsConstraint violations

Broken logic in proceduresBroken logic in procedures

Broken Broken wherewhere clauses in Views clauses in Views

Updating large amounts of dataUpdating large amounts of data

Page 18: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 18

Add Lookup TableAdd Lookup Table

► MotivationsMotivations Introduce referential integrity for a columnIntroduce referential integrity for a column Provide code lookup (move enum to the db)Provide code lookup (move enum to the db) Replace column constraint with set of expected values in Replace column constraint with set of expected values in

lookup tablelookup table

► Potential TradeoffsPotential Tradeoffs Identifying the data to populate (especially for multiple Identifying the data to populate (especially for multiple

apps)apps) Possible performance impact due to additional joinsPossible performance impact due to additional joins

“Create a lookup table for an existing column”

Page 19: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 19

Add Lookup Table (contd.)Add Lookup Table (contd.)

Address

Street

State

State <<PK>>

Name <<FK>>

1. Identify the column 4. Introduce FK constraint

3. Populate Data

2. Create Lookup Table

State

PostCode

Page 20: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 20

Referential Integrity RefactoringsReferential Integrity RefactoringsChanges that improve referential integrity of Changes that improve referential integrity of

datadataeg. eg. Add Foreign Key Constraint, Introduce cascading delete, Add Foreign Key Constraint, Introduce cascading delete, Introduce trigger for historyIntroduce trigger for history

Issues to consider when implementing:Issues to consider when implementing: Fix broken CRUD logic in procedureFix broken CRUD logic in procedure

Data cleansing to make new constraints workData cleansing to make new constraints work

Page 21: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 21

Introduce Cascading DeleteIntroduce Cascading Delete

► MotivationsMotivations Preserve referential integrity of the parent /child rowsPreserve referential integrity of the parent /child rows Remove responsibility for child deletion in the applicationRemove responsibility for child deletion in the application

► Potential TradeoffsPotential Tradeoffs Deadlock ?Deadlock ? Trigger accidental mass deletion when deleting root nodesTrigger accidental mass deletion when deleting root nodes Duplicate functionality is introduced when using Duplicate functionality is introduced when using

persistence frameworks like Hibernate/Toplinkpersistence frameworks like Hibernate/Toplink

“Delete the child record(s) when the parent is deleted”

Page 22: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 22

Introduce Cascading Delete (contd.)Introduce Cascading Delete (contd.)

Policy

PolicyId <<PK>>

Claim

ClaimId <<PK>>

1. Identify the column

2. Choose cascading mechanism (triggers or using cascade clause during constraint creation)

PolicyId <<FK>>

DeleteClaim

{event = on delete}

Page 23: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 23

Architectural RefactoringsArchitectural RefactoringsChanges that improve performance, Changes that improve performance,

portability and define the architecture within portability and define the architecture within the databasethe databaseeg. eg. Encapsulate Table with View, Introduce Calculation Encapsulate Table with View, Introduce Calculation Method, Replace Method(s) with View, Introduce trigger for Method, Replace Method(s) with View, Introduce trigger for historyhistory

Issues to consider when implementing:Issues to consider when implementing: Performance vs Data redundancy Performance vs Data redundancy

Keeping business logic in the application vs databaseKeeping business logic in the application vs database

Page 24: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 24

Introduce IndexIntroduce Index

► MotivationsMotivations Increase performance of read queriesIncrease performance of read queries

► Potential TradeoffsPotential Tradeoffs Too many indexes degrade performance during Too many indexes degrade performance during

insert/update/deletesinsert/update/deletes Existing data containing duplicates might need cleansing Existing data containing duplicates might need cleansing

when introducing unique indexeswhen introducing unique indexes

“Introduce a unique or non-unique Index”

Page 25: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 25

Introduce Index (contd.)Introduce Index (contd.)

Customer

CustomerId <<PK>>

TFN <<index>>

1. Determine type of index – unique vs non-unique

3. Add a new index

TFN <<AK>>

Name

4. Add more disk space for index maintenance

2. Eliminate duplicate rows when using unique index

Page 26: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 26

Method RefactoringsMethod RefactoringsChanges that improve code representing Changes that improve code representing

stored procedures, functions and triggersstored procedures, functions and triggerseg. eg. Rename Method, Reorder Parameters, Replace literal with Rename Method, Reorder Parameters, Replace literal with Table LookupTable Lookup

Issues to consider when implementing:Issues to consider when implementing: Broken triggers, procedures, functionsBroken triggers, procedures, functions

Tool supportTool support

Page 27: Database Refactoring Sreeni Ananthakrishna 2006 Nov

November 2006 Sreenivas Ananthakrishna 27

Refactoring ToolsRefactoring Tools

► Schema Migration – Rails Migration, SundogSchema Migration – Rails Migration, Sundog► Unit Testing –JUnit, DBUnitUnit Testing –JUnit, DBUnit► Refactor Stored Procedures – Refactor Stored Procedures –

SQLRefactor(SQLServer Only)SQLRefactor(SQLServer Only)