practical agile data warehousing final

13
Practical Agile Data Warehousing: If at first you don’t succeed, change something. Wendy Gilbert Lead Project Manager, Data Intelligence, ANZ New Zealand. Level 9, 23 Albert Street, Auckland, New Zealand 1010 [email protected], www.linkedin.com/in/wmgilbert Executive summary It is easy to get tired of the hype surrounding big data, data science, self-service, cloud, mobile BI, agile BI, etc. I’ve worked with several data warehousing teams who have said that agile didn’t work for them, doesn’t work for data warehousing, or that they are just plain tired of hearing the word. My experience has been: Data warehousing teams who feel that agile doesn’t work for them are often trying to ‘do agile’ rather than be agile’, and they can succeed if they re-assess their practices. Agile is an excellent framework, so long as you take account of the differences between data warehousing and software development. Hype about agile isn’t going away any time soon. It’s the natural result of people (like me) feeling so energized by the benefits of agile that they can’t stop using it as the solution to all the problems of the universe. I am going to cover what agile is for me, when you should use it, thoughts on what you need to do to be successful, and common challenges we encounter applying agile to data warehousing. Introduction In my scrum master certification course, the instructor started the day with a group exercise. He broke the class into groups of two with each team having a ‘manager’ and a ’developer’. Our first task was to navigate around a small section of the room with the manager giving direction (start, stop, left turn, right turn) and the developer following those directions, counting how many times we could circle the area together in the allocated time period. The second time around, the goal was the same but the developers decided when and where to move and the managers just came along for the ride. The first attempt resulted in very little progress with many near-collisions, complaints about management, and laughter. The second resulted in us gradually merging together and forming a circle that went round and round the room in sync, more than twice as many times as the first try. A silly exercise, yes, but it got the point across: one of the primary tenets of the Agile Manifesto [1] is that the best architecture, requirements and designs emerge from self-organising teams. It is easy to get tired of the hype surrounding big data, data science, self-service, cloud, mobile BI, agile BI, etc. I’ve worked with several data warehousing teams who have said that agile didn’t work for them, doesn’t work for data warehousing, or that they are just plain tired of hearing the word. My experience has been: Data warehousing teams who feel that agile doesn’t work for them are often trying to ‘ do agile’ rather than ‘be agile’, and they can succeed if they re-assess their practices. Agile is an excellent framework, so long as you take account of the differences between data warehousing and software development. Hype about agile isn’t going away any time soon. It’s the natural result of people (like me) feeling so energized by the benefits of agile that they can’t stop using it as the solution to all the problems of the universe.

Upload: wendy-gilbert

Post on 14-Jan-2017

590 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Practical Agile Data Warehousing Final

Practical Agile Data Warehousing: If at first you don’t

succeed, change something.

Wendy Gilbert Lead Project Manager, Data Intelligence, ANZ New Zealand.

Level 9, 23 Albert Street, Auckland, New Zealand 1010

[email protected], www.linkedin.com/in/wmgilbert

Executive summary

It is easy to get tired of the hype surrounding big data, data science, self-service, cloud, mobile BI, agile BI, etc. I’ve

worked with several data warehousing teams who have said that agile didn’t work for them, doesn’t work for data

warehousing, or that they are just plain tired of hearing the word. My experience has been:

Data warehousing teams who feel that agile doesn’t work for them are often trying to ‘do agile’ rather than

‘be agile’, and they can succeed if they re-assess their practices.

Agile is an excellent framework, so long as you take account of the differences between data warehousing

and software development.

Hype about agile isn’t going away any time soon. It’s the natural result of people (like me) feeling so

energized by the benefits of agile that they can’t stop using it as the solution to all the problems of the

universe.

I am going to cover what agile is for me, when you should use it, thoughts on what you need to do to be successful,

and common challenges we encounter applying agile to data warehousing.

Introduction

In my scrum master certification course, the instructor started the day with a group exercise. He broke the

class into groups of two with each team having a ‘manager’ and a ’developer’. Our first task was to navigate around a small section of the room with the manager giving direction (start, stop, left turn, right

turn) and the developer following those directions, counting how many times we could circle the area

together in the allocated time period. The second time around, the goal was the same but the developers

decided when and where to move and the managers just came along for the ride.

The first attempt resulted in very little progress with many near-collisions, complaints about management,

and laughter. The second resulted in us gradually merging together and forming a circle that went round and round the room in sync, more than twice as many times as the first try. A silly exercise, yes, but it got

the point across: one of the primary tenets of the Agile Manifesto [1] is that the best architecture,

requirements and designs emerge from self-organising teams.

It is easy to get tired of the hype surrounding big data, data science, self-service, cloud, mobile BI, agile

BI, etc. I’ve worked with several data warehousing teams who have said that agile didn’t work for them,

doesn’t work for data warehousing, or that they are just plain tired of hearing the word. My experience has been:

Data warehousing teams who feel that agile doesn’t work for them are often trying to ‘do agile’

rather than ‘be agile’, and they can succeed if they re-assess their practices.

Agile is an excellent framework, so long as you take account of the differences between data

warehousing and software development.

Hype about agile isn’t going away any time soon. It’s the natural result of people (like me)

feeling so energized by the benefits of agile that they can’t stop using it as the solution to all the

problems of the universe.

Page 2: Practical Agile Data Warehousing Final

Is agile data warehousing worthwhile? Data warehouses (DWs) built using waterfall methodologies are not doomed to failure. Despite the

headlines we see of how often DWs fail, many of them do succeed. I worked on several DWs earlier in

my career that used a traditional waterfall software development lifecycle (SDLC). In my experience, the

projects were massively over schedule and over budget, but they did succeed in the end and some of them are still in use today.

But we could have done better. The companies hired consultants who knew little about the data and were given limited access to our business users or technical subject matter experts, because they were too busy.

We made assumptions about what the users wanted and tried to ensure that the DW could handle these

situations, thinking we were brilliant when we came up with a possible use case for the data that nobody

else had thought of. We decided to track history for every field in the system, because somebody somewhere might need it someday.

These stories aren’t shared to make me look bad. Mistakes like this are common, and we learn from them. I believe that using an agile framework alleviates many of the situations that led to the mistakes I’ve

experienced. And I’m not alone, TDWI’s 2014 BI Benchmark report [2] determined that “agile is the

most effective development methodology in terms of BI value”:

BI Value by development methodology

High value Moderate value Low value

Agile 38% 51% 11%

Waterfall 32% 36% 32%

Hybrid 18% 64% 18%

What is agile?

Wikipedia defines agile software development [1] as:

A group of software development methods in which solutions evolve through

collaboration between self-organizing, cross-functional teams. It promotes adaptive planning, evolutionary development, early delivery, continuous improvement, and

encourages rapid and flexible response to change.

Agile Manifesto

Agile software development is based on the Agile Manifesto [3] which values:

Individuals and interactions over processes and tools

Working software over comprehensive documentation

Customer collaboration over contract negotiation

Responding to change over following a plan

The most important take-away from the Agile Manifesto is that it puts the focus on the elements on the

left (individuals, interactions, working software, customer collaboration, responding to change) but is not at all implying that the elements on the right (processes, tools, documentation, contract negotiation, plans)

are not needed.

Page 3: Practical Agile Data Warehousing Final

Agile principles

The Agile Manifesto [3] outlines 12 agile principles (originally created for software development):

Our highest priority is to satisfy the customer through early and continuous delivery of valuable

software.

Welcome changing requirements, even late in development. Agile processes harness change for

the customer's competitive advantage.

Deliver working software frequently, from a couple of weeks to a couple of months, with a

preference for the shorter timescale.

Working software is the primary measure of progress.

Agile processes promote sustainable development. The sponsors, developers, and users should be

able to maintain a constant pace indefinitely.

Business people and developers must work together daily throughout the project.

Build projects around motivated individuals. Give them the environment and support they need,

and trust them to get the job done.

The most efficient and effective method of conveying information is face-to-face conversation.

At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its

behaviour.

Continuous attention to technical excellence and good design enhances agility.

Simplicity, the art of maximising the amount of work not done, is essential.

The best architectures, requirements, and designs emerge from self-organising teams.

Agile Frameworks

Agile as a discipline has many frameworks beneath it. The most commonly used agile frameworks within

data warehousing are scrum [5] and Kanban [6]. Other frameworks include lean, extreme programming (XP), feature driven development, crystal and others.

Scrum

Scrum [5] is the most frequently adopted agile process worldwide. It is especially helpful for projects that

have a lot of uncertainty and need constant business interaction, such as building the core of a DW. Wikipedia defines scrum as:

An iterative and incremental agile software development methodology for managing

product development. It defines "a flexible, holistic product development strategy where a development team works as a unit to reach a common goal", challenges assumptions of

the "traditional, sequential approach" to product development, and enables teams to self-

organize by encouraging physical co-location or close online collaboration of all team

members, as well as daily face-to-face communication among all team members and disciplines in the project.

A key principle of scrum is its recognition that during production processes, the customers can change their minds about what they want and need (often called

"requirements churn"), and that unpredicted challenges cannot be easily addressed in a

traditional predictive or planned manner. As such, scrum adopts an empirical approach,

accepting that the problem cannot be fully understood or defined, focusing instead on maximizing the team's ability to deliver quickly and respond to emerging requirements.

Page 4: Practical Agile Data Warehousing Final

My recommendation for those interested in implementing scrum is to take a two day scrum master

certification course, even if you don’t intend to be a scrum master. The course will teach you the basics and, more importantly, will help you understand why each scrum event is useful.

Scrum teams consist of three roles: product owner, scrum master, and development team. Scrum refers to

all members of the development team as developers because well-rounded teams make up of people who are capable of taking on more than one role are the desired goal, but the development team should have

the skillsets:

Data analysis

Data modelling

Data mapping

ETL and/or ELT

Testing

Report development

BI Solution Architecture (this skill may be

managed outside the scrum team)

Database experience

Heizenberg et al. [5] do an excellent job of explaining scrum principles, particularly as they apply to BI, so I refer you there for further details on scrum processes.

Kanban

In kanban, individual work items are broken into phases such as ‘Not Yet Started’, ‘In Progress’, and

‘Done’. I think of it like a group ‘to do’ list except that there are multiple phases and you can set a work in progress limit per phase. In data warehousing, I have mostly seen kanban used for maintenance and

modifications of an existing system.

Apply the process that best fits the situation

There is no need to pick a single framework for all of your projects and teams. At ANZ NZ, we use scrum, kanban, and waterfall within our data warehouse teams. We are using Teradata as our database

platform. Installing and configuring the platform is basically an infrastructure project so we are using a

more traditional waterfall approach for that as well as for installation of our ETL tool and BI platforms.

For agile projects, we are required to meet the typical stage gates for software delivery (analyse, design,

plan, build, test, implement), but these have been modified to meet the needs of an agile world. The

terminology is the same, but the project artefacts and timing are very different. For example, in a typical waterfall project the output of our analyse phase would be detailed requirements for the project, whereas

with our Agile SDLC, our analyse phase output is a product backlog that we believe we can complete in

the specified time period. We will revise the analyse artefact as needed if the original product backlog is no longer meeting the business needs and/or priorities and can easily change it as long as our sponsors

support the change.

For a variety of reasons, we chose to use scrum for building the core and presentation layers of our DW,

and kanban for sourcing our data into staging.

Scrum is better suited for the core since the requirements are less understood and we need

constant interaction with our business users. The business users need to be able to change

priorities and requirements quickly.

Kanban is well suited for the staging layer since there is limited transformation and the

requirements are better understood. The focus here is continuous delivery and productivity.

Page 5: Practical Agile Data Warehousing Final

Being agile versus doing agile

Before I took scrum training, when someone asked me if I had used agile before, I would mumble a reply and tell them something about how I understood the concepts but had not worked on a team that used it.

I’d go on to cite the need for daily stand-ups and such. I didn’t realise how much I didn’t know until my

first scrum team.

I have conducted hundreds of interviews in the last few years and I am frequently looking for people with

agile experience. Much of the time I get responses similar to my own many years ago, often by people

who work on teams that have adopted some agile practices (usually daily stand-ups), but who were not focused on agile principles. Adding daily stand-ups to a team that is otherwise using a traditional SDLC is

not agile if you haven’t adopted the mind-set change that goes with it.

In ANZ New Zealand, our agile advisory team expressly avoids approaches to agile that don’t fit to a prescribed process. From the outset they adopted the practices of Scrum.org in their entirety, and

progressively adopted agile techniques which they found most effective.

According to Chris Starling, ANZ New Zealand, Delivery Transformation Manager, you should “choose

a mature agile process such as scrum, so you can get access to coaching and support. ‘Doing agile’ means

‘on your own experimentation with a chemistry set and no teacher’. The result is not a surprise to anybody. It doesn’t mean that agile won’t work for you. It simply means that your adoption strategy

sucked.”

Agile training classes will teach you the elements of what it is to ‘do agile’, but to ‘be agile’ is to change the mind-set of yourself, your team, and your organization, and put the focus on the core tenets of the

Agile Manifesto.

Business value

Everything you do, every key you press, spec you write, email you send, meeting you attend, code you build, etc., should have an underlying connection to the business value it provides. All levels of the

organization should be comfortable asking why they are doing something and connecting business value

to the work. Business value can be direct (e.g. the work will result in better service for the customer) or indirect (e.g. the work is needed for regulatory reporting to prevent fines that would result in increased

costs to the customer).

I once spent weeks trying to get the details correct for a specification for a new field that my manager was convinced the business would want us to build. When I went to the business manager to ask her for

clarification on it, she told me that nobody in the business would ever use that field because it didn’t meet

their marketing needs. The DW team was convinced that if we built it she would use it: so we built it, and it never got used. We started with the solution, not the business value.

Continuous process improvement

When I was a consultant and a client told me that they had tried agile before and it failed, I would often

suggest they rebrand it as continuous process improvement (CPI) when they tried the second time. Even those who are tired of the word ‘agile’ will get on board with the idea that you need to be always

improving.

Page 6: Practical Agile Data Warehousing Final

As with business value, CPI must be ingrained in your culture. This takes time and it is harder in

organizations that strongly silo roles and responsibilities. The sprint retrospective is one way scrum encourages CPI, but there are other things you can do to encourage CPI behaviours so that, over time and

with encouragement, people become more proactive and empowered.

We have all been in meetings or hallway conversations where something was suggested and the respondents agreed it would be great. Two weeks later the idea comes up again, and we agree that it

would be great and we assume someone else will follow up on it. A week later we start getting grumpy

because it hasn’t been implemented. Those conversations continue until the idea either dies out or someone who feels empowered picks the idea up and actions it.

In my last scrum team, we added a team agreement that said “We will not ignore things; we all take responsibility and ownership for remembering and acting on issues and ideas.” When something was

suggested and we agreed it would be a good idea, we didn’t move on to the next subject until one of us

volunteered to follow up. This applied to big things, such as improvements to our architecture, and

smaller things, such as adding a step to our definition of done.

Maximise the amount of work not done

This is one of the key agile principles, but in data warehousing in particular we tend to ‘future proof’ and

‘what if’ ourselves into delivering things that the customer does not need, because we are worried that one

day they might.

The most glaring example of this is with slowly changing dimensions (SCDs). In data warehousing, we

love history. I’ve been told that a DW isn’t a DW if it doesn’t have SCDs. But I have also helped build two DWs that didn’t have a single SCD, and still delivered huge business value. In both cases, SCD

scenarios were in our backlog but never got high priority.

Two of the largest DWs that I worked on started their journey with a decision to treat all fields as Type II

(track changes to the field over time), unless there was a good reason to set the field to Type I (overwrite

the changed field). In both cases, after months and months of spider-web code, slow builds, and training

chaos, they reversed the decision and decided to go with Type I unless there was a good reason to set the field to Type II, thus maximising the amount of work not done.

Rework is a fact of life

With waterfall, requirements and design are done up front, whereas with scrum they are more fluid. It is

harder to plan ahead and there is always the possibility that you’ll make a mistake in your design or requirements that will force rework. It’s a very uncomfortable feeling to build something that you know

could need to be redone at a future point if requirements change or evolve.

It took one of my most experienced developers months to adapt to the idea. He took great pride in his

work and both worried about the reputation hit if someone thought his code wouldn’t stand the test of

time, and the wasted cost spent reworking the code later. But I argue that reworking code later (if needed)

is not nearly as arduous as we make it out to be, and while we are waiting for the possibility that this needs to happen, the business users are reaping the benefit of what we initially delivered.

Agile and data warehousing

Page 7: Practical Agile Data Warehousing Final

While there are many similarities between data warehousing and software development, there are

common situations when applying scrum to data warehousing initiatives and projects that may not apply to software development.

We don’t deliver software

I’ve seen several data people read the agile principles and immediately take issue with the term ‘working

software’. The agile principles were created for software development, but of course working software is not the focus of a DW. So we adapt: rather than working software, we deliver business value and focus on

building a product that will be accepted by the product owner as working. This could be fields, tables,

views, dashboards, reports, data marts, etc. Sprint reviews sometimes have to be creative as a result, e.g.

if the result of a user story was a new field, then the review for that story might be an SQL query showing the value distributions in the field and the results of testing that show the scenarios accounted for.

User stories in data warehousing are more focused on what many refer to as a ‘slice’ of the DW. Rather than building each layer in its entirety before moving to the next, you take a subset of the DW and build it

from beginning to end (or whatever layers are relevant to that story). Slicing initially worries some

architects who are uncomfortable with the risk of future rework, but it’s a risk well worth taking.

A common criticism I hear is that you can’t deliver business value with a DW user story that fits into a

two-week sprint. This is sometimes true, sometimes you have to put in some foundation work before true

business value can be realised. But this is something the team gets better at over time and you get better at breaking epics into user stories that provide some value along the way.

More pre-analysis and prep is necessary

In software development and data warehousing, the development team’s job is to figure out how to create

the code for the user story. They collaborate with business owners to clarify the details as they go. In my experience, there are more unknowns in this process for data warehousing than software development,

and more analysis and prep work is needed for a user story before it can be considered sprint ready.

In their article on Agile Business Intelligence, Heizenberg et al. [5] grouped BI user stories into five

categories as shown below (the article also provides examples of user stories within each category):

Data Disclosure stories are about extracting data from the source system and making it available.

Data Augmentation stories are about creating new information based on existing information.

Data Presentation stories describe the presentation of the information in a format the user can

easily understand.

Data Validation stories are where a user talks about applying business rules to check whether the

extracted data is of good enough quality for the end users to work with, and to take the necessary

actions when the data is not good enough.

Configuration stories are about enabling maintenance staff and administrators to keep the

configuration of the BI system up to date without having to change the code.

Data Presentation, Data Validation and Configuration are similar to the user stories found in software

development, whereas Data Disclosure and Data Augmentation user stories often have many questions

that the business owner will have trouble answering:

What source system does the data come from? What is the source of truth for the data?

Do we have access to the source system already? If not, how (and when) can we get it?

What fields and tables are needed? Are they in the DW already? If so, at what layer?

Page 8: Practical Agile Data Warehousing Final

Is the data clean enough to use for the intended purpose? Is the data in a complete state?

In the case of data augmentation, how much do we know about the calculation?

Is a prototype needed to help the business owner with the definition?

What existing reports/dashboards/etc. are affected?

I’ve seen scrum teams handle the additional analysis in different ways:

Pre-Sprint Prep Team: One of my larger clients had a separate team of technical analysts that

worked outside the scrum team. Their role was to analyse the high-priority user stories in the

backlog to ensure the scrum team had enough information to size them. They assisted the Product

Owner getting user stories ‘sprint ready’.

Spikes: A spike is a user story or task where the goal is to gather information rather than create

shippable code. One of the scrum teams I worked on didn’t have enough analysts for a pre-sprint

team, so we would take high priority user stories, that we didn’t have enough information on to

size, as spikes. We would time-box the story and the scrum team would investigate the story

within the sprint. At the end of the sprint, the story would go back to the backlog to be sized and prioritised or another spike would be instigated if discovery was not complete.

Designated Discovery Time: A less desirable option would be to carve out some time from each

sprint for discovery. But this uses the scrum team for sprint and pre-sprint work, and impacts

efficiency.

Definition of done In scrum, the definition of done is a simple checklist that helps the development team ensure a story is

complete and ready for shipping. The items on the list are nothing special by themselves, but diligently

ensuring that all user stories satisfy the definition of done is key to delivering quality, well documented results. Here are a few examples of what may be included in the definition of done for a DW user story:

Code:

o Development is complete

o The DW architect is comfortable with the

solution that was developed o Code has been peer reviewed (as

applicable)

o Code was checked into the code repository and migrated to the appropriate release

environment

o The data architect is comfortable with any model changes

Documentation

o Source-to-target and data model were

updated as appropriate

o Necessary supporting documentation is complete and checked in

QA and Unit testing

o User story acceptance criteria has been

verified by the scrum team

o QA tests were created and run o Code has no known defects, or defects are

at an acceptable severity

o Automated tests were run (if your team does them)

Release Management (some teams manage

these as part of the release rather than within a

sprint): o Release documents have been updated

o Training materials have been updated

o Data dictionary has been updated

o The user story owner has validated the user story meets their acceptance criteria

o If applicable, the owner of the data

elements that were added/modified approve the change

Team agreements

Page 9: Practical Agile Data Warehousing Final

A powerful practice my agile coaches (Shama Bole and Grant Beck with Plaster Group Consulting)

introduced me to is the concept of a team agreement. A team agreement is not specific to scrum and can be used by any team. It is basically a short list of items that the team agrees to. It works because the team

determines the list, not management, and the team can change them if they are no longer fit for purpose.

A few examples are shared below:

We will honour our commitments in a Sprint to

the best of our ability: Teams that are new to scrum can be slow to realise that taking on a

task is a commitment to completing it: don’t

commit if you don’t honestly feel you can do the work in the planned time or have

bandwidth to take it on.

If a user story is not in the sprint, we will not

work on it: The next technical story or user story in the backlog may be more interesting

than the ones in the current sprint and

developers can be tempted to stray.

We will raise questions/issues early rather than

waiting for the next stand up, team meeting, etc.

We will favour face-to-face communication.

We will be positive.

We will regularly update our scrum tool.

We will be on time for meetings.

Laptop-free meetings: The team may designate

certain meetings as laptop-free to ensure they

stay focused.

Everything we do that could impact production

data and/or code must be reviewed by someone

else.

We will not commit to stories that are not

sprint ready.

We all take responsibility and ownership for

remembering and acting on issues and ideas.

Our tasks are more modular, so more transition points and handoffs within a sprint

The best scrum teams are those with team members who are fungible, i.e., capable of mutual substitution.

This is easier in some teams and organizations than others. In a perfect world, your scrum team would be made up of people who can analyse, map, model, code, test, etc. However, in most teams, especially those

just getting started with scrum, you will have team members whose comfort level and/or ability to work

outside their current role is low.

In a user story [7] that has ETL (extract, transform, load), and/or ELT (extract, load, transform), there will

be multiple steps to complete the story, such as:

Initial analysis

Data model update

Source-to-target mapping document update

ETL/ELT changes

Quality assurance for ETL/ELT

Update related views and/or presentation layer

Test outcome preparation and creation

Quality assurance for view changes and/or

presentation layer

Check completed work against user story

acceptance criteria

Validate the definition of done and ensure the

user story moves through the sprint

Update data dictionary

If the team is not comfortable stepping outside their traditional roles, the steps involved require multiple

handoffs between team members. If the handoffs are not well coordinated, you could be one week into a

two week sprint before the person who owns the second task picks it up, and completion of subsequent tasks is jeopardised. This is manageable with diligence in the sprint in the short term, and would benefit

from team cross-training in the long-term. One short-term solution could be to assign a user story owner

within the sprint team for stories with a lot of handoffs. This person’s responsibility being to make sure the user story handoffs are coordinated appropriately. You could also try pairing individuals to both help

with upskilling and reducing handoff issues.

Page 10: Practical Agile Data Warehousing Final

Multiple scrum teams must still share a common methodology and data model

Architectural oversight is crucial to a DW. If you have multiple scrum teams, they must have common

design standards and development processes, and work off of a common data model. How can we resist

the urge to do detailed up-front modelling and yet still stay in sync across scrum teams?

The solution I have seen most often is to have centralised data architect(s) to manage the data model and

DW architect(s) for overall design and standards for the DW. A centralised data architect can be challenging if they must do all data modelling and they are overly academic, as they become a bottleneck

for the scrum team. On the other hand, many DW teams aren’t large enough to justify a full time data

architect. The DW architect may play both roles and provide guidance and oversight to scrum team

members who do their own modelling.

For DW architecture, being agile still requires some architecture up front. I agree with Heizenberg et al

that some decisions should be made before a scrum team begins development, because changing them later would be very costly:

Tools, including database platform, ETL tool, reporting solution(s), data modelling tool, version

management system, etc.

Data model(s) and method(s) (3rd

Normal Form, Star Schema, Data Vault, etc.), and DW

environments (staging, core, presentation, operational data store, etc.).

Standard processing methods, e.g. historic data processing and error handling. This includes

deciding on the underlying control framework and guidance on decisions such as when to use

ETL or ELT.

The remaining DW architecture will evolve over time and can respond to changing business needs.

Managing technical debt and technical user stories

Techopedia defines technical debt [8] as “a concept in programming that reflects the extra development

work that arises when code that is easy to implement in the short run is used instead of applying the best

overall solution.” Technical user stories are those that the development team or architects initiate such as improvements to the system to ensure scalability, cut down on future development time, etc.

There are a few ways to handle these issues, similar to handling pre-sprint analysis work:

Separate team/developer(s) focused on technical improvements: Have a developer or team

outside of the scrum team work through technical debt and focus on improvements. This is a

practical option, and keeps the sprint team focus on the business backlog, but scrum developers

often like these types of tasks as they are a diversion from the characteristic sprint work and keep their skills sharp.

Prioritise technical stories with the product backlog: Put technical stories into the backlog and

have the product owner categorise them along with the business-focused user stories. This can be

challenging because the technical team bears the burden of ensuring the product owner

understands the benefits. I have not seen this option succeed in practice.

Designated technical time: Carve out specific time (percentage, set time, number of story points,

etc.) from each sprint for technical work, using the scrum team for sprint work and technical debt.

This is the option I have seen work best, but you have to manage it carefully because some of the

technical stories are more ‘interesting’ to developers and they may focus on them to the detriment of other sprint work.

Page 11: Practical Agile Data Warehousing Final

Grant Beck, Plaster Group Consulting, shared a few more options for managing technical debt:

Slack time. Time set aside for an employee to do whatever they want (work related). This

encourages innovation in agile, and also can be used for managing technical debt (see suggested reading).

If you touch it, refactor it. When you are coding, incorporate any refactoring as you see the

opportunity.

Make it visible! Grant will often create a large bulletin board with colourful items representing

outstanding refactoring tasks. The team is encouraged to grab an item if they have bandwidth. As

technical debt accumulates, everyone can see it, and it WILL become a topic of conversation.

Ongoing changes, maintenance, and modifications

If you try to use your scrum product backlog for maintenance, modifications, and new user stories, then

small improvements will struggle to get priority and could languish in the backlog indefinitely. A

common solution is to use kanban for maintenance and modifications, and to have the scrum team(s) focus on new user stories. Large maintenance items may be put into the scrum team’s product backlog

and prioritised by the Product Owner, while other items are prioritised for the dedicated maintenance

team. You can make the maintenance team as small or large as needed to accommodate demand.

Data governance Data governance is crucial for a DW to maintain a single version of the truth. Responsibility for ensuring

data governance falls on all members of an organization, but the product owner(s) and developers can

make or break any data governance plan. Data governance should be part of the process regardless of the development framework being used, if a new field is being added, existing field being modified, etc. then

the ‘owner’ of that field should be consulted.

In the DW for ANZ, we are building data governance from the ground up rather than from the top down.

As we create derived fields in the integrated layer of our DW, we will ensure the fields have dedicated

‘owners’ who advise us.

Test automation No discussion of agile is complete without mentioning test automation. I have seen successful data

warehousing projects that did not have test automation, at least not initially. Some evolved into it and

others never did. One challenge with test automation in a DW environment is that open source or free test

tools require knowledge of C#, Java, etc., that a DW team may not have.

At ANZ, we have a mature test automation practice with several agile teams using specification-by-

example and associated technologies shared from our existing scrum practice. Within the new data warehouse team, we are currently investigating test automation options. In my previous companies, test

automation tools were built in-house, and this is one option we are considering.

Conclusions Data warehousing projects that utilise agile frameworks are slowly becoming the majority. However,

companies need to adopt practices that work for them and meet their organizational culture. Those with a

culture of continuous process improvement are naturally more comfortable with change and will adapt more quickly. Integrating agile into your data warehousing environment depends on your org structure

and whether senior management recognises the benefits agile can bring.

Page 12: Practical Agile Data Warehousing Final

One of the most important elements of success for a scrum team is an active, engaged and focused product owner(s). But perhaps the biggest key to success in agile is to experiment with solutions that

work for your situation. This doesn’t mean you should abandon the scrum framework if you’re using

scrum, but there is a lot of flexibility available within the scrum guidelines. If you are currently using

spikes for discovery, for example, and it isn’t working for your team, try another option for a few sprints and see if that solves the problem.

Agile is, after all, about being adaptive, resourceful and responsive.

Suggested Reading/Viewing

1. Definitions of Type I, II, III, etc. http://www.kimballgroup.com/2008/09/slowly-changing-

dimensions-part-2/ 2. The Role of Project Manager in An Agile Environment. http://plastergroup.com/role-project-

manager-in-an-agile-environment/

3. Agile Business Intelligence: How to Make it Happen? https://www.nl.capgemini.com/resource-file-access/resource/pdf/1b-

032.13_whitepaper_new_vi_agile_bi_17th_oct13_final_web_secured_1.pdf

4. Agile Product Ownership in a Nutshell. https://www.youtube.com/watch?v=502ILHjX9EE 5. Further details on slack time. http://agiletrail.com/2012/01/09/slack-to-the-rescue-what-you-want-

to-do/

References

1. Agile software development, Wikipedia:

https://en.wikipedia.org/wiki/Agile_software_development.

2. 2014 TDWI BI Benchmark Report: Organizational and Performance Metrics for Business Intelligence Teams, TDWI, Sep 2014.

3. K. Beck et al. Manifesto for Agile Software Development, Capgemini, http://agilemanifesto.org.

4. Scrum (software development), Wikipedia:

https://en.wikipedia.org/wiki/Scrum_(software_development) 5. J. Heizenberg, A. van den Berk, R. Fietsima. Agile Business Intelligence How to make it happen,

Capgemini, 2013, https://www.nl.capgemini.com/resource-file-access/resource/pdf/1b-

032.13_whitepaper_new_vi_agile_bi_17th_oct13_final_web_secured_1.pdf 6. Kanban vs Todo lists and Scrum, http://kanboard.net/documentation/kanban-vs-todo-and-scrum

7. M. Rouse and Y Francino. User story definition, TechTarget, Feb 2015.

http://searchsoftwarequality.techtarget.com/definition/user-story 8. Technical debt, techopedia, https://www.techopedia.com/definition/27913/technical-debt

About the Author

Wendy Gilbert commenced her career in Business Intelligence and banking with a master’s degree in computer Science from Texas. Her wide experience, culminating in

five years as Business Intelligence Practice Director for Plaster Group Consulting in

Seattle, included Seattle Cancer Care Alliance, Bill & Melinda Gates Foundation, Expedia, Amazon, and Washington Mutual, and has prepared her well to lead the Data

Intelligence project portfolio for ANZ bank in New Zealand, driving the creation of a

centralised data warehouse.

Page 13: Practical Agile Data Warehousing Final

Wendy is a Certified Data Management Professional, and has a passion for the discipline. She has been

widely involved in developing courses at UW (University of Washington), was past-president of the Seattle chapter of TDWI (The Data Warehouse Institute), active in the Puget Sound Data Management

Association, and a speaker on self-service BI and BI solutions. She was also active in the BI/DW

educational community in the Puget Sound as an advisory board member for Renton Technical College,

Bellevue College, and UW’s Continuing Education program where she co-taught Data Mining.