practical agile data warehousing final
TRANSCRIPT
Practical Agile Data Warehousing: If at first you don’t
succeed, change something.
Wendy Gilbert Lead Project Manager, Data Intelligence, ANZ New Zealand.
Level 9, 23 Albert Street, Auckland, New Zealand 1010
[email protected], www.linkedin.com/in/wmgilbert
Executive summary
It is easy to get tired of the hype surrounding big data, data science, self-service, cloud, mobile BI, agile BI, etc. I’ve
worked with several data warehousing teams who have said that agile didn’t work for them, doesn’t work for data
warehousing, or that they are just plain tired of hearing the word. My experience has been:
Data warehousing teams who feel that agile doesn’t work for them are often trying to ‘do agile’ rather than
‘be agile’, and they can succeed if they re-assess their practices.
Agile is an excellent framework, so long as you take account of the differences between data warehousing
and software development.
Hype about agile isn’t going away any time soon. It’s the natural result of people (like me) feeling so
energized by the benefits of agile that they can’t stop using it as the solution to all the problems of the
universe.
I am going to cover what agile is for me, when you should use it, thoughts on what you need to do to be successful,
and common challenges we encounter applying agile to data warehousing.
Introduction
In my scrum master certification course, the instructor started the day with a group exercise. He broke the
class into groups of two with each team having a ‘manager’ and a ’developer’. Our first task was to navigate around a small section of the room with the manager giving direction (start, stop, left turn, right
turn) and the developer following those directions, counting how many times we could circle the area
together in the allocated time period. The second time around, the goal was the same but the developers
decided when and where to move and the managers just came along for the ride.
The first attempt resulted in very little progress with many near-collisions, complaints about management,
and laughter. The second resulted in us gradually merging together and forming a circle that went round and round the room in sync, more than twice as many times as the first try. A silly exercise, yes, but it got
the point across: one of the primary tenets of the Agile Manifesto [1] is that the best architecture,
requirements and designs emerge from self-organising teams.
It is easy to get tired of the hype surrounding big data, data science, self-service, cloud, mobile BI, agile
BI, etc. I’ve worked with several data warehousing teams who have said that agile didn’t work for them,
doesn’t work for data warehousing, or that they are just plain tired of hearing the word. My experience has been:
Data warehousing teams who feel that agile doesn’t work for them are often trying to ‘do agile’
rather than ‘be agile’, and they can succeed if they re-assess their practices.
Agile is an excellent framework, so long as you take account of the differences between data
warehousing and software development.
Hype about agile isn’t going away any time soon. It’s the natural result of people (like me)
feeling so energized by the benefits of agile that they can’t stop using it as the solution to all the
problems of the universe.
Is agile data warehousing worthwhile? Data warehouses (DWs) built using waterfall methodologies are not doomed to failure. Despite the
headlines we see of how often DWs fail, many of them do succeed. I worked on several DWs earlier in
my career that used a traditional waterfall software development lifecycle (SDLC). In my experience, the
projects were massively over schedule and over budget, but they did succeed in the end and some of them are still in use today.
But we could have done better. The companies hired consultants who knew little about the data and were given limited access to our business users or technical subject matter experts, because they were too busy.
We made assumptions about what the users wanted and tried to ensure that the DW could handle these
situations, thinking we were brilliant when we came up with a possible use case for the data that nobody
else had thought of. We decided to track history for every field in the system, because somebody somewhere might need it someday.
These stories aren’t shared to make me look bad. Mistakes like this are common, and we learn from them. I believe that using an agile framework alleviates many of the situations that led to the mistakes I’ve
experienced. And I’m not alone, TDWI’s 2014 BI Benchmark report [2] determined that “agile is the
most effective development methodology in terms of BI value”:
BI Value by development methodology
High value Moderate value Low value
Agile 38% 51% 11%
Waterfall 32% 36% 32%
Hybrid 18% 64% 18%
What is agile?
Wikipedia defines agile software development [1] as:
A group of software development methods in which solutions evolve through
collaboration between self-organizing, cross-functional teams. It promotes adaptive planning, evolutionary development, early delivery, continuous improvement, and
encourages rapid and flexible response to change.
Agile Manifesto
Agile software development is based on the Agile Manifesto [3] which values:
Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan
The most important take-away from the Agile Manifesto is that it puts the focus on the elements on the
left (individuals, interactions, working software, customer collaboration, responding to change) but is not at all implying that the elements on the right (processes, tools, documentation, contract negotiation, plans)
are not needed.
Agile principles
The Agile Manifesto [3] outlines 12 agile principles (originally created for software development):
Our highest priority is to satisfy the customer through early and continuous delivery of valuable
software.
Welcome changing requirements, even late in development. Agile processes harness change for
the customer's competitive advantage.
Deliver working software frequently, from a couple of weeks to a couple of months, with a
preference for the shorter timescale.
Working software is the primary measure of progress.
Agile processes promote sustainable development. The sponsors, developers, and users should be
able to maintain a constant pace indefinitely.
Business people and developers must work together daily throughout the project.
Build projects around motivated individuals. Give them the environment and support they need,
and trust them to get the job done.
The most efficient and effective method of conveying information is face-to-face conversation.
At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its
behaviour.
Continuous attention to technical excellence and good design enhances agility.
Simplicity, the art of maximising the amount of work not done, is essential.
The best architectures, requirements, and designs emerge from self-organising teams.
Agile Frameworks
Agile as a discipline has many frameworks beneath it. The most commonly used agile frameworks within
data warehousing are scrum [5] and Kanban [6]. Other frameworks include lean, extreme programming (XP), feature driven development, crystal and others.
Scrum
Scrum [5] is the most frequently adopted agile process worldwide. It is especially helpful for projects that
have a lot of uncertainty and need constant business interaction, such as building the core of a DW. Wikipedia defines scrum as:
An iterative and incremental agile software development methodology for managing
product development. It defines "a flexible, holistic product development strategy where a development team works as a unit to reach a common goal", challenges assumptions of
the "traditional, sequential approach" to product development, and enables teams to self-
organize by encouraging physical co-location or close online collaboration of all team
members, as well as daily face-to-face communication among all team members and disciplines in the project.
A key principle of scrum is its recognition that during production processes, the customers can change their minds about what they want and need (often called
"requirements churn"), and that unpredicted challenges cannot be easily addressed in a
traditional predictive or planned manner. As such, scrum adopts an empirical approach,
accepting that the problem cannot be fully understood or defined, focusing instead on maximizing the team's ability to deliver quickly and respond to emerging requirements.
My recommendation for those interested in implementing scrum is to take a two day scrum master
certification course, even if you don’t intend to be a scrum master. The course will teach you the basics and, more importantly, will help you understand why each scrum event is useful.
Scrum teams consist of three roles: product owner, scrum master, and development team. Scrum refers to
all members of the development team as developers because well-rounded teams make up of people who are capable of taking on more than one role are the desired goal, but the development team should have
the skillsets:
Data analysis
Data modelling
Data mapping
ETL and/or ELT
Testing
Report development
BI Solution Architecture (this skill may be
managed outside the scrum team)
Database experience
Heizenberg et al. [5] do an excellent job of explaining scrum principles, particularly as they apply to BI, so I refer you there for further details on scrum processes.
Kanban
In kanban, individual work items are broken into phases such as ‘Not Yet Started’, ‘In Progress’, and
‘Done’. I think of it like a group ‘to do’ list except that there are multiple phases and you can set a work in progress limit per phase. In data warehousing, I have mostly seen kanban used for maintenance and
modifications of an existing system.
Apply the process that best fits the situation
There is no need to pick a single framework for all of your projects and teams. At ANZ NZ, we use scrum, kanban, and waterfall within our data warehouse teams. We are using Teradata as our database
platform. Installing and configuring the platform is basically an infrastructure project so we are using a
more traditional waterfall approach for that as well as for installation of our ETL tool and BI platforms.
For agile projects, we are required to meet the typical stage gates for software delivery (analyse, design,
plan, build, test, implement), but these have been modified to meet the needs of an agile world. The
terminology is the same, but the project artefacts and timing are very different. For example, in a typical waterfall project the output of our analyse phase would be detailed requirements for the project, whereas
with our Agile SDLC, our analyse phase output is a product backlog that we believe we can complete in
the specified time period. We will revise the analyse artefact as needed if the original product backlog is no longer meeting the business needs and/or priorities and can easily change it as long as our sponsors
support the change.
For a variety of reasons, we chose to use scrum for building the core and presentation layers of our DW,
and kanban for sourcing our data into staging.
Scrum is better suited for the core since the requirements are less understood and we need
constant interaction with our business users. The business users need to be able to change
priorities and requirements quickly.
Kanban is well suited for the staging layer since there is limited transformation and the
requirements are better understood. The focus here is continuous delivery and productivity.
Being agile versus doing agile
Before I took scrum training, when someone asked me if I had used agile before, I would mumble a reply and tell them something about how I understood the concepts but had not worked on a team that used it.
I’d go on to cite the need for daily stand-ups and such. I didn’t realise how much I didn’t know until my
first scrum team.
I have conducted hundreds of interviews in the last few years and I am frequently looking for people with
agile experience. Much of the time I get responses similar to my own many years ago, often by people
who work on teams that have adopted some agile practices (usually daily stand-ups), but who were not focused on agile principles. Adding daily stand-ups to a team that is otherwise using a traditional SDLC is
not agile if you haven’t adopted the mind-set change that goes with it.
In ANZ New Zealand, our agile advisory team expressly avoids approaches to agile that don’t fit to a prescribed process. From the outset they adopted the practices of Scrum.org in their entirety, and
progressively adopted agile techniques which they found most effective.
According to Chris Starling, ANZ New Zealand, Delivery Transformation Manager, you should “choose
a mature agile process such as scrum, so you can get access to coaching and support. ‘Doing agile’ means
‘on your own experimentation with a chemistry set and no teacher’. The result is not a surprise to anybody. It doesn’t mean that agile won’t work for you. It simply means that your adoption strategy
sucked.”
Agile training classes will teach you the elements of what it is to ‘do agile’, but to ‘be agile’ is to change the mind-set of yourself, your team, and your organization, and put the focus on the core tenets of the
Agile Manifesto.
Business value
Everything you do, every key you press, spec you write, email you send, meeting you attend, code you build, etc., should have an underlying connection to the business value it provides. All levels of the
organization should be comfortable asking why they are doing something and connecting business value
to the work. Business value can be direct (e.g. the work will result in better service for the customer) or indirect (e.g. the work is needed for regulatory reporting to prevent fines that would result in increased
costs to the customer).
I once spent weeks trying to get the details correct for a specification for a new field that my manager was convinced the business would want us to build. When I went to the business manager to ask her for
clarification on it, she told me that nobody in the business would ever use that field because it didn’t meet
their marketing needs. The DW team was convinced that if we built it she would use it: so we built it, and it never got used. We started with the solution, not the business value.
Continuous process improvement
When I was a consultant and a client told me that they had tried agile before and it failed, I would often
suggest they rebrand it as continuous process improvement (CPI) when they tried the second time. Even those who are tired of the word ‘agile’ will get on board with the idea that you need to be always
improving.
As with business value, CPI must be ingrained in your culture. This takes time and it is harder in
organizations that strongly silo roles and responsibilities. The sprint retrospective is one way scrum encourages CPI, but there are other things you can do to encourage CPI behaviours so that, over time and
with encouragement, people become more proactive and empowered.
We have all been in meetings or hallway conversations where something was suggested and the respondents agreed it would be great. Two weeks later the idea comes up again, and we agree that it
would be great and we assume someone else will follow up on it. A week later we start getting grumpy
because it hasn’t been implemented. Those conversations continue until the idea either dies out or someone who feels empowered picks the idea up and actions it.
In my last scrum team, we added a team agreement that said “We will not ignore things; we all take responsibility and ownership for remembering and acting on issues and ideas.” When something was
suggested and we agreed it would be a good idea, we didn’t move on to the next subject until one of us
volunteered to follow up. This applied to big things, such as improvements to our architecture, and
smaller things, such as adding a step to our definition of done.
Maximise the amount of work not done
This is one of the key agile principles, but in data warehousing in particular we tend to ‘future proof’ and
‘what if’ ourselves into delivering things that the customer does not need, because we are worried that one
day they might.
The most glaring example of this is with slowly changing dimensions (SCDs). In data warehousing, we
love history. I’ve been told that a DW isn’t a DW if it doesn’t have SCDs. But I have also helped build two DWs that didn’t have a single SCD, and still delivered huge business value. In both cases, SCD
scenarios were in our backlog but never got high priority.
Two of the largest DWs that I worked on started their journey with a decision to treat all fields as Type II
(track changes to the field over time), unless there was a good reason to set the field to Type I (overwrite
the changed field). In both cases, after months and months of spider-web code, slow builds, and training
chaos, they reversed the decision and decided to go with Type I unless there was a good reason to set the field to Type II, thus maximising the amount of work not done.
Rework is a fact of life
With waterfall, requirements and design are done up front, whereas with scrum they are more fluid. It is
harder to plan ahead and there is always the possibility that you’ll make a mistake in your design or requirements that will force rework. It’s a very uncomfortable feeling to build something that you know
could need to be redone at a future point if requirements change or evolve.
It took one of my most experienced developers months to adapt to the idea. He took great pride in his
work and both worried about the reputation hit if someone thought his code wouldn’t stand the test of
time, and the wasted cost spent reworking the code later. But I argue that reworking code later (if needed)
is not nearly as arduous as we make it out to be, and while we are waiting for the possibility that this needs to happen, the business users are reaping the benefit of what we initially delivered.
Agile and data warehousing
While there are many similarities between data warehousing and software development, there are
common situations when applying scrum to data warehousing initiatives and projects that may not apply to software development.
We don’t deliver software
I’ve seen several data people read the agile principles and immediately take issue with the term ‘working
software’. The agile principles were created for software development, but of course working software is not the focus of a DW. So we adapt: rather than working software, we deliver business value and focus on
building a product that will be accepted by the product owner as working. This could be fields, tables,
views, dashboards, reports, data marts, etc. Sprint reviews sometimes have to be creative as a result, e.g.
if the result of a user story was a new field, then the review for that story might be an SQL query showing the value distributions in the field and the results of testing that show the scenarios accounted for.
User stories in data warehousing are more focused on what many refer to as a ‘slice’ of the DW. Rather than building each layer in its entirety before moving to the next, you take a subset of the DW and build it
from beginning to end (or whatever layers are relevant to that story). Slicing initially worries some
architects who are uncomfortable with the risk of future rework, but it’s a risk well worth taking.
A common criticism I hear is that you can’t deliver business value with a DW user story that fits into a
two-week sprint. This is sometimes true, sometimes you have to put in some foundation work before true
business value can be realised. But this is something the team gets better at over time and you get better at breaking epics into user stories that provide some value along the way.
More pre-analysis and prep is necessary
In software development and data warehousing, the development team’s job is to figure out how to create
the code for the user story. They collaborate with business owners to clarify the details as they go. In my experience, there are more unknowns in this process for data warehousing than software development,
and more analysis and prep work is needed for a user story before it can be considered sprint ready.
In their article on Agile Business Intelligence, Heizenberg et al. [5] grouped BI user stories into five
categories as shown below (the article also provides examples of user stories within each category):
Data Disclosure stories are about extracting data from the source system and making it available.
Data Augmentation stories are about creating new information based on existing information.
Data Presentation stories describe the presentation of the information in a format the user can
easily understand.
Data Validation stories are where a user talks about applying business rules to check whether the
extracted data is of good enough quality for the end users to work with, and to take the necessary
actions when the data is not good enough.
Configuration stories are about enabling maintenance staff and administrators to keep the
configuration of the BI system up to date without having to change the code.
Data Presentation, Data Validation and Configuration are similar to the user stories found in software
development, whereas Data Disclosure and Data Augmentation user stories often have many questions
that the business owner will have trouble answering:
What source system does the data come from? What is the source of truth for the data?
Do we have access to the source system already? If not, how (and when) can we get it?
What fields and tables are needed? Are they in the DW already? If so, at what layer?
Is the data clean enough to use for the intended purpose? Is the data in a complete state?
In the case of data augmentation, how much do we know about the calculation?
Is a prototype needed to help the business owner with the definition?
What existing reports/dashboards/etc. are affected?
I’ve seen scrum teams handle the additional analysis in different ways:
Pre-Sprint Prep Team: One of my larger clients had a separate team of technical analysts that
worked outside the scrum team. Their role was to analyse the high-priority user stories in the
backlog to ensure the scrum team had enough information to size them. They assisted the Product
Owner getting user stories ‘sprint ready’.
Spikes: A spike is a user story or task where the goal is to gather information rather than create
shippable code. One of the scrum teams I worked on didn’t have enough analysts for a pre-sprint
team, so we would take high priority user stories, that we didn’t have enough information on to
size, as spikes. We would time-box the story and the scrum team would investigate the story
within the sprint. At the end of the sprint, the story would go back to the backlog to be sized and prioritised or another spike would be instigated if discovery was not complete.
Designated Discovery Time: A less desirable option would be to carve out some time from each
sprint for discovery. But this uses the scrum team for sprint and pre-sprint work, and impacts
efficiency.
Definition of done In scrum, the definition of done is a simple checklist that helps the development team ensure a story is
complete and ready for shipping. The items on the list are nothing special by themselves, but diligently
ensuring that all user stories satisfy the definition of done is key to delivering quality, well documented results. Here are a few examples of what may be included in the definition of done for a DW user story:
Code:
o Development is complete
o The DW architect is comfortable with the
solution that was developed o Code has been peer reviewed (as
applicable)
o Code was checked into the code repository and migrated to the appropriate release
environment
o The data architect is comfortable with any model changes
Documentation
o Source-to-target and data model were
updated as appropriate
o Necessary supporting documentation is complete and checked in
QA and Unit testing
o User story acceptance criteria has been
verified by the scrum team
o QA tests were created and run o Code has no known defects, or defects are
at an acceptable severity
o Automated tests were run (if your team does them)
Release Management (some teams manage
these as part of the release rather than within a
sprint): o Release documents have been updated
o Training materials have been updated
o Data dictionary has been updated
o The user story owner has validated the user story meets their acceptance criteria
o If applicable, the owner of the data
elements that were added/modified approve the change
Team agreements
A powerful practice my agile coaches (Shama Bole and Grant Beck with Plaster Group Consulting)
introduced me to is the concept of a team agreement. A team agreement is not specific to scrum and can be used by any team. It is basically a short list of items that the team agrees to. It works because the team
determines the list, not management, and the team can change them if they are no longer fit for purpose.
A few examples are shared below:
We will honour our commitments in a Sprint to
the best of our ability: Teams that are new to scrum can be slow to realise that taking on a
task is a commitment to completing it: don’t
commit if you don’t honestly feel you can do the work in the planned time or have
bandwidth to take it on.
If a user story is not in the sprint, we will not
work on it: The next technical story or user story in the backlog may be more interesting
than the ones in the current sprint and
developers can be tempted to stray.
We will raise questions/issues early rather than
waiting for the next stand up, team meeting, etc.
We will favour face-to-face communication.
We will be positive.
We will regularly update our scrum tool.
We will be on time for meetings.
Laptop-free meetings: The team may designate
certain meetings as laptop-free to ensure they
stay focused.
Everything we do that could impact production
data and/or code must be reviewed by someone
else.
We will not commit to stories that are not
sprint ready.
We all take responsibility and ownership for
remembering and acting on issues and ideas.
Our tasks are more modular, so more transition points and handoffs within a sprint
The best scrum teams are those with team members who are fungible, i.e., capable of mutual substitution.
This is easier in some teams and organizations than others. In a perfect world, your scrum team would be made up of people who can analyse, map, model, code, test, etc. However, in most teams, especially those
just getting started with scrum, you will have team members whose comfort level and/or ability to work
outside their current role is low.
In a user story [7] that has ETL (extract, transform, load), and/or ELT (extract, load, transform), there will
be multiple steps to complete the story, such as:
Initial analysis
Data model update
Source-to-target mapping document update
ETL/ELT changes
Quality assurance for ETL/ELT
Update related views and/or presentation layer
Test outcome preparation and creation
Quality assurance for view changes and/or
presentation layer
Check completed work against user story
acceptance criteria
Validate the definition of done and ensure the
user story moves through the sprint
Update data dictionary
If the team is not comfortable stepping outside their traditional roles, the steps involved require multiple
handoffs between team members. If the handoffs are not well coordinated, you could be one week into a
two week sprint before the person who owns the second task picks it up, and completion of subsequent tasks is jeopardised. This is manageable with diligence in the sprint in the short term, and would benefit
from team cross-training in the long-term. One short-term solution could be to assign a user story owner
within the sprint team for stories with a lot of handoffs. This person’s responsibility being to make sure the user story handoffs are coordinated appropriately. You could also try pairing individuals to both help
with upskilling and reducing handoff issues.
Multiple scrum teams must still share a common methodology and data model
Architectural oversight is crucial to a DW. If you have multiple scrum teams, they must have common
design standards and development processes, and work off of a common data model. How can we resist
the urge to do detailed up-front modelling and yet still stay in sync across scrum teams?
The solution I have seen most often is to have centralised data architect(s) to manage the data model and
DW architect(s) for overall design and standards for the DW. A centralised data architect can be challenging if they must do all data modelling and they are overly academic, as they become a bottleneck
for the scrum team. On the other hand, many DW teams aren’t large enough to justify a full time data
architect. The DW architect may play both roles and provide guidance and oversight to scrum team
members who do their own modelling.
For DW architecture, being agile still requires some architecture up front. I agree with Heizenberg et al
that some decisions should be made before a scrum team begins development, because changing them later would be very costly:
Tools, including database platform, ETL tool, reporting solution(s), data modelling tool, version
management system, etc.
Data model(s) and method(s) (3rd
Normal Form, Star Schema, Data Vault, etc.), and DW
environments (staging, core, presentation, operational data store, etc.).
Standard processing methods, e.g. historic data processing and error handling. This includes
deciding on the underlying control framework and guidance on decisions such as when to use
ETL or ELT.
The remaining DW architecture will evolve over time and can respond to changing business needs.
Managing technical debt and technical user stories
Techopedia defines technical debt [8] as “a concept in programming that reflects the extra development
work that arises when code that is easy to implement in the short run is used instead of applying the best
overall solution.” Technical user stories are those that the development team or architects initiate such as improvements to the system to ensure scalability, cut down on future development time, etc.
There are a few ways to handle these issues, similar to handling pre-sprint analysis work:
Separate team/developer(s) focused on technical improvements: Have a developer or team
outside of the scrum team work through technical debt and focus on improvements. This is a
practical option, and keeps the sprint team focus on the business backlog, but scrum developers
often like these types of tasks as they are a diversion from the characteristic sprint work and keep their skills sharp.
Prioritise technical stories with the product backlog: Put technical stories into the backlog and
have the product owner categorise them along with the business-focused user stories. This can be
challenging because the technical team bears the burden of ensuring the product owner
understands the benefits. I have not seen this option succeed in practice.
Designated technical time: Carve out specific time (percentage, set time, number of story points,
etc.) from each sprint for technical work, using the scrum team for sprint work and technical debt.
This is the option I have seen work best, but you have to manage it carefully because some of the
technical stories are more ‘interesting’ to developers and they may focus on them to the detriment of other sprint work.
Grant Beck, Plaster Group Consulting, shared a few more options for managing technical debt:
Slack time. Time set aside for an employee to do whatever they want (work related). This
encourages innovation in agile, and also can be used for managing technical debt (see suggested reading).
If you touch it, refactor it. When you are coding, incorporate any refactoring as you see the
opportunity.
Make it visible! Grant will often create a large bulletin board with colourful items representing
outstanding refactoring tasks. The team is encouraged to grab an item if they have bandwidth. As
technical debt accumulates, everyone can see it, and it WILL become a topic of conversation.
Ongoing changes, maintenance, and modifications
If you try to use your scrum product backlog for maintenance, modifications, and new user stories, then
small improvements will struggle to get priority and could languish in the backlog indefinitely. A
common solution is to use kanban for maintenance and modifications, and to have the scrum team(s) focus on new user stories. Large maintenance items may be put into the scrum team’s product backlog
and prioritised by the Product Owner, while other items are prioritised for the dedicated maintenance
team. You can make the maintenance team as small or large as needed to accommodate demand.
Data governance Data governance is crucial for a DW to maintain a single version of the truth. Responsibility for ensuring
data governance falls on all members of an organization, but the product owner(s) and developers can
make or break any data governance plan. Data governance should be part of the process regardless of the development framework being used, if a new field is being added, existing field being modified, etc. then
the ‘owner’ of that field should be consulted.
In the DW for ANZ, we are building data governance from the ground up rather than from the top down.
As we create derived fields in the integrated layer of our DW, we will ensure the fields have dedicated
‘owners’ who advise us.
Test automation No discussion of agile is complete without mentioning test automation. I have seen successful data
warehousing projects that did not have test automation, at least not initially. Some evolved into it and
others never did. One challenge with test automation in a DW environment is that open source or free test
tools require knowledge of C#, Java, etc., that a DW team may not have.
At ANZ, we have a mature test automation practice with several agile teams using specification-by-
example and associated technologies shared from our existing scrum practice. Within the new data warehouse team, we are currently investigating test automation options. In my previous companies, test
automation tools were built in-house, and this is one option we are considering.
Conclusions Data warehousing projects that utilise agile frameworks are slowly becoming the majority. However,
companies need to adopt practices that work for them and meet their organizational culture. Those with a
culture of continuous process improvement are naturally more comfortable with change and will adapt more quickly. Integrating agile into your data warehousing environment depends on your org structure
and whether senior management recognises the benefits agile can bring.
One of the most important elements of success for a scrum team is an active, engaged and focused product owner(s). But perhaps the biggest key to success in agile is to experiment with solutions that
work for your situation. This doesn’t mean you should abandon the scrum framework if you’re using
scrum, but there is a lot of flexibility available within the scrum guidelines. If you are currently using
spikes for discovery, for example, and it isn’t working for your team, try another option for a few sprints and see if that solves the problem.
Agile is, after all, about being adaptive, resourceful and responsive.
Suggested Reading/Viewing
1. Definitions of Type I, II, III, etc. http://www.kimballgroup.com/2008/09/slowly-changing-
dimensions-part-2/ 2. The Role of Project Manager in An Agile Environment. http://plastergroup.com/role-project-
manager-in-an-agile-environment/
3. Agile Business Intelligence: How to Make it Happen? https://www.nl.capgemini.com/resource-file-access/resource/pdf/1b-
032.13_whitepaper_new_vi_agile_bi_17th_oct13_final_web_secured_1.pdf
4. Agile Product Ownership in a Nutshell. https://www.youtube.com/watch?v=502ILHjX9EE 5. Further details on slack time. http://agiletrail.com/2012/01/09/slack-to-the-rescue-what-you-want-
to-do/
References
1. Agile software development, Wikipedia:
https://en.wikipedia.org/wiki/Agile_software_development.
2. 2014 TDWI BI Benchmark Report: Organizational and Performance Metrics for Business Intelligence Teams, TDWI, Sep 2014.
3. K. Beck et al. Manifesto for Agile Software Development, Capgemini, http://agilemanifesto.org.
4. Scrum (software development), Wikipedia:
https://en.wikipedia.org/wiki/Scrum_(software_development) 5. J. Heizenberg, A. van den Berk, R. Fietsima. Agile Business Intelligence How to make it happen,
Capgemini, 2013, https://www.nl.capgemini.com/resource-file-access/resource/pdf/1b-
032.13_whitepaper_new_vi_agile_bi_17th_oct13_final_web_secured_1.pdf 6. Kanban vs Todo lists and Scrum, http://kanboard.net/documentation/kanban-vs-todo-and-scrum
7. M. Rouse and Y Francino. User story definition, TechTarget, Feb 2015.
http://searchsoftwarequality.techtarget.com/definition/user-story 8. Technical debt, techopedia, https://www.techopedia.com/definition/27913/technical-debt
About the Author
Wendy Gilbert commenced her career in Business Intelligence and banking with a master’s degree in computer Science from Texas. Her wide experience, culminating in
five years as Business Intelligence Practice Director for Plaster Group Consulting in
Seattle, included Seattle Cancer Care Alliance, Bill & Melinda Gates Foundation, Expedia, Amazon, and Washington Mutual, and has prepared her well to lead the Data
Intelligence project portfolio for ANZ bank in New Zealand, driving the creation of a
centralised data warehouse.
Wendy is a Certified Data Management Professional, and has a passion for the discipline. She has been
widely involved in developing courses at UW (University of Washington), was past-president of the Seattle chapter of TDWI (The Data Warehouse Institute), active in the Puget Sound Data Management
Association, and a speaker on self-service BI and BI solutions. She was also active in the BI/DW
educational community in the Puget Sound as an advisory board member for Renton Technical College,
Bellevue College, and UW’s Continuing Education program where she co-taught Data Mining.