Agile Processes for Data Warehouse

Change is hard, and revolutions are terrifying.  As my company began our data warehouse re-architecture journey, I hesitated to proceed with a traditional waterfall approach. I lived through a few projects where we built a new data warehouse using waterfall methods. After seven years (no kidding) our “Next Gen” project was “Old Gen” and we still didn’t have anything to show for it. There are few projects that are better aligned to the "fail early, fail often" mantra as data.  This is certainly true if you build a data warehouse where there is so much you don't know at the time a decision is made. As you proceed through the work of acquiring and profiling data, you peel back layers to learn more and more.  In a traditional waterfall process, you wait until you’ve peeled those layers back.  I can tell you based on experience, that can take an enormous amount of time, and it’s exceedingly difficult to know where the end is. 

I was certain that a full “scrum” model wasn’t feasible for us.  Based on much of what I had read about agile, it seemed that it was all or nothing. As a result, I figured that true agile processes were out of reach. However, after reading and enjoying the book “Agile Data Warehousing” by Ralph Hughes, I decided it was worth a shot. The first thing we did was try to find an agile coach that understood our hesitancy and our current state. After all, not only were we changing our data warehouse and using new technologies to do it, we had recently moved our company location and merged two teams together.  Change had been rampant.  We were extremely lucky to find DevJam founder David Hussman. He started us off with some history lessons, and helped us understand that at the core of agile was delivery. There is only one reason for process, regardless to what you call it, and that is the ability to reliably deliver whatever you are responsible for. Whatever you call it or however you get there, the point is the creation of a product, in our case a data warehouse. 

Our journey began by simply trying to understand what the units of work were. After ten sprints, we are still trying to figure out our units of work, but the amazing thing about that process is the ability to refine and learn over time.  I know so much more than I did five months ago, and so does everyone else on the team. We have achieved some pretty amazing things in five months, including identifying the repeatable units of work so that we are now working on the exceptions, not the rules.  As you create a new data warehouse there are plenty of work activities that get repeated and don’t warrant additional definition or discussion, but we often found ourselves wanting to discuss them more.  With our agile processes, we identified those scenarios and quickly (within weeks) modified our work flow to reduce the impact.  That means that the stuff we do, over and over again, (like acquiring data sources into our staging area) has gotten easier and less error laden.  More importantly, it means that the areas that are exceptions, such as MDM-based reference data, get the focus they deserve. 

It hasn’t been all blue sky and butterflies though.  Something happens to a group of people when you start asking them what they are working on, and how long it will take. Particularly if you start calling out (which agile does naturally) when you are working on things that are not specific to the current sprint. We have shared lots of data with the team, including burn-down charts (Figure 1) and an analysis of estimated time versus actual time, which we started tracking after sprint four. That has helped us challenge our own biases about how we estimate work. It was important that each team member see whether they consistently said it would take more or less time than it actually did.

 Figure 1 Burn-down charts


If I had it to do over again I wouldn’t hesitate to transition to an agile process for data warehouse. But there are some things I would do differently.  I would still hire DevJam, but I would want them around a little longer. We did have a coach follow up David’s visionary explanation of agile, but there’s a pretty steep learning curve and our team lives in a larger department that is staunchly waterfall.  A number of the nuances have tripped us up.  I wish we would’ve started with the Kanban (KB) board (Figure 2). We started with other options at the start, and the KB board wasn’t that far behind, but this approach made transparency much easier. Our KB board includes (as most do) stories and tasks, but we found that asking people to mark their time remaining each day held people accountable for the task of the day.  We also instituted a “pink sticky” rule; which meant that if you worked on something that wasn’t on the board you had to write a pink sticky, the equivalent of admitting you worked on something you weren’t supposed to.  Our first sprint with the pink stickies we had well over a dozen.  On our last sprint, we had three.  I think the reality of the pink sticky for us is that we still have a data warehouse that we have to maintain, so recently we decided to track our work in three separate swim-lanes, one for the legacy data warehouse, one for our modern data platform and one more for the analytic and development work that helps us keep the lights on.

Figure 2 - KanBan board


We are always improving and finding new and better ways to deliver.  I’ve built a few data warehouses, but I have to say, using agile has made this process a little more fun. 

Books by Our Experts