Let’s pretend for just a second that we need estimates in order to perform our business. Some of you will say that we do and some will probably say that estimates are a big waste. But for the moment, let’s at least pretend that they have a place.

dice

Usually we do estimation in order to provide some kind of predictability in our deliveries. It’s just that an estimate is not enough on its own. Knowing that something will take 6 man weeks to implement has no value unless we know that we have 6 man weeks at our disposal. We need to combine our estimate with some kind of capacity measure in order to get predictability. There’s a big difference if our team can give the task 6 man weeks worth of attention within the next two week iteration or if they’re overloaded with other work and need 4 calendar months to finish the requested 6 man weeks.

So we need an estimate AND a capacity in order for the estimate to have any value. The thing is that it’s not enough either. When we estimate, we also need to agree on what we’re estimating. We need to have the same view on the expected quality; both external and internal quality. Everyone involved needs to know how the solution is supposed to behave; if the customer expects a Lexus but a developer is about to build a go-cart, the estimate will have no value. Everyone involved needs to have the same view on the level of confidence for the internal quality; if the developer is ok with Windows 95 quality but the tester is expecting a pacemaker level of confidence, the estimate will have no value.

So now we need an estimate AND a capacity AND an understanding of quality in order for the estimate to have any value. The thing is that if we make an estimate and it’s wrong, the effects will fade over time (unless we’re dealing with systematic estimation errors). If a requirement was estimated to take 5 days but actually took 10 days (a 100% estimation error), the effect on a six-month project will be less than 4%. An error in capacity on the other hand will multiply if left to itself. If a team is working in two-week sprints and plans are made with a 10% error in capacity, this error will multiply for each sprint and for a six-month project, we’ll have to add another two sprints to the end in order to finish what we had initially planned. But even worse is the cost of poor quality. These costs tend to rise exponentially with time. The longer time a poor assumption or a bug goes unnoticed, the more code will get built on top of that error and either multiplying the instances of the actual error or at least building dependencies to the error.

In short:
Error in estimate – impact decreasing linearly with time
Error in capacity – impact increasing linearly with time
Error in quality – impact increasing exponentially with time

But where do people put their attention when plans fail? They usually address the estimate and way too often put blame on the teams for not doing good enough estimates. Apart from being unethical since estimates are nothing but guesses, it’s also a waste of time since any deviations from the plan are much more likely to come from errors in capacity measurements (or worse; capacity estimates) or a mismatch in the understanding of what quality (or functionality) was being estimated.

So if predictability is what you’re looking for, don’t invest much in your estimates, instead you should make sure that your capacity is known and that quality (internal as well as external) is well understood. And that’s why your estimates don’t really matter.

I wrote in my previous post about the Scrum team I’m working in as a ScrumMaster and that we’re closing in on our first release to production. At this stage a lot of the work is related to getting production environments up and running and our user stories have taken on a more technical format and are formed more by the team than the Product Owner. Our PO had a story asking for a production environment but that one was way too fuzzy for the team to take in so they had to break it into smaller stories/technical tasks. A lot of this work was done on the fly during the planning session and we needed to find defintions of done for the stories as they were forming.

The task of finding good definitions of done proved to be harder than anticipated for these stories. After a while I realized that what we were specifying tended to be more of a list of tasks than acceptance criteria. So we backed up the value chain by asking “Why” we wanted to do something and started arriving at better definitions of done. However, the crown jewel was when we were discussing the job of getting outsourced operations to monitor our application. What was the definition of done here? Putting the order for monitoring services in the mail? Getting a reply to our order? Having a signed contract? We weren’t really getting anywhere until all of a sudden one of the team members spoke up:

“I want to turn off one of our services and when I see operations dancing a jig on our doorstep, that’s when we’re done.”

I damn near got a tear in my eye hearing this suggestion. This is the kind of thinking that we need when we measure things. Whether it’s a level of service, quality or productivity we want to measure we always need to begin by looking at what we want to accomplish. We can’t demo usability by showing a pretty UI, we need to put a user in front of the UI to demo usability. We can’t demo quality in number of bugs found, we must demo quality in a fit for use product that is stable under usage and over time. And if we want to demo our ability to handle problems, we can’t do that by waving a contract. We demonstrate our ability to handle problems by handling a problem.

This episode reminded me of a story an aquiantance told me ten years ago about his neighbor. The neighbor had a burglar alarm connected to a security company. The security company promised in their service to arrive at the house within 20 minutes of an alarm. Twice every year this neighbor set the alarm off. He then pulled out a lawn chair, put on ear protections and sat down with a timer in his hand and when the security company arrived after half an hour or fortyfive minutes, he filed a complaint and got the service for free for another six months. This guy knew what the definition of done was; he also waited for operations to dance a jig on his doorstep.

If you want to measure or demo some qualitative aspect, don’t settle for the easy way out and try to quantify it. Put it to the ultimate test, that is the only way you’ll ever know for sure that you’ve done something right.

MoSCoW Prioritization Poker

October 20, 2011

Today it felt like I had one of my brighter moments. Sitting at my new client, faced with a backlog way too long to implement in the time available to us, I had an epiphany. We needed to prioritize the backlog and get a first shot at a release plan in a two-hour meeting.
“How about prioritization poker?” The words jumped out of my mouth before I had even considered what they meant.

Let’s jump ahead a couple of hours now. “My” idea about prioritization poker would most certainly be worthy of a blog post so I started writing it in my head on the subway back home. Then another thought hit me, almost as hard as the one earlier today; wait, this was too obvious, I might not be the first one coming up with this idea. And sure enough, some googling told me that this guy called Mike Cohn had already invented prioritization poker. C’mon Mike, you already have so many other things going for you, couldn’t you’ve let me have this one? Anyway, so what if my idea wasn’t unique? I think we had a little different twist on it and it proved to be quite useful and thus I’m still going to write this post.

Back up a couple of hours again.
“Prioritization poker? What is that?” My colleagues gave me weird looks.
The words kept pouring out of my mouth faster than my brain worked; “I guess it’s like planning poker but you do it with MoSCoW instead of numbers or t-shirt sizes. Do you want to try it?”
“Sure. I have a planning poker deck in my bag.” One of my colleagues replied.

So we picked out the 0, 1, 2 and 3’s from the deck.
“0 is Won’t, 1 is Could, 2 is Should and 3 is Must. Kay-O?”

Our product owner read the first story out loud and we began asking for clarifications. After a couple of minutes it was time to play our cards. We had 1 and 2 and 3. Certainly a difference of opinion. After a short discussion where the extremes put their views forward we played another round. All 3’s! On to the next story.

After one and a half hours we had gone through the entire backlog and managed to divide it into almost equal chunks of Musts, Shoulds, Coulds and Won’ts. The remaining half hour of the meeting was spent putting all Musts into the first version of a release plan.

Wow! I’m very familiar with the power of planning poker but this was a new experience for me. Using the poker format brought our assumptions out to be examined and corrected when necessary. We managed to align our views on what needs to be done and to get a common understanding of the requirements. Whenever I felt safe about understanding a requirement someone challenged my view by questioning the meaning of a certain phrase or a specific word. This kept happening until we narrowed the description down to something everyone could agree upon.

Using MoSCoW for the prioritization also proved to be a good idea. At first I thought I had made a mistake by not including the “?”-card in the game. Some of the requirements didn’t make any sense to me and I had no idea what prio to give them. Then I realized the power of the values in MoSCoW. When we got to the first story that I was clueless on, I thought about digging out a question mark from the poker deck on the table. But then my wheels started spinning again and I gave it a 0 … a big “Shazbot! We’re not doing this.”
The others who considered the requirement a Should or a Must looked at me like I was crazy again.
“No, we’re not doing this requirement until someone can sell it to me.” I said.
And they began explaining the importance of the requirement until I understood it and could agree on its’ importance.

This exercise also showed me how important it is to force people to take a stand. There’s a huge difference between prioritizing in a continuous range from 1 to four, or from low to high and if you’re using a scale that includes “Won’t”. If you give something “low” priority or a 1 on an importance scale, people are still left with some hope that the requirement might be implemented and they will with great certainty be disappointed when they realize that it’s not happening. If, on the other hand, you tell them that the requirement is a “Won’t”, you’re effectively telling them that this is not going to happen unless they manage to get it reprioritized. People are forced to come to terms with reality a lot sooner or to take responsibility for making an alternative reality happen.

This was my first experience with prioritization poker and it was a good one. It proved to be a great tool for our context today and I’m pretty sure that I will use it, or some modified version of it again.

Na-Nu Na-Nu!

Rotting Estimates

June 20, 2011

Have you ever been part of a late project where you constantly update your estimates and plans but they continuously get worse instead of improving? Where you finally get to the point where your estimates get out of synch during the time you fetch a cup of coffee?

No? Then I congratulate you because it’s an extremely demoralizing and costly situation.

Yes? Then this post might be able to provide some insights to the dynamics at play.

The model

A colleague of mine just recently presented me with the generic project model built on the work of Pugh Roberts in the early 70’s.

The model is very simple but it gives a good starting point for discussing the recurring dynamics in projects. We pull work from our “Work to Do” box and implement it, either correctly or incorrectly. Work done correctly goes into our “Work Done” box while work done incorrectly goes into our box of “Undiscovered Rework”. Note that this has nothing to do with our physical deliveries, both work done correctly and work done incorrectly will move along the same physical path since we haven’t been able to distinguish the two from each other yet. When we do discover the need for rework we will assess the problems and move them back into our backlog of “Work to Do” again.

What is “Undiscovered Rework”?

In this post I will mainly focus on the “Undiscovered Rework” box. This is our backlog of work that we think we have completed but that will not be accepted as it is. Some of the rework will be discovered when we implement new functionality, some will be found during testing and yet some will be discovered by our end users in production. Anything we produce where the quality is unknown carries the potential to end up in this box.

The sources of “Undiscovered Rework”

The amount of work in  “Undiscovered Rework” tends to grow quite fast as a project goes along. A couple of factors that speed this growth up are:

  • Not having a good definition of done
  • Postponing testing until the end of the project

Both of these factors hide the true quality of our product and allow for different kinds of errors to pile without us knowing it. If our feedback loops for determining the quality of our product are too long or if they are missing entirely, there is really no limit to how much waste we can add to this future todo-list.

The implications

The big problem with “Undiscovered Rework” is that it hides the true progress of our project. It hides our status because we do not know how much work is actually done and we do not know how much work is actually left to do. It also corrupts our view of the rate at which we make progress.

Normally when working in an agile project where we use our velocity to predict future deliveries, our estimates narrow in and get better and better as we gather data over the sprints but this only holds true if we don’t let our hidden backlog grow. If we do not know the true quality of our product, the only thing our velocity tells us is at what rate we can produce crap. If we allow the amount of “Undiscovered Rework” to grow, our estimates will keep deteriorating over time.

An example

Let’s imagine we’re in a project where a serious amount of the testing is done at the end of the project. We begin this project with 15 user stories in our backlog and find that according to our velocity we can implement three user stories each sprint.

The thing is that one third of this work ends up in the “Undiscovered Rework” box. We move into our next sprint believing that we have finished requirements A, B and C and that we will be able to do requirements E, F and G during the next couple of weeks. The problem is that stories C and G will need to be redone completely later on (I’ve simplified the example by gathering all errors in one user story here).

After going for four iterations believing that we have a velocity of three, we look at the last three remaining items in our backlog and think that we are one iteration away from our goal. But testing will show that we actually have four more stories to complete from our previous sprints. So we actually have seven (!) stories to implement.

We are not talking about one more sprint anymore. That is more like two and half sprints. But wait a minute, our true velocity was not three stories per sprint, we actually only managed to produce two stories of good enough quality per sprint so that means that our seven remaining stories actually will take three and a half sprints to complete. Now we’ve gone from being almost done, to being halfway done.

The insights about the remaining work in the previous example will not happen all at once. They will usually dawn on us one at a time and without us being able to see the connections. The symptoms that management and we will see are that work suddenly begins to slow down. So we begin to re-estimate our work when the first bug reports come in. During our first re-estimates we probably still believe that our velocity is three stories per sprint and we will just add some time for the bugs that have been found so far. Then as we move closer to testing and get faster response on our fixes our true velocity will begin to show and we will need to re-estimate again. What often happens at this point is that pressure from management will force people to take shortcuts so the rework becomes fixes and the fixes becomes workarounds and these workarounds will create new problems for us. Stress generally forces people to make bad decisions that will put the project even more in a pickle. If we are totally out of luck, management will fall back into old command and control ways of thinking at this point and force the project to go from pull to push while beginning to micro-manage people and thus slowing down progress even more. Now there is really no saying how long this project will take anymore.

Conclusion

Good estimation is about always being correct but adding precision as we learn more.

Most projects start out with a well-defined status (i.e. nothing has been done) and they stand a fair chance of making a somewhat decent initial estimate. Nurturing this estimate by collecting data while at the same time assuring quality will help bring precision to it. But if we allow for quality to be an unknown and still fool ourselves into believing that our data gathering will add precision to our estimates, then we are heading for a crash. The false sense of knowing where we stand will only make further estimates more and more off for every new data point gathered.

Turning this knowledge around though, you can use it as a canary in our project. If you experience that your estimates begin to get worse over time instead of improving, it might be a sign that your project has some serious quality issues.

In every project I’ve been a part of, we’ve been working against requirements. Now I think this is regrettable, because what we should be doing is to satisfy needs. The reason for building a system is that some stakeholder has a need and thinks that we can fulfill the need through software. There are requirements as well, but these should never be our main driver.

  • Our first-line stakeholder needs to be able to retrieve data at a later point in time. Someone else imposes the requirement that all persistence should be done with Oracle products.
  • Our first-line stakeholder needs an intuitive user interface. Someone else imposes the requirement that the user interface must follow some predefined design guidelines.
  • Our first-line stakeholder needs a fast system. An integrated system requires that our solution responds within 5 milliseconds.

Requirements are the restrictions that other stakeholders impose on the system that we are building. Needs are “what’s” while requirements are “how’s”. We will have to satisfy different levels of stakeholders but the raison d’etre for our application is always the need of a first-line stakeholder. If the same person also has requirements, i.e thoughts on how we should solve the problem, we need to acknowledge that the person is wearing more than one hat and take this into consideration when prioritizing our backlog. A requirement should never trump the need that it relates to.

If we let the “how’s” take precedence over the “what’s”, we will build the wrong system.