Let’s pretend for just a second that we need estimates in order to perform our business. Some of you will say that we do and some will probably say that estimates are a big waste. But for the moment, let’s at least pretend that they have a place.

dice

Usually we do estimation in order to provide some kind of predictability in our deliveries. It’s just that an estimate is not enough on its own. Knowing that something will take 6 man weeks to implement has no value unless we know that we have 6 man weeks at our disposal. We need to combine our estimate with some kind of capacity measure in order to get predictability. There’s a big difference if our team can give the task 6 man weeks worth of attention within the next two week iteration or if they’re overloaded with other work and need 4 calendar months to finish the requested 6 man weeks.

So we need an estimate AND a capacity in order for the estimate to have any value. The thing is that it’s not enough either. When we estimate, we also need to agree on what we’re estimating. We need to have the same view on the expected quality; both external and internal quality. Everyone involved needs to know how the solution is supposed to behave; if the customer expects a Lexus but a developer is about to build a go-cart, the estimate will have no value. Everyone involved needs to have the same view on the level of confidence for the internal quality; if the developer is ok with Windows 95 quality but the tester is expecting a pacemaker level of confidence, the estimate will have no value.

So now we need an estimate AND a capacity AND an understanding of quality in order for the estimate to have any value. The thing is that if we make an estimate and it’s wrong, the effects will fade over time (unless we’re dealing with systematic estimation errors). If a requirement was estimated to take 5 days but actually took 10 days (a 100% estimation error), the effect on a six-month project will be less than 4%. An error in capacity on the other hand will multiply if left to itself. If a team is working in two-week sprints and plans are made with a 10% error in capacity, this error will multiply for each sprint and for a six-month project, we’ll have to add another two sprints to the end in order to finish what we had initially planned. But even worse is the cost of poor quality. These costs tend to rise exponentially with time. The longer time a poor assumption or a bug goes unnoticed, the more code will get built on top of that error and either multiplying the instances of the actual error or at least building dependencies to the error.

In short:
Error in estimate – impact decreasing linearly with time
Error in capacity – impact increasing linearly with time
Error in quality – impact increasing exponentially with time

But where do people put their attention when plans fail? They usually address the estimate and way too often put blame on the teams for not doing good enough estimates. Apart from being unethical since estimates are nothing but guesses, it’s also a waste of time since any deviations from the plan are much more likely to come from errors in capacity measurements (or worse; capacity estimates) or a mismatch in the understanding of what quality (or functionality) was being estimated.

So if predictability is what you’re looking for, don’t invest much in your estimates, instead you should make sure that your capacity is known and that quality (internal as well as external) is well understood. And that’s why your estimates don’t really matter.

So we’re supposed to be working with continuous improvements now? Kaizen, Toyota Kata, Retrospectives, PDCA and what else? Don’t worry, I’m not going to bash any of these approaches, they’ve all got their merit. I just want to write some about the follow-up.

In order to see if our improvement efforts are giving us the expected benefit we need to measure something. It could be hard metrics such as velocity, cycle time or costs. Or it could be softer metrics such as happiness index, perceived workload or communication. We need at least to be able to express the metric in terms of “more of” or “less of” so we can see if we got more of what we wanted or less of what we didn’t want after implementing a change. But of course you already knew that.

What most people also know but have a tendency to forget is that there are no free lunches. There is always a tradeoff.

Tradeoffs

There is always a tradeoff and you need to identify it so you can ask yourself the question that Jerry Weinberg poses in The Secrets of Consulting;
“What are you willing to sacrifice?”

Don’t measure your progress in just one dimension, no metric should be evaluated on its own. Identify at least one possible tradeoff before you hop on your PDCA-cycle and start pedalling. Find a metric that allows you to follow-up on this possible tradeoff as well and then ask yourself what you’d be willing to sacrifice in this dimension to reach your goal in the dimension you wish to improve. Follow up on both of these metrics (or all, if you’ve identified more than one possible tradeoff) to see when the cost of your improvement efforts is exceeding the benefit.

I wrote in my previous post about the Scrum team I’m working in as a ScrumMaster and that we’re closing in on our first release to production. At this stage a lot of the work is related to getting production environments up and running and our user stories have taken on a more technical format and are formed more by the team than the Product Owner. Our PO had a story asking for a production environment but that one was way too fuzzy for the team to take in so they had to break it into smaller stories/technical tasks. A lot of this work was done on the fly during the planning session and we needed to find defintions of done for the stories as they were forming.

The task of finding good definitions of done proved to be harder than anticipated for these stories. After a while I realized that what we were specifying tended to be more of a list of tasks than acceptance criteria. So we backed up the value chain by asking “Why” we wanted to do something and started arriving at better definitions of done. However, the crown jewel was when we were discussing the job of getting outsourced operations to monitor our application. What was the definition of done here? Putting the order for monitoring services in the mail? Getting a reply to our order? Having a signed contract? We weren’t really getting anywhere until all of a sudden one of the team members spoke up:

“I want to turn off one of our services and when I see operations dancing a jig on our doorstep, that’s when we’re done.”

I damn near got a tear in my eye hearing this suggestion. This is the kind of thinking that we need when we measure things. Whether it’s a level of service, quality or productivity we want to measure we always need to begin by looking at what we want to accomplish. We can’t demo usability by showing a pretty UI, we need to put a user in front of the UI to demo usability. We can’t demo quality in number of bugs found, we must demo quality in a fit for use product that is stable under usage and over time. And if we want to demo our ability to handle problems, we can’t do that by waving a contract. We demonstrate our ability to handle problems by handling a problem.

This episode reminded me of a story an aquiantance told me ten years ago about his neighbor. The neighbor had a burglar alarm connected to a security company. The security company promised in their service to arrive at the house within 20 minutes of an alarm. Twice every year this neighbor set the alarm off. He then pulled out a lawn chair, put on ear protections and sat down with a timer in his hand and when the security company arrived after half an hour or fortyfive minutes, he filed a complaint and got the service for free for another six months. This guy knew what the definition of done was; he also waited for operations to dance a jig on his doorstep.

If you want to measure or demo some qualitative aspect, don’t settle for the easy way out and try to quantify it. Put it to the ultimate test, that is the only way you’ll ever know for sure that you’ve done something right.

In an open space session a couple of days ago, a colleague of mine said that we’re using the word quality a bit carelessly. I’d like to go one step further and say that it is a careless word. It is a careless, sloppy, non-specific word that leads to careless decisions and sloppy measurements, and I want to take it out back and put it to sleep.

We have pondered the question of what quality is for years and years. A number of smart people have come up with several good definitions of quality such as:

  • Fitness for use
  • Meeting and exceeding customer expectations
  • Satisfied stakeholders
  • etc …

These are all fine but they’re not very precise. Quality seems to be almost synonymous with “good”.
“Yes, I’m with the good assurance department. We make sure that the level of good in the product is at an acceptable level.”

Of course we want to make sure that our product is fit for use and satisfies our stakeholders but that won’t happen unless we become more explicit. What happens though when we use such a sloppy word is that management gets a non- or at least ill-defined parameter to negotiate with and to measure.
“At this point we must let costs take priority over good.”
“How is the good coming along in our product?”
“What KPIs do we have for measuring good in our product?”

I’m sorry, but I won’t answer one more question regarding quality from now on. You will have to be a lot more specific if you want to know how good the product is. If you want to know about usability, you will have to ask me about usability. If you want to know about maintainability, you will have to ask me about maintainability. If you want to know about errors in programming, you will have to ask me about errors in programming. I will probably keep probing even more about what you want to know and why you want to know it, but if you ask me about quality again, my answer will be … 42.

“So tell me what you want, what you really really want” – Spice Girls

– It’s 42 boss.
– …
– Yes boss, that’s higher than the 5 bugs we had a couple of sprints back.
– …
– No boss, that was the sprint when we spent the majority of the effort implementing the configurable splash screen and the easter egg TicTacToe-game. The features that you predicted would triple our sales.
– …
– Yes boss, it is also lower than the 50 bugs we had last sprint.
– …
– Yes boss, last sprint. You know, the one when you hade us work overtime every second day in order to fulfill our customer’s demands of removing the configurable splash screen and the easter egg TicTacToe-game. Remember how they said that the features lowered the productivity for their employees?
– …
– No boss, it’s still 42 this sprint. However, we’re only on week three of our two week sprint that you’ve prolonged to four weeks in order to fit some extra functionality in. So I guess the number might still go up.
– …
– Boss, I think I’m actually beginning to see the value in counting bugs. These numbers seem to have a strong correlation to the decisions you’ve made. We can probably learn a lot from them.
– …
– What the hell was it you wanted to know boss?
– …

There’s an old saying that a lazy programmer is a good programmer. I’ve never heard an old saying about a lazy manager being a good manager but yet there are all too many of them out there.  Not the good kind of lazy manager who is willing to delegate work and to let people do what they’re good at without micromanagement. No, we get the other kind of lazy manager – the Panacea Manager – who is constantly looking for a shortcut.

The Panacea Manager will measure the organization on what he can count on his fingers and toes because words bore him, unless of course they’re his own. The Panacea Manager will ask for bug count, lines of code, number of test cases etc.

What do you think the Panacea Manager will receive when asking for these numbers?
Is it hard for a developer to produce a lot of code if that is what she is being measured on?
Is it hard for a tester to produce a lot of test cases if that is what he is being measured on?
Will the number of bugs found tell you anything about the quality of the software after they’ve been corrected?

Do our customers care about the bug count or LOC or number of test cases? I don’t think so. Bug count and most other numbers that we collect are just proxy variables for quality as we think our customers perceive it. Measuring quality and productivity via proxy variables might very well result in quite the opposite of what we’re trying to achieve.

How about if the Panacea Manager took a long look at the system he has built or inherited and started making deliberate decisions based on how he actually wanted the system to work? What if he stopped measuring bug count and lines of code and began asking the customers for what really matters to them? Would it make his job harder if he tried to compare customer satisfaction over time instead of employee hours spent in the work place? It probably would to some degree but it would also mean that he actually began doing his job.

So what should we measure then? I’ve learnt that many people itch to ask that question after listening to my arguments. Please don’t! That means I’ve failed at getting my point across and I’ll loose even more hair on my head. Begin with asking yourself; what decision do I want to make and what do I need to know in order to make that a good decision? THAT is what you should measure.

Rotting Estimates

June 20, 2011

Have you ever been part of a late project where you constantly update your estimates and plans but they continuously get worse instead of improving? Where you finally get to the point where your estimates get out of synch during the time you fetch a cup of coffee?

No? Then I congratulate you because it’s an extremely demoralizing and costly situation.

Yes? Then this post might be able to provide some insights to the dynamics at play.

The model

A colleague of mine just recently presented me with the generic project model built on the work of Pugh Roberts in the early 70’s.

The model is very simple but it gives a good starting point for discussing the recurring dynamics in projects. We pull work from our “Work to Do” box and implement it, either correctly or incorrectly. Work done correctly goes into our “Work Done” box while work done incorrectly goes into our box of “Undiscovered Rework”. Note that this has nothing to do with our physical deliveries, both work done correctly and work done incorrectly will move along the same physical path since we haven’t been able to distinguish the two from each other yet. When we do discover the need for rework we will assess the problems and move them back into our backlog of “Work to Do” again.

What is “Undiscovered Rework”?

In this post I will mainly focus on the “Undiscovered Rework” box. This is our backlog of work that we think we have completed but that will not be accepted as it is. Some of the rework will be discovered when we implement new functionality, some will be found during testing and yet some will be discovered by our end users in production. Anything we produce where the quality is unknown carries the potential to end up in this box.

The sources of “Undiscovered Rework”

The amount of work in  “Undiscovered Rework” tends to grow quite fast as a project goes along. A couple of factors that speed this growth up are:

  • Not having a good definition of done
  • Postponing testing until the end of the project

Both of these factors hide the true quality of our product and allow for different kinds of errors to pile without us knowing it. If our feedback loops for determining the quality of our product are too long or if they are missing entirely, there is really no limit to how much waste we can add to this future todo-list.

The implications

The big problem with “Undiscovered Rework” is that it hides the true progress of our project. It hides our status because we do not know how much work is actually done and we do not know how much work is actually left to do. It also corrupts our view of the rate at which we make progress.

Normally when working in an agile project where we use our velocity to predict future deliveries, our estimates narrow in and get better and better as we gather data over the sprints but this only holds true if we don’t let our hidden backlog grow. If we do not know the true quality of our product, the only thing our velocity tells us is at what rate we can produce crap. If we allow the amount of “Undiscovered Rework” to grow, our estimates will keep deteriorating over time.

An example

Let’s imagine we’re in a project where a serious amount of the testing is done at the end of the project. We begin this project with 15 user stories in our backlog and find that according to our velocity we can implement three user stories each sprint.

The thing is that one third of this work ends up in the “Undiscovered Rework” box. We move into our next sprint believing that we have finished requirements A, B and C and that we will be able to do requirements E, F and G during the next couple of weeks. The problem is that stories C and G will need to be redone completely later on (I’ve simplified the example by gathering all errors in one user story here).

After going for four iterations believing that we have a velocity of three, we look at the last three remaining items in our backlog and think that we are one iteration away from our goal. But testing will show that we actually have four more stories to complete from our previous sprints. So we actually have seven (!) stories to implement.

We are not talking about one more sprint anymore. That is more like two and half sprints. But wait a minute, our true velocity was not three stories per sprint, we actually only managed to produce two stories of good enough quality per sprint so that means that our seven remaining stories actually will take three and a half sprints to complete. Now we’ve gone from being almost done, to being halfway done.

The insights about the remaining work in the previous example will not happen all at once. They will usually dawn on us one at a time and without us being able to see the connections. The symptoms that management and we will see are that work suddenly begins to slow down. So we begin to re-estimate our work when the first bug reports come in. During our first re-estimates we probably still believe that our velocity is three stories per sprint and we will just add some time for the bugs that have been found so far. Then as we move closer to testing and get faster response on our fixes our true velocity will begin to show and we will need to re-estimate again. What often happens at this point is that pressure from management will force people to take shortcuts so the rework becomes fixes and the fixes becomes workarounds and these workarounds will create new problems for us. Stress generally forces people to make bad decisions that will put the project even more in a pickle. If we are totally out of luck, management will fall back into old command and control ways of thinking at this point and force the project to go from pull to push while beginning to micro-manage people and thus slowing down progress even more. Now there is really no saying how long this project will take anymore.

Conclusion

Good estimation is about always being correct but adding precision as we learn more.

Most projects start out with a well-defined status (i.e. nothing has been done) and they stand a fair chance of making a somewhat decent initial estimate. Nurturing this estimate by collecting data while at the same time assuring quality will help bring precision to it. But if we allow for quality to be an unknown and still fool ourselves into believing that our data gathering will add precision to our estimates, then we are heading for a crash. The false sense of knowing where we stand will only make further estimates more and more off for every new data point gathered.

Turning this knowledge around though, you can use it as a canary in our project. If you experience that your estimates begin to get worse over time instead of improving, it might be a sign that your project has some serious quality issues.

Dancing for dollars

September 8, 2010

You’ve got to listen to this! Today I travelled in both space and time to planet UN-BE-LIE-VA-BLE and back. Actually, I only took the commuter train to the north side of Stockholm, but it sure felt like I passed through several dimensions of stupidity.

I was invited to a meeting for salespeople and managers, to what supposedly was an opportunity to learn from experts in the business.

The first presenter of the day was a seasoned manager/president/marketing director (you name it) who was now working as the president of a management consulting company. The theme of the presentation was “increased profitability and performance related pay”.

The topic alone was almost enough to make me miss out on this opportunity but I figure that sometimes it’s good to hear what “they” say, just in order to keep an open mind. I don’t feel so open minded anymore, however.

There was a lot of discussion about how to design effective systems of performance related pay. Most people seemed to agree that PRP was essentially a good thing, BUT I couldn’t hear one single success story where people actually enjoyed working for any of these companies. Everyone had a bunch of issues with their systems that they wanted to be fixed before they were entirely satisfied. One common problem was that the systems changed too often(!?!).

One guy at my table, who was very pro-PRP, told us how their system kept their sales people alert:

– I’m also payed the same way and my salary is very much depending on the sales performance of my staff so you can see how I keep the whip on them at all times.

When I asked if he would pass down a top sales person who was not driven by money the same way he was, the answer was a distinct “YES”. Such a person would not fit into his salary system.

Our speaker then proceeded to tell us how it is a good idea to update a performance related system every couple of years or so:

– You see, after a couple of years people will learn how to beat the system and will start to take advantage of any weaknesses in it.

I simply had to ask if he wasn’t able to see a bigger problem in this picture. If you design a system that people feel compelled to beat, don’t you think that the problem might lie somewhere else than in trying to foolproof the system? At this time the guy at my table spoke up again:

– Well, that happens every now and again. But we have a general clause in our contract stating that the company can intervene and withhold the payment if we suspect that someone is using any unforeseen side effects of our system.

I completely lost my breath at this point and failed to say anything else. However, I think that I got an explanation for this behaviour when we were told to be careful to take any soft values into consideration when trying to keep employees happy:

– You see, it’s very hard to handle these soft values, everyone feels differently. It’s much easier to just pay people according to the sales that they bring into the company.

Scotty, please beam me home to 2010 now. This place scares me.

%d bloggers like this: