Let’s pretend for just a second that we need estimates in order to perform our business. Some of you will say that we do and some will probably say that estimates are a big waste. But for the moment, let’s at least pretend that they have a place.

dice

Usually we do estimation in order to provide some kind of predictability in our deliveries. It’s just that an estimate is not enough on its own. Knowing that something will take 6 man weeks to implement has no value unless we know that we have 6 man weeks at our disposal. We need to combine our estimate with some kind of capacity measure in order to get predictability. There’s a big difference if our team can give the task 6 man weeks worth of attention within the next two week iteration or if they’re overloaded with other work and need 4 calendar months to finish the requested 6 man weeks.

So we need an estimate AND a capacity in order for the estimate to have any value. The thing is that it’s not enough either. When we estimate, we also need to agree on what we’re estimating. We need to have the same view on the expected quality; both external and internal quality. Everyone involved needs to know how the solution is supposed to behave; if the customer expects a Lexus but a developer is about to build a go-cart, the estimate will have no value. Everyone involved needs to have the same view on the level of confidence for the internal quality; if the developer is ok with Windows 95 quality but the tester is expecting a pacemaker level of confidence, the estimate will have no value.

So now we need an estimate AND a capacity AND an understanding of quality in order for the estimate to have any value. The thing is that if we make an estimate and it’s wrong, the effects will fade over time (unless we’re dealing with systematic estimation errors). If a requirement was estimated to take 5 days but actually took 10 days (a 100% estimation error), the effect on a six-month project will be less than 4%. An error in capacity on the other hand will multiply if left to itself. If a team is working in two-week sprints and plans are made with a 10% error in capacity, this error will multiply for each sprint and for a six-month project, we’ll have to add another two sprints to the end in order to finish what we had initially planned. But even worse is the cost of poor quality. These costs tend to rise exponentially with time. The longer time a poor assumption or a bug goes unnoticed, the more code will get built on top of that error and either multiplying the instances of the actual error or at least building dependencies to the error.

In short:
Error in estimate – impact decreasing linearly with time
Error in capacity – impact increasing linearly with time
Error in quality – impact increasing exponentially with time

But where do people put their attention when plans fail? They usually address the estimate and way too often put blame on the teams for not doing good enough estimates. Apart from being unethical since estimates are nothing but guesses, it’s also a waste of time since any deviations from the plan are much more likely to come from errors in capacity measurements (or worse; capacity estimates) or a mismatch in the understanding of what quality (or functionality) was being estimated.

So if predictability is what you’re looking for, don’t invest much in your estimates, instead you should make sure that your capacity is known and that quality (internal as well as external) is well understood. And that’s why your estimates don’t really matter.

I wrote in my previous post about the Scrum team I’m working in as a ScrumMaster and that we’re closing in on our first release to production. At this stage a lot of the work is related to getting production environments up and running and our user stories have taken on a more technical format and are formed more by the team than the Product Owner. Our PO had a story asking for a production environment but that one was way too fuzzy for the team to take in so they had to break it into smaller stories/technical tasks. A lot of this work was done on the fly during the planning session and we needed to find defintions of done for the stories as they were forming.

The task of finding good definitions of done proved to be harder than anticipated for these stories. After a while I realized that what we were specifying tended to be more of a list of tasks than acceptance criteria. So we backed up the value chain by asking “Why” we wanted to do something and started arriving at better definitions of done. However, the crown jewel was when we were discussing the job of getting outsourced operations to monitor our application. What was the definition of done here? Putting the order for monitoring services in the mail? Getting a reply to our order? Having a signed contract? We weren’t really getting anywhere until all of a sudden one of the team members spoke up:

“I want to turn off one of our services and when I see operations dancing a jig on our doorstep, that’s when we’re done.”

I damn near got a tear in my eye hearing this suggestion. This is the kind of thinking that we need when we measure things. Whether it’s a level of service, quality or productivity we want to measure we always need to begin by looking at what we want to accomplish. We can’t demo usability by showing a pretty UI, we need to put a user in front of the UI to demo usability. We can’t demo quality in number of bugs found, we must demo quality in a fit for use product that is stable under usage and over time. And if we want to demo our ability to handle problems, we can’t do that by waving a contract. We demonstrate our ability to handle problems by handling a problem.

This episode reminded me of a story an aquiantance told me ten years ago about his neighbor. The neighbor had a burglar alarm connected to a security company. The security company promised in their service to arrive at the house within 20 minutes of an alarm. Twice every year this neighbor set the alarm off. He then pulled out a lawn chair, put on ear protections and sat down with a timer in his hand and when the security company arrived after half an hour or fortyfive minutes, he filed a complaint and got the service for free for another six months. This guy knew what the definition of done was; he also waited for operations to dance a jig on his doorstep.

If you want to measure or demo some qualitative aspect, don’t settle for the easy way out and try to quantify it. Put it to the ultimate test, that is the only way you’ll ever know for sure that you’ve done something right.

Quality Radar Retrospective

February 25, 2012

I’m currently working as ScrumMaster with a Scrum-team that is closing in on our first release to production. The entire release has been tough, I’d almost say mission impossible, in order to get the basic functionality into the procuct. What has been produced by team, Product Owner and other people around us is nothing short of a miracle but we’ve cut a few corners along the way and several decisions have been deferred until later. The thing is that “later” is coming at us now like a freight train. Yesterday I decided to run a retrospective trying to find out what the team consider to be any main quality issues.

Earlier in the release I had planned to perform David Laing’s Complexity Radar retrospective but never got around to actually doing it. At this time I figured that we’re not in a position to do any architectural changes but we can still do some stabilizing activities and perhaps sand the edges of some of the corners that have been cut. So instead of looking at complexity I changed the perspective to quality (whatever that is).

I opened up with a check-in; “Which day of the sprint was your favorite?”. Most members pointed to the last day of the sprint “because a lot of pieces fell into place”.

I know, I know … we’re still working on WIP-issues. – But that’s not why we’re here today.

After the check-in I presented the Quality Radar to them:

Everyone got five points to distribute among the dimensions.

0 points – Stable

1 point – Unknown quality

2 points – Known quality issues

Since the team is distributed over two locations, we used corkboard.me where I had placed Post-its according to the Quality Radar above:

The team members put up Post-its on the radar with their points (one’s and two’s) with a comment on why they thought we had an issue.

My original plan was then for team members to pair up (one from each city) and analyze the radar in pairs but due to time limitation we did it as an open discussion instead.

We began looking at the biggest cluster of notes that was around maintainability. It turned out though that most concerns where not regarding the maintainability of the application but regarding the maintenance organization. Since this was out of the teams reach we decided that I’d raise these concerns withing the organization and we moved on the the second largest cluster that was around the applications’ capacity. What came up was that not only did we have unknowns in this area but there was also a known issue that had fallen between the chairs.

It was decided that load testing will be a focus for testers during the next sprint with extra attention given to the known issue.

If you’ve read my blog before you might have seen that I’m not particularly fond of the word “quality” because of it’s fuzziness but in this case I left it to the team to decide what they meant by “quality”. I’m really glad that we did this because it raised several issues beyond those we prioritized this time. I will use this format again in the future but I’ll make sure that we have more time for analyzing the data and for suggesting and deciding on actions. Another thing that’s bothering me is the scale where I gave more significance to known issues than to unknown quality. This could easily draw people’s attention away from untested areas and into minor issues that they’ve seen. I welcome any suggestions on how to address this risk.

In an open space session a couple of days ago, a colleague of mine said that we’re using the word quality a bit carelessly. I’d like to go one step further and say that it is a careless word. It is a careless, sloppy, non-specific word that leads to careless decisions and sloppy measurements, and I want to take it out back and put it to sleep.

We have pondered the question of what quality is for years and years. A number of smart people have come up with several good definitions of quality such as:

  • Fitness for use
  • Meeting and exceeding customer expectations
  • Satisfied stakeholders
  • etc …

These are all fine but they’re not very precise. Quality seems to be almost synonymous with “good”.
“Yes, I’m with the good assurance department. We make sure that the level of good in the product is at an acceptable level.”

Of course we want to make sure that our product is fit for use and satisfies our stakeholders but that won’t happen unless we become more explicit. What happens though when we use such a sloppy word is that management gets a non- or at least ill-defined parameter to negotiate with and to measure.
“At this point we must let costs take priority over good.”
“How is the good coming along in our product?”
“What KPIs do we have for measuring good in our product?”

I’m sorry, but I won’t answer one more question regarding quality from now on. You will have to be a lot more specific if you want to know how good the product is. If you want to know about usability, you will have to ask me about usability. If you want to know about maintainability, you will have to ask me about maintainability. If you want to know about errors in programming, you will have to ask me about errors in programming. I will probably keep probing even more about what you want to know and why you want to know it, but if you ask me about quality again, my answer will be … 42.

“So tell me what you want, what you really really want” – Spice Girls

– It’s 42 boss.
– …
– Yes boss, that’s higher than the 5 bugs we had a couple of sprints back.
– …
– No boss, that was the sprint when we spent the majority of the effort implementing the configurable splash screen and the easter egg TicTacToe-game. The features that you predicted would triple our sales.
– …
– Yes boss, it is also lower than the 50 bugs we had last sprint.
– …
– Yes boss, last sprint. You know, the one when you hade us work overtime every second day in order to fulfill our customer’s demands of removing the configurable splash screen and the easter egg TicTacToe-game. Remember how they said that the features lowered the productivity for their employees?
– …
– No boss, it’s still 42 this sprint. However, we’re only on week three of our two week sprint that you’ve prolonged to four weeks in order to fit some extra functionality in. So I guess the number might still go up.
– …
– Boss, I think I’m actually beginning to see the value in counting bugs. These numbers seem to have a strong correlation to the decisions you’ve made. We can probably learn a lot from them.
– …
– What the hell was it you wanted to know boss?
– …

There’s an old saying that a lazy programmer is a good programmer. I’ve never heard an old saying about a lazy manager being a good manager but yet there are all too many of them out there.  Not the good kind of lazy manager who is willing to delegate work and to let people do what they’re good at without micromanagement. No, we get the other kind of lazy manager – the Panacea Manager – who is constantly looking for a shortcut.

The Panacea Manager will measure the organization on what he can count on his fingers and toes because words bore him, unless of course they’re his own. The Panacea Manager will ask for bug count, lines of code, number of test cases etc.

What do you think the Panacea Manager will receive when asking for these numbers?
Is it hard for a developer to produce a lot of code if that is what she is being measured on?
Is it hard for a tester to produce a lot of test cases if that is what he is being measured on?
Will the number of bugs found tell you anything about the quality of the software after they’ve been corrected?

Do our customers care about the bug count or LOC or number of test cases? I don’t think so. Bug count and most other numbers that we collect are just proxy variables for quality as we think our customers perceive it. Measuring quality and productivity via proxy variables might very well result in quite the opposite of what we’re trying to achieve.

How about if the Panacea Manager took a long look at the system he has built or inherited and started making deliberate decisions based on how he actually wanted the system to work? What if he stopped measuring bug count and lines of code and began asking the customers for what really matters to them? Would it make his job harder if he tried to compare customer satisfaction over time instead of employee hours spent in the work place? It probably would to some degree but it would also mean that he actually began doing his job.

So what should we measure then? I’ve learnt that many people itch to ask that question after listening to my arguments. Please don’t! That means I’ve failed at getting my point across and I’ll loose even more hair on my head. Begin with asking yourself; what decision do I want to make and what do I need to know in order to make that a good decision? THAT is what you should measure.

You all know that stress is just a killer to quality, don’t you? I think all of us know this at some level but we often act as if it wasn’t so. I’d like to share a story about how this became painfully apparent to me today.

To follow up on a recent kidney stone, I needed to take a blood test. I figured I could do this during my lunch break today so I walked down to a medical clinic quite close to work. I had barely sat down before a nurse in her mid 120’s (let’s call her Nurse Singer in tribute to Mr Isaac Merritt Singer) called me in. It was obvious that this lady had been in this line of work for a very long time, which actually made me feel kind of relieved. Younger nurses often have a tendency to wuss around a bit more and unnecessarily prolong the process and since I had a lunch meeting to attend to in just a couple of minutes, this suited me just fine.

We went through the usual procedure with me spelling out my name and social security number while she prepared the needle and tubes. I then rolled up the sleeve on my right arm and she tightened a rubber band around it. After some poking around she seemed to have found a good enough vein and stuck the needle into it. First tube went just fine but mid second tube my lunch date called on my cell to find out where I was. This must have made the nurse jump enough to move the needle out of place because the tube stopped filling up. She tried to find the vein again by moving the needle around but to no avail. I asked her if she wanted to try again with my other arm. She accepted my offer and tied the rubber band around my left arm instead. This time she had some obvious problems finding a good vein and decided to go for a smaller one. After bringing out a thinner needle she once again stung me. This one was a duster to begin with. She pushed and pulled the needle a couple of times but no blood was coming out.

Nurse Singer was becoming apparently nervous now and offered me a stream of apologies. I told her that it wasn’t any problem, to just relax and give it another shot. So she pulled out a fresh needle and went back to my right arm. After a lot of poking this time she decided that she had found the vein to hit. Needle went in and … no blood this time either. More excuses and this time she told me that this wasn’t really working out for her so she would go and fetch a colleague instead (let’s call her colleague Nurse Nightingale). She came back after a minute telling me that Nurse Nightingale had just started on a new patient and that I would have to wait for a while. Already late for my lunch date I was getting a bit stressed as well so I asked Nurse Singer if she didn’t want to give it another shot. After some hesitation she picked up a new needle and went back to my left arm. Well, to make a short story even shorter; it was another barren well.

This time we both agreed that waiting for her Nurse Nightingale would be the only sensible thing to do. Five minutes later I was seated in another room, Nurse Nightingale (another contemporary of Lucy) calmly poked my left arm a couple of times with her index finger, pushed a needle into it and swiftly filled two more tubes with my blood.

Now I’m convinced that, in spite of her turning me into a Swiss cheese, Nurse Singer is probably a very competent nurse who just happened to end up in a stressful situation. She made a small mistake and then noticed that I was in a hurry. Trying to rush things she only managed to lower the quality of her work even more and thus created a need for more rework. In software projects this happens all the time. As deadlines approach, we get stressed (or get stress put unto us) and we begin to make poor decisions. We take shortcuts and we skip good practices to save time. We might not perforate our clients but we do harm to them by wasting their money on low quality products. So the next time you start to feel stress coming over you, remember the old proverb that “haste makes waste”. Take a couple of minutes to calm down and become the Nightingale your clients need.

%d bloggers like this: