I’ve been struggling for a metaphor to help explain the difference between statistical forecasting and estimation, and this one came to me, so lets give it a whirl.
Let me express the scenario in some human readable BDD (Gherkin) language
Given I am an orange juice shop owner
and currently have no oranges
when I buy 25 boxes of oranges of varying sizes and varieties
then I need to forecast how many glasses of juice I can produce to sell
Let me now describe two different approaches, the first like normal agile estimation processes, the second like a scientist.
Method 1 - Estimation
I close my shop to customers gather my team together. I open the first box of oranges and hold up the first orange, showing them how plump it is and telling them it looks like it is probably a Belladonna variety orange (http://en.wikipedia.org/wiki/Orange_(fruit)#Common_oranges).
I ask them all to estimate how much juice the orange will yield. They each choose a planning poker card and I count them down, and they reveal their numbers. I ask them to discuss any outlying responses and then re-estimate until we get consensus. I note the number and move on to the next orange.
The second orange is a Berna orange and is quite small. We estimate that one. And so the long day goes on, with us repeating the process and selling no orange juice.
Eventually we finish estimating the first box, and we have a number which is our estimate of how much juice box 1 will yield. We have to make a decision, do we stop estimating and open the shop to sell some juice or do we go on and do the same process on box 2 through box 25.
Some juice shops stop at 1 box and use the estimated yield for that box as the "magic number" of yield per box. Other shops keep on going until all 25 boxes have been estimated one orange at a time, because they need a “more accurate estimate”. The downside of the accuracy is that we had to buy it by keeping the shop closed for 25 times as long.
What do I get at the end? Somewhere between 200 and 300 Story Points of oranges.
Method 2 - Mathematics & Science
I open box 1, and count how many oranges are inside. I also open boxes 2, 3, 4, and 5 and count the oranges in each of them. 10 minutes later I open the shop for business and start my staff selling juice.
When a customer orders juice we measure how much juice the first 11 oranges actually yield. We don’t estimate, we measure.
I now have enough data to make a pretty accurate forecast of how much juice the boxes will yield. It’s only taken me 10 minutes.
Now for the sciencey bit. The German Tank Problem (http://en.wikipedia.org/wiki/German_tank_problem) is a famous bit of Bletchley Park Boffinery from the second world war. To save you a bit of reading, the Allies wanted to know how many of a particular tank they were likely to come up against in France when they invaded. There were 2 ways of getting the forecast, via Military Intelligence estimates, or using Statistic and probability. Lives depended on this so it had to be correct.
Here is a comparison of the 2 methods used over time, and on the right, the actual numbers found out at the end of the war.
So the provenance for using this method is pretty good. Lets apply it to the oranges.
The key to it is understanding that the maths lets us make very accurate predictions using very small sample size. Indeed the formula below shows how likely the next item measured falls within our existing range of highest to lowest values, where k is the size of our sample
% Likelihood = (1 - (1 / k – 1)) * 100
So if we have 5 boxes of oranges with 17, 23, 16, 30, 25 oranges in each, the likelihood of the 6th box having more than 16 oranges and less than 30 is (1 - (1 / 5 -1)) * 100 which is 75%. 75% likelihood of all future boxes being inside the current know range from a sample of only 5 boxes.
A sample of 11 gives us 90% likelihood of the next being within our known range. Credit goes to Troy Magennis for explaining this to me over a couple of Weissbiers at the Kanban Leadership Retreat in 2013.
So thats a likelihood of 75% that we have between 16 and 30 oranges per box. That gives us a median of 23 oranges per box
So, how much juice will we get per orange? For the 11 oranges I measured I got 79.1, 78.5, 71.2, 72.1, 65.2, 79.3, 73.2, 67.2, 65.0, 75.3, and 69.1 ml.
So thats 90% likelihood that all oranges have between 65.0 and 79.3 ml each. That gives us a median of 72.2ml juice per orange
So I have 25 boxes, each with 23 oranges, each giving 72.2ml juice. I have a total yield of 25*23*72.2 ml of juice = 41.515 Litres of Juice. I’m going to have to buy a lot more boxes to keep my shop in stock for the day. I’m glad I found that out early enough to get back to the wholesalers in time.
In the knowledge work world, we tend to have to solve the same problem for forecasting work completion, and measure days per work item, and work items per “epic” or “MMF” or “MVP” or Project (whatever you call your orange boxes in your context). If you want real accuracy instead of working out medians and using those, you would plug the very same numbers into a Monte Carlo Simulation of your system of work, and work out how many of the project runs finish before each date. That is much more complicated to do, and requires a good old dose of processor power, but is far more accurate than my simple sums on the median values. However, even my simple sums are much more accurate than estimation, and cost much less time off from doing value work to generate.
If you’re running an IT Software project with 25 epics, you’d break down the first 5 epics into stories (the ones epics you’re going to work on first anyway) to work out the stories per epic number. When you start work you can see that each story takes between say 2 and 9 days based on a sample of 11 stories… We have all the data we need to make a good forecast, and not one piece of estimation has occurred. Most of the data is derived from doing the actual work we need to do to finish the project. Which is ideal, as it means we are focusing on doing the thing that will get us finished, and the forecast is a secondary outcome, not a distraction from doing the work like with estimation.