3. Consider data – Use data instead of estimates if you have enough data

Content champion Troy Magennis

Why it matters

People have cognitive biases. Well captured data does not.

How it works

The most common bias when estimating is an optimistic bias. How so? We are generally quite good at estimating individual items, but we are often not thinking of system factors that impact the outcome.

The best way to mitigate cognitive bias is to use data. But use it in a smart way!

Wouldn’t cognitive bias also impact the data we select, you might ask. Sure, but the good news is that if the data used is at the system level, then it would be much harder for an individual to manipulate it.

For example, it is quite easy for an individual member of a development team who is doing the estimate of a story point to “anchor” (influence) other members in the team towards a certain number. But it is much harder for an individual to manipulate the throughput of the development team.

Steps you should take:

Step 1

Get a comparable range estimate against which you can validate your forecast at a system level.

Step 2

Use 3 to 7 samples of real inputs, run it through your model to see if the forecast you got was even close to real world.

Step 3

If you have more than 7 samples, throw away any range estimate. Use the data instead.

Step 4

Old and stale data can mislead and distort, you should delete it if you can. The biggest error people make is using stale data (say 1000 samples). Keeping too much unrepresentative data amplifies uncertainty. Better still, keep the 11 most recent samples and discard the rest.

What to do about outliers? An outlier should stay if it is likely to occur again. If it is a one off event, you should delete it.

Here’s an illustration of how misleading data could inadvertently distort:

Say you are forecasting the delivery date for a project, if you only extract data from the “crunch time” period during the end of the project, this data would be misleading. If people work overtime or weekends during this phase, it is clearly not a representative or a sustainable production rate to base your forecast on.