4. Data in context - Stop using historical data when “context” changes

Why it matters

A part of being a good forecaster is to know when not to use data and to re-engage the human mind instead.

Humans intuitively know when a situation has changed enough that the data is no longer relevant. When forecasting is built into tools, these tools often use all the data available, sometimes at the expense of accuracy and context. The big assumption for Big Data is that it is universally good, but it is not.

Example: In the EU in August at the peak of the summer holiday season, it would be silly to use data from April to June to forecast what will get done in August.

Meteorologists use different forecasts for summer and winter. They don’t use 10 years’ of summer data to forecast winter. They just use the previous 10 year’s winter data to forecast the forthcoming winter.

This can happen in Agile as well, using “yesterday’s weather” is not always relevant.

Humans are well equipped to judge when context changes and historical data becomes less meaningful.

When forecasting on small data samples, one bad apple can spoil the whole bunch. However, when you have a situation where you have tens of thousands of data points, the variability caused by the bad apple is easily evened out by the other 9999 data points.

How it works

Every bit of data must earn its place if it is to be counted.

As you add historical data, reflect on the likelihood that it could recur in the future.

This is an imperfect task, hard to get right. Avoid using data that has no prospect of recurrence. For example, a team that has been disbanded and no longer exists.

How do you spot the difference between outliers and flukes, and recurrent events? To avoid bias, the data you use should give you a happy outcome and there should be a consistent way to remove unwanted data.

Here are some examples:

  • One-off events should be removed.
  • If a story can be made that an event can recur in the forecast timeframe, leave it in
  • Make sure you don’t just remove the worst outliers, you have to be consistent and remove ALL outliers

Forecasting requires an assessment of what historical data to use. This is a human task that should not be relegated to a tool.

The reason why this matters is that when you are presented with a forecast, the first question anyone would ask is, “Do I believe it?”  Ultimately, we intuitively know if we believe that historical data makes sense for the forecast in question.