Death to story points! Long live T-shirt sizing!

I am not sure I invented story points but if I did I’m sorry now. 😇
— Ron Jeffries (@RonJeffries) December 21, 2017

Let’s start with the important question:

Why do we estimate?

I’ll summarize it with two reasons:

To see how long things will take (i.e., forecasting). We can have discussions like “we usually finish 30 points a week, and this milestone has 120 points left, so it’ll take about a month.”
To uncover differences in assumptions (e.g., Planning Poker). If I say something is 2 points and you say that thing is 8 points, then one of us knows something the other doesn’t, and that needs to be surfaced and discussed.

Both of those are very valuable, and any other solution besides story points needs to retain them.

With that said, let’s get the definitions out of the way:

Story points: A measure of relative size in terms of (usually fibonacci) numbers. 1, 2, 3, 5, 8, etc.
T-shirt sizes: A measure of relative size in terms of sizes of T-shirts. S, M, L, XL, etc.

My troubles with story points

I have two problems with story points.

First and foremost, story points make it too tempting for people to mentally equate points to hours/days. If a new person joins the project and asks “how are we pointing?” and someone else answers with something like “a point is roughly a day of work” then you’ve already lost. That team is thinking absolutely rather than relatively, and estimates should always be relative.

Estimates should be based on size, not time. My favorite metaphor here is that mowing the lawn gets the same estimate whether you’re using a lawnmower or a pair of scissors, because the “size” of the work is the same.

If you’re estimating based on how many hours/days you’d guess something will take, then you’re missing the point. You are assuming our velocity (i.e., number of points per day), but velocity is something that should be discovered by measuring it as an average over a period of time.

Secondly, story points are more specific than necessary. Have you ever had a Planning Poker session where the majority of time is spent nitpicking on if tickets are 2 points vs. 3 points, or 3 points vs. 5 points? That is an absolute waste of time. Those discussions aren’t valuable at all.

“But that will lead to more accurate forecasting!” you may argue. No it won’t. Velocity and forecasting deal with tickets by the dozen and time spans in terms of weeks or months. The average is what matters, and our estimates aren’t nearly accurate enough that 2 points vs. 3 points is worth talking about.

So why t-shirt sizing?

T-shirt sizing solves both of those problems.

It’s much easier to think relatively rather than absolutely when you aren’t using numbers. It’s more difficult to make the leap t-shirt sizes to days than it is to make the leap from points to days. So teams using t-shirt sizing naturally estimate the size rather than the duration of the work.
It eliminates the pointless nitpicking over specific estimates. The difference between a S and a M is much more worth talking about than the difference between 2 points and 3 points. It reduces waste in the form of pointless conversations that don’t actually provide any value.

But what about forecasting?

Here’s the sticky bit for most teams. If you aren’t counting points, then how can you measure velocity and forecast the next milestone?

The answer is simple: count the tickets. Instead of measuring velocity in terms of points, measure it in terms of tickets. Velocity is meant to be an average over a period of time. Because of that, velocity measured in terms of tickets is nearly always as accurate as velocity measured in terms of points.

Try it. On your current project, create a graph of the number of tickets completed per sprint, and on that same graph, add the number of points completed per sprint. I bet you the 2 lines will go up and down together in almost perfect harmony.

This is a beautiful thing because it means that you can throw the estimates away immediately when Planning Poker ends. You only need them to surface differences in assumptions, and once that happens, you’re done with them.

This brings an interesting side effect: lots of toxic anti-patterns are suddenly made impossible. For example, team members cannot have their productivity measured in terms of the number of points they complete per sprint, so there’s no point in padding your estimates to make your numbers look better.

Give it a shot

Suggest it as an experiment to your team. For the next sprint, do Planning Poker in terms of t-shirt sizing, switch to counting tickets for measuring velocity and forecasting, and see what happens. I bet the team will feel a weight lifted, and I bet nobody will miss story points.