A Closer Look at Scoring Streaks

I’ve scraped some data off our local Burwood Bruisers Football Club website. Take a look and see what trends you can identify:

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22   .  W  W  .  .  .  W  .  .  W  .  W  .  .  .  W  .  W  W  .  W  .Key: W = Win; . = Lose  

You can see that early in the game they had a couple of wins, then choked under pressure and didn’t get their act together until round seven. Over time it looks like they have become more consistent in their playing style as they win one, lose one, etc. in rounds 19 to 22 against their equally experienced opponent.

Okay, I lied. There is no “Burwood Bruisers”. I generated the above data by tossing a coin 22 times (heads they win). Streaks will inevitably occur in random data. Much more so than our intuition would expect. As a reader of this data science blog, you probably have a much more refined statistical intuition than the typical sports fan, but the effect can still catch us out.

As Gilovich, Vallone and Tversky explained in their seminal 1985 article “The hot hand in basketball: On the misperception of random sequences“, sports fans generate causal stories to explain the streaks, even when no pattern exists. Of course, the media also happily obliges in providing these stories.

So how can we approach this scientifically? An elegant statistical test to see if persistent pattern actually exists is the “runs test“.

A runs test looks at the number of streaks in the data. Whenever the team goes from winning to losing (or visa versa), we start a new run.

W W W W - 1 streak AKA run  W W . . - 2 runs  W . . W - 3 runs  W . W . - 4 runs  

We can use statistics to calculate the number of runs expected under the null hypothesis of random chance. Specifically, our null hypothesis is that the chances of winning are akin to independent and identically distributed random weighted coin tosses. Thankfully mathematicians of the past have calculated this for us, so I won’t attempt to derive the formula here.

If there really is a causal explanation going on, then we would expect streaks to persist longer than by chance, and so there will be fewer total runs.

Okay, now we are armed with a statistical tool to distinguish between real streaks vs random chance, let’s use it to analyse the 2015 results for Collingwood football club (for our non-Australian readers, I swear that this time, Collingwood really is a team).

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23   W  .  W  W  W  .  .  W  W  W  W NA  .  .  .  .  .  .  W  .  .  W  .

Round 12 Collingwood was on a “bye” (a break where they don’t compete). When analysing the data, I’ll exclude round 12, and treat it as a sequence of 22 games.

See that losing streak of 6 games from round 13 to 18? If you look at articles in the media, you will see analysis talking about the losing streak that Collingwood somehow needed to “break“, up until round 19 when they miraculously “rediscovered” how to win. But after our look at the streaks in coin toss data, we should be skeptical of the concepts suggested by this kind of language.

I’ve written an R script to perform the runs test for Collingwood 2015. The runs.test function is provided by the R randtests package.

install.packages("randtests", repos="http://cran.r-project.org")  library(randtests)  # Collingwood 2015 results (excluding 'Bye')#         W . W W W . . W W W W . . . . . . W . . W .wins <- c(1,0,1,1,1,0,0,1,1,1,1,0,0,0,0,0,0,1,0,0,1,0)  # 2-sided p-test of whether actual runs differ significantly from chance.# For continuous data, the runs test needs a threshold value to create# runs of low numbers and runs of high numbers (by default the median is# used). In our case, we want runs of 0s and runs of 1s, so we set the# threshold to 0.5.results <- runs.test(wins, alternative='two.sided', threshold=0.5)  # 95% confidence interval => plus or minus 2 standard deviationserrorMargin <- sqrt(results$var) * 2  expected <- results$mu  sprintf("Expected Runs: %f ± %f", expected, errorMargin)  sprintf("Actual Runs: %d", results$runs)  sprintf("Significance: %f", results$p.value)  

Here are the results of the runs test:

Expected Runs: 11.9±4.5 (95% confidence interval)  Actual Runs: 10  Significance: 0.4 (not significant)  

The runs test reveals that Collingwood’s scoring patterns are indistinguishable from flipping a weighted coin.

Streaks have now been extensively studied both across multiple games (as we discuss here), and across individual scoring patterns within a single game (such as goal attempts of an individual player). The studies suggest that the perception of temporarily raised scoring ability during a winning streak (or lowered ability during a losing streak) is merely a widespread illusion.

Header image by Tom Reynolds, licensed CC BY 2.0