On reflection, perhaps they shouldn’t go this far. It’s a bit sad, isn’t it?

I saw this post about the graphs of National Novel Writing Month on my brother’s blog ages ago, but an RSS malfunction showed it to me again today, and I got to thinking about his final graph: words written on any given day, plotted against how far behind he was that day (or, more precisely, words left to write / day left, normalised to the same value on day one). He correctly notices “a hint of a positive correlation”, and notes that that may not be causation, but could be an external factor, such as his determination to show his doubting wife what for.

I have my own theory, arguably more prosaic but also more interesting, so I created a simulation to test it. I assigned a random number to each of the 30 days of November, using the formula RAND()*RAND() to create a nice distribution. I assigned a number of words to each day by multiplying this random number by 50,000 and dividing by the total of all 30 random numbers. This created a month of simulated writing with random ups and downs, but a guaranteed total output of exactly 50,000. (For these purposes I allowed non-integer numbers of words.) I worked out the cumulative wordcount and behindness index for each day, and plotted them, with a trendline (in red) and R2 value, and ran simulation after simulation.

My theory was proven: every single one had a positive correlation.

Of course it did — any deviation from the 1666.7-word daily target will be reflected in your behindness score on every subsequent day, and if you write your 50,000th word on day 30, then it will be balanced by an equal and opposite deviation spread unevenly across those same days. Any novel of exactly 50,000 words will have this hint of positive correlation between behindness and words written. I suspect this would hold even if we allowed some days to have a net deletion of words. Novels of just over 50,000 (such as almost all of them) will get a slightly reduced version of the same effect. There are only two ways to avoid it. One is to write a wildly different number of words — say, 40,000 or 60,000. The other is to write exactly one thousand, six hundred and sixty six and two thirds of a word every single day, although that involves using a lot of three- and six- letter words.

I’m not sure that someone who writes 60,000 words really worked to the 50,000-word target at all, so if you want to know if ‘behindness’ affects performance, you’ll have to examine the stats from people who failed to complete the novel, between day one and whenever they eventually gave up. And to be honest, how useful a sample are they to a study of productivity?

Anyway, I suppose I hope this can be a nice example of how something plausible and supported-by-the-data-looking can turn out to just be randomness viewed from a funny angle.