Making the documentation cruft calculation more user-friendly

Our tech pubs group enjoyed Scott W. Ambler’s “Calculating Documentation Cruft” article in Dr. Dobb’s Journal related to Agile methods and documentation. We had some fun in an email thread and I’d like to capture some of our discussion here.

Scott cleverly turns CRUFT (a geek term that either combines crust with fluff or refers to a lab where useless objects piled up in the hallways) into a measuring stick for the effectiveness of documentation:

Correct = The percentage of the document that is currently “correct”.
Read = The chance that the document will be read by the intended audience.
Understood = The percentage of the document that is actually understood by the intended audience.
Followed = The chance that the material contained in document will be followed.
Trusted = The chance that the document will be trusted.

Typically, I focus on Agile and end-user documentation, but Scott dwells on the type of documentation that you use while developing the software. Knowing that the Agile manifesto doesn’t value comprehensive documentation over working software, the fact that he’s calculating cruft in a document is refreshing and unexpected. However, my co-worker Shannon Greywalker and I both thought his algorithm could use some refinement. Shannon, being a math whiz, was able to re-mix it and gave me permission to post the idea here.

Shannon says “His formula is essentially 1 – (C * R * U * F * T) = a flat percentage “cruft” rating, wherein higher values are worse.

Let’s assume a perfect score for all 5 values (100% or 1.0 for each). It’s 100% correct; will have a 100% chance of being read; and will be 100% used, followed, and trusted.

1 – ( 1 * 1 * 1 * 1 * 1) = 0.0 = 0% CRUFT. An ultimate low score and so far so good.

Now let’s assume a 99% score for all 5 values:

1 – (.99 * .99 * .99 * .99 * .99) = 0.049 = 4.9% CRUFT. Our score has jumped almost +5% from 0. Still looks okay, mostly…

But now let’s assume scores of 40%, and 30%, for all 5 values:

1 – (.4 * .4 * .4 * .4 * .4) = 0.989 = 98.9% CRUFT.

1 – ( .3 * .3 * .3 * .3 * .3) = 0.997 = 99.7% CRUFT.

The curve produced by his formula is sharply logarithmic instead of linear. Logarithmic curves are the hardest for most of us to use in everyday life just because most of us can’t natively compare one logarithmic value to another.

The best example of an original formula that is not human-intuitive and therefore not all that useful is the Richter scale for earthquakes. The average layperson has no idea what those numbers really mean because 6.0 doesn’t sound that much worse than 5.0 – but it is much worse, because the Richter’s a logarithmic scale.

A more straightforward, intuitive, and linear results curve would be expressed by the formula:

1 – ((C + R + U + F + T ) / 5)

Examples:

With a 60% score in all 5 factors: 1 – ((.6 + .6 + .6 + .6 + .6) / 5) = 0.4 = 40% CRUFT
With an 80% score in all 5 factors: 1 – ((.8 + .8 + .8 + .8 + .8) / 5) = 0.2 = 20% CRUFT
With the scores from Scott’s example: 1 – ((.65 + .9 + .8 + 1.0 + .9) / 5) = 0.15 = 15% CRUFT ”

I would take it one step further, with these descriptive ratings:

40% CRUFT, “document is getting a little crufty” rating

20% CRUFT, “the cruft hasn’t yet overtaken your doc” rating

15% CRUFT, “this document is approaching cruft-free” rating

Shannon’s argument is a good one, the best way to increase the understandability of the CRUFT rating for documentation would be to create a linear calculation for it.

Shannon also argues that there should be a weighting factor for each of the five factors used in scoring a cruft rating. Now, I say that people would disagree on which factors are the most valuable in a document, so weighting might overly complicate the calculation.

I also find the weighting to be too subjective because it seems that Correct could influence Trusted (docs that are found to be incorrect are then distrusted). Also, Understood is elusive for a writer to measure and also to target that specifically – if a selected audience member doesn’t understand what you’ve written, is that always the fault of the writer?

I do like Scott’s tips for creating less crufty documentation, most of which can also be translated for end-user doc. Here’s my attempt at translation for end-user docs.

Scott says “Write stable, not speculative, concepts.” Scott’s talking about writing conceptual information only when its factual and can be trusted. For end-user doc, sometimes writing the concepts out helps you uncover problems, so I would argue that it’s better to write some concepts even early on.

Scott says “Prefer executable work products over static ones.” Scott’s talking about using test executables for code rather than documenting the code and its limitations. For end-user doc, I think that an executable work product could be a screencast or live demo sandbox that lets people try out a feature safely, or perhaps having a link in the online help that opens the dialog box in context like Windows Help in Windows 98 that would open the Control Panel being described in a task about setting display resolution.

Scott says “Document only what people need.” Amen, brother, preaching to the minimalism choir. Know your audience before you write down words that will be cruft.

Scott says “Take an iterative approach.” All writers know that re-writing is as essential as writing.

Scott says Be concise.” Agreed.

What are your thoughts about calculating documentation cruft? Are we taking it way too seriously, or is there a good tool in this (possibly crufty) post?

Related