OddThinking

A blog for odd things and odd thoughts.

Stability Metrics Drive Behaviour

The manager in charge of the test team was gossiping to me one day about the pain involved in running a stability test.

The test specification required the system to be booted up, and left running for 24 hours straight. The trouble was, someone had interpreted a local fire-safety requirement to mean that the hardware devices machines could not be left running unattended. He was talking about bring a cot into the machine room, and spending the night there, but someone else had pointed out that local council regulations forbade that.

He sorted out the regulations, and ran the test unattended. Unfortunately, it had failed with an error – which annoyed him, because it meant he needed to run it again.

The test manager was fairly diligent. Not only did he report the error to the development team involved, but he dug around in the code to see if he could spot the cause. He found it; the foolish developer was counting the number of seconds, starting at zero when the program started. It was storing this into an integer variable. When the number of seconds exceeded the size of the integer (around the 18 hour mark), the thread aborted with an exception.

A fix was promptly made, and the test was run again. This time it passed the 24 hour test.

Just out of curiousity, the test manager had another look to see how the developer had fixed the problem so quickly. He was enraged with what he saw.

The developer had simply started at MIN_INT instead of zero; now the program could count to 36 hours before it died, thus passing the 24 hour test!


Comment

  1. I can just imagine the fix once it was noticed that the 36 hour test failed: increment the time counter once every two seconds. And so on.

Leave a comment

You must be logged in to post a comment.