{"id":135,"date":"2005-11-22T19:25:55","date_gmt":"2005-11-22T09:25:55","guid":{"rendered":"http:\/\/www.somethinkodd.com\/oddthinking\/?p=135"},"modified":"2005-11-22T19:26:42","modified_gmt":"2005-11-22T09:26:42","slug":"hunting-intermittent-bugs","status":"publish","type":"post","link":"https:\/\/www.somethinkodd.com\/oddthinking\/2005\/11\/22\/hunting-intermittent-bugs\/","title":{"rendered":"Hunting Intermittent Bugs"},"content":{"rendered":"<h2>The Situation<\/h2>\n<p>I was testing my real-time code. The problem with testing real-time systems is that errors can be intermittent. I dutifully ran the unit test repeatedly to make sure it was stable.<\/p>\n<p>It wasn&#8217;t until the sixth iteration that an odd error showed up.<\/p>\n<p>I spent some time made a change to fix it, and tried again&#8230;<\/p>\n<p>After six iterations of the test, I hadn&#8217;t seen the error.<\/p>\n<p>But that&#8217;s hardly good enough. The bug had roughly a 83% chance of simply not occurring in each run. If the bug <em>wasn&#8217;t<\/em> fixed, there would be a 33% chance that it simply didn&#8217;t show for six straight runs.<\/p>\n<h2>The Puzzle<\/h2>\n<p>Here was interesting real world puzzle: After fixing an intermittent bug, how many test runs do you need to do in order to be convinced that it has gone?<\/p>\n<h2>Fumbling for an Answer<\/h2>\n<p>The answer depends on several factors:<\/p>\n<ol>\n<li>How often did the intermittent bug occur?<\/li>\n<li>How often do you <em>think<\/em> you have fixed a bug, only to find you haven&#8217;t?<\/li>\n<li>How sure do you need to be that the bug is gone?<\/li>\n<\/ol>\n<p>I think the formula is:<\/p>\n<p><code><\/p>\n<pre>P(bug still exists | you think you solved it)\r\n  * (P(test run passes | bug still exists) ^ <em>n<\/em>)\r\n    &lt;= P<sub>Acceptable<\/sub>(bug still exists)\r\n\r\nSolve for <em>n<\/em>.<\/pre>\n<p><\/code><\/p>\n<p>For the first question, I had to assume that the initial sample of six runs was representative. Why? Because I didn&#8217;t have the maths skills to work out what value I should use, and because I wasn&#8217;t about to re-run the old known-faulty code just to get a better idea of the <acronym title=\"Mean Time Between Failure\">MTBF<\/acronym>.<\/p>\n<p>The second question I could estimate. Maybe 2% of the time? If you include &#8220;introducing a new bug&#8221; it probably counts much higher, but this is when you think the bug has gone but it is still there.<\/p>\n<p>How sure do you need to be that the bug is gone?  This was a commercial project &#8211; I was a junior developer, so I figured that decision wasn&#8217;t mine. I asked my Project Manager. She looked at me bemused by the nerdy question, and hazarded &#8220;99% sure?&#8221;  I was horrified &#8211; that seemed to be far too risky. That only required one iteration.<\/p>\n<h2>A Pragmatic Solution<\/h2>\n<p>Rather than debate it further, I went back to my desk and quietly ran it over and over again, until I couldn&#8217;t handle it any more.<\/p>\n<p>On the 35th run, it failed! I hadn&#8217;t found the bug!<\/p>\n<h2>Coda<\/h2>\n<p>More careful examination of the code revealed the true problem. I fixed it, more confident that before. But now I had a further dilemma. If the bug only occurred once in 35 runs, how many times would I need to run it this time to reassure myself?<\/p>\n<p>I spent a few hours producing a far more elaborate test harness, and let it run in a corner for several days straight, many thousands of time. This time I was going to be <em>sure<\/em> the bug was gone!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Here&#8217;s an interesting real world puzzle: After fixing an intermittent bug, how many test runs do you need to do in order to be convinced that it has gone?<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_s2mail":"","footnotes":""},"categories":[23,33,34],"tags":[],"class_list":["post-135","post","type-post","status-publish","format-standard","hentry","category-based-on-a-true-story","category-puzzle-solving","category-software-development"],"_links":{"self":[{"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/posts\/135","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/comments?post=135"}],"version-history":[{"count":0,"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/posts\/135\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/media?parent=135"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/categories?post=135"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.somethinkodd.com\/oddthinking\/wp-json\/wp\/v2\/tags?post=135"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}