Testing and Fault Localization

Testing and Fault Localization

One of the goals of software testing is to produce evidence that a piece of software works as intended.

That evidence might be in the form of hard, deterministic data or even stochastic results that statistically show the correctness of the software system to some degree, among others.

However, with modern software systems becoming increasingly complex, it can be difficult and time-consuming to narrow down the exact location of a detected fault.

Fault localization is important for several reasons, including isolating a fault, recreating the fault, fixing the fault, as well as creating appropriate documentation when necessary (such as when such faults cannot be fixed).

Fault localization is all the more important when applying manual testing methodologies because automation might not be present to easily record (or script) and replay the steps necessary to recreate previously encountered faults.

Indeed, this is one of the biggest limitations of exploratory testing methods because they aim to loosely describe what to test (say as charters), rather than specifically prescribe both what and how to test. For such reasons, it is commonly difficult to recreate bugs accurately when they were found using exploratory testing.

It is therefore obvious that all forms of software testing (and development) benefit greatly from effective and fast fault localization methods.


If one test has failed, then all of them have failed

An additional and related concern shared by many practitioners of software testing (including developers) is whether their tests can be trusted once a few tests have failed within the same execution.

In other words, there is skepticism from many practitioners — especially novice — regarding whether their remaining tests are indeed “telling them the truth” about their systems under test, once a few failed tests have been observed.

This is due to the belief that faulty states from failing tests somehow carry or cascade to passing ones without practitioners being able to determine exactly where those points of cascade happened.

This perception is best summarized by the following statement from one such anonymous practitioner:


When a tester completes a test, there is some kind of an expected result or post-condition.


There is also an unwritten post-condition. It is: “Nothing else that could threaten the value of this product happened.”


It is impossible to know if that post-condition is met, and even if we encounter a problem down the line due to a memory leak, or extraneous data in the database, or corrupted files, then we will not know which test it was that “failed”.


And yet…

Yet, even with this apparent limitation, veteran practitioners of software testing are able to use their skills to successfully find such points of failure in software components.

With respect to tests, such information is then utilized to localize and distinguish faults for failing and passing tests.

In turn, this information allows them to determine which state carried to passing tests, if any, and which tests need to be re-run.


What is to come?

Over the next several months, I will provide a methodology by which such veteran practitioners are able to successfully localize faults in their systems — and, likewise, perform test design — with a high degree of effectiveness.

My upcoming blog posts will therefore give you knowledge about how to test more effectively, reduce confounders at work, and become a better tester overall.

Stay tuned.