Testing and Fault Localization Part 3


How can one be effective at fault localization and testing?

While it is true that it is difficult to know the complete set of states of any software system or even to know the current (complete) state transition of a particular SUT at a given time, it is not true that we can never isolate any causal relationships in a SUT to a high degree of certainty.

Moreover, we can further use this causal information – along with the scientific method – to guide us to the actual causal relationships.

In other words, even with the inherent limitations hindering our knowledge of the underlying workings of a software system, we can still trust our tests in many (if not most) cases with the information they provide to us about the SUT. In turn, this implies that we need not worry about entering permanent scenarios where we do not know “which test it was that ’failed’”.

The reason for this is because the unwritten and unobservable post-condition (“Nothing else that could threaten the value of this product happened”) is fundamentally at odds with the foundation of software testing – the scientific method – which has allowed all engineering and scientific fields to flourish.

While anything can indeed happen that “could threaten the value of this product”, the key observation is that not everything can happen that would threaten the value of a product with respect to the intent of each particular test and its set of oracles.

Therefore, once we start targeting pieces of any software system, we can begin to isolate relationships that may be at work for those pieces and then gradually rule out irrelevant variables that we originally might have considered important.

Once those pieces are understood and irrelevant variables thrown out of our model, we can gradually move our view of the SUT to higher levels, all the while remaining certain that the relationships we found before are not disturbed simply by including more of the SUT in our model.

 

Axioms for effective fault localization and testing

The following two axioms illuminate the foundation that helps us understand how this is done.

 

Axiom 1: System Behavior-State Mapping

Any behavior of a system can be mapped to some sequence of states of the system,

 

Axiom 2: Monotonic Knowledge of System Behavior

To an ideal knowledge base, the amount of knowledge about a system’s behavior never decreases and is never contradictory.

That is, let KBt1 represent a fact about the SUT at time t1. Further, let KBt2 represent a fact about the SUT at some future time t2 . Then the following two conditions hold:

  • |KBt2| ≥ |KBt1| ∀t1 , t2 ∈ TIME such that t2 ≥ t1
  • if qKBti , then  !∈ KBti, where q is a fact about the SUT at any time ti

More details about the axioms can be found in the official fault localization paper.

 

Abstract States

Given these two axioms, we can then understand the SUT in terms of abstract states. I’ve written about abstract states before, here.

Additionally, we require the usage of another device called a reachability graph, which I have also written about before, here.

 

Isolation of Irrelevant Abstract State Properties

With the tools of abstract states and reachability graphs, we are in great shape to begin isolating irrelevant (abstract) state properties.

The details, as well as the proof, are contained in the published fault localization paper. However, we can summarize the usage of abstract states and reachability graphs here, as a theorem:

An abstract state property, p, is irrelevant to an abstract state if and only if there exists a reachability path from two (or more) abstract states, where one abstract state contains p and the other contains p’s negation.

The power of reachability graphs and abstract states is that it does not matter whether property p and its negation are reachable from the same abstract state at different logical times!

If the same abstract state can reach both p and its negation (at different logical times), then we can safely label p as an irrelevant property in our testing or fault-localization. This reduces our search space to reduce, and us to focus on relevant properties.

 

A Powerful Corollary

The theorem above is a necessary condition to rule out irrelevant properties as it never labels irrelevant properties as relevant. In other words, it never declares a property p as relevant to an abstract state when in fact it is not.

We can summarize this corollary as:

The theorem above yields no false positives

 

Examples

The paper contains several useful examples that ellucidate the usage and power of the theorem explained here, as well as of abstract states and reachability graphs. I invite you to read it.

 

But Beware

I mention above that the theorem is necessary to rule out irrelevant properties. Unfortunately, the theorem is not sufficient for fault localization.

The insufficiency of the theorem implies that it may yield false negatives, meaning that it might not be able to detect certain relevant properties. In other words, it might label relevant properties as irrelevant in certain situations.

We will discuss those situations, as well as an extended fault-localization methodology, in the next post.

Stay tuned!