“Testing vs Checking” : ‘Tomato’ vs ‘Tomato’


Apologies to those of you awaiting the next post of the test case equivalence series. The series will continue in a subsequent post, but this blog post must be published now.

So what is so important as to interrupt a series? It is to put the silly matter of “testing vs checking” to rest.

For many years now, I’ve been voicing to my colleagues (and recently to the world, via this blog) the private observation that the terminology we use in our industry, the testing industry, is lacking and murky, to say the least.

Most of the mainstream terminology is unhelpful on a daily frequency. For instance, see my posts about duplicates (which is, incidentally, the series of posts that was interrupted).

This vagueness in definitions has spawned much confusion and noise from consultants and other testing loud-mouths (I say this with kindness, as I am a loud-mouth too) regarding what certain terms mean and heuristics to go about them.

Unfortunately, certain noise has ended up being echoed back down to us by the sphere of ignorance encapsulating the industry and has amplified certain discussions that merit no energy.

Before I get to the issue, I wish to make clear that I commend Michael Bolton et al. in their vigorous attempts at chipping away at this unfortunate imprecision in our industry. However, I need to voice my concern about their efforts, and do so urgently.

In my view, we’re dangerously close to creating ghost definitions that will only hurt future generations of testers, regardless if such definitions benefit the consultants cashing the money in the present.

Put another way: the chipping away is being done on something that was not a problem to begin with.

The “testing vs checking” dilemma that is plaguing the minds of testers world-wide is fueling countless expensive courses by many a consultant, and, most importantly and sadly, false knowledge in our industry.

“Why false knowledge?” you ask. Because the problem with the “testing vs checking” nonsense is that it takes a very dim view of the world (i.e. as consultants see it) and presupposes that technology will be in the future as it is right now.

Let me give you a very real example. I’ll start with my personal, practical definition of a test:

 

a test attempts to prove something about a system

 

Whether it be a security vulnerability or the fact that a sequence of states will lead to the infamous “screen of death,” doesn’t matter. What matters is that we want to prove that the SUT exhibits some property or behavior (or lack thereof), whatever that happens to be.

Now, fast-forward 50, or 100 years from now, when theorem-proving algorithms become fast enough to be practical (i.e. cost-effective) in the industry. Many applications will then be specified in a rigorous language like first-order logic.

In fact, many applications are at present being specified in some rigorous language like Z1 or even the powerful method of ASMs (abstract state machines)2.

Such languages are called formal systems.

It turns out that a basic, natural property of any formal system (including first-order logic) is that their theorems (their truths) can be mechanically enumerated3.

That is, all the truths of a formal system can be deduced by a Turing machine.

When applications are specified with such formal systems, all the testing is done via a machine. Theorem-provers4 are able to reach conclusions about the software that we never thought possible, regardless of how many “rapid software testing” courses or “session-based test management” courses a tester attended (and paid for dearly).

Algorithms exist that can prove properties of an application without ever having to touch the UI or concern itself with “sapience” (or other such consultant-speak).

In other words, bugs can and have been discovered via machine, because humans were not able to.

I’d like to make it very, very clear that my point is not aimed at the silly dichotomy between humans and machines. Rather, my point is that the whole “testing vs checking” distinction, along with its refinements of “human checking,” “machine testing” and any combinations thereof, are made altogether useless precisely because of the vacuous premise implied within those definitions.

Our industry is already plagued by plasters of useless terminology that has taken us nowhere, except toward avenues where people create dubious definitions to fill in gaps and their own pockets.

It is personally very sad to see many otherwise smart people subscribing worldwide to such vacuous definitions brought about to our industry. This is especially true when the following device is used by consultants to turn people’s judgment off immediately, which will forever astonish me:

“We sometimes make provocative statements to get people to question their beliefs.”

How true that isn’t.

I admire and respect anyone who truly questions the nonsense currently permeating our industry. I’d like to hear from you, if you’re out there.

 

The way of Z. Jacky.

2 Abstract State Machines. Börger & Stärk.

Gödel’s Theorem. Torkel Franzén.

4 Computational Complexity. Oded Goldreich.

Permalink.


14 thoughts on ““Testing vs Checking” : ‘Tomato’ vs ‘Tomato’

  • avatar
    Iain

    Mario,
    A significant division in the testing world is between those who believe that it is (or will someday) be possible to /solve/ testing mechanically, and those who do not. I count myself in the latter group, and suspect – based on this post – that you belong to the former.
    The oracle problem is more complex than you represent in this post: you focus on starting state. What about complete specification of every interaction that takes place during the test, including the timing and sequencing of those interactions? What about specification of every item that must be observed upon completion of the test? Or your ability to perform those observations without changing the state of the system and compromising subsequent observations? What about the limitations of the oracle itself, be it scope, accuracy, or precision? Waving the magic wand of technology does not remove these problems, even if some might wish that it did.
    As to formal systems, I’m sure this will be of great benefit to the world of software engineering, assuming that it is ever widely adopted. It will not however remove the need for testing (as opposed to checking). Claiming that it will assumes:

    -That the problem that we are seeking to solve is fully understood and can be articulated to those who will solve it, without loss or distortion of information
    -That a solution to the problem, and the consequences of that solution, can be fully defined and articulated to those who will implement it without loss or distortion of information
    -That all knowledge as to what constitutes quality (value to some person) can be formally and symbolically represented.

    I’ve yet to see any evidence to suggest that any of these assumptions are sound. Claiming that testing is technically solvable is rather like suggesting that robots from the future have been sent to kill you: it is a work of science fiction.

    -Iain

  • avatar
    Mario Post author

    Iain, this kind of reaction is not unexpected. Yet it is welcome as it leads to good discussions that will lead to a better industry, which is what we all need.

    I’ll go through each of your points. As I do, please note that my responses are aimed at the “general you” who might have similar reactions. As such, it is nothing personal against you in particular.

    “””
    A significant division in the testing world is between those who believe that it is (or will someday) be possible to /solve/ testing mechanically, and those who do not. I count myself in the latter group, and suspect – based on this post – that you belong to the former.
    “””

    I don’t know of anyone who makes such a distinction (except perhaps those who don’t understand testing to begin with.) In particular, you speak of testing as if it’s a problem needing a solution.

    As far as I know, testing is not a problem. It’s a *process* and, as such, you cannot “solve” a process. Using my definition of test above, testing is the process of proving something about a system.

    So, no, I don’t belong to any of those camps you mention and, I believe, neither does any worthwhile tester.

    “””
    The oracle problem is more complex than you represent in this post: you focus on starting state. What about complete specification of every interaction that takes place during the test, including the timing and sequencing of those interactions? What about specifi…
    “””

    Well, hold on. Are you voicing real concerns or are you simply repeating what you’ve been told? If you can provide them, I’d like to see specific instances of real-world domains that impose such requirements on your testing.

    For the meantime, I’ll say that if it’s important to your test that you include the “complete specification of every interaction that takes place during the test, including the timing and sequencing of those interactions,” then you should do so and (pardon my bluntness) not whine about it.

    However, do you really need to do so in practice? If you are truly having this problem and it’s not just a “thought experiment,” then I think you have bigger problems in your hands:
    – a non-testable system
    – and/or a horribly (un)designed test
    – and/or tester that needs more training

    You need to address the first issue with the makers of the system your testing. For the other two, you also need to *know your system* (an analog to *know your data*) to know what you can leave out and ignore. Systems are full of orthogonalities that can be leveraged to your advantage.

    On a personal note: I’ve tested systems ranging from small applications (think microchip controller), to webapps, to huge multi-site enterprise storage systems with DR and tight RPO requirements at IBM, and have not run into any issues of the kind you mention (nor have other teams I’ve worked with). I guess we’ve been lucky in that the systems were testable, and, additionally, we had the courage to speak up when they weren’t, thus improving their testability.

    “””
    As to formal systems, I’m sure this will be of great benefit to the world of software engineering, assuming that it is ever widely adopted.
    “””

    Formal methods are already widely used. For instance, many makers of medical devices use such methods to minimize the possibility of their devices hurting people (incidentally, there’s a great story about this in the ASM book which you should read.)

    They need to use such methods not only to prevent injury, but also to reduce liability. To me, that’s a great benefit. It’s also a great benefit to the users of those devices because they *know* they won’t get hurt.

    Additionally, several other industries use formal methods: transportation and aviation, elevator makers, compiler makers, et c.

    “””
    It will not however remove the need for testing (as opposed to checking).
    “””

    I think you missed the point entirely here. The claim isn’t that it removes the need for testing. Rather, the point of this post is that such formal systems enable us to use other systems to test our applications by proving things about them, which makes the entire “testing vs. checking” distinction worthless.

    “””
    -That the problem that we are seeking to solve is fully understood and can be articulated to those who will solve it, without loss or distortion of information
    -That a solution to the problem, and the consequences of that solution, can be fully defined and articulated to those who will implement it without loss or distortion of information
    -That all knowledge as to what constitutes quality (value to some person) can be formally and symbolically represented.

    I’ve yet to see any evidence to suggest that any of these assumptions are sound.
    “””

    I completely agree with you here, because those assumptions you just made are completely unfounded, and I certainly don’t see those three assumptions made anywhere in this post. Do you?

    Thanks for your remarks, Iain.

  • avatar
    Michael Bolton

    Hi, Mario…

    I haven’t talked James about the nature of his reply, but I think it indicates a sense of hopelessness about the possibility of a reasonable conversation. That’s because we really are working from different paradigms, like Ptolemaic astronomers and modern-day cosmologists. I’m not saying who’s who; for all I know, you could be a visitor from the distant future (or perhaps some time 50 to 100 years hence), here to show us the way forward. There’s no question to me, though, that we’re working from completely different world views.

    Your claim appears to be that you can do this:

    Create a description of the product in Z.
    Set up a copy of VMWare, and take a snapshot of the system.
    ???
    Completely tested system.*

    Now, that seems like a swell idea, and easy too. Why hasn’t anyone thought of this before? In fact, many people have thought of testing like this for quite a while, at least since 1972 when similar ideas were presented at the Computer Program Test Methods Symposium at the University of North Carolina at Chapel Hill. Those ideas haven’t worked. They are unlikely ever to work (until software is developed by and for our robot overlords), because they are rooted in a premise that I consider invalid (and that I will identify): a computer program is a set of instructions for a computer, and that’s all there is to it.

    Instead (and I’m sorry to invoke your apparent nemesis here), I concur with Cem Kaner who says: a computer program is a communication among several people and computers, who are distributed over space and time, that contains instructions that can be executed by a computer”; and he adds that the purpose of the program is to provide value to stakeholders. I would argue—and I think Cem would agree—is that one of the key points of testing is to identify problems that threaten stakeholder value. Some of those problems can be expressed in terms of formal systems, and for those problems, I support the idea of checking the product against those formal systems. We can anticipate other problems, though:

    in the translation of the product description from natural language to formal language;
    in the translation of the natural or formal languages to code;
    in interactions between the product and the operating system, the file system, third-party libraries, in-house libraries, interoperating programs, concurrent processes that are not specified in the formal language; and, most importantly,
    in the interactions between the product and its users—who behave and make decisions about quality in a decidely informal way.

    (Also, we can expect those problems now, because our customers are apparently unwilling to wait 50 to 100 years until “everything” can be expressed economically in a formal language.)

    But here’s the real problem for formal languages: much of the knowledge required to develop a formal description of the product is rooted in informal work and tacit knowledge (I strongly urge that you read Tacit and Explicit Knowledge, by Harry Collins.) We start with ideas about the product, expressing them in natural language to one another or to ourselves. We refine our understanding of the communication to the degree that can interpret the communication and specify it in at least one kind of formal language: a programming language.

    (Somewhere along the way, we might also include stages of translating natural language to formal languages other than program code. Not many people do this. The Z language was proposed in 1974, and first presented in 1977. It took 25 years for Z to achieve ISO standardization. Why didn’t they write the standard for Z in Z? Then everyone would have understood it and agreed, right? The problems with formal languages of various kinds are not new; they’ve been encountered at least since the 17th century, from Wilkins and The Philosophical Language, to Russell and Whitehead with Principia Mathematica, to Donald Knuth who famously remarked to a colleague, “Beware of bugs in the above code; I have only proved it correct, not tried it.” You cite both Godel and Turing, and so are doubtless aware of the incompleteness theorem and the halting problem. These are not a problem for checking, since checking is by definition algorithmically decidable. Incompleteness and the halting problem are problems in testing, but you can make the problem invisible to yourself if you limit your idea of testing to our idea of checking. Sorry; long parenthetical, there.)

    As we develop the product further, we refine our understanding of the product and the problem that it’s trying to solve. This happens naturally as we gain experience with the product and the problem space. We recognize problems in the product or in our models of it; we recognize shortcomings in our formal models and improve upon them; we run the product and encounter problems, and we fix them. You know the degree of informal work that goes into formal work, because you follow this process as you develop your product (or perhaps you don’t know this, perhaps because you don’t notice that you’re applying and developing knowledge as you develop the product). Your attention appears to be on the explicit, formal parts of the work. That’s cool. Our attention is on the tacit, informal part (and converting some of that relational tacit knowledge to explicit knolwedge, but you’ll see this as meaningless or nonsensical or pointless if (as it appears) you focus only on the formal.

    One note: you offer something that you call “a proof by construction”.

    Take a snapshot of your system (VMware does this excellently).
    Save a copy and call it “snapshot”.
    Write a test that says either in its pre-conditions or its post-conditions: the state of the system must equal “snapshot”.

    There’s a difference between a specification and a snapshot. The implicit claim in your “proof” that the snapshot captures everything that is relevant and important about the state of the system; and that the snapshot misses nothing relevant or important about the state of the system. So this is not a proof by construction; it’s proof by tautology, which is to say not a proof at all. Preconditions and (especially) postconditions are not about what’s there, but about what you want to be there, what matters. A specification tells you that (albeit always incompletely to some degree). A snapshot does not distinguish between what does matter and what doesn’t, nor why it matters.

    It’s tempting to thing of testing in terms of demonstrating repeatability, but I’d like you to consider this story: http://www.developsense.com/blog/2010/05/why-we-do-scenario-testing/ I would especially encourage you to read the paragraph—the passgae from Computer Programming Fundamentals at the end of the piece. (In fact, I’d encourage you to read Weinberg—especially An Introduction to General Systems and the Quality Software Management series, but I fear that you wouldn’t be interested in them either; different paradigm.)

    All that said, I must say that I disagree with James in this case. I don’t think your argument is wrong. My contention is that your argument is not even wrong. That is, while I understand that you’re apparently upset about something, you haven’t identified “the vacuous premise implied within those definitions”, and how and why it causes harm.

    I suspect that this is because, to you, testing is checking the functionality of product against formal models. If that’s the case, then you will see any attempt to divide checking and testing as preposterous, because to you there is no difference. To me, testing is an empirical investigation of the products, systems, and people, and the relationships between all of those; that includes investigation of parafunctional (some say non-functional, but I say that sounds weird) aspects of the quality of the product; and part of that investigation might include well include checking the product against formal models. But I must say, you lose me completely when you refer to “the silly dichotomy between humans and machines”. Confusion between humans and their tools is, in my world, a dehumanizing mistake. In particular, machines cannot identify threats to value (or health, or safety): they can only extend the ability of humans to do that. To say that machines can perform this action is to ignore the role that humans play in preparing the machine and interpreting its results; but again, ignoring the role of humans isn’t hard if your paradigm is rooted in emphasis on the formal and mechanical.)

    I encourage you to continue your work in developing a tool that can identify all of the salient details of preconditions and postconditions for a check, and can apply all of the tacit knowledge and perform all of the actions of testing that find all of the problems that matter to people. Although I’m bound to offer this: once you believe you’ve finished work on that, drop me a line and perhaps we can help you to test it.

    —Michael B.

    *That’s is a little like this:

    1. Become a consultant.
    2. Divide some aspect of testing with a meaningless distinction.
    3. ???
    4. Profit.

    Believe me, that’s not how it works in my experience. If that worked, I’d be publishing meaningless distinctions every day and firing them off to Lulu for printing and sale. Ka-ching!

  • avatar
    Joe Harter

    Hi Mario,

    I want to second Iain’s concern that you aren’t understanding the Oracle Problem, though if Doug Hoffman (via your link to Cem’s site) wasn’t able to convince you then I may not be able to. The problem may truly be because we have “opposed paradigms” as James Bach said.

    I will ignore “testing v checking” in this comment, and just call it all testing. When a tester completes a test there is some kind of an expected result or post condition. There is also an unwritten post-condition. It is: “Nothing else that could threaten the value of this product happened.” It is impossible to know if that post-condition is met, and even if we encounter a problem down the line due to a memory leak, or extraneous data in the database, or corrupted files, then we won’t know which test it was that “failed”. Do you agree with that statement? Do you consider that a problem?

  • avatar
    Mario Post author

    Hi Joe,

    No question about it. Such is the inherent challenge when testing most (if not all) systems/applications. There is no question that we are limited by not knowing everything that happens in the SUT as we execute our tests. Further, that limitation precludes us from saying, with 100% certainty every single time, that there was nothing else affecting the behavior of the SUT at the time of running our test.

    However, there are also parts or areas of a SUT that are completely independent and, therefore, irrelevant to the results of many tests. This is what I mean by orthogonalities in software, whereby such parts can be immediately pruned from our models (or phase space) because they are completely irrelevant in answering the question: Can we trust the result of this particular test?

    So not knowing everything about a SUT does not mean that our tests (either automated or manual) cannot be trusted at all. Yes, there might be many things we do not know about a SUT (we might not even know what we do not know), but there are also many unconfounded relationships in software that we can take advantage of.

    Otherwise, the practice and profession of software testing would be completely pointless.

  • avatar
    Mario Post author

    Michael, you make good points. My replies are below.

    “””
    Your claim appears to be that you can do this:

    Create a description of the product in Z.
    Set up a copy of VMWare, and take a snapshot of the system.
    ???
    Completely tested system.*
    “””

    That’s not at all my point. I’m starting to see a pattern in the replies where people mistakenly believe that if there’s one theorem-prover for one formal system, then that same theorem-prover can be used for all of them, and testing can be done with. That is not the premise at all. If only all SUTs were complete in the sense of complexity (think NP-complete or even P-complete), then by testing one SUT we would have tested all, but this is not so.

    My use of theorem-provers are simply as a device to refute the entire “checking/testing” distinction and point out that it is a useless distinction.

    Further, my argument is that the “checking vs testing” division forces a useless and primitive distinction over what we all know and practice as *testing*. Such a distinction ends up overloading concepts in our terminology-diluted industry for the benefit of, most certainly, certain consultants, and they do not benefit the community at all.

    “””
    But here’s the real problem for formal languages: much of the knowledge required to develop a formal description of the product is rooted in informal work and tacit knowledge (I strongly urge that you read Tacit and Explicit Knowledge, by Harry Collins.)
    “””

    Ok, I’ll add that to my queue. Sounds like an interesting read. I can assert from experience, however, that even tacit knowledge can be successfully codified (this was exactly what I introduced to one team I worked with at IBM to test the Information Archive and SoNAS), but that’s a topic for another post.

    “””
    You cite both Godel and Turing, and so are doubtless aware of the incompleteness theorem and the halting problem. These are not a problem for checking, since checking is by definition algorithmically decidable. Incompleteness and the halting problem are problems in testing, but you can make the problem invisible to yourself if you limit your idea of testing to our idea of checking. Sorry; long parenthetical, there.)
    “””


    Here, we must be careful when talking about incompleteness and not generalize, as doing so leads to spectacular logical failures. In particular, there are two incompleteness theorems and they both apply *ONLY* to the arithmetical aspects of certain formal systems, and nothing else. That being said, which theorem do you have in mind in your reply?

    I think I’m beginning to see where you’re coming from (you’re paradigm, like you said). It seems to me that, by the association you see between (your definition of) “testing” and incompleteness or the halting problem — the first being a consequence of consistency in formal systems that have a certain amount of arithmetic, while the latter being generally undecidable — that you define “testing” as the task of verifying special cases. Whereas “checking,” defined by you as “algorithmically decidable,” aims to verify the cases where there is a solid yes/no answer.

    Does that summarize your view of “checking vs testing”?

  • avatar
    Kim Engel

    Hi Mario,

    “For the meantime, I’ll say that if it’s important to your test that you include the “complete specification of every interaction that takes place during the test, including the timing and sequencing of those interactions,” then you should do so and (pardon my bluntness) not whine about it. ”

    I find this comment interesting. In my experience, I have come across anomalies or bugs which are very difficult to reproduce. Upon investigation, these are usually caused by race conditions which the tester could not have predicted in advance. It would be impossible to write all of these timings and sequences into test cases, particularly to discover problems which we’re not yet aware of, and if it were possible, it would be so time consuming that it would be of no value compared to the amount of testing which could have been done in it’s place. Or have I misunderstood your statement?

    For one such difficult-to-reproduce bug which had a crtical impact to the SUT, our development team wrote a tool which would reproduce the offending actions in sequence, repeatedly. For weeks this tool had no luck in reproducing the issue, yet testers would encounter the defect approximately twice per week during their manual testing. So for me, exact repeatability is a benefit of ‘checking’, but checking is not valuable in isolation without testing.

    Cheers,
    Kim

    Cheers,
    Kim

  • avatar
    Mario Post author

    Now this is an interesting comment! 🙂

    Kim,

    I have had similar experiences with race conditions and people trying to recreate them in an automated fashion. Granted, sometimes it’s easy to reproduce the problem, especially in situations when enabling debugging (like different kinds of tracing) affects the timing so much that you can easily isolate where the problem is. But, as you said, attempts to recreate race conditions are usually done by hand because a posteriori automation is brittle for this.

    And that was exactly my tongue-in-cheek retort to Iaian’s. Why would you even bother and worry about specifying the “complete specification of every interaction that takes place during the test, including the timing and sequencing of those interactions,” when that is not even the problem in the realm of testing?!

    It’s just nonsense to even worry about specifying *every single thing and interaction* that happens in the SUT. It’s nonsense because the way the argument is being used is like playing it as a wild card as the premise for definitions that are hollow. Further, it’s nonsense because SUT’s are full of orthogonalities that we can always take advantage of to reduce our testing space (this is what professional testers, like you and me, do with every SUT, and I’m sure you’ll agree).

    Now, going back to my point about theorem-provers, the reason I call the automation in your comment above “a posteriori automation” is because it’s automation that is written after the failing part of the SUT is developed. However, theorem-provers and model-checkers, being a priori automation (but still “checkers” as per Bolton et al’s definition) are able to show you (via a proof) that you have a race condition coming at you.

    Do you now see the contradiction in their definitions?

    Let’s recap Bolton et al’s definition: theorem-provers fall on their “checking” camp. However, theorem-provers would be able to alert you a priori (i.e. before you even ran any test [as per their definition of “testing”] against the SUT and eventually spend precious time trying to recreate the problem with a posteriori automation or manual effort) about the problem itself.

    You see my point?

    Their entire premise for their definition of “checking” is based on a posteriori automation, and they either willfully ignore, or are completely ignorant of, any other kind of automation.

    This is why I just cannot subscribe to such simple-minded, consultant-speak “checking vs testing” distinction, because it (conveniently) ignores actual “algorithmic decision rules” (as per their definition of “checking”) that would show you problems without even you having to do any “testing” (again, as per the definition for “testing” from those circle of consultants).

    Let’s face it, these consultants are very good at coming up with definitions that merely *sound* important, scientific and truthful, while their aim is just to keep their classes and their pockets full.

    Thanks for your comment!
    – Mario G.

  • avatar
    Michael Bolton

    That’s not at all my point. I’m starting to see a pattern in the replies where people mistakenly believe that if there’s one theorem-prover for one formal system, then that same theorem-prover can be used for all of them, and testing can be done with. That is not the premise at all. If only all SUTs were complete in the sense of complexity (think NP-complete or even P-complete), then by testing one SUT we would have tested all, but this is not so.

    My use of theorem-provers are simply as a device to refute the entire “checking/testing” distinction and point out that it is a useless distinction.

    You have not done this successfully, and not just because your logic is shaky, You have not done this successfully because your premise is unfounded. Testing is about far more than theorem-proving.

    Further, my argument is that the “checking vs testing” division forces a useless and primitive distinction over what we all know and practice as *testing*. Such a distinction ends up overloading concepts in our terminology-diluted industry for the benefit of, most certainly, certain consultants, and they do not benefit the community at all.

    I don’t understand what you mean by “primitive” in this context. More importantly, you haven’t made clear who you mean by “we”, “all”, and “know”, and that’s important because you seem not to know and practice what I call testing—only a part of that. You haven’t made clear how declaring a distinction benefits a consultant, and many people in my community have embraced the distinction as helpful and beneficial to them. Finally, you cannot declare your community to be “the” community any more that I can declare my community to be “the” community—and in the original post, James and I made it clear that these are our terms for use in Rapid Software testing. I’ve made it clear all the way along that I have no authority over the way people speak. To the extent that you’re seeing people use this language, perhaps you’re identifying a different school of thought.

    Ok, I’ll add [Tacit and Explicit Knowledge, by Harry Collins] to my queue. Sounds like an interesting read. I can assert from experience, however, that even tacit knowledge can be successfully codified (this was exactly what I introduced to one team I worked with at IBM to test the Information Archive and SoNAS), but that’s a topic for another post.

    Collins’ thesis is that some kinds of tacit knowledge (what he calls relational tacit knowledge, or RTK) can be codified. RTK is contained in a human’s head, but remains tacit for logistical reasons, or perhaps because a person has not bothered to make it explicit. You know where the corner store is, and you know that it’s likely you can get a cold drink there, even though you’ve never seen their inventory list or the manifests of the deliveries. He describes two other kinds of tacit knowledge. One is somatic tacit knowledge, the knowledge based on what it’s like to reside inside a human body. He refines the way that people have described tacit knowledge. A canonical example is that of riding a bicycle; the usual claim from a long way back is that machines can’t do that. Collins suggests that machines probably could ride a bike, with sufficient sensing and computing power—or more accurately machines could probably balance a bike. But riding a bike is not only a physical activity; it’s a social activity too. Riding a bike successfully in the real world depends on a third kind of tacit knowledge, one that Collins calls collective tacit knowledge (CTK). CTK isn’t embedded in a human brain or a human body, but in the collective, in relationships between people. We don’t have a clue how to make collective tacit knowledge explicit, at least in part because it’s constantly changing along with the social groups and cultures in which it’s embedded. If you don’t believe me, ride to a bike work in rush-hour traffic in Utrecht—not just car traffic, but bike traffic—and negotiate a left-hand turn at an intersection where there are no signals. Now do “the same thing” in New York, and in Toronto, and in Delhi, and in Beijing. You’ll find that it’s different in each place—and in certain parts of Beijing

    I think I’m beginning to see where you’re coming from (you’re paradigm, like you said). It seems to me that, by the association you see between (your definition of) “testing” and incompleteness or the halting problem — the first being a consequence of consistency in formal systems that have a certain amount of arithmetic, while the latter being generally undecidable — that you define “testing” as the task of verifying special cases. Whereas “checking,” defined by you as “algorithmically decidable,” aims to verify the cases where there is a solid yes/no answer.

    Does that summarize your view of “checking vs testing”?

    No, not entirely. You’re pretty close with your description of checking, but your description of testing misses the mark with respect to my idea of what testing is. You seem to be focused on the notion of testing as verification. Testing usually contains verification, but testing is not only verification. That is, testing is not only an attempt to show that the product is consistent with a particular set of explicit expectations. Testing is an open-ended and incompletely specified investigation, for the purpose of refining our understanding of what we’re building and what we’ve built.

    Jerry Weinberg puts it more simply. He says that testing is gathering information with the intention of informing a decision. In the past, James and I have said that testing is questioning a product in order to evaluate it. “Questioning” a product in this context means asking it to perform certain behaviours, which we then observe and evaluate. In testing, we perform experiments with the product to discover how it might be inconsistent with expectations that have not yet been made explicit.

    Some of those experiments are the checks themselves. Some of those experiments are attempts to run checks. It’s my experience (and I presume yours too) that our attempts to perform automated checks are rarely successful on the first run. Something happens: the checks don’t work, or the checks don’t pass, or we realize that the checks are incomplete in some sense, or we realize that checks are checking the wrong things, or checking the right things in the wrong. Since preparing checks is software development, we should expect the occasional error or misunderstanding or misapprehension. Perhaps there’s something wrong with the specification, or with the application under test, or the test harness, or the environment, or with the coding of the check.

    Checks are very good for analyzing specific outputs of a given function. They might also be helpful for programmatically evaluating some aspect of the product’s performance. There’s a kind of evaluation that checks cannot handle, though: an observation of some aspect of the program, and a decision about value. A check can only tell you whether some particular, explicit claim has been met or not met. A check cannot make an assessment about whether some person will be happy or upset (“Well, we got the right page, but geez, it took a long time to come up. That’s… surprising. I didn’t expect that.” “Well, we got the right page, but on THIS display, the text seems to spill outside the bounding box. Weird.” “Everything seems to have happened okay, but… when I scan through the log files, I notice that this procedure posted the ‘Completed’ message before the last transaction. What’s up with that?” “Given this problem in the product, how would we suggest tech support person help a customer to work around it?” “The documentation for this third-party API doesn’t say we can call this function in this way—but it doesn’t say we can’t either. I wonder what would happen if we did? And then, if it appeared to work, what other variations would we consider to see if it worked in other circumstances.” “Dang, this bug doesn’t seem to reproduce consistently. What should I vary to try to make it more likely that I’ll see it again?”

    In your reply to Iain, you asked if the instances that he brought up were grounded in reality or merely based on what he had been told. I can assure you that each of the preceding cases come directly from my own experience developing and managing and testing software over (oh my God) 25 years, and unless you got into programming last week or have been almost catastrophically unobservant over a long career—both of which I sincerely doubt—these things have happened for you too. You may have made these discoveries entirely accidentally, serendipitously. Discoveries like this were sometimes like that, for me, to be sure. But I made some of the discoveries because I sought them out, because I tested. I engaged in a search for problems, not just in my planning, but also based on engaging with the product or some part of it. I did not simply use a program to check the program against a preconception.

    SUT’s are full of orthogonalities that we can always take advantage of to reduce our testing space (this is what professional testers, like you and me, do with every SUT, and I’m sure you’ll agree).

    Professional testers like me recognize that what you call “reducing our testing spaces” is in fact reducing our models of the testing space. There’s a big difference between the model and the thing it represents. Taleb points out that models, Platonic representations of reality, are not wrong; they’re only wrong in specific applications, and we do not know a priori how they are wrong. In that, he says, they are like very powerful medicines with severe and unpredictable side effects. In testing, one side effect of over-reliance on a model is this: being blind-sided by a problem that has not been modeled mathematically, but that affects customers nonetheless.

    Even though machinery and code can provide awesome assistance to humans, the simple fact is that machinery is incapable of making observations independently. Machinery can’t prove that something is good; it can only help a human to identify consistency or inconsistency with a proposition. A check cannot make an observation other than one that has been programmed. But humans can. For that stuff, we need another concept that includes the concept of checking: testing.

    Testing is not about verification—certainly not exclusively; testing is about understanding the product we have, to see how it differs from the product we hoped we’d get. Testing is not only about verifying special cases; it’s also about interacting with the product and considering how we’d identify a special case in the first place. Testing is about more than using machinery to check a result produced by other machinery. It’s about investigating the product, looking for threats to value. Testing is not only about confirming a predicted, expected result. It’s about exposing ourselves to the unpredicted and the unexpected. Testing isn’t only about writing and running theorem provers; it’s also about discovering how your first round of theorem provers missed things that were important to your or your customers. Testing is not only about functional correctness; it’s also about asking and answering the question “is there a problem here?”

    Let’s face it, these consultants are very good at coming up with definitions that merely *sound* important, scientific and truthful, while their aim is just to keep their classes and their pockets full.

    This is something you’ve claimed repeatedly. It includes the premise that you can read my mind and understand my goals. That premise is false. Note that none of Iain, Petteri, Joe, or Kim are consultants that would profit from the distinction; they’re people who actively practice and study a practice that is wider than your narrow conception of it, in which they have found the distinction useful. You don’t find the distinction useful. That’s okay. Your arguments present a canonical example of why I (and they, no doubt) find the distinction useful: as a mathematician and programmer, you ignore even the possibility of problems that cannot be found by mathematics and programming, but that nonetheless threaten customer value. To you, testing IS checking. To them, and to me, testing is more than checking. Good luck with your thing.

    —Michael B.

  • avatar
    Mario Post author

    Michael,

    Thanks for your reply.

    Even though we clearly disagree on many things, I still think it is worthwhile to have these kinds of discussions. They at least let us understand each other and grow from different points of view, which is what our industry sorely needs.

    On the other hand, I’m sure Bach would have been glad to be “open” about things and willing to talk had there been some money involved… isn’t that the point of his courses anyway? To “teach” his way of thinking to people who don’t already think like him?

    By the way, the “testing vs checking” distinction reminds me of the Glib continuum that Kathy Sierra talks about in this awesome post: http://headrush.typepad.com/creating_passionate_users/2006/04/when_only_the_g.html

    At any rate, I appreciate your comments.

  • avatar
    Michael Bolton

    On the other hand, I’m sure Bach would have been glad to be “open” about things and willing to talk had there been some money involved… isn’t that the point of his courses anyway? To “teach” his way of thinking to people who don’t already think like him?

    You have your reply to that on James’ blog. Note that neither he nor I receive money for the time we take in replying to you. Nor, I might add, do we receive money from the people that we help in email, at conferences, via Skype coaching, after class time, face to face when we’re at home or on the road

    By the way, the “testing vs checking” distinction reminds me of the Glib continuum that Kathy Sierra talks about in this awesome post.

    So I’ve spent the better part of 10 years thinking about testing and checking. In 2009 I articulated it, with James’ help. In 2013, James and I refined it, after recognizing that we had achieved only shallow agreement on the subject. Who knows what might happen another three years from now? No matter; this isn’t a meeting, and no one is on the spot, so I don’t think this is an instance of what Kathy is talking about in her awesome post.

    —Michael B.

  • avatar
    Mario Post author

    Michael,

    There really is no relevance between how long you’ve been thinking about the subject and whether it’s correct (or not). You can think about an ill-defined subject for as long as you like, and it will not make it correct simply by the passage of time.

    For instance, for a very long time (more than one thousand years, in fact: from the 2nd century AD up to the 16th), many people including scholars, philosophers, astronomers, clerics, and royalty spent time thinking, discussing and asserting that the Ptolemaic model precisely and correctly defined the entire universe. And most of these were smart people. In fact, they even tested (using your definition), time and time again, the Ptolemaic model and kept getting positive results, thus re-enforcing their beliefs.

    But it turned out to be dead wrong.

    Same thing happened with the anatomical model of the human body by Galen. The model seemed to be the right one and it lived unquestioned for over a millenia. However, it was eventually found faulty by Andreas Vesalius much, much later (also 16th century, in fact).

    Now, I’m not questioning your experience. This is not what it’s about. If you’ve been thinking about this for a long time, good for you. Many people prefer to take whatever other people say as if it was written in stone and defend it like Nitrian monks; now whether it is because they don’t like to think for themselves or are simply born to be followers, I will not speculate.

    But I have to say that just like the Ptolemaic model ignored crucial observations that caused it to be a flawed model, so does the “testing vs checking” distinction is ignoring actual mathematical results that make it ultimately unfounded; and it does not magically become correct by how many people defend it or find it useful.

    Glib, therefore, it is very much so.

  • avatar
    J. Michael Hammond

    Mario – is it possible that this whole argument is based on different models of a software system?

    Your arguments all hold true – in fact they’re pretty ironclad – if we assume a software system is an implementation of a known, specified Von Neumann machine. If we all agree that the underlying Von Neumann machine is “correct” and what we’re doing is demonstrating that the software system is a correct implementation of it, certain principles apply, certain theorems may be uttered and proven, and so on. There is truth (and, whatever you’d call it, “falsity”,) in this world.

    But the context-driven guys don’t make this assumption. They have chosen to study a different beast: a software system built on a specification that may be incompletely elaborated, is actually incomplete, which is believed to be different things by different people, and so on. In this world, attempting to utter and prove certain theorems just doesn’t even make sense.

    I lived in the former world in my undergraduate and graduate classrooms. I live in the latter world “out here”. Out here, the context-driven guys’ arguments are more useful.

    –JMike

    –JMike

Comments are closed.