Ian Gilchrist
23 Jun

Test Automation How Far Should It Go?

23 June 2011 by Ian Gilchrist

In the old days (and I have been in the profession for about 25 years) testing of codeian gilchrist modules was an intensively manual process. Test harnesses needed setting up by hand, and then populating with carefully chosen input values and laboriously calculated expected outputs. If you wanted to monitor test coverage this generally had to be done by use of Trace statements and some kind of manual analysis after the event. Results were output in formats that needed skilled decryption before a verdict can be reached as to whether the test had passed or not.

The end result was (usually) code of good quality because it had had all this care and attention lavished on it, but also a sense that there were better things to do, and a productivity ratio of 3:1 test:coding was not considered good! As the kind of work that novice programmers were often put onto it could also be considered damaging from a morale point of view. But, we could all see the potential for a high degree of automation...

Surveying the scene now, 25 years on, it is possible to see massive changes: test harnesses can be generated in a matter of seconds, results are output in a clean and readable form, coverage analysis is there at the end of the test run, and most crucially the opportunity to generate tests which 'pass' is now on the table. What I am referring to here is the option to have the test tool automatically select inputs to drive code down a full set of paths, and even calculate what the expected results should be. Is this going too far? Surely the point of testing code is to demonstrate that it does (or doesn't) do what it is supposed to do. Does not the process described above have the potential to lead to a lead to an entirely mistaken sense of confidence? Are there circumstances when testing in this way has validity?

In a world where we see lives and livelihoods depending on decisions made within computers driven in turn by their software is it a good or a bad thing to be trusting software to test software?


Ian Gilchrist entered the software profession as a an Assembler language programmer in the early 1980's. He has worked at various levels including project manager in a variety of environments since then, using a variety of languages including Fortran, Ada and C. Most recently he has been involved in the production of IPL's testing tools and their application to safety-related and mission critical projects.




5 comment(s) for “Test Automation How Far Should It Go?”

  1. Gravatar of thiago kzao
    thiago kzao Says:
    Very interesting article.
  2. Gravatar of Tom du Pré
    Tom du Pré Says:
    There is a danger here of testing that the code does what the code does, rather than testing that the code does what the code is supposed to do. Whilst clearly there is some scope for automating the generation of test inputs it is likely that the result would be a very large number of equivalent tests that exercise less than their volume would suggest. Hello false confidence. I would also be very sceptical of letting a tool tell me what the results should be. The only way I can imagine it would be able to do that would be for it to examine the paths in the code under test. Again, this is just testing that the code does what it does, proving little other than the existence of some code. This sounds a bit like letting the inmates run the asylum.
    In my experience of testing (admittedly not as lengthy as Mr. Gilchrist, but still pretty substantial) most bugs that testers miss and embarrass us later are the result of poor test design and an inadequate understanding of the business requirements. When bugs get through and testers are asked to explain themselves, the most common response is "OMG I never even thought to test that scenario." You seldom hear "I tested that scenario, but I didn't put enough test inputs through." A computer is great at generating thousands of variants of test data inputs and there will be a time and a place for that, but only a human can really understand what the system is for, how it will be used and misused in real life, what external factors could impact it, how internal errors are handled and what knock on effects they could have, and so on and so forth. There seems to be a growing emphasis in the testing industry of testers being test automaters and producers of yet more code that happens to be test code. Although we should of course use the tools that make us efficient, isn't testing supposed to be about bug hunting and taking a gleeful and only slightly guilty pleasure in breaking something?
  3. Gravatar of David Ramsay
    David Ramsay Says:
    Having been testing software since 1978 I can only concur with what Tom has indicated.

    The best tool for the tester is BVA and EP (just in case boundary vale analysis and equivalence partitioning) and for this to work you need defined boundaries in a spec that can be checked against. Unless an automated tool can read the spec and/or the tool in use can take input from the test automator then throwing loads of data is no more than a performance test.

    You have to start small and prove the product does what is expected of it but then exercise all the out of bounds conditions to ensure that the code exists to prevent the software doing something unexpected. This assumes that you can identify those boundaries.

    Where a tool might help is to identify the conditionals and the paths through the code and what values it is checking for this would then allow the tester to 'read' what is going on in the code to verify the correct conditional tests are being applied.

    People who know me know that I feel that just testing automatically without thought about whether it is appropriate or cost effective is a value judgement for the Test Lead since they have to justify the budget and therefore the quality against time that is acceptable for the product.

    Test automation is still in its infancy (sorry Dot) and management still feel that if they automate they can save money - why else would they automate.

    What they are actually asking is for 2 development teams and 2 test teams (who tests the test after all) test automation costs up front but saves downstream. You need to balance the cost/benefit here.

    I do agree with Ian tho with regard to the output of the tests being 'readable' these days, but would it not be better if the development team developed testable code? That means being involved as a Test Analyst in the product definition and being able to challenge the design at a very early stage.

    I would take issue with Ian regarding the 'opportunity to generate tests that 'pass'', surely the purpose of the tester is to make sure that the tests fail (AKA find defects), a test that passes gives a feeling of comfort, it does assure the end user that it will work under all its expected conditions.
  4. Gravatar of Sanju Pillai
    Sanju Pillai Says:
    To me the question is not about how far test automation should be taken but how much - how much automation is automation enough? Did we stop to ask why we are automating one application and not the other?

    Automating test execution is just the beginning. There are key benefits to be had by automating aspects including test data creation and management, test infrastructure management, test processes, and so on.
    The truth is, most organizations that have not progressed to effective use of test automation and understand how much to automate and how much to test manually, don't perform good testing. That is, testing that gives them what they expect their applications to give them – cost savings, punctuality and quality.

    God, grant me the serenity to manually test what is IMPOSSIBLE to automate,
    Courage, knowledge and the tools to automate ALL that one can,
    And wisdom to know the difference.
  5. Gravatar of Ian Gilchrist
    Ian Gilchrist Says:
    Thanks for your various comments. My background is one where I always believed that the only purpose of testing is to find bugs and then retest after changing to ensure the fix has worked. The reason for my original post was that I am becoming aware of new types of testing, where automation is possible, but still worth debating the merits.

    One 'new' form of testing is robustness, which is where large volumes of test data input can be generated automatically solely from knowledge of the input type. For example, for a float input parameter there might be generated test cases with values FLT_MIN, 0, and FLT_MAX. If the test succeeds in running without crashing then this is a small demonstration that the code is robust i.e. probably free from obvious divide by zero or overflow problems. There does seem to be merit in this, though noting:
    (1) similar assurance can be reached using static analysis tools;
    (2) this approach won't work with all types e.g. address/pointer types.

    The other form of testing is what we in IPL call 'baseline testing'. This is applicable (we argue) in cases where you have a large volume of trusted code,, possibly many 1000s of modules and KLoCs but where no unit testing was done or exists in a useable form. In this situation it is rather difficult to make changes to code modules and retain confidence that the change has had the wanted effect and that no unwanted side-effects have been introduced. System-level regression tests are not dependable because they don't always test everything, and it would be uneconomical to start creating new unit tests by traditional means (even if semi-automated).

    Baseline tests could (if you agree) be generated from the trusted modules using information derived from the code on:
    a) input and output types;
    b) the logical branching structure of the code and the coverage level wanted;
    c) the set of external calls made and which need simulation.
    The output of this baseline test generation exercise is a set of module tests, one for each module, which 'pass'. This can then be used in a two-part process, to firstly check that changes in a module have been correctly implemented, and secondly in unit/regression test mode to ensure that on unwanted changes have occurred in all the other modules.
    Of course I accept that such a baseline test facility could be mis-used. It could for example be used to generate tests on new code, this giving a completely false sense of confidence that modules work as they should. But if used appropriately on code which has independent evidence that it does work, then the possibilities offered by baseline testing seem to offer a reasonable solution to the problem of maintaining a baseline of trusted code.

Leave comment: