Some Notes on Software Testing (using Java examples)

Alex Borgida
Dept. of Computer Science
Rutgers University

I. Principles

As Dijkstra said, testing only shows the presence of errors in a program, not their absence, because usually there are infinitely many possible inputs. So the goal of testing is to reveal errors. Therefore a successful test run is one that finds an error!! (Strange idea, isn't it - that's why testing is often best done by someone else than the programmer.)

A second principle of testing is that you need to know what the expected output is for some test input, otherwise it is a waste to do the test run!

A third principle is not to do ``testing on the fly'' by typing in input from the terminal, or editing a previous test file. Always prepare and save test data either in a file or a program so you can re-run the test easily later. Why? Well, after you change your program to fix an error, you need to go back and redo the previous tests - some things that worked before may not work now!!! (In general code that is a bug-fix is much more likely to be buggy itself!)

So now we are ready to talk about how to test. One approach is to use a random number generator to produce test data and keep on testing as long as you can. This is not as easy as it sounds. Most systematic approaches to testing are based on the idea of dividing the class of inputs into subclasses so that if you run a test for one member of the subclass, you assume you would get the same result for the other inputs in the same subclass (and so you do not have to run them).

II. Black-box/specification-based testing

Suppose you are given the specification of the program, but you do not know the code implementing it. You can still prepare quite a good set of test data for it. (This is what we do whenever we are grading the programs you hand in to us for assignments.) We shall consider two separate cases: (a) testing a single method; (b) testing a class, or collection of classes.

Testing a Method/Procedure

Here are three approaches that are widely known:
  1. Equivalence Partitioning:
    For each of the inputs and outputs of the method, consider equivalence classes of values; for inputs, consider in addition the possibility of invalid values (this is especially important for programs that interact with users). Here are some possible types of values, and their equivalence classes:
  2. Boundary values:
    Experience indicates that errors are likely to occur at the 'boundaries' of the above equivalence classes. For example, with the employee age set, values like 15,16,17, 79,80 and 81.
  3. Error Guessing: Experience indicates that with any numeric input, problems arise when the value is 0, 1 or (if possible), maxInt. Similarly, with any collection, errors arise when the collection's size is 0, 1 or maximal possible. This means that strings should be tested for being the empty string string, and the one-character string; arrays should be tested when they are empty, or have one value, or are full; etc.

    Of course, this is a catch-all category, so anything else you suspect could cause problems goes here.

    An example of black box testing

    Consider a program  mss, which, given a sequence of values a[1],  [a2],..., finds the subsequence a[i], a[i+1],..,a[j] with the largest possible sum.  Here is the precise specification for that program. (It is good to learn to give such precise specifications. The refernce by Liskov and Guttag at the end, teaches this material.)
    public class MCS{
      public int mcs(int [] A, int N ) throws IllegalN; 
    
     /* Find some subsequence of a given list, which has maximal sum.
    
       parameters
             N:  the number of elements in the list         
             A:  the list,as an array with indices staring at 0;
                    (notation will be A[0..N])
       returns
             the sum of values A[seqStart],...A[seqEnd]  
                    (abbreviated A[seqStart..seqEnd])
       modifies
              int seqStart //the starting index of the subsequence
              int seqEnd   // the ending index of the subsequence 
       exceptions
             IllegalN when not(0 <= N-1 <= A.length - 1)
    
    
       ASSUMES  A[0..N-1] is initialized                      
                                                            
       EFFECTS:
                 At least one non-negative value in A[0..N-1] and
                    0 <= seqStart <= seqEnd <= N-1  and 
                 return >= SUM{A[j..k]} for any 0<=j<=k<=N-1 
                           **OR**
                all A[i] < 0  and  returns=0,  seqStart=seqEnd=N+1
     */
    }
    In this case the inputs are array A and integer N, and the outputs are two ``global'' variables, seqStart and seqEnd, plus the value returned. The ``ASSUMES'' clause indicates what conditions are assumed and not checked by the method. If you call the method without the assumption being true, all bets are off. Note that this is different from error conditions checked, for which exceptions are raised. (For your information, I include at the end a way of presenting the above kind of information as part of JavaDoc format.) But the above was considering only the inputs. The same kind of analysis should be applied to the outputs!!

    For valid outputs, we expect 0 <= seqStart <= SeqEnfd <= N-1.  Boundary analysis suggests trying { seqStart=seqEnd=0 }, { seqStart=seqEnd=N-1 }, and { seqStart=0, seqEnd=N-1 }. (The other combination: { seqStart=N-1,seqEnd=0 } is not possible.)
        If possible, one can also try to find ways to get invalid outputs: return value less than 0, mixed up seqStart and end values, etc.

    To put these all together:

    1. you have a separate run for every invalid input class
    2. you try to make test cases to cover as many valid subclasses as possible. Ideally, one should have all possible combinations of valid input and output sets


    Here then would be one collection of test-data for the above example:

       N  |A.length|  A[0],A[1],...  |  seqStart  |  seqEnd  |  result 
    ------------------------------------------------------------------
       1  |  1     |  -3 
        
          |        |   0
       
          |        |   3
     ----------------------------------------------------
       3  |  4     |  -3,-3,-3       |
       
          |        |  -3,0,-3        |
       
          |        |  4,-5,1         |  0      |  0
       
          |        |  1,-5,4         |  2      |  2
       
          |        |  5,-4,1         |  0      |  2
    ----------------------------------------------------
    invalid:
    ----------------------------------------------------
       0  | 
     ----------------------------------------------------
       -1 |
     ----------------------------------------------------
       2  |    1  |  3
     -----------------------------------------------------

    II. White-box/code-based testing

    Note that we had chosen our test-data in a manner totally independent of an implementation. The textbook by Weiss, referenced at the end, shows two different algorithms, one of complexity O(n^3),  the other of complexity O(n). Presumably there will be some additional differences on how to test these. Among the systematic techniques for white-box test data generation we mention
    1. Statement Coverage The minimum requirement is that every statement of your program be executed at least once on some test run.
    2. Condition Coverage For the statement if (x<0) {y+=2;}, the statement coverage would allow a single test-run, with a negative x. Clearly, we want something stronger: every condition should be true and false in some test run. For loops, it is considered adequate if the loop is executed 0 times, once, and more than once.
    3. Data-flow testing Every possible ``control path'' between a way of setting a variable and a use of it should be executed. This is made hard by loops, which represent an infinite set of possible paths! So we settle for executing a loop 0, 1 or 2+ times.

    III. White-box testing of mss

    In general, this kind of testing is harder to do because you have to work back from the desired behavior to the input that will produce it!!

    One thing to do is to first start with the black-box tests and see what remains to be done. At the very least, you need to add test data to achieve ``statement coverage''.

    Consider the O(n) algorithm that has the following structure

      maxsume=0;thissum=0;
      for (i=0,j=0; j<a.length; j++) {
            thissum += a[j];
            if( thissume=maxsum ) {                 // test1
               maxsum=thissum; seqStart=i; seqEnd=j; }
            else 
              if( thissum < 0 )                     //test2
                    {i=j+1; thissum=0;}
          }
    If you look carefully, the original set of black-box test data was good enough to do statement coverage. To do more, your test sets should cause the loop to execute on some run 0 times. On three other runs the loop would be executed 1 time, with three different cases: { test1=test2=false }, { test1=true,test2=false }, { test1=test2=true }. On nine other runs, the loop would be executed twice, each of these independetly having the three test combinations above. This is getting to be labor intensive, but then who said it will be easy?!

    IV. TESTING CLASSES

    If you have an abstract data type, distinguish between two kinds of operations for an ADT: constructors/mutators which create/change the ADT, and observer/accessors, which let you examine the state of the ADT. Without some of each, you have a useless ADT!
        One minimal test of the ADT is to combine every possible way of constructing/mutating followed by every possible observation on the ADT. But first, one tries to identify a basic set of constructors/mutators, which allow you to achieve every possible state of the ADT.
        So for a stack, with constructors create,push,pop, and observers isEmpty, top, the basic set of constructors is create,push (since any stack you got to with pop, you could have gotten to without the pops, by omitting the pushes that got wiped out by the pops). [Mixed functions like topAndPop can usually be treated as observers.] One therefore needs to try every possible way of building a stack, using create and push, and then look at each such stack with every observer. Also, apply to it each non-basic constructor, like pop, and observe the result in all possible ways.

    This would lead to the following tests with stack

    1.   s=new stack(); s.isEmpty(); s.top() ;
    ---------------------------------------------
    2.   s=new stack(); s.pop();  s.isEmpty(); s.top() ;
    ---------------------------------------------
    3.   [..set up a stack..] s.push("A");  s.isEmpty(); s.top() ;
    ---------------------------------------------
    4.   [..set up a stack..] s.push("A"); s.pop();  s.isEmpty(); s.top() ;
    As usual, error cases should be run separately. And you may need to do more "set-up" before each sequence. Note that the principles mentioned in the first section apply here too. So, one would like to do different stack set ups, so that, if possible, isEmpty() returns both true and false in each of the above test categories. (Category 4 is the only one where you can do this: you can have the set up be just s=new stack(); or s=new stack(); s.push("B");.) Also, test changes/lookups occuring at the beginning/end of data structures.

    V. Support for testing classes

    A serious problem with object-oriented programs is that the methods of a class often call each other, and you seem to get caught in a vicious cycle where no procedure can work correctly till others do so. Here are some things to do.

    A good toString() method

    If at all possible, implement right from the beginning a toString() method for the class, which is the most complete 'observer' for your internal data structure. So, for a stack implemented as an array, StackAr, one might write something that displayed the values of the two variables, topOfStack and theArray.
        public string toString() {
          String res = "aStack<topOfStack=" + topOfStack + "; theArray=[";
          for(int j=0;j<=topOfStack; j++){
             res += theArray[j].toString() + "; "   ;
          res+= "]; array length ="  +  theArray.length + ">";
          return res;
          }}
    Note that we even include the array length in the "state display".

    Stubs

    Stubs are trivialized versions of programs that do something very specific, so you can be sure it is right. So in the above case, how are you to test the toString method without having push working also??? Well, you can make a stub for the push operation which ignores its parameter and always does the same thing, but right. For example,
       public void push(Object x){
           // a stub 
           theArray[0]="A"; theArray[1]= "B";
           topOfStack = 1;
       }
    makes it so that Stack s = new Stack(); s.push("anything"); leaves you in a precisely known situation, in which you can test your toString() function, to make sure it returns the right thing:
    "aStack<topOfStack=1;theArray=["A";"B";]; array length=10>".

    Some stubs compute or return the same thing all the time, like push above. Others, can be made to return a series of different values, depending on whether this is the first, second, etc. call to it. For example, suppose that for testing some other class, you need a stack, for which the first topAndPop returns "B", and the second one "F". Well, you can add to the class Stack a little helper array to store the "canned" responses of your function, as below:

    class Stack... {
      ...
      Object [] popResults = ("B" , "E");
      int popCt =0;
      public Object topAndPop()
        {//a stub
          return popResults[popCt++];
        }
      ...
    Note that this can be done to *any* implementation of Stack -- even if the final stack will use linked lists, a stub can use an array to save its responses..

    VII. Final Words

    Like many aspects of programming, testing is an art, learned through experience and practice. The above is just a tip of the iceberg, but should be a useful start. The following references are ones I am partial to. There are many others.
    1. B.Liskov and J.Guttag, Abstraction and Specification in Program Development, McGraw Hill Book Company, 1986. (Regretfully out of print.)
    2.  G. Myers, The Art of Software Testing, Wiley-Interscience, 1979 (ISBN 0-471-042328-1) (Also out of print and hard to find in libraries.)
    3.  B. Beizer, Software Testing Techniques, Van Nostrand Reinhold, 1990.
    4. M.A.Weiss, "Data Strutcures and Problem Solving using Java", Addisoin Wesley, 1998

    VIII JavaDoc specification of procedures

    public class MCS{
     /**
     * Find some subsequence of a given list, which has maximal sum.
     * @param A  the list,as an array with indices staring at 0;
     * @param N  the number of elements in the list         <pre> 
     * MODIFIES  int seqStart //the starting index of the subsequence
     * MODIFIES  int seqEnd  // the ending index of the subsequence </pre>
     * @exception IllegalN when not(0 <= N-1 <= A.length - 1))   <pre>
     * ASSUMES  A[0..N-1] is initialized                      </pre>
     * @return  the sum of values A[seqStart..seqEnd]  
     *                                                      <pre>
     * EFFECTS:
     *           At least one non-negative value in A[0..N-1] and
     *              0 <= seqStart <= seqEnd <= N-1
     *           return >= SUM{A[j..k]} for any 0<=j<=k<=N-1 
     *                    *OR*
     *          A[0..N-1] < 0  and  returns=0,  seqStart=seqEnd=N+1
     *                                                       </pre>
     */
     public int mcs(int [] A, int N ) throws IllegalN;
    }



    Alex

    Tue May 11 13:41:40 EDT 1999