08 Oct

Taking the 'test' out of in 'Contest' - by Shmuel Gershon

08 October 2010 by Shmuel Gershon

Yesterday my company (Intel Corp) held a conference for Software Professionals in Israel. One of the activities, along with the technical lectures, was a day-long coding contests, in which contenders submitted code to solve given problems and data sets. I was one of the 1st place winners (it was a tie between some competitors), perhaps the only tester in the list.

During the preparation of the conference, there was talking about promoting a testing contest, and similar comments were heard during the conference itself: If there's a coding contest, why not holding a testing contest? In fact, I volunteered myself to help if such a contest was carried on. I don't particularly like testing contests because you can't pick a real testing winner by looking at definite criteria (see my post on evaluation, here or here)

Then throughout the day, and as I noticed how I worked for the coding contest, I started to understand more about why a testing contest wouldn't work.
It's not only because testing is hard :). It's more because testing is hard to define.

A coding contests has some very definite outputs. You write code, it either compiles or not. You run the code over the given data test sets, and it either computes a correct answer or not (correct answer compared automatically towards an oracle solution, this software suffers from the halting problem undecidability just as any other). You wither get these right, or you get these wrong. Software, with all it's abstraction, has still got tangible characteristics.

But testing stands in a more intangible space.
On my way back home I told about the testing contest to a programmer (who got 1st place as well), and he asked "A testing contest? How? You can't rate testing activities."
I was very happy that he said that, because I was half expecting the same trivial answer we are used to receive: "You can count bugs and pick the winner with most bugs". This programmer understands that testing is not about finding bugs (although bugs are part of testing, as one of the information types we uncover). Michael Bolton once wrote that the best programmers he's ever seen have been great critical thinkers. Checks.

Back to the intangibility. Far from having clear cut results (a source code file that compiles and answers a question with the required answer), testing is a service. You can't measure a service by counting discrete actions taken to serve it.
Testing happens at so many levels and dimensions at once, that you can't keep track of score in any fair way.

It would be funny if at a "Clinical Psychology Professionals" conference there was a "Psychologist Contest", in which Drs have to provide mental health care to patients by asking as many questions as they can. The Doctor that asks more questions wins.
Or a "People Managing Contest" where managers have to quickly provide as many compliments as they can while a "Moral Boost Gauge" measures which manager has the most engaged team?

So why do we still want testing contests?
I think that part of the problem is the relation is that we still think software testing is very related to computers, or software, or computer science, or programming. But testing has very little to do with these (surprise!). Testing is about studying value, and value is related to people.
So instead of thinking that "testing is like computer programming, but from another point of view", we need to think that "testing is like studying people, but with a computer software point of view". That will help us measure testing in a more relevant way.


But what about the contests that exist out there? Every company holds a testing contest once in a while. uTest have public bug battles, for example.
These contests should not be called "Testing Contest", in my new humble opinion. These are "Bugging Contests" or "Bug Reporting Contests", but there's limited testing happening on them. My experience in in-company contests (I've participated in some) and with uTest Bug Battles (participated too (got a Best Feedback award)) confirm that.

So "Bug Reporting Contests" are possible. But there's a lot of intangibility over what is a bug and which exposure it gets, which makes it less fair and gets people angry (that's from experience too).
So if we do want to make contest that somewhat relate to bugs or testing, we have to pick testing activities that are clear cut:

  • Maybe a "Crashing Contest" where testers have to find as many ways to crash a software would work well -- a crash is not given to discussion (it either crashes or not), can be counted, and are great fun to search for.
  • "Security Penetration Tests" contests are definite as well. Testers that can crash, halt or bring internal secret data from a system. This can be counted and scored.
  • "Misspelling Contests" will work as well, as they can be counted and there's little discussion about them (we can have the Oxford dictionary and style manual as oracle).

But "Testing Contest"? Testing can not be weighted or measured in a one-day context-detached situation.


An alternative to contests can be collaborative (or at least interactive) testing sessions. Where people can approach people that can tell them about value and give them feedback about the information they come up with.
Or Testing Dojos, like the ones that happened recently at Agile Testing Days.
But in these everybody wins, so there's no one big prize. Maybe that's what people don't like?


What do you think? What value have you derived from testing contests?
How do you think we can build a contest with substance?
. Shmuel

Ps1> Coding contests also can't measure very important traits of programming, like maintainability, testability, elegance, efficiency... So they've got some limited value as well.

 

 

12 comment(s) for “Taking the 'test' out of in 'Contest' - by Shmuel Gershon”

  1. Gravatar of Ajay Balamurugadas
    Ajay Balamurugadas Says:
    Excellent post. I do agree that testing is indeed a service and it has lot to do with value and people. I liked the Psychology example. Excellent.
  2. Gravatar of Shmuel Gershon
    Shmuel Gershon Says:
    Thanks for the comment, Ajay!
    We have to extend this view whenever possible, it helps smashing wrong assumptions about testing :)
  3. Gravatar of David O'Dowd
    David O'Dowd Says:
    I have an idea of one way to hold a testing contest which might be of interest to you. Please see my blog post. Liked your post!

    http://www.agiletester.info/2010/09/kobayashi-maru.html
  4. Gravatar of Shmuel Gershon
    Shmuel Gershon Says:
    Hi David, thanks for commenting.
    Yes, I like exploratory games -- they're like the dojos I mentined in the text -- everybody learns, but it they're not done to pick one winner.
  5. Gravatar of Issi Hazan
    Issi Hazan Says:
    That's will be true for running contests too.
    for example, The ability to run 100 meter is only small part of walking and running in the real world, nevertheless we the human beings like to gather together and watch the contest.
    While it will be dangerous to do contests like that to evaluate employees, I think that contests for the sake of fun which is related to the real world are a good idea while we keep remember that this is a very limited barometer for the real world context.
    and the most important - if we will limit the contests in Intel only to coding - I will never have chance to win a macbook Air... :-)
  6. Gravatar of Shmuel Gershon
    Shmuel Gershon Says:
    Issi, good comment.
    Let me disagree on the running contest. 100m runners are hired/evaluated for the sprints they do, and the contest fits that in a 1:1 match. Ussain Bolt is a great sprinter, but maybe he's not a great Marathonist (who am I to say that, :)...). No one thinks he's a great marathonist, no one tries to have him run marathons or train him for that. His job is to sprint 200m. He does that better than anyone and the competition measures this capacity.
    There are other people who are hired and measured for pedestrianism. They walk. Probably better than Ussain.

    You can do a bugging contest and call it a testing contest. But then you are motivating the trivialization of the craft, and also getting testers angry for bad criteria.

    Fun is allright, I like fun too. But contests are dangerous too. Any motivator is dangerous when motivates the wrong skill.
    Wouldn't Dojos be fun, maybe even more?
  7. Gravatar of ravit
    ravit Says:
    Hi Shmuel,
    1) I mostly agree with what you wrote. I have one reservation - I am not sure there can also be a good programming contest. Who is the best programmer? The fastest? The most efficient? The one that produces fewer bugs\ crashes?
    The situation you described is really well defined, and I think that is the reason the competition was doable. Other situations will be more complex - if for example a group programmers were asked to implement a certain idea, how would we judge who is the overall "best"?
    So, I agree with you about the inability to do a testing contest, but I also think that this problem is much bigger. The problem to compare quality occurs every time we don’t have a set of defined criteria, and this happens a lot because it is often hard to find criteria for what it is to be “good”.

    2) About the substance of contests – if we are talking about in-company contest, I think the added value can be a focus on a certain element of testing. For example, if the aim of the contest if to find crashes, it is likely that after the test the testers in the company will be more crash-oriented.

    3) Another type of test-contest that sounds good to me is a “test-plan” contest.
  8. Gravatar of Michael Stahl
    Michael Stahl Says:
    Gershon -
    I beg to differ.
    Ravit’s post actually says it all: The contest is for SOME aspect of the whole coding landscape. If it’s OK for coding, it’s OK for testing.

    In a testing test you'd craft a piece of code with specific bugs, and are looking to see if people managed to find the bugs you planted in the code.

    Assuming you implanted faults are “fair”, (that is, representing a reasonable real-life coding mistake), and not something like


    Read (a, b)
    If (a = 1234345534324 the crash)
    Else return a+b

    then I think you can have a testing contest – which tells you of SOME aspect of the testing capabilities of the contestants.

    Michael.
  9. Gravatar of Shmuel Gershon
    Shmuel Gershon Says:
    @ Ravit and @ Michael.

    First, thanks for taking the time to reply. And thanks for disagreeing -- as with Issi's comment above, disagreements makes us think.

    I agree we can pick particular aspects and do a contest around that. For example, the crashnig contest or mispelling contest we mention above. Looks similar to picking one aspect of programming (coding) and doing a contest.

    OTOH, this aspect of programming is a significant one, while these aspects of testing are not so much. So by picking less important traits, we are trivializing testing (again).

    So, yes, it can be done. Fact is, such contests are done often. But to what cost?

    Michael, in your model of implanted faults, I believe that a team or tester that finds many bugs, but none of them in the fault list, does not get prizes. It is fair, as long as it is disclosed at the beginning. But aren't we missing skill recognition here?

    Anyway, this gives me a good idea. What if instead of finding bugs, testers were following the app in a 'treasure hunt' of sorts? The application prints when a desired spot was reached, and gives a hint about with which technique to find the next one.
    Like:
    * "Intrinsic non-trivial boundary value tested, 15 points. Next point can be found by exploring the error messages."

    Hmm. That's something to think about. I will take the weekend to think about it.
    What do you think of it?
  10. Gravatar of DebiZ
    DebiZ Says:
    Great discussion, Gershon,
    I agree with Michael, and I like his suggestion and your example of preparing code with specific bugs in it.
    Of course, I also enjoy looking for the most ways to crash an application, although "crash" can have more than one interpretation. For example, does "complete database deleted" count as an application crash?
  11. Gravatar of Shmuel Gershon
    Shmuel Gershon Says:
    @ DebiZ, thanks for your comment.
    I consider it a crash when the application gets out of control and exits in an unexpected form. That's an unhandled exception, or a core dump, or a BSoD for drivers...
    Data corruptions are good too, and may be even more serious/impactant than the crash (a crash is momentaneous while data corruptions, like diamonds, are forever).
  12. Gravatar of Jailen
    Jailen Says:
    Good point. I hadn't thgouht about it quite that way. :)

Leave comment:

Name:  
Email:  
Website:
Comment: