PerfSpy: To automate or not – A cost perspective

Last week, in a meeting, a QA manager proposed that we needed to make up for some automation test cases that were dropped in the last release due to feature changes and difficulty in maintaining them. My boss asked point-blankly, “why? what is the point of automation?” At first, I thought this question was obvious, we all know the benefits of automation:

Catch regressions

Provide fast feedback, the later a bug is discovered, the most cost it takes to fix it

But he is my boss, I didn’t want to throw these easy answers to him. Back at my desk, I couldn’t get rid of this seemingly stupid question. When I was a DEV manager, I was guilty of pointing fingers at QA for letting out stupid bugs; now I have a chance to look at both sides, the more I think about it, the more profound it becomes.

Think how a tester typically writes test cases:

First he does manual tests of a feature
He discovers bugs, prompts DEV to fix them and redo the manual test; then discovers other bugs, and repeats the process
Finally he thinks he has got rid of most bugs, and he starts to write automation tests
Writing automation tests is just like writing DEV code, he might need write/debug/test several times to get the automation tests ready
Because the tester is busy writing automation tests, he is not able to invest time to manual test other features.

If we assume the time cost to run a test manually is 1, writing an automation test may be 10 or even more. And running the automation test at this point doesn’t bring any value. Manual test has caught most of the bugs, running the automation test won’t find any new bugs.

As an extreme case, if all automation tests are written after the system under test (SUT) is stabilized, running all automation tests won’t find any new bugs, all the investment on automation for this release has no benefits at all.

So when and how does automating test cases redeem itself?

Cost of repeatedly running tests

Running a test manually 1 time may take little time, but if the test has to be run many times, the time cost adds up; writing an automation test may take 10 or even more time, but running it 100 times doesn’t take any more human time. So the time cost is amortized if the automation test case has to be run many times.

What tests need to be run many times? Examples are:

Smoke tests. These tests need to be run after each build to make sure no serious bugs have been introduced by code changes.
Compatibility tests, such as different operation systems, different databases etc.

With the extreme case that all automation tests are written after the SUT is stabilized, the investment on automation will be redeemed in the next release, acting as regression tests to ensure changes made on this current release do not unintentionally change old behaviors.

Maintenance cost

Writing an automation test case may already take 10 times more than running the test manually, maintaining an automation test case may take even more time, in fact, many automation test cases are dropped or reduced to nearly being useless because of high maintenance cost. The life cycle of an automation test case is like this:

If a test becomes useless or dead before it has been run enough times to justify its creation and maintenance cost, the investment on automating this test is not recouped. If a test case breaks for bad reasons, the cost of fixing it multiple times diminishes its value as well.

Let us consider why an automation test case breaks.

Bad break

If a test case breaks because of elements unrelated to the intention of the test case, it is a bad break. GUI automation test cases are often more fragile and suffer from bad breaks.

For example, a user story is that a user has 50$ in his account, he can type in 100$, click on “withdraw” button, get notified on the screen “Sorry, you can’t withdraw 100$”. The intention of the test case is to test users can’t overdraw their accounts. If the “withdraw” button is changed to “WITHDRAW”, it doesn’t change the intention of the test case, and it shouldn’t fail. If the prompt on the screen is changed to “Sorry, you are not allowed to withdraw 100$” , it usually shouldn’t break the test case either – the SUT should return an ERROR code, which is more stable, and the test case should look for the ERROR code instead of the error message.

There is another kind of bad break that is related to data. Ideally, every test case should be isolated, and every test case should have a clean slate to start with. For example, a user saves 100$ into his account which originally has 100$, and then he withdraws 50$, he should have 150$ left. If before he withdraws, another test case runs that the same user withdraws 50$, the first test case will fail because the balance will be 100$. This is a simple example, and can be dealt with in many ways, for example, data shouldn’t be persisted to real database, or each test case creates a unique account, or both test cases should run in their own transaction etc. But in reality, the reason to have GUI test cases in the first place is because it is hard (or even impossible) to do modularized tests, and test cases have to be run against the real database.

Too many bad breaks will cause teams to lose faith with the automated test cases, cause them to short-cut, for example, by commenting out the parts that often fail, or by commenting out assertion. Thus, although test cases run green, they are becoming simpler and simpler and reduced to being useless. Worse still, it gives a false safe feeling to teams as they think the SUT has passed automation tests.

Break caused by feature change

If the feature under test is changed, the test case of course will break. For example, now the bank allows overdrawing to a limit of 100$, overdrawing 100$ on a 50$ can succeed now. The test case that previously expected an ERROR code now breaks.

In reality, feature changes might be very big, it might be more cost-effective to rewrite test case rather than tweaking the old ones.

This happened with my team. The previous release changed some features greatly, and the old test cases were so hard to maintain, we simply dropped them.

Break because of bugs

Finally, this is the reason we hope a test case break! The more such breaks, the more value test cases are.

We would especially expect such breaks when we are changing some common code that is used in many places. In a tightly coupled system (or in other words, in real-word systems where architecture inevitably decay), a seemingly small change may ripple through the whole system, and nobody can tell exactly what impact is. In this situation, automation test cases act as:

A safety-net to catch regression tests.
Documenting the impact

Increase cost-effectiveness

From the above analysis, it is clear to increase cost-effectiveness, we need to:

reduce cost by making writing and maintaining automation easier
add value by automating in critical areas. Test cases against these critical areas will be run many times, and guard against regressions

PerfSpy

Sunday, March 15, 2015

To automate or not – A cost perspective