Last week, in a meeting, a QA manager proposed that we
needed to make up for some automation test cases that were dropped in the last
release due to feature changes and difficulty in maintaining them. My boss
asked point-blankly, “why? what is the point of automation?” At first, I thought
this question was obvious, we all know the benefits of automation:
- Catch regressions
- Provide fast feedback, the later a bug is discovered, the most cost it takes to fix it
But he is my boss, I didn’t want to throw these easy answers
to him. Back at my desk, I couldn’t get rid of this seemingly stupid question. When
I was a DEV manager, I was guilty of pointing fingers at QA for letting out
stupid bugs; now I have a chance to look at both sides, the more I think about
it, the more profound it becomes.
Think how a tester typically writes test cases:
- First he does manual tests of a feature
- He discovers bugs, prompts DEV to fix them and redo the manual test; then discovers other bugs, and repeats the process
- Finally he thinks he has got rid of most bugs, and he starts to write automation tests
- Writing automation tests is just like writing DEV code, he might need write/debug/test several times to get the automation tests ready
- Because the tester is busy writing automation tests, he is not able to invest time to manual test other features.
If we assume the time cost to run a test manually is 1,
writing an automation test may be 10 or even more. And running the automation
test at this point doesn’t bring any value. Manual test has caught most of the
bugs, running the automation test won’t find any new bugs.
As an extreme case, if all automation tests are written
after the system under test (SUT) is stabilized, running all automation tests
won’t find any new bugs, all the investment on automation for this release has no benefits at all.
So when and how does automating test cases redeem itself?
Cost of repeatedly running tests
Running a test manually 1 time may take little time, but if
the test has to be run many times, the time cost adds up; writing an automation
test may take 10 or even more time, but running it 100 times doesn’t take any more
human time. So the time cost is amortized if the automation test case has to be
run many times.
What tests need to be run many times? Examples are:
- Smoke tests. These tests need to be run after each build to make sure no serious bugs have been introduced by code changes.
- Compatibility tests, such as different operation systems, different databases etc.
With the extreme case that all automation tests are written
after the SUT is stabilized, the investment on automation will be redeemed in
the next release, acting as regression tests to ensure changes made on this current
release do not unintentionally change old behaviors.
Maintenance cost
Writing an automation test case may already take 10 times
more than running the test manually, maintaining an automation test case may
take even more time, in fact, many automation test cases are dropped or reduced
to nearly being useless because of high maintenance cost. The life cycle of an
automation test case is like this:
If a test becomes useless or dead before it has been run
enough times to justify its creation and maintenance cost, the investment on
automating this test is not recouped. If
a test case breaks for bad reasons, the cost of fixing it multiple times
diminishes its value as well.
Let us consider why an automation test case breaks.
Bad break
If a test case breaks because of elements unrelated to the
intention of the test case, it is a bad break. GUI automation test cases are
often more fragile and suffer from bad breaks.
For example, a user story is that a user has 50$ in his
account, he can type in 100$, click on “withdraw” button, get notified on the
screen “Sorry, you can’t withdraw 100$”. The intention of the test case is to
test users can’t overdraw their accounts. If the “withdraw” button is changed
to “WITHDRAW”, it doesn’t change the intention of the test case, and it
shouldn’t fail. If the prompt on the screen is changed to “Sorry, you are not
allowed to withdraw 100$” , it usually shouldn’t break the test case either –
the SUT should return an ERROR code, which is more stable, and the test case
should look for the ERROR code instead of the error message.
There is another kind of bad break that is related to data.
Ideally, every test case should be isolated, and every test case should have a
clean slate to start with. For example, a user saves 100$ into his account
which originally has 100$, and then he withdraws 50$, he should have 150$ left.
If before he withdraws, another test case runs that the same user withdraws
50$, the first test case will fail because the balance will be 100$. This is a
simple example, and can be dealt with in many ways, for example, data shouldn’t
be persisted to real database, or each test case creates a unique account, or
both test cases should run in their own transaction etc. But in reality, the
reason to have GUI test cases in the first place is because it is hard (or even
impossible) to do modularized tests, and
test cases have to be run against the real database.
Too many bad breaks will cause teams
to lose faith with the automated test cases, cause them to short-cut,
for example, by commenting out the parts that often fail, or by commenting out
assertion. Thus, although test cases run green, they are becoming simpler and
simpler and reduced to being useless. Worse still, it gives a false safe feeling to
teams as they think the SUT has passed automation tests.
Break caused by feature change
If the feature under test is changed, the test case of
course will break. For example, now the bank allows overdrawing to a limit of 100$,
overdrawing 100$ on a 50$ can succeed now.
The test case that previously expected an ERROR code now breaks.
In reality, feature changes might be very big, it might be
more cost-effective to rewrite test case rather than tweaking the old ones.
This happened with my team. The previous release changed some features greatly, and the old test cases were so hard to maintain, we simply dropped them.
Break because of bugs
Finally, this is the reason we hope a test case break! The
more such breaks, the more value test cases are.
We would especially expect such breaks when we are changing
some common code that is used in many places. In a tightly coupled system (or
in other words, in real-word systems where architecture inevitably decay), a
seemingly small change may ripple through the whole system, and nobody can tell
exactly what impact is. In this situation, automation test cases act as:
- A safety-net to catch regression tests.
- Documenting the impact
Increase cost-effectiveness
From the above analysis, it is clear to increase
cost-effectiveness, we need to:
- reduce cost by making writing and maintaining automation easier
- add value by automating in critical areas. Test cases against these critical areas will be run many times, and guard against regressions