It is not your job to hide application/environment issues
Let’s say that a Selenium test fails because a page of the site loads partially even after waiting for 30 seconds. Or the page does not display some data.
What should you do about it?
You could:
1. add a static wait of 10 seconds to wait more before interacting with the page
oh no, it is still not working? what to do now?
increase the static wait to 30 seconds.
2. add a @Retry annotation so the test is executed again.
oh no, it is still not working? what to do now?
change @Retry to @Retry(2)
3. reload the page as the page loads correctly most of them time after being reloaded.
oh no, it is still not working? what to do now?
reload the page in a loop until it is loaded fully.
4. re-run the failed tests only through TestNG.
oh no, some tests are still failing after being rerun? what to do now?
re-run the failed failed tests again.
5. re-run the CI/CD job.
oh no, it is still failing? what to do now?
re-run it one more time.
6. verify the test manually
All these “fixes” try to rescue the test from failing.
They also hide a problem, either of the application or of the environment so no one will know about it.
What is missing from the above list?
Looking for the root cause.
The root cause may be related to application issues or environment performance.
Once you confirm that this is the reason for the test not working, please ask the dev team to join the discussion.
It is not an SDET responsibility to "rescue" the failing tests so that the automated tests "pass 100%".
If issues are noticed, they should not be "managed" but reported to dev team.
Because, whatever issues you see while running tests in CI/CD, if unreported to dev team, they will just spill to live users.
Thanks for reading.