Use retries to detect flaky tests in Playwright

What do I understand as a flaky test?

It's a test that is unreliable and unpredictable, not necessarily due to what it is testing, but due to how the test is written.

You can detect them by enabling retries:

playwright test --retries=3

Then if a test fails, Playwright will retry it up to 3 times, and at the end it will tell you if a test fails sometimes, or always. It's then up to you to investigate.

Some more thoughts

There are many reasons why test code can be written in a way that is unpredictable, but the one that bites me more often is timing-related. For example: a test that passes in one browser, but not in another browser. Or a test that passes when the browser UI is visible, but not in headless mode.

These are almost always due to differences in timing. One browser might take a little longer than another one to execute, and so that gives enough time for something to happen that doesn't happen in the other browser fast enough.

The problem is not the browser being faster or slower. The problem is that I forgot to await in the test code.

Once you get bitten by a few of these, you start suspecting and quickly figure out how to fix them (i.e. adding the missing await in front of the expect). But until you get there, there are two ways to avoid introducing those errors to start with:

  1. tooling in your editor or pipeline to detect the flaky code as soon as possible. For example, setting up eslint to warn you if you have perhaps not awaited something you should await.
  2. tooling in your pipeline to make playwright actively search for flaky tests when it runs the tests.

As to option 1, I don't always set up that kind of linting, and it might be coming up with false positives (what if I really don't want to await on things?). I think option 2 can be more write and forget because it won't depend on linters running, but on the tests running. Which is what you want to do!

It quite fascinating: even if these timing issues normally manifest when running the same code in different browsers, they also tend to be reproduceable by running it twice or more in the same browser. That's it! Because we're talking about really small timing differences. So let Playwright actively look for those on your behalf.

TIP: I actually also add the retries parameter to the test script in npm, something like this:

    //...
    "scripts": {
        "test:e2e": "npx playwright test --retries=3",
        //...
    }

... so that way, whoever checks out the project will run the tests the same way I do. If they modify the tests and they become flaky, this will tell them.

Of course this adds execution time, as tests are run three times (or more)! But it can be valuable, both for them and for Future You the next time you run the tests: you don't always have the same amount of context about the project in your head.

See also the Playwright documentation page for retries for more details and advice on how to avoid introducing other types of flakiness (e.g. writing tests that do not depend on each other, etc)