Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Reproducing a test failure

make roachprod-stress and make roachprod-stressrace are your friend. They remote-execute the stress tool (which is also your friend, see make stress PKG=... TEST=... TESTFLAGS=... STRESSFLAGS=... and make stressrace if you’re on a beefy machine or on your gceworker; you can use the STRESSFLAGS var to pass options to stress (see bin/stress --help)).

One common mistake is to forget to unskip the test that you’re stressing (if it is skipped to begin with, as it may be as the test-infra community service team does that to flaky tests). It happens to everyone. Just keep in mind that this is a thing that happens, and if the number of iterations seems to fly up very quickly, ponder whether it’s currently happening to you.

stress{,race}

./dev test --stress pkg/something --filter '^MyTestName$'; ideally on a gceworker (to avoid clogging your work station).

If this doesn’t yield a reproduction in due time, you could try under race (add --race flag) or adjust the --stress-args (see go run ./vendor/github.com/cockroachdb/stress --help).

roachprod-stress{,race}

When a gceworker won’t do, you can farm out the stressing to a number of roachprod machines. First create a cluster:

roachprod create $USER-stress -n 20 --gce-machine-type=n1-standard-8 --local-ssd=false

Then invoke

Code Block
make roachprod-stress CLUSTER=$USER-stress PKG=... TESTS=... [STRESSFLAGS=...]

Once you know how to reproduce a failure, you can think about bisecting it or to reproduce it with additional logging.