...
Reproducing a test failure
make roachprod-stress
and make roachprod-stressrace
are your friend. They remote-execute the stress
tool (which is also your friend, see make stress PKG=... TEST=... TESTFLAGS=... STRESSFLAGS=...
and make stressrace
if you’re on a beefy machine or on your gceworker; you can use the STRESSFLAGS
var to pass options to stress
(see bin/stress --help
)).
One common mistake is to forget to unskip the test that you’re stressing (if it is skipped to begin with, as it may be as the test-infra community service team does that to flaky tests). It happens to everyone. Just keep in mind that this is a thing that happens, and if the number of iterations seems to fly up very quickly, ponder whether it’s currently happening to you.
stress{,race}
./dev test --stress pkg/something --filter '^MyTestName$'
; ideally on a gceworker (to avoid clogging your work station).
If this doesn’t yield a reproduction in due time, you could try under race (add --race
flag) or adjust the --stress-args
(see go run ./vendor/github.com/cockroachdb/stress --help
).
roachprod-stress{,race}
When a gceworker won’t do, you can farm out the stressing to a number of roachprod machines. First create a cluster:
roachprod create $USER-stress -n 20 --gce-machine-type=n1-standard-8 --local-ssd=false
Then invoke
Code Block |
---|
make roachprod-stress CLUSTER=$USER-stress PKG=... TESTS=... [STRESSFLAGS=...] |
Once you know how to reproduce a failure, you can think about bisecting it or to reproduce it with additional logging.