Nightly tests
Overview
Every night, a set of tests run as part of the TeamCity project Nightlies. These tests have a few common characteristics:
- They set up a temporary CockroachDB cluster and run load against it.
- Their runtime is too long for them to be included in CI.
All nightly tests, except for Jepsen, use Terraform to create and destroy their temporary cluster. It may be wise to remove Terraform in the future, given the cognitive overhead of using a tool that provides much more functionality than we need.
Manually Running Nightly Tests
- The simplest way to run a nightly test is to go to the Nightlies project, find the test you want to run, and click the Run button.
- To specify flags for
cockroach
for a single test run, click the ... button next to the Run button for a test. Then, go to the Parameters tab and specify a value forenv.COCKROACH_EXTRA_FLAGS
. - To launch a test locally, use the appropriate
build/teamcity-*.sh
script. For many nightlies, this is build/teamcity-nightly-acceptance.sh. See the comments at the top of that script for setup steps.
Test Entry Points
TeamCity jobs execute various bash scripts that, in turn, run the relevant tests. These files are named teamcity/build-*.sh
. Key files include:
- build/teamcity-nightly-acceptance.sh - Wrapper script for running allocator, continuous load, and backup & restore tests.
- build/teamcity-jepsen.sh - Runs Jepsen tests.
Key Source Files
- pkg/acceptance/terrafarm -
terrafarm.(*Farmer)
is our thin wrapper around Terraform. It's used by most nightlies to setup, interact with, and destroy the temporary cluster for the test. - pkg/acceptance/terraform/azure - Contains the Terraform config files. Reference: Terraform configuration docs.
- pkg/acceptance/allocator_test.go - Allocator tests, including the schema change test and test steady 6 nodes.
- pkg/acceptance/continuous_load_test.go - Continuous load tests.
Allocator tests
The allocator tests stress the replica allocator under load. At a high level, they do the following:
- Create a temporary cluster.
- Restore tarballs of test data (which are TPC-H data sets with various scale factors) on to each node in the cluster.
- Add new nodes to the cluster. The only current exception to this is the "steady 6 nodes" test.
- Starts load generators.
- Wait until the replica allocators reach equilibrium (no replicas added/removed in the last N minutes).
- The test passes only if the standard deviation of range counts is lower than the threshold (set to 5% of the mean range count). This must happen before
TESTTIMEOUT
elapses. - Destroys the temporary cluster.
Continuous load tests
These are straightforward tests that set up test clusters and run load against them. They pass if TESTTIMEOUT
elapses with no crashes and no periods with 0 QPS.
Gotchas
- Care should be taken when upgrading Terraform. Various backward incompatible changes have been introduced over time (e.g.
terraform init
). - The cloud provider Terraform uses for the temporary clusters is independent of the cloud provider used by TeamCity agents. For example, at the time I'm writing this, TeamCity agents run on GCE agents, and most Terraform clusters run on Azure.
- Azure-based tests can take a long time to iterate on. Azure VM startup and destruction times (4-5 minutes) are much longer than GCE (~1 minute).
- Core dumps aren't enabled for cockroach
- Pressing control-C at certain times will leak cloud resources. Fortunately, there is a nightly script to clean up resource leaks.
- Terraform is not needed for our relatively simple needs. It'd be helpful to replace it with
roachprod
androachperf
. - If you're hoping to take advantage of (or have an issue with) probabilistic multi-tenant testing, have a look at this document which describes how it works.
Copyright (C) Cockroach Labs.
Attention: This documentation is provided on an "as is" basis, without warranties or conditions of any kind, either express or implied, including, without limitation, any warranties or conditions of title, non-infringement, merchantability, or fitness for a particular purpose.