Overview

TestServer and TestCluster are two frameworks we built to create Go unit tests for CockroachDB:

TestServer simulates a single CockroachDB node.
TestCluster simulates a multi-node CockroachDB cluster. Each node is simulated using one TestServer.

By default, TestServer uses in-RAM storage (not persisted) to make tests faster.

For integration tests, consider using roachtest instead. See: the page Roachtest vs TestServer for details.

Here are the main differences between a regular CockroachDB node (e.g. one started via cockroach start) and one simulated via TestServer:

Component

Behavior in cockroach start

Behavior in TestServer

SQL, HTTP and KV layers

identical

Coordination glue around SQL/HTTP/KV to create an overall running server (topLevelServer in code)

identical

Server configuration

Via CLI flags, defaults useful for production deployments.

Via TestServerArgs struct; defaults useful for testing.

For example, KV storage is in-RAM by default (no persistence) to make tests faster. There is also a predefined TLS configuration with a self-signed CA.

Of note, cockroach demo uses TestServer under the hood so all the properties and limitations of TestServer apply to cockroach demo as well.

Introduction to TestServer and TestCluster in Go unit tests

Tests mostly use the following programming pattern:

func TestSomething(t *testing.T) {
  defer leaktest.AfterTest()()  // verify no goroutine leaks
  defer log.Scope(t).Close(t)   // capture test logs intelligently

  ctx := context.Background()
  srv := serverutils.StartServerOnly(t, base.TestServerArgs{})  // initialize and start a TestServer
  defer srv.Stopper().Stop(ctx) // ensure the TestServer gets cleaned up at the end of the test
  
  ts := srv.ApplicationLayer()  // see below for an explanation
  
  // ... use ts in test code ...

Alternatively, the following is also possible:

    srv, db, kvDB := serverutils.StartServer(t, base.TestServerArgs{})

as it is equivalent to:

    srv := serverutils.StartServerOnly(t, base.TestServerArgs{})
    db := srv.ApplicationLayer().SQLConn(t, "")  // access to SQL
    kvDB := srv.ApplicationLayer().DB()          // access to KV

When a test needs to exercise a cluster of 2 or more nodes connected together, it can use TestCluster:

func TestSomething(t *testing.T) {
  defer leaktest.AfterTest()()  // verify no goroutine leaks
  defer log.Scope(t).Close(t)   // capture test logs intelligently

  ctx := context.Background()
  const numNodes = 3
  tc := serverutils.StartCluster(t, numNodes, base.TestClusterArgs{})  // initialize and start a TestCluster
  defer tc.Stopper().Stop(ctx) // ensure the TestCluster gets cleaned up at the end of the test
  
  ts0 := tc.Server(0).ApplicationLayer()  // see below for an explanation
  ts1 := tc.Server(1).ApplicationLayer()  // see below for an explanation  
  
  // ... use ts0 and ts1 in test code ...

In a nutshell, the result of StartCluster (TestClusterInterface) has a Server(nodeIdx) method which returns a different TestServer for each node in the simulated cluster.

Note: there is no benefit to using StartCluster with just 1 node. Prefer StartServer in that case.

High-level TestServer API

Let’s inspect the interface of Serverutils.StartServerOnly():

func StartServerOnly(t TestFataler, params base.TestServerArgs) TestServerInterface

type TestServerInterface interface {
  // ApplicationLayer returns the interface to the application layer that is
  // exercised by the test.
  ApplicationLayer() ApplicationLayerInterface

  // StorageLayer returns the interface to the storage layer.
  StorageLayer() StorageLayerInterface

  ...
}

The result of StartServerOnly is TestServerInterface, which contains the following two main methods: ApplicationLayer() and StorageLayer(). They refer to the following architectural diagram:

In short, ApplicationLayer() gives the test code a handle to the application layer of CockroachDB, containing the SQL and HTTP components; while StorageLayer() gives the test a handle to the storage layer with KV (replication, transactions) and lower level storage (Pebble).

In these two interfaces we see the following methods, for example:

// ApplicationLayerInterface defines accessors to the application
// layer of a test server.
type ApplicationLayerInterface interface {
  ...
  // SQLConn returns a handle to the server's SQL interface, opened
  // with the 'root' user.
  // The connection is closed automatically when the server is stopped.
  SQLConn(t TestFataler, dbName 𝓢) *gosql.DB
  ...
}

// StorageLayerInterface defines accessors to the storage layer of a
// test server. See ApplicationLayerInterface for the relevant
// application-level APIs.
type StorageLayerInterface interface {
  ...
  // LookupRange looks up the range descriptor which contains key.
  LookupRange(key roachpb.Key) (roachpb.RangeDescriptor, ⊙)
  ...
}

Finally, a few methods are implemented by TestServerInterface directly by inheriting from TestServerController; for example Stopper() is part of TestServerController. They correspond to “orchestration-level” control of the server.

Automatic cluster virtualization

Background - quick intro to cluster virtualization

CockroachDB has supported cluster virtualization since v20.x, as the underlying technology to CC Serverless. Cluster virtualization corresponds to the virtualization of the application layer (as per the diagram above). It’s possible for the SQL and HTTP components to be fully encapsulated into a “virtual cluster” such that multiple virtual clusters can exist side-by-side on top of the same storage layer.

This corresponds to the following diagram (theoretical):

In this diagram, the “System” box is a special SQL interface to the storage layer which exists outside of virtualization; it corresponds to the “control layer” of the virtualization system and can be used to set parameter across all virtual clusters.

TestServer and automatic cluster virtualization

By default, TestServer automatically randomizes the server architecture between one of the following choices:

That is, a virtual cluster is started probabilistically inside the TestServer and the TestServer’s .ApplicationLayer() accessor is configured to point to it.

Tests should be implemented to primarily access SQL/HTTP via the ApplicationLayerInterface returned by .ApplicationLayer() to automatically get coverage with and without cluster virtualization.

Meanwhile, regardless of the randomization, the following invariants are always true:

.StorageLayer() always points to the storage layer inside TestServer.
.SystemLayer() always points to the special system interface inside TestServer (previously known as “system tenant” in previous versions of CockroachDB).

Temporary API pitfall and misdesign

This section is the target of a redirect from a warning printed in tests when the test code uses an ApplicationLayerInterface method without calling .ApplicationLayer() first. For example:

ts := serverutils.StartServerOnly(t, ...)
defer ts.Stopper().Stop(ctx)

addr := ts.RPCAddr()  // prints warning, linking to this section

The recommended way to remove the warning is to make the test intent explicit by adding the missing call to .ApplicationLayer(), for example:

srv := serverutils.StartServerOnly(t, ...)
defer srv.Stopper().Stop(ctx)
ts := srv.ApplicationLayer()

addr := ts.RPCAddr()

More detailed explanation

The reason for this warning is that as of this writing, TestServerInterface also contains the following interface embedding:

type TestServerInterface interface {
  ...
  // ApplicationLayerInterface is implemented by TestServerInterface
  // for backward-compatibility with existing test code.
  //
  // It is CURRENTLY equivalent to .SystemLayer() however
  // this results in poor test semantics.
  //
  // New tests should spell out their intent clearly by calling
  // the .ApplicationLayer() (preferred) or .SystemLayer() methods directly.
  ApplicationLayerInterface
  ...
}

This was done because many tests were already implemented by the time .ApplicationLayer() was implemented.

However, this status quo is quite problematic.

Two problems result from this.

The first is incoherence: the following two excerpts are not equivalent, which is counter-intuitive and can be the source of bugs:

srv, db, _ := StartServer(t, ...)
// vs.
srv := StartServerOnly(t, ...)
db := srv.ApplicationLayer().SQLConn(t, ...)

The second problem is insufficient test coverage. If the test code only ever uses the implicit ApplicationLayerInterface, it will never be exposed to cluster virtualization via the randomization described above. So it will not exercise the relevant code paths and cluster virtualization will be under tested.

Bottom line: use .ApplicationLayer()!

In a later version, we envision to redirect the implicit ApplicationLayerInterface to the result of .ApplicationLayer() automatically, so that this entire section becomes a non-problem. Follow along here: https://github.com/cockroachdb/cockroach/pull/110001