Note: these are a work in progress; check with #technical-leads-council for questions/clarification.

Cluster settings

Appropriateness

Guideline: Consider appropriateness of a cluster setting versus some other configuration mechanism.

Organization and naming for cluster settings

Guideline: A name is composed of three main parts, the middle of which could have sub-parts, joined by dots: 

Example: sql.catalog.descriptor_lease_renewal.cross_validation.enabled

Guideline: Always use a separate suffix to identify the aspect of a behavior being configured, even when it is the only aspect being configured.

Guideline: Use only ASCII lower-case letters and numbers, avoiding any special characters or punctuation other than dot to separate parts of the name and underscore to separate words within a part.

This ensures that settings names can appear as "bare" unquoted identifiers in our SQL grammar, e.g SET … a.b_c.d = x

Default values

Guideline: Review and adjust defaults to be appropriate out of the box.

The more settings each cluster sets, the harder it is to support them, as their behaviors become increasingly unique and dependent on who set them up, what doc or guide they followed, on what version, etc which can complicate subsequent operation or support. If you see docs or customers or field teams setting a particular setting often, stop and ask why, then see if its default can be adjusted, or the behavior reworked with additional smarts/adaptiveness, to avoid the need to be setting it manually. 

A setting set via automation should smell like a bug most of the time (except in CC, where we’re OK with custom defaults).

A setting can use a sentinel such as zero or ““ for its default then document that this value causes the some special case or dynamic behavior instead, for example “0 = GOPAXPROCs” or “0 = no limit”. However the definition of the default itself should use a constant, not derived from an env var, flag or runtime/compiler value that could differ between nodes (though making the constant metamorphic for testing is allowed and encouraged).

Visibility

Guideline: Do not make a setting “public” unless:

It is okay to add a setting without doing the above so long as it remains non-public

Guideline: Tread carefully around unsafe configuration. Use “unsafe” in the name AND description / help texts.

This applies to settings that are known to have potential for lead to data loss or corruption.

CLI configuration for server commands

In server commands (cockroach start), we use a combination of CLI flags and environment variables for knobs that either:

There are two general categories described in the following sub-sections.

User-visible CLI configuration

We generally prefer CLI command-line flags for user-visible configuration.

User-visible CLI configuration always applies according to a common schema:

Guideline: Ensure any addition or change to user-visible CLI configuration is documented in release notes and has a documentation follow-up project.

Guideline: Use descriptive names for CLI flags that pertain to the mechanism, not the use case. For example, we use the flag name --clock-device to make CockroachDB work with VMWare PTP clocks, not --vmware-ptp-device, because the mechanism is more generic than the use case.

Guideline: Don’t define CLI flags such that the user must pass PII or secrets as value: CLI flags can be inspected from other unprivileged processes on the same machine. In those use cases that require it, make the CLI flag point to a file path and load the PII/secret from there.

Guideline: Use env var aliases for user-visible CLI server flags extremely sparingly. Currently only 4 CLI flags have env var aliases, mostly for historical purposes. We should shy away from using env vars for CLI server configuration.

Guideline: Tread carefully about CockroachDB version upgrades.

A user may have built automation that embeds specific CLI flags and env vars. During an upgrade, they will use the same automation to run both previous and new version nodes. Therefore:

Guideline: Tread carefully around unsafe configuration. Use “unsafe” in the name AND description / help texts.

Guideline: If the same aspect of a behavior must be controllable by both a cluster setting and a per-node flag/env var, the flag/env var should override the cluster setting.

Ad-hoc back-end configuration overrides

We use environment variables for server configuration that is ad-hoc to specific deployments, that is, where the specificity is such that relatively very few users will ever need to change it. This includes:

Like other CLI config for server commands, env vars are only used for behavior that can be different on different nodes, or need to apply before a cluster is fully initialized.

Guideline: Ensure that the definition of env vars in code has detailed documentation next to it that explains its impact.

Guideline: Don’t define env vars such that the user must pass PII or secrets as value: env vars can be inspected from other unprivileged processes on the same machine. In those use cases that require it, make the env var point to a file path and load the PII/secret from there.

CLI configuration for client commands

In client commands (e.g. cockroach node, cockroach sql) we primarily use CLI flags for all configuration: both documented, user-visible and internal, ad-hoc configuration.

As a main difference from server commands, we do provide slightly more env var aliases for CLI client configurations that are expected to be configured the same across many invocations of client commands, and across multiple client commands. For example: COCKROACH_HOST, COCKROACH_PORT, COCKROACH_URL.

Review

TBD