Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Original author: Nikhil Benesch

This document is a long-winded guide to preparing and submitting your first contribution to CockroachDB. It's primarily intended as required reading for new Cockroach Labs engineers, but may prove useful to external contributors too.

Table of Contents
outlinetrue

The development cycle

At the time of this writing, CockroachDB is several hundred thousand lines of Go code, plus a smattering of C++ and TypeScript. This section is a whirlwind tour of what lives where.

...

Since CockroachDB is open source, most of the instructions for hacking on CockroachDB live in this repo, cockroachdb/cockroach. You should start by reading the top level of this section of the wiki

Then, look at our Go (Golang) coding guidelines and the Google Go Code Review guide it links to. These are linked to from the other pages in this wiki, but they're easy to miss. If you haven't written any Go before CockroachDB, you may want to hold off on reviewing the style guide until you've written your first few functions in go.

...

Here's what's in each top-level directory in this repository:

  • build/ Build support scripts. You'll likely only need to interact with [build/builder.sh]. See "Building on Linux" below.

  • c-deps/ Glue to convince our build system to build non-Go dependencies. At the time of writing, "non-Go dependencies" means C or C++ dependencies. The most important of these is RocksDB, our underlying persistent key-value store.

  • cloud/kubernetes/ Kubernetes configuration to auto-launch CockroachDB clusters.

  • docs/ Documentation for CockroachDB developers. See "Internal documentation" below.

  • monitoring/ Configuration to integrate monitoring frameworks, namely Prometheus and Grafana, with CockroachDB. This configuration powers our internal monitoring dashboard as well.

  • pkg/ First-party Go code. See "Internal documentation" below for details.

  • scripts/ Handy shell scripts that aren't part of the build process. You'll likely interact with scripts/gceworker.sh most, which spins up a personal Linux VM for you to develop on in the GCE cloud.

  • vendor/ A Git submodule that contains the right version of our dependent libraries. For many years, the Go answer to dependency management was "write backwards-compatible code." That strategy is as dangerous as it sounds, so since v1.6, Go will automatically load packages in a subdirectory named vendor, if it exists. See build/README.md for details on how we maintain this folder.

Other important repositories

...

Other important repositories

Besides cockroachdb/cockroach, the cockroachdb GitHub organization is home to several other important open-source components of Cockroach:

  • cockroachdb/docs, which houses the code behind our user-facing documentation at cockroachlabs.com/docs. At the time of writing, our stellar docs team handles essentially all documentation.

  • cockroachdb/examples-go, which contains small, self-contained Go programs that exercise CockroachDB via the PGWire protocol. You're likely to hear most about block_writer, which writes uniformly random values into a table, and photos, which simulates a more-realistic workload of a photo-sharing site, where some photos and users are orders of magnitude more popular. The other example programs are of a similar scope and purpose, but block_writer and photos are deemed important enough to run constantly against our production clusters.

  • cockroachdb/examples-orms, which showcases ORMs a toy API that uses an ORM to prepare its responses in several different languages.

...

The internal documentation that we do have lives in cockroachdb/cockroach/docs. At the time of writing, most of this documentation covers the high-level architecture of the system. Only a few documents hone in on specifics, and even those only cover the features that were found to cause significant developer frustration. For most first-party packages, you'll need to read the source for usage instructions.

Protip: If you prefer Go-generated HTML documentation to reading the source directly, take a look at godoc.org/github.com/cockroachdb/cockroach.

For our For our internal docs, I recommend the following reading order.

...

The one exception to this rule is docs/tech-notes/contexts.md. It's worth learning why so many of our function signatures take a context as their first parameter, like so:

Code Block
func doOperation(ctx context.Context, ...)

Otherwise, plumbing contexts everywhere will feel like a chore with no upsides.

...

It's time to fire up your editor and get to work! The other pages on this wiki do a good job of describing the basic development commands you'll need. In short, you'll use make to drive builds.

To produce a cockroach binary:

$ make build

To run all the tests in ./pkg/rest/of/path:

$ make test PKG=./pkg/rest/of/path

Is there a Go debugger?

Yes! It

Is there a Go debugger?

Yes! It's called delve.

You can use delve on its own, but the easiest way is to use it via Goland. Goland is a Jetbrains IDE for Go. It has built in support for delve, and will "just work" if you click the standard debug buttons, set breakpoints, etc.

...

If you're debugging an issue that spans several nodes, you could consider using our distributed tracing. Ask your Roachmate to either walk you through OpenTracing/LightStep or point you at someone who can.

To use a debugger, you will probably have to run ./dev gen go to generate Go code.

A note on adding a new file

You'll notice that all source files in this repository have a license notification at the top. Be sure to copy this license into any files that you create .(adjust the year to be the current year).

When to submit

Code at Cockroach Labs is not about perfection. It's too easy to misinterpret "build it right" as "build it perfectly," but striving for perfection can lead to over-engineered code that took longer than necessary to write. Especially as a startup that moves quickly, building it right often means building the simplest workable solution, then iterating as necessary. What we try to avoid at Cockroach Labs is the quick, dirty, gross hack—but even that can be acceptable to plug an urgent leak.

...

You have a few options for choosing when to submit:

  1. You can open a PR with an initial prototype

...

Especially on your first PR, you can do everything short of opening a GitHub PR by sending someone a link to your local branch. Alternatively, open a PR against your fork. This won't send an email notification to anyone watching this repository, so it's a good way to let your Roachmate (or the lead reviewer on your project). This works well when you have a reviewer in mind and you want to avoid the onslaught of

...

For small changes where the approach seems obvious, you can open a PR with what you believe to be production-ready or near-production-ready code. As you get more experience with how we develop code, you'll find that larger and larger changesets will begin falling into this category.

PRs are assumed to be production-ready (option 3) unless you say otherwise in the PR description. Be sure to note if your PR is a WIP to save your reviewer time.

...

  1. using the “Draft” feature. (“Create Draft PR” in the Github interface). This won't send an email notification to anyone watching this repository.

  2. For small changes where the approach seems obvious, you can open a PR with what you believe to be production-ready or near-production-ready code. As you get more experience with how we develop code, you'll find that larger and larger changesets will begin falling into this category.

PRs are assumed to be production-ready (option 3) unless you create it as “Draft” or say otherwise in the PR description. Be sure to note if your PR is a WIP to save your reviewer time.

If you find yourself with a PR (git diff --stat) that exceeds roughly 500 line of code, it's time to consider splitting the PR in two, or at least introducing multiple commits. This is impossible for some PRs—especially refactors—but "the feature is only partially complete" should never be a reason to balloon a PR.

...

Fix your style violations

First, read the Go (Golang) coding guidelines again, looking for any style violations. It's easier to remember a style rule once you've violated it.

Then, run our suite of linters:

Code Block
$ 

...

./dev lint

This is not a fast command. On my machine, at the time of writing, it takes about a full minute to run. You can You can instead run

Code Block
$ 

...

./dev lint --short

which omits the slowest linters.

Where to push

In the interest of branch tidiness, we ask that even contributors with the commit bit avoid pushing directly to this contributors should not push directly to the upstream repository. That deserves to be properly called out:

Protip: don't push to cockroachdb/cockroach directly!

Instead, create yourself a fork on GitHub. This will give you a semi-private personal workspace at github.com/YOUR-HANDLE/cockroach. Code you push to your personal fork is assumed to be a work-in-progress (WIP), dangerously broken, and otherwise unfit for consumption. You can view @bdarnell's WIP code, for example, by browsing the branches on his fork, bdarnell/cockroach.

...

It's far more important to give the commit message a descriptive title and message than getting the branch name right.

Split your change into logical chunks

Be kind to other Note that your branch name does not have to be unique among all branch names in the organization.

Split your change into logical chunks

Be kind to other developers, including your future self, by splitting large commits. Each commit should represent one logical change to the codebase with a descriptive message that explains the rationale for the change.This is explained in our PR organization philosophy.

On your first PR, it's worth re-reading the Git Commit Messages page to ensure your commit messages follow the guidelines.

...

Deciding who should review a PR requires context that you just won't have in your first month. For your starter project, your Roachmate is the obvious choice. For little bugs that you fix along the way, ask your Roachmate or your neighbor if there's someone who's obviously maintaining a particular area of the codebase. (E.g., if you had a question about the code that interfaces with RocksDB, I would immediately direct you to @petermattis.)

If you're unable to get a reviewer recommendation, the "Author" listed at the top of the file may be your best bet. You should check that the hardcoded author agrees with the Git history of the file, as the hardcoded author often gets stale. Use you can check a recent author from the git log. Use git log FILE to see all commits that have touched a particular file or directory sorted by most recent first, or use git blame FILE to see the last commit that touched a particular line. The GitHub web UI or a Git plugin for your text editor of choice can also get the job done. You might also try scripts/authors.sh FILE, which will show you a list of contributors who have touched the file, sorted by the number of commits by that author that touched the file. If there's a clear winner, ask them for a review; if they're not the right reviewer, they'll suggest someone else who is. In cases that are a bit less clear, you may want to note in a comment on GitHub that you're not confident they're the right reviewer and ask if they can suggest someone better.

If you're still unsure, just ask in chat! You'll usually get a response in under a minute.Slack!

It's important to have a tough reviewer. In the two months I've been here, I've seen a few PRs land that absolutely shouldn't have landed, each time because the person who understood why the PR was a bad idea wasn't looped into the review. There's no shame in this—it happens! I say this merely as a reminder that a tough review is a net positive, as it can save a lot of pain down the road.

Also, note that GitHub allows for both "assignees" and "reviewers" on a pull request. It's not entirely clear what the distinction between these fields is, and we're currently split at Cockroach Labs on what we prefer. Choose the field you like most and move on. Ever since GitHub began auto-suggesting reviewers, there seems to be a slight preference for using the reviewer field instead of the assignee fieldIn general, prefer to assign a reviewer as “reviewer” instead of “assignee”.

Write your PR description

Nearly everything you're tempted to put in a PR description belongs in a Git commit message. (As an extension, nearly everything you put in a commit message belongs in code comments too.)

...

Otherwise, something simple like "see commit messages for details" is good enough. On PRs with just one commit, GitHub will automatically include that one commit's message in the description; it's totally fine to leave that in the PR description.

TeamCity

We run our own continuous integration server called TeamCity, which runs unit tests and acceptance tests against all open pull requests.

GitHub displays the status of the latest TeamCity run at the bottom of every pull request. You can click the "Details" link to get insight into a failed build or view real-time status of an in-progress build. Occasionally, a build will fail because of flaky tests. You can verify by running a new build and seeing if the problem disappears; just hit the "Run" button on the top right of the page GitHub links to queue a new build. Less frequently, TeamCity will entirely fail to notice a PR, and GitHub will display "waiting for status to be reported" forever. The "TeamCity Continuous Integration" wiki page describes how to fix this.

Unfortunately, a full tutorial on TeamCity is outside the scope of this document. Ask your Roachmate for an overview if you're confused, or post specific questions in #teamcity in slack.

That's it! You're done. Sit back and get ready for the impending code review.

The impending code review

We take code reviews very seriously at Cockroach Labs. Please don't be deterred if you feel like you've received some hefty feedback. That's completely normal and expected—and, if you're an external contributor, we very much appreciate your contribution!

...

CI

Monitor your PR statuses for CI jobs labeled “GitHub Actions Essential CI”. Each of these jobs runs a set of checks on your PR, like unit tests, lints, or integration tests. Check out the /wiki/spaces/devinf/pages/3418980354 for tips on how to navigate CI and address failures.

Note: Older release branches (release-23.2 and prior) use TeamCity for CI. If you’re confused, check which status checks are considered to be “Required” by GitHub.

That's it! You're done. Sit back and get ready for the impending code review.

The code review

Code reviews are one of the main ways for Cockroach Labs to guarantee quality in the product. We use them to ensure that all code written meets our strict standards for production-ready code.

A code review is like a discussion. The reviewer can have questions, and can even sometimes request the author to write a deeper justification for changes. The goal of that discussion is to end up with the highest quality product, and create a track record of why and how a change occurred, so we can later look back and understand the choices.

No one is expected to write perfect code on the first try. That's why we have code reviews in the first place!

By humans for humans

Probably everyone has at some time had an unpleasant interaction during a code review, and with code reviews being by humans for humans, one or both of these humans could be having a bad day. As if that weren't enough already, text-based communication is fraught with miscommunication. Consider being a minimally nice maintainer), reflect upon mistakes that will inevitably be made, and take inspiration from the constructive and friendly reviews that are the daily staple of our repository.

For more details see For more information on what you might see as part of a code review, see What to expect when you’re expecting (someone to review your code at CRL).

As you advance in your work on CockroachDB, you’ll inevitably be asked to review someone else’s code. For more information on what’s expected of reviewers, see Working as a reviewer.

Reviewable

Except for the smallest of PRs, we many teams eschew GitHub reviews in favor of a third-party app called Reviewable (demo video).

Reviewable has an incredibly dense UI: you could probably master a new musical instrument faster than you could master Reviewable.Still, Reviewable is far better ; still, some teams have decided that Reviewable offers a better UI than GitHub for large changes that undergo multiple rounds of feedback. The basic model is the same: a diff of your change where your reviewers can leave inline notes. In Reviewable, unlike GitHub, every comment spawns a new thread of discussion, and the UI will highlight any threads that you haven't acknowledged/responded to.

Reviewable also takes great pains to handle force pushes. When you push a new revision, Reviewable will do its best to match up comment threads to the logically-equivalent location in the new diff. Even if this algorithm fails, Reviewable records each force push as a "revision," and will allow you and your reviewers to track how your PR has evolved as you address review feedbackand your reviewers to track how your PR has evolved as you address review feedback. GitHub’s built-in tooling does not support this and force-pushes will often cause previous iterations of the code to be “lost”.

You won't need to do anything special to enable Reviewable for your PR; one of our bots will be along shortly after the PR is opened to post a link to the review interfaceinterface.

Check with someone on your team if to see what your team prefers, but even if your team does not use reviewable for sending each other reviews in your day to day work, you will still want to be familiar with it if for reviews on changes that cross team boundaries.

Addressing feedback

If someone leaves line comments on your PR without leaving a top-level "looks good to me" (LGTM), it means they feel you should address their line comments before merging. That is, without an LGTM, all comments are roughly in the discussing disposition, even if the Reviewable disposition marker in the bottom right corner hasn't been set as such. If you agree with their feedback, you should make the requested change; if you disagree, you must leave a comment with your counterargument and wait to hear back.

If someone leaves a top-level LGTM/accepts your PR using Github but also leaves line comments, you can assume all line comments are in the satisfied disposition, even if the Reviewable toggle hasn't been set as such. If you disagree with any of their feedback, you should still let them know (e.g., "actually, I think my way is better because reasons"), but you don't need to wait to hear back from them to merge.

...

Rarely, someone will express a sentiment like "I feel very strongly that we shouldn't merge this." Disagreements like these are often easier to resolve outside of Reviewable, via an in-person discussion or via chat.As you gain confidence in Go, you'll find that some of the nitpicky style feedback you get does not make for obviously better code. Don't be afraid to stick to your guns and push back. Much of coding style is subjective. As I was learning Go, though, I found it helpful to try all style suggestions. For the first month, essentially every style suggestion I received made for far more idiomatic Golike these are often easier to resolve outside of Reviewable, via an in-person discussion or via chat.

Reviewers may also have comments about subjective matters, for example what code presentation should be considered idiomatic. When in doubt, if the approach suggested by the reviewer doesn't change functionality, it may be a learning opportunity to follow the suggestion—if the reviewer cared to give the suggestion, probably someone else would trip up in the same way too. However, remember that a review is also a two-way discussion and you should feel free to ask your own questions to the reviewer, ask them to motivate their suggestions, and you can also weigh their considerations against your own. Sometimes, an appropriate resolution is to copy a summary of a conversation during the review into a comment inside the source code.

Once you've accumulated a set of changes you need to make to address review feedback, it's time to force-push a new revision. Be sure to amend your prior commits—you don't want to tack on a bunch of fixup commits to address review feedback. If you've split your PR into multiple commits, it can be tricky to ensure your review feedback changes make it into the right commit. A tutorial on how to do so is outside the scope of this document, but the magic keywords to Google for are "git interactive rebase."

...

These are the acronyms you'll frequently encounter during code reviews.

  • CR, "code review"

  • PR, "pull request"—you probably knew that one.

  • PTAL, "please take another look"

  • RFAL, "ready for another look"—as in, I made some changes, PTAL.

  • LGTM, "looks good to me"—i.e., ship it!

  • woman_scientistImage ModifieddogImage Modified

    , "science dog"—i.e., I have no idea what this code does.

  • TF[YT]R, "thanks for your/the review"

A list of even more Cockroach jargon lives on this repository's wiki.

...

To merge code into any CockroachDB repository, you need at least one LGTM (or Github PR approval). Some LGTMs matter more than others; you want to make sure you solicit an LGTM from whoever "owns" the code you're touching.

...

When your PR is ready to go, request a merge from our build bot Craig. Craig is a Bors merge bot, whose only job is to be gatekeeper to the repository. Add a comment on the PR of the form bors r=<reviewer>, replacing <reviewer> with the GitHub username of the person who gave you the LGTM. Once approved, your PR will be batched up with other PRs approved around the same time. Craig will try to build them as though they were merged to the target branch, and if successful, will merge them. This limits your exposure by ensuring you don't accidentally merge a commit with an obviously broken build or merge skew. (If Craig says "Permission denied", see the FAQ.)We call merging locally and pushing to master the "nuclear option" or "god merging." It's not disabled, but you shouldn't do it unless the repo is on fire.

Congratulations on landing your first PR! It only gets easier from here.

...

Here's a checklist of action items to keep you sane: