[draft] SQL Code Lab: Add the STEP statement modifier

Hello, intrepid hackers! This code lab is designed to help you learn some of the ins and outs of extending various parts of CockroachDB's SQL layer. It's made for new and old employees alike. It will require several (UPDATE 2021: probably many! SQL has gotten more complex since this was written, and as such, this lab will take more time than originally expected) hours of your time to complete, start to finish.

This lab will wend its way through all of the major parts of CockroachDB's SQL subsystem, touching the lexer, parser, AST, optimizer, plan tree, distributed runtime, as well as the column-oriented runtime.

By the time you complete it, you will understand the basic programming APIs of many of these subsystems, as well as how they fit together. Buckle up - we're about to write an entirely new (and plausibly useful) relational operator from scratch! It's called STEP.

Problem Statement: Implement STEP

The ultimate goal of this lab will be to add a completely novel SQL statement modifier, called `STEP n`, that changes a query to return every nth row. Here's what it would look like, from the perspective of cockroach demo:

[16:40]% ./cockroach demo startrek
root@> SELECT * FROM episodes ORDER BY season ASC, num ASC LIMIT 5 OFFSET 5 STEP 2;
  id | season | num |             title              | stardate
   6 |      1 |   6 | Mudd's Women                   |   1329.8
   8 |      1 |   8 | Miri                           |   2713.5
  10 |      1 |  10 | The Corbomite Maneuver         |   1512.2
  12 |      1 |  12 | The Menagerie, Part II         |   3013.1
  14 |      1 |  14 | Balance of Terror              |   1709.2
(5 rows)

For obvious contrast, here's what it looks like today:

root@> SELECT * FROM episodes ORDER BY season ASC, num ASC LIMIT 5 OFFSET 5 STEP 2;
invalid syntax: statement ignored: syntax error at or near "step"
DETAIL: source SQL:

If you've played with SQL before, you'll notice that our brand new STEP operator is similar to LIMIT and OFFSET - it comes at the end of a statement and modifies which rows of that statement are returned. This is a good thing! It means we can follow some of the patterns of how LIMIT and OFFSET are implemented throughout the codebase, and that's exactly what we'll be doing during this lab.

Parsing: Adding STEP to the Grammar and AST

The first thing we need to do is add our new keyword to the grammar that CockroachDB uses to parse statements. The lack of this keyword is what's causing the error message we saw above: a syntax error! We get a syntax error here because, until now, STEP wasn't a keyword that CockroachDB understood at all.

CockroachDB's grammar is a YACC grammar. It's defined in pkg/sql/parser/sql.y, so go ahead and open that file up in your editor - I'll wait.

If you haven't seen a YACC grammar before, don't worry! The file is certainly large, but it's nothing you can't wrap your head around. If you like, you can go ahead and read up about the basics of YACC parsers. https://www.cs.utexas.edu/users/novak/yaccpaper.htm is one intro that's pretty good. But rest assured that it's not important to master YACC to finish this code lab, or to do most work in CockroachDB's SQL layer.

Like we discussed, we're going to be following the pattern of LIMIT and OFFSET, so go ahead and search for those keywords in the file. Do you see what's going on? We've defined LIMIT and OFFSET as tokens, or parseable words, added rules for them, defining where the tokens can go in a statement, and finally associated those rules with actions, telling the YACC parser generator what code to generate when it sees them. For offset, the rule looks like this:

  OFFSET a_expr
    $$.val = &tree.Limit{Offset: $2.expr()}
  // SQL:2008 syntax
  // The trailing ROW/ROWS in this case prevent the full expression
  // syntax. c_expr is the best we can do.
| OFFSET c_expr row_or_rows
    $$.val = &tree.Limit{Offset: $2.expr()}

This is saying that an offset_clause can either be the word OFFSET followed by an a_expr, the rule for a general relational expression in this grammar, or the word OFFSET followed by a c_expr (a restricted form of a_expr) and row_or_rows, a rule that expands to either the word ROW or the word ROWS, a testament to just how flexible SQL's syntax can sometimes be.


  • $$ represents the output value of the current rule, $1 the value of the first symbol in the current rule, $2 the value of the second symbol, etc.

  • STEP only needs to be added to SELECT statements, not DELETE or UPDATE.
  • Remember to add the STEP keyword to one of the keyword category lists at the end of the grammar (this has to be done whenever a new keyword is added).

Type-Checking: Ensuring STEP's arguments are typed as integers

Planning: Adding STEP to the Optimizer, so that it can produce the correct query plan

Execution: Implementing the STEP algorithm in a DistSQL Processor

Now, it's time to implement the STEP algorithm in a DistSQL processor. DistSQL is what we call the row-at-a-time, distributed SQL execution in CockroachDB. Implementing a DistSQL processor for an relational operator, like STEP, allows the system to schedule SQL execution on a node that's closest to where the data lives.

LIMIT and OFFSET are both somewhat special in DistSQL - they're implemented as part of the post-process spec for a processor. Post-processing is a phase that does rendering (calculation of scalar expressions), projections (shuffling or removing of columns) and limit and offset calculations. Because of this, we won't be following the LIMIT and OFFSET template too closely.

To implement STEP, you should try to copy the noopProcessor, a simple DistSQL processor that simply passes rows unmodified from its parent, and modify it to implement the algorithm. The noopProcessr is located in pkg/sql/distsqlrun/noop.go. As you can see, the basic loop of a DistSQL processor is implemented in its Next method - and that's the method we'll need to copy, with a few modifications, to implement STEP.

Go ahead and make a new file in distsqlrun called step.go. You can copy what's in noop.go to start, renaming the noopProcessor to stepProcessor. You'll also need to extend the constructor to include the argument to STEP - for now, it's okay to just make it an integer. Later, when we implement DistSQL physical planning for STEP, we'll change that integer to be a full protobuf message called  StepSpec, but there's no need to do that for now.

Now, implement the algorithm! You'll need to add a little bit of state, since Next is called once per output row, like a generator. You can put the state that you need into the stepProcessor struct you added.

Testing the Processor

How'd it go? The STEP algorithm is relatively simple, but as we all know, writing software is hard! We should write some unit tests for our new processor. To unit test the processor, add a new file called step_test.go in the same distsqlrun directory that you put step.go into. Write some tests there, following the model of another processor test suite like distinct_test.go. Try to think of some corner cases (what if k is bigger than the total number of rows in the input? Smaller? What if the input has no rows? etc.) and do your best to thoroughly test the processor.

Benchmarking the Processor

No processor implementation is complete without a microbenchmark - an isolated test that evaluates the throughput of the processor when acting on in-memory data. The purpose of such a test is to gather the "speed-of-light" throughput for a processor - in other words, the amount of data that it can process on a single thread without any overhead from other processors, network or disk latency.

For this, you should copy and modify the example benchmark in noop_test.go . Add benchmarks with a couple of different values of k, so it's possible to reason about what k might do to the performance of our processor.

Columnar Execution: Implementing the STEP algorithm in a columnar operator

Next, let's add an implementation of STEP as a columnar operator in the pkg/sql/exec package. You might be wondering - why do we need to add two implementations of STEP? The reason is that (as of Jan 2019) the columnar execution engine is still an experimental feature, so it's implemented as a wholly separate engine to the default DistSQL engine. The columnar model does have the potential to make operators much more efficient, though, so we'll implement STEP in this model to showcase the potential speed increases.

We'll implement STEP in a new file in pkg/sql/exec/, the directory that contains the columnar execution engine, called (surprise!) step.go. In the columnar model, OFFSET and LIMIT are implemented as normal operators, so you can clone them to start. Go ahead and start by cloning limit.go, and changing limitOp into stepOp instead.

You'll notice that the structure of limitOp is somewhat similar to the DistSQL processor we wrote earlier - it consumes something from a parent operator, does some work, and returns something else. The difference is that, instead of consuming rows one at a time from a parent, and returning a single row per invocation, the operators in exec consume something called a ColBatch and return another ColBatch. The idea here is that, by processing a batch of rows at a time, oriented by column, we can waste a lot less processor time doing repeated interface lookups, type switches, and method calls. If you're not yet convinced, that's okay! Hopefully, you'll become more convinced after we write and run a benchmark for our new operator.

Go ahead and implement the operator now. One important detail that you'll need to learn about is the selection vector, which is how we filter out rows of data in the columnar model. The selection vector is briefly documented on the ColBatch interface, but the idea is that, since we're processing a batch of data at once, it would be great if we could avoid having to mutate that batch if we're just filtering out rows - preventing an O(n) re-ordering of the data if we decided to remove a row in the middle of the batch. We get around this by writing to this selection vector - a densely-packed vector of integers that contains the indexes of the rows that have not been filtered out already.

You'll also need to have a switch in your operator's Next method, unlike limit, that does something different depending on whether there was a selection vector set by a previous operator or not.

Finally, you'll also need to keep some state, like before, to keep track of the sequence number of the rows you've seen before across batches - go ahead and stick that state into stepOp.


Just like before, please write some tests for your new operator, following the framework example seen in limit_test.go. The test framework does some nice stuff for us like automatically test all of your cases with several batch sizes, to avoid bugs caused by inconvenient batch sizes.


Now, finally, it's time for some payoff! Write a benchmark for the operator, copying the example in distinct_test.go. Running it and comparing to the benchmark you made earlier should demonstrate some of the potential gains that the columnar model makes available. Note that the gains in this comparison are limited to what you get by batching data and using the selection vector - we don't get any performance gains from the power of type templating to avoid type switches, since STEP never has to introspect any columns.

Copyright (C) Cockroach Labs.
Attention: This documentation is provided on an "as is" basis, without warranties or conditions of any kind, either express or implied, including, without limitation, any warranties or conditions of title, non-infringement, merchantability, or fitness for a particular purpose.