Database Background

This list is a compilation of readings which are valuable to a general understanding of the operation of Cockroach. This list is extensive (but not exhaustive), don't feel you need to read everything here, it's provided as a way to drill down into topics you find interesting, if you so choose. The entries in each section are roughly organized in recommended order of consumption, but this is not a strict ordering in any sense.




Admission Control




SQL Execution

  • Volcano:

    • The introduction of the general execution model that the planNode and DistSQL engines in Cockroach use. Start here if you know nothing about how SQL statements are executed.

  • MonetDB (MIL Primitives for querying a fragmented world) (1999) (Boncz, Kersten):

    • This paper discusses MonetDB, a database system that uses column-at-a-time processing. Entire columns are processed at once, with no batching. 

  • MonetDB/X100: Hyper-Pipelining Query Execution (2005) (Boncz, Zukowski, Nes) :

    • This paper is the primary source of the idea to use batched, templated, column-at-a-time execution to avoid the interpretation and type-lookup overhead inherent in the Volcano model. CockroachDB's nascent vectorized execution engine follows the ideas in this paper closely.

    • This paper is really important! Read it if you're interested in CockroachDB's exec package and vectorized execution.

    • It's also written by Marcin Zukowski, cofounder of Snowflake

  • Everything you always wanted to know about compiled and vectorized queries but were afraid to ask (2018) (Kersten, Leis, Kemper, Neumann, Pavlo, Boncz):

    • Really good intro to what's the deal with vectorized and how does it compare with JIT (compiled) systems like HyPER or MemSQL. Andy Pavlo co-author.

  • Balancing vectorized query execution with bandwidth-optimized storage (2009) (Zukowski):

    • This is the paper about the VectorWise database system that came out of MonetDB/X100 from Zukowski, his PhD thesis.

    • Especially chapter s4, 5, 6 are really relevant to CockroachDB.

  • The Design and Implementation of Modern Column-Oriented Database Systems (2012) (Abadi, Boncz, ...) :

    • Massive survey paper. Good but a lot of info in there.

  • Rethinking SIMD Vectorization for In-Memory Databases (2015) (Polychroniou, Raghavan, Ross)

    • Interesting stuff about how to actually utilize SIMD for vectorized execution.

  • DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing (2008) (Zukowski, Nes, Boncz)

SQL Optimization/Query Planning



This section is randomly important because people at Cockroach talks about things in terms of the Google system which introduced them


Copyright (C) Cockroach Labs.
Attention: This documentation is provided on an "as is" basis, without warranties or conditions of any kind, either express or implied, including, without limitation, any warranties or conditions of title, non-infringement, merchantability, or fitness for a particular purpose.