Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

This list is a compilation of readings which are valuable to a general understanding of the operation of Cockroach. This list is extensive (but not exhaustive), don't feel you need to read everything here, it's provided as a way to drill down into topics you find interesting, if you so choose. The entries in each section are roughly organized in recommended order of consumption, but this is not a strict ordering in any sense.

...

Introduction

General

Storage

Admission Control

Transactions

Linearizability

Consensus

SQL Execution

  • Volcano: https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf

    • The introduction of the general execution model that the planNode and DistSQL engines in Cockroach use. Start here if you know nothing about how SQL statements are executed.

  • MonetDB (MIL Primitives for querying a fragmented world) (1999) (Boncz, Kersten): https://ir.cwi.nl/pub/11183/11183B.pdf

    • This paper discusses MonetDB, a database system that uses column-at-a-time processing. Entire columns are processed at once, with no batching. 

  • MonetDB/X100: Hyper-Pipelining Query Execution (2005) (Boncz, Zukowski, Nes) : http://cidrdb.org/cidr2005/papers/P19.pdf

    • This paper is the primary source of the idea to use batched, templated, column-at-a-time execution to avoid the interpretation and type-lookup overhead inherent in the Volcano model. CockroachDB's nascent vectorized execution engine follows the ideas in this paper closely.

    • This paper is really important! Read it if you're interested in CockroachDB's exec package and vectorized execution.

    • It's also written by Marcin Zukowski, cofounder of Snowflake

  • Everything you always wanted to know about compiled and vectorized queries but were afraid to ask (2018) (Kersten, Leis, Kemper, Neumann, Pavlo, Boncz): http://www.vldb.org/pvldb/vol11/p2209-kersten.pdf

    • Really good intro to what's the deal with vectorized and how does it compare with JIT (compiled) systems like HyPER or MemSQL. Andy Pavlo co-author.

  • Balancing vectorized query execution with bandwidth-optimized storage (2009) (Zukowski): https://dare.uva.nl/search?identifier=5ccbb60a-38b8-4eeb-858a-e7735dd37487

    • This is the paper about the VectorWise database system that came out of MonetDB/X100 from Zukowski, his PhD thesis.

    • Especially chapter s4, 5, 6 are really relevant to CockroachDB.

  • The Design and Implementation of Modern Column-Oriented Database Systems (2012) (Abadi, Boncz, ...) : http://db.csail.mit.edu/pubs/abadi-column-stores.pdf

    • Massive survey paper. Good but a lot of info in there.

  • Rethinking SIMD Vectorization for In-Memory Databases (2015) (Polychroniou, Raghavan, Ross) http://www.cs.columbia.edu/~orestis/sigmod15.pdf

    • Interesting stuff about how to actually utilize SIMD for vectorized execution.

  • DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing (2008) (Zukowski, Nes, Boncz) http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.1243

SQL Optimization/Query Planning

...

    • This is what our optimizer is based on

...

Systems

This section is randomly important because people at Cockroach talks about things in terms of the Google system which introduced them

Other