Versions Compared
compared with
Key
- This line was added.
- This line was removed.
- Formatting was changed.
This list is a compilation of readings which are valuable to a general understanding of the operation of Cockroach. This list is extensive(but not exhaustive), don't feel you need to read everything here, it's provided as a way to drill down into topics you find interesting, if you so choose. The entries in each section are roughly organized in recommended order of consumption, but this is not a strict ordering in any sense.
...
- Volcano: https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf
- The introduction of the general execution model that the planNode and DistSQL engines in Cockroach use. Start here if you know nothing about how SQL statements are executed.
- MonetDB (MIL Primitives for querying a fragmented world) (1999) (Boncz, Kersten): https://ir.cwi.nl/pub/11183/11183B.pdf
- This paper discusses MonetDB, a database system that uses column-at-a-time processing. Entire columns are processed at once, with no batching.
- MonetDB/X100: Hyper-Pipelining Query Execution (2005) (Boncz, Zukowski, Nes) : http://cidrdb.org/cidr2005/papers/P19.pdf
- This paper is the primary source of the idea to use batched, templated, column-at-a-time execution to avoid the interpretation and type-lookup overhead inherent in the Volcano model. CockroachDB's nascent vectorized execution engine follows the ideas in this paper closely.
- This paper is really important! Read it if you're interested in CockroachDB's
exec
package and vectorized execution. - It's also written by Marcin Zukowski, cofounder of Snowflake
- Everything you always wanted to know about compiled and vectorized queries but were afraid to ask (2018) (Kersten, Leis, Kemper, Neumann, Pavlo, Boncz): http://www.vldb.org/pvldb/vol11/p2209-kersten.pdf
- Really good intro to what's the deal with vectorized and how does it compare with JIT (compiled) systems like HyPER or MemSQL. Andy Pavlo co-author.
- Balancing vectorized query execution with bandwidth-optimized storage (2009) (Zukowski): https://dare.uva.nl/search?identifier=5ccbb60a-38b8-4eeb-858a-e7735dd37487
- This is the paper about the VectorWise database system that came out of MonetDB/X100 from Zukowski, his PhD thesis.
- Especially chapter s4, 5, 6 are really relevant to CockroachDB.
- The Design and Implementation of Modern Column-Oriented Database Systems (2012) (Abadi, Boncz, ...) : http://db.csail.mit.edu/pubs/abadi-column-stores.pdf
- Massive survey paper. Good but a lot of info in there.
- Rethinking SIMD Vectorization for In-Memory Databases (2015) (Polychroniou, Raghavan, Ross) http://www.cs.columbia.edu/~orestis/sigmod15.pdf
- Interesting stuff about how to actually utilize SIMD for vectorized execution.
DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing (2008) (Zukowski, Nes, Boncz) http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.1243
SQL Optimization/Query Planning
- Index Selection in CockroachDB: https://www.cockroachlabs.com/blog/index-selection-cockroachdb-2/
- Not especially specific to Cockroach - great as an introduction to index selection
- SQL Query Planning: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20171213_sql_query_planning.md
- Andy K on Optimizer: The Story So Far(April 2018): https://www.youtube.com/watch?v=wAfAVv9SFIc
- Andy Pavlo optimizer lectures
- Part 1: https://www.youtube.com/watch?v=qbfPpWnAP-4
- Part 2: https://www.youtube.com/watch?v=m7GxSvdV4NU
- The Cascades Framework for Query Optimization
- Not aware of a PDF, this is what our optimizer is based on, though.
- Fundamental Techniques for Order Optimization by Simmen et. al
- Explains how to manipulate orders, not aware of a publicly available PDF
- pkg/sql/opt/doc.go: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/doc.go
- Optimization of Analytic Window Functions: http://vldb.org/pvldb/vol5/p1244_yucao_vldb2012.pdf
...
- BigTable: https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf
- Spanner: https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf
- F1: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41344.pdf
- Colossus/GFS: http://pages.cs.wisc.edu/%7Eremzi/Classes/736/Spring2000/Papers/gfs-sosp2003.pdf
- Online, Asynchronous Schema Change in F1: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41376.pdf
- Lovely discussion of how the schema changes in F1/Spanner(and Cockroach) work
- PostgreSQL back-end flowchart: https://www.postgresql.org/developer/backend/
- The internals of PostgreSQL: http://www.interdb.jp/pg/index.html
...