- Created by Justin Jaffray, last modified by Raphael ‘kena’ Poss on Jan 23, 2019
- Mentions
- 0 Associations
You are viewing an old version of this page. View the current version.
Compare with Current View Page History
« Previous Version 3 Next »
This list is a compilation of readings which are valuable to a general understanding of the operation of Cockroach. This list is extensive (but not exhaustive), don't feel you need to read everything here, it's provided as a way to drill down into topics you find interesting, if you so choose. The entries in each section are roughly organized in recommended order of consumption, but this is not a strict ordering in any sense.
General
- Architecture docs: https://www.cockroachlabs.com/docs/stable/architecture/overview.html
- The hows and whys of a distributed SQL database by Alex: https://www.youtube.com/watch?v=6OFeuNy39Qg
- https://github.com/cockroachlabs/slack-convos
- There's been various productive conversations that have taken place in the #transactions channel in slack.
- What’s Really New With NewSQL: https://db.cs.cmu.edu/papers/2016/pavlo-newsql-sigmodrec2016.pdf
- A 30k overview of distributed databases: ACID, CAP, NewSQL.pdf
Storage
- Kleppmann - Designing Data Intensive Applications: Chapter 3
- A Brief History of Log Structured Merge Trees: https://ristret.com/s/gnd4yr/brief_history_log_structured_merge_trees
Transactions
- A History of Transaction Histories: https://ristret.com/s/f643zk/history_transaction_histories
- ANSI Critique: https://arxiv.org/pdf/cs/0701157.pdf
- Yabandeh: https://drive.google.com/file/d/0B9GCVTp_FHJIMjJ2U2t6aGpHLTFUVHFnMTRUbnBwc2pLa1RN/edit
- An extremely well-written explanation of optimistic serializable transactions. I think this is one of the best resources for gaining intuition about transactions.
- How CockroachDB Does Distributed, Atomic Transactions: https://www.cockroachlabs.com/blog/how-cockroachdb-distributes-atomic-transactions/
- Serializable, Lockless, Distributed: Isolation in CockroachDB: https://www.cockroachlabs.com/blog/serializable-lockless-distributed-isolation-cockroachdb/
- Understanding Weak Isolation: http://www.bailis.org/blog/understanding-weak-isolation-is-a-serious-problem/
- Read-Only Transaction Anomaly: https://www.cs.umb.edu/~poneil/ROAnom.pdf
- Explanation of a surprising and unintuitive anomaly that can occur in Snapshot Isolation
- What Does Write Skew Look Like: http://justinjaffray.com/what-does-write-skew-look-like/
- A (hopefully) simplified explanation of Fekete 2005.
- Fekete 2005: https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/2009/Papers/p492-fekete.pdf
- Empirical Evaluation: http://www.cs.cmu.edu/~pavlo/papers/p781-wu.pdf
- Mostly relevant to in-memory databases (which CockroachDB is not), but a good overview of the main components of an MVCC system.
- Adya Thesis: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.32.1760&rep=rep1&type=pdf
- Very dense, formal treatment of isolation levels. Not really recommended unless you're trying to win an argument.
Linearizability
- Strong consistency models: https://aphyr.com/posts/313-strong-consistency-models
- Linearizability vs. Serializability: http://www.bailis.org/blog/linearizability-versus-serializability/
- While I think this is a good post, keep in mind when reading it that linearizability and serializability are mostly orthogonal concepts and I think conflating them in this way is confusing.
- A Mild Generalization of Linearizability: http://justinjaffray.com/a-mild-generalization-of-linearizability/
- Living Without Atomic Clocks: https://www.cockroachlabs.com/blog/living-without-atomic-clocks/
- Written by Spencer comparing Cockroach's use of synchronized clocks with Spanner's.
- Herlihy+Wing: https://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf
- The formal genesis of linearizability. I think reading this is confusing because it's actually slightly different and much more formal from how people talk about linearizability in the real world, but I still think it's valuable. It goes off the rails in section 4, though.
Consensus
- Why Consensus: http://justinjaffray.com/why-consensus/
- I feel quite strongly that people should start here. I don't think the problem that consensus solves is adequately motivated in most discussions, which is what I tried to rectify with this.
- Paxos Made Live: http://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/papers/paper2-1.pdf
- Describes the process of building a production system on Paxos. Illuminated for me the purpose of consensus in a lot of ways.
- Paxos Made Simple: https://lamport.azurewebsites.net/pubs/paxos-simple.pdf
- In Search of an Understandable Consensus Algorithm: https://raft.github.io/raft.pdf
- Animation: The Secret Lives of Data: http://thesecretlivesofdata.com/raft/
- Scaling Raft: https://www.cockroachlabs.com/blog/scaling-raft/
- Flexible Paxos: Quorum Intersection Revisited: https://arxiv.org/pdf/1608.06696.pdf
SQL Execution
- Volcano: https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf
- The introduction of the general execution model that Cockroach uses. Start here if you know nothing about how SQL statements are executed.
- The Design and Implementation of Modern Column-Oriented Database Systems: http://db.csail.mit.edu/pubs/abadi-column-stores.pdf
SQL Optimization/Query Planning
- Index Selection in CockroachDB: https://www.cockroachlabs.com/blog/index-selection-cockroachdb-2/
- Not especially specific to Cockroach - great as an introduction to index selection
- SQL Query Planning: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20171213_sql_query_planning.md
- Andy K on Optimizer: The Story So Far (April 2018): https://www.youtube.com/watch?v=wAfAVv9SFIc
- Andy Pavlo optimizer lectures
- Part 1: https://www.youtube.com/watch?v=qbfPpWnAP-4
- Part 2: https://www.youtube.com/watch?v=m7GxSvdV4NU
- The Cascades Framework for Query Optimization
- Not aware of a PDF, this is what our optimizer is based on, though.
- Fundamental Techniques for Order Optimization by Simmen et. al
- Explains how to manipulate orders, not aware of a publicly available PDF
- pkg/sql/opt/doc.go: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/doc.go
- Optimization of Analytic Window Functions: http://vldb.org/pvldb/vol5/p1244_yucao_vldb2012.pdf
Systems
This section is randomly important because people at Cockroach talks about things in terms of the Google system which introduced them
- BigTable: https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf
- Spanner: https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf
- F1: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41344.pdf
- Colossus/GFS: http://pages.cs.wisc.edu/%7Eremzi/Classes/736/Spring2000/Papers/gfs-sosp2003.pdf
- Online, Asynchronous Schema Change in F1: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41376.pdf
- Lovely discussion of how the schema changes in F1/Spanner (and Cockroach) work
- The internals of PostgreSQL: http://www.interdb.jp/pg/index.html
Other
- The Promise, and Limitations, of Gossip Protocols: http://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/2007PromiseAndLimitations.pdf
- No labels