This page outlines the relationship between:
- SQL syntax clauses, from the PostgreSQL dialect
- Algebraic-relational operators, provided by theory
- Nodes in the data structure (memo) used to represent logical plans during optimizations, that we call "relational operators" or "nodes"
They are more diverse than what provided by relational algebra, because they aim to represent everything that's possible in PostgreSQL's dialect.
This can be inspected with EXPLAIN (OPT). - Nodes in the tree used as data interface during optimizations and physical planning.
We aim to remote this data structure entirely at some point. It's not fundamentally needed.
These correspond semantically to a blend between abstract, relational "operators" and specifications of operational semantics which we call "processors" below. They are a historical artifact.
This can be currently inspected with EXPLAIN (PLAN).
We also colloquially call this "the logical plan" although be mindful that the other data structure used for optimizations above is also a logical plan. Say "planNode tree" in technical discussions to disambiguate. - Nodes that represent data processing logic in the physical plan graph, also called "
...
- processors"
This can be inspected with EXPLAIN (DISTSQL).
It might be helpful to look at this to understand what all of the words mean in an throughout various EXPLAIN plan outputs - and help you decode what's an efficient plan and what isn't.
SQL Syntax clause |
---|
Relational Operator |
---|
(planNode, how it appears in EXPLAIN)
Physical Operator
(DistSQL Processor, how it appears in EXPLAIN(DISTSQL))
Logical plan node (opt) | Logical plan node (planNode) | Execution processor | Notes | ||
---|---|---|---|---|---|
FROM <tablename> | atom | Scan | scan, revscan | TableReader | |
IndexJoin | index-join | IndexJoiner | Necessary when an index being selected from doesn't contain all of the columns that the SELECT needs. Jumps back to the primary index to retrieve the other necessary data. | ||
VirtualScan | virtual table | ??? | |||
ROWS FROM(. |
..) FROM srf_builtin() | N/A | project set | project set | ??? | |
WHERE | selection (σ) | Select | filter | (Embedded in every processor) | |
SELECT | projection (Π), rename (ρ) | Project | render | (Embedded in every processor) | |
JOIN FROM a,b WHERE EXISTS | natural join (⋈) anti join (▷) semi join (⋉) | InnerJoin, InnerJoinApply LeftJoin, LeftJoinApply RightJoin, RightJoinApply FullJoin, FullJoinApply SemiJoin, SemiJoinApply AntiJoin, AntiJoinApply | join (or others depending on join elimination and other xforms) | (Translated to either of the joins below depending on circumstances) | The "apply" variants are used as intermediate representation when there is a correlated subquery. |
HashJoiner | Less efficient than merge join - has to buffer an entire side in memory. | ||||
MergeJoin | join w/ specific algorithm | MergeJoiner | Efficient - can "stream rows" from each side. | ||
LookupJoin | join w/ specific algorithm | JoinReader |
ZigzagJoin | join w/ specific algorithm | ZigZagJoiner | |||
ORDER BY | (Embedded in logical plan nodes as required output ordering) | sort | (Translated to either of the sorters below depending on circumstances) | ||
sortAllProcessor | |||||
sortTopKProcessor | |||||
sortChunksProcessor | |||||
GROUP BY | aggregation | GroupBy, ScalarGroupBy | groupby | (Translated to either of the groupers below depending on circumstances) | |
orderedAggregator | |||||
hashAggregator | |||||
UNION | set union (∪) | Union | ??? | ||
EXCEPT | set difference (\) | Except | ??? | ||
INTERSECT | set intersection (∩) | Intersect | ??? | ||
subquery aka (SELECT... ) | N/A | subquery |
N/A | ||||
DISTINCT | Distinct | distinct | Distinct |
DISTINCT ON |
DistinctOn | distinct (with "distinct-on" attribute) | Distinct | |||
OrderedDistinct | |||||
LIMIT | Limit | limit w/ limit field | ??? | ||
OFFSET | Offset | limit w/ offset field | ??? | ||
OVER / PARTITION BY | Not implemented yet | window | windower | ||