Query Engines: Push vs. Pull (2021)

a year ago

Push-based query engines push results to downstream operators, improving cache efficiency and enabling efficient processing of DAG-shaped plans.
Pull-based query engines use the Volcano or Iterator model, where consumers drive the system by requesting rows from operators.
Push-based systems decouple work from consumption, making them suitable for streaming systems like Flink or Materialize.
DAG-shaped plans are more efficiently handled in push-based systems due to better scheduling and lifetime management of rows.
Push-based systems naturally unroll into simpler code when compiled, which can improve performance.
Some algorithms, like merge join and LIMIT operators, are more challenging to implement in push-based systems.
Cyclic graphs are nontrivial in both models, but push systems like Naiad and Timely Dataflow have made progress in this area.
Modern analytic systems are increasingly exploring push models, though direct comparisons with pull models are rare.

Hasty Briefsbeta