Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gen4 Tracking #7280

Closed
13 tasks done
systay opened this issue Jan 11, 2021 · 1 comment
Closed
13 tasks done

Gen4 Tracking #7280

systay opened this issue Jan 11, 2021 · 1 comment
Assignees
Labels
Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature)

Comments

@systay
Copy link
Collaborator

systay commented Jan 11, 2021

This issue is meant to track the work going on on the Gen4 planner.

The Gen4 planner is a new planner in Vitess that explores many different join alternatives and uses a little bit of cost analysis to pick the cheapest plan. In contrast, the V3 planner merges and join tables from left to right, and this made it important for the user to list tables in a good order so that the planner could produce an efficient route.

The gen4 algorithm that we will start implementing is based on the GOO paper (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.737), but the infrastructure for it can be reused for other models.

This is a larger rewrite of the vtgate planner. It introduces new passes and intermediate representations of the query.

The old code used these passes over the query:

Pass Struct transformation
Parsing String -> AST
Rewriting (normalization) AST -> AST
Planning AST -> logicalPlan (builder)
WireUp logicalPlan -> engine.Primitive

This refactored planner now uses the following passes:

Pass Struct transformation
Parsing String -> AST
Rewriting (normalization) AST -> AST
Semantic Analysis AST -> AST"
Build Operator Tree AST" -> OperatorTree
Optimize/Merge OperatorTree -> joinTree
Horizon Planning joinTree -> logicalPlan
WireUp logicalPlan -> engine.Primitive

By splitting the planning process into smaller pieces, each part can be simplified and extended to do more.

Here follows a short description of each new pass.

Semantic Analysis

Responsibilities: Scoping, Binding

Walks the AST and does scoping and binding, so whenever a column name is found, the planner has information about which tables is being referenced. Tables are given a TableSet identifier - a bitmask struct that allows the planner to quickly find what dependencies every expression has.

Extract Query Graph

Responsibilities: Extract Subqueries, Create Query Graph

The query graph is an intermediate representation that is designed to allow the route planner to quickly consider many different solutions for the query. Instead of keeping the query in the AST, which is limited by the tree structure it has, we produce a graphy representation with all used tables (nodes) in one list, and edges between them in a separate list.

In this pass, subqueries are extracted into a list of queries and the relationships between them. This makes it easier for later passes to plan fully without having to switch back and forth between passes - when doing route planning, we can do all of route planning in one go and don't have to wait for SELECT expressions to be considered before planning subqueries used in SELECT expressions.

Route planning

Responsibilities: Plan how to route the query - plan FROM and WHERE

This pass uses dynamic programming to consider all combinations of tables in order to find the optimal plan. Optimal here means minimal number of route primitives in the plan.

At the end of this stage, we have a tree structure that represents all the route primitives needed and how they should be joined.

Horizon planning

Responsibilities: Plan projections, aggregations, grouping and ordering

Once we have a plan for how to route queries, we plan what projections we need from each route, and how to do ORDER BY/GROUP BY/LIMIT et al.

Positive outcomes from this refactoring.

Why do this non-trivial piece of work?

We still have a number of query types that are not supported. In order to be able to support more queries, we needed to extend the planner. Instead of adding to the legacy planner which is not very easy to work with, we felt that it was time to introduce this new design, which not only will allow us to support these queries, it also sets us up to be able to do more optimisations in the future.

Known tasks:

@systay systay added the Type: Enhancement Logical improvement (somewhere between a bug and feature) label Jan 11, 2021
@systay systay changed the title V4 Tracking Gen4 Tracking Jan 16, 2021
@systay systay mentioned this issue Feb 12, 2021
4 tasks
@frouioui frouioui mentioned this issue May 20, 2021
2 tasks
@systay systay mentioned this issue Jun 3, 2021
2 tasks
@systay systay mentioned this issue Jul 8, 2021
2 tasks
This was referenced Jul 15, 2021
@systay systay mentioned this issue Oct 6, 2021
2 tasks
@GuptaManan100 GuptaManan100 mentioned this issue Oct 21, 2021
3 tasks
@GuptaManan100
Copy link
Member

Should we close this issue. Seems very old and redundant now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

No branches or pull requests

3 participants