Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gen4: Faster analysis #8938

Merged
merged 4 commits into from
Oct 6, 2021
Merged

gen4: Faster analysis #8938

merged 4 commits into from
Oct 6, 2021

Conversation

systay
Copy link
Collaborator

@systay systay commented Oct 6, 2021

Description

We spent some time figuring out where time was being spent in the gen4 planner, and it turned out that our analyzer was being inefficient while examining the AST.

During semantic analysis, we do a number of things, and these things impact each other. We do:

  • scoping - which tables are available at the point where an expression is invoked?
  • checking - we check the AST for illegal constructs. It might pass the parser, but still make no sense
  • expand star - expand * expressions to the columns in the tables. yeah, it doesn't really make sense that we are rewriting the AST while doing analysis, but the alternative is worse. because this step introduces columns into the AST, they would need to be bound separately, unless we could do this before binding has happened
  • binding - figuring out which table a column is referencing. to do this well, we need the scoper to have done it's thing, so we know the current scope and which tables are available
  • typing - given an expression, we calculate the type we expect the expression to have. this needs columns to already be bound to their table, so we can look up the schema info

Before, we did this in three passes of the AST:

  1. scoping and checking
  2. rewriting
  3. binding and typing

This PR changes this, and does all of analysing and rewriting in a single step.

Performance-wise, the new analyzer makes the Gen4 planner about 16% faster compared to the old analyzer. Below is the output of benchstat comparing Gen4's micro-benchmarks using the old and new analyzer.

name                                         old time/op    new time/op    delta
OLTP/gen4-48                                    495µs ± 0%     436µs ± 1%  -11.79%  (p=0.008 n=5+5)
TPCC/gen4-48                                   3.49ms ± 1%    2.97ms ± 0%  -14.86%  (p=0.008 n=5+5)
TPCH/gen4-48                                   10.6ms ± 1%     9.4ms ± 1%  -11.75%  (p=0.008 n=5+5)
Planner/from_cases.txt-gen4-48                 8.67ms ± 1%    7.53ms ± 1%  -13.06%  (p=0.008 n=5+5)
Planner/filter_cases.txt-gen4-48               10.9ms ± 1%     9.2ms ± 2%  -14.80%  (p=0.008 n=5+5)
Planner/large_cases.txt-gen4-48                 385µs ± 0%     340µs ± 0%  -11.90%  (p=0.016 n=5+4)
Planner/aggr_cases.txt-gen4-48                 9.26ms ± 1%    7.94ms ± 2%  -14.20%  (p=0.008 n=5+5)
Planner/select_cases.txt-gen4-48               6.00ms ± 1%    5.00ms ± 1%  -16.76%  (p=0.008 n=5+5)
Planner/union_cases.txt-gen4-48                3.40ms ± 1%    2.82ms ± 1%  -17.11%  (p=0.008 n=5+5)
SemAnalysis-48                                 26.8ms ± 1%    20.9ms ± 2%  -22.27%  (p=0.008 n=5+5)
SelectVsDML/DML_(random_sample,_N=32)-48       1.13ms ± 1%    1.09ms ± 1%   -3.46%  (p=0.008 n=5+5)
SelectVsDML/Select_(random_sample,_N=32)-48    1.50ms ± 1%    1.40ms ± 2%   -6.66%  (p=0.008 n=5+5)

name                                         old alloc/op   new alloc/op   delta
OLTP/gen4-48                                    101kB ± 0%      91kB ± 0%  -10.21%  (p=0.008 n=5+5)
TPCC/gen4-48                                    760kB ± 0%     669kB ± 0%  -11.99%  (p=0.008 n=5+5)
TPCH/gen4-48                                   2.74MB ± 0%    2.55MB ± 0%   -6.90%  (p=0.008 n=5+5)
Planner/from_cases.txt-gen4-48                 2.11MB ± 0%    1.90MB ± 0%   -9.97%  (p=0.008 n=5+5)
Planner/filter_cases.txt-gen4-48               2.65MB ± 0%    2.38MB ± 0%  -10.29%  (p=0.008 n=5+5)
Planner/large_cases.txt-gen4-48                 112kB ± 0%     105kB ± 0%   -6.45%  (p=0.008 n=5+5)
Planner/aggr_cases.txt-gen4-48                 1.72MB ± 0%    1.52MB ± 0%  -12.11%  (p=0.008 n=5+5)
Planner/select_cases.txt-gen4-48               1.27MB ± 0%    1.12MB ± 0%  -12.02%  (p=0.008 n=5+5)
Planner/union_cases.txt-gen4-48                 712kB ± 0%     625kB ± 0%  -12.24%  (p=0.008 n=5+5)
SelectVsDML/DML_(random_sample,_N=32)-48        279kB ± 0%     277kB ± 0%   -0.69%  (p=0.008 n=5+5)
SelectVsDML/Select_(random_sample,_N=32)-48     327kB ± 0%     305kB ± 0%   -6.58%  (p=0.008 n=5+5)

name                                         old allocs/op  new allocs/op  delta
OLTP/gen4-48                                    2.27k ± 0%     1.99k ± 0%  -12.66%  (p=0.008 n=5+5)
TPCC/gen4-48                                    16.9k ± 0%     14.2k ± 0%  -15.57%  (p=0.008 n=5+5)
TPCH/gen4-48                                    41.6k ± 0%     35.8k ± 0%  -13.95%  (p=0.008 n=5+5)
Planner/from_cases.txt-gen4-48                  43.0k ± 0%     36.7k ± 0%  -14.63%  (p=0.008 n=5+5)
Planner/filter_cases.txt-gen4-48                53.8k ± 0%     45.5k ± 0%  -15.42%  (p=0.008 n=5+5)
Planner/large_cases.txt-gen4-48                 1.97k ± 0%     1.75k ± 0%  -11.12%  (p=0.008 n=5+5)
Planner/aggr_cases.txt-gen4-48                  39.2k ± 0%     32.8k ± 0%  -16.12%  (p=0.029 n=4+4)
Planner/select_cases.txt-gen4-48                29.0k ± 0%     24.5k ± 0%  -15.66%  (p=0.008 n=5+5)
Planner/union_cases.txt-gen4-48                 16.9k ± 0%     14.2k ± 0%  -15.71%  (p=0.008 n=5+5)
SelectVsDML/DML_(random_sample,_N=32)-48        4.70k ± 0%     4.61k ± 0%   -1.98%  (p=0.008 n=5+5)
SelectVsDML/Select_(random_sample,_N=32)-48     6.22k ± 0%     5.86k ± 0%   -5.86%  (p=0.008 n=5+5)

Related Issue(s)

#7280

Checklist

  • Tests were added or are not required
  • Documentation was added or is not required

systay and others added 3 commits October 5, 2021 19:04
Co-authored-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Andres Taylor <andres@planetscale.com>
Co-authored-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Co-authored-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Andres Taylor <andres@planetscale.com>
@systay systay merged commit e00cf5b into vitessio:main Oct 6, 2021
@systay systay deleted the faster-analysis branch October 6, 2021 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants