-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve python3 performance by adding __slots__ #3017
Conversation
@@ -652,6 +652,7 @@ CaptureNextTokenType(d) ::= "<d.varName> = self._input.LA(1)" | |||
|
|||
StructDecl(struct,ctorAttrs,attrs,getters,dispatchMethods,interfaces,extensionMembers) ::= << | |||
class <struct.name>(<if(contextSuperClass)><contextSuperClass><else>ParserRuleContext<endif>): | |||
__slots__ = ('parser', '__dict__') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding __dict__
here as a catch-all for any other class members (labeled grammar rule members?)
Ideally these would be emitted here too, but I'm not familiar enough with how the parser generator templates work to implement this, so using __dict__
is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be as simple as:
slots = ('parser', <attrs:{a | ''}; separator=",">)
can you try ?then we can leverage your perf improvement measures
@@ -58,7 +63,7 @@ def nextTokensNoContext(self, s:ATNState): | |||
if s.nextTokenWithinRule is not None: | |||
return s.nextTokenWithinRule | |||
s.nextTokenWithinRule = self.nextTokensInContext(s, None) | |||
s.nextTokenWithinRule.readonly = True | |||
s.nextTokenWithinRule.readOnly = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fixing what appears to be a capitalization typo. The IntervalSet
class initializes this using camelCase instead of lowercase. Not sure which is preferred since I see both usages in the runtime.
@@ -89,7 +92,6 @@ def copyState(self, simulator:LexerATNSimulator ): | |||
self.startIndex = simulator.startIndex | |||
|
|||
def match(self, input:InputStream , mode:int): | |||
self.match_calls += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing a debug counter that conflicts with the ability to use __slots__
since it was initialized as a class variable but referenced as an instance variable. Nothing else references this.
Hi, looks great but it currently breaks the Travis CI build... |
am I correct in thinking this improvement would also work for Python 2.x? |
Yeah. Slots have been around since early python 2: https://docs.python.org/2.7/reference/datamodel.html#__slots__ |
I've merged this PR to have it part of 4.9.1 being released this weekend. |
Will do! I'll try to prep another PR today |
Summary
Python isn't exactly known for its performance, but this PR is my attempt at squeezing out a few more percent of speedup from the Python3 runtime.
This change adds the
__slots__
variable to most classes in the Antlr runtime in order to explicitly declare class members and avoid the use of the slower__dict__
-based member storage mechanism. This results in faster Python class instance creation, as well as member access.A benchmark I ran showed roughly 10% improvement in performance.
Compatibility considerations
For normal "sane" uses of the Antlr runtime, this change is backwards compatible with the current runtime API and should have no effect on behavior.
The side effect of
__slots__
usage is that one is no longer able to dynamically assign new variables to class instances. There are also some caveats with Python weakref usage. I can't quite imagine why someone may need to stuff their own variables on-top of runtime classes, but then again, you never know what other users will do 😄.This limitation can be removed by adding
__dict__
and__weakref__
back into the__slots__
definition. (possibly at a small performance cost)The current implementation omits this for now.
See python docs for more details: https://docs.python.org/3/reference/datamodel.html#slots
Other changes made
The following other changes were made to correct issues and allow
__slots__
to be used:ATN.py
, fixed capitalization of an assignment to<IntervalSet>.readOnly
IntervalSet
initializes a memberreadOnly
, but this was being assigned asreadonly
inATN.py
LexerATNSimulator.py
, removed thematch_calls
counter.__slots__
.Benchmark
I ran some comparison benchmarks on some sample code from one of my projects. Also including stats using the speedy-antlr-tool accelerator.
Inputs are using the lexer/parser for SystemRDL as well as one of the testcase files. Measured time elapsed for 1000 iterations via timeit.
It isn't much, but the addition of
__slots__
improves performance by roughly 10% in both the baseline Python3 parser, as well as the accelerated version. I expect this also depends somewhat on the grammar used.Other notes
I ran the unit-tests in
runtime/Python3/test/run.py
without issues. Not sure how exhaustive these are.My name is already signed in
contributors.txt
which is why you don't see it in this PR's diff.