Go runtime performance fixup #2916

dmitrys99 · 2020-09-19T12:15:17Z

Main performance issue in Go runtime comparing to Java runtime is absence of hashCode() method in Go world. It is emulated using hash() method which is called directly or indirectly significant amount of times. But instead of returning precalculated hash it actively calculates actual hash on each call. This leads to significant performance degradation.
This PR reduces time required to calculation to about 7 times.

dmitrys99 · 2020-09-21T07:49:52Z

Further investigation shown real problem is GC and pointers: https://syslog.ravelin.com/further-dangers-of-large-heaps-in-go-7a267b57d487
So, to eliminate performance issues we have to rework Go target runtime and decrease (or remove if possible) usage of pointers in runtime datastructures.

exander77 · 2020-11-11T15:47:00Z

@dmitrys99 Is it in a viable state for testing? I see Golang performance issues as well.

dmitrys99 · 2020-11-11T20:08:16Z

@dmitrys99 Is it in a viable state for testing? I see Golang performance issues as well.

I added several commits, now compilation goes significantly faster.
To issue is that ATNConfig has 3 pointer and GC goes crazy checking those pointers.
I eliminated one of the 3 pointers, but have no clue how to deal with others two.

You can try to test with this PR, it does things faster, but not as Java runtime does.

exander77 · 2020-11-12T01:27:29Z

@dmitrys99 I have tested it and it seems to cut 30% from the parsing time (in my case).

exander77 · 2020-11-12T14:52:07Z

@parrt Can this be pulled into master? I can confirm significant speed up and maybe other people can join the effort.

parrt · 2020-11-12T17:10:50Z

@davesisson I think you're the Go guy. can you take a look?

exander77 · 2020-11-12T17:33:41Z

This pull makes it much better, but if there is a Golang expert, I would like some opinion why it is spending so much time in garbage collector? Can garbage collector be disabled during the parsing and memory cleared once after parsing?

runtime.scanobject itself is 11.02%.

      flat  flat%   sum%        cum   cum%
    14.33s 11.02% 11.02%     27.54s 21.18%  runtime.scanobject
     9.18s  7.06% 18.08%     60.66s 46.65%  github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork
     7.13s  5.48% 23.56%     23.98s 18.44%  runtime.mallocgc
     4.84s  3.72% 27.28%     25.85s 19.88%  github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).getEpsilonTarget
     4.69s  3.61% 30.89%      6.54s  5.03%  runtime.findObject
     4.06s  3.12% 34.01%      5.41s  4.16%  runtime.heapBitsSetType
     3.61s  2.78% 36.79%      6.89s  5.30%  runtime.cgocall
     3.25s  2.50% 39.29%      3.25s  2.50%  runtime.memclrNoHeapPointers
     2.37s  1.82% 41.11%      2.40s  1.85%  syscall.Syscall
     2.33s  1.79% 42.90%      2.33s  1.79%  runtime.nanotime (inline)
     2.31s  1.78% 44.68%      2.31s  1.78%  github.com/antlr/antlr4/runtime/Go/antlr.(*BaseATNConfig).GetState
     2.04s  1.57% 46.25%      2.04s  1.57%  runtime.nextFreeFast
     1.94s  1.49% 47.74%      1.94s  1.49%  runtime.memmove
     1.92s  1.48% 49.22%        13s 10.00%  github.com/antlr/antlr4/runtime/Go/antlr.NewBaseATNConfig
     1.80s  1.38% 50.60%      5.56s  4.28%  runtime.greyobject
     1.75s  1.35% 51.95%      1.75s  1.35%  github.com/antlr/antlr4/runtime/Go/antlr.(*BaseATNState).GetTransitions
     1.63s  1.25% 53.20%     28.95s 22.26%  runtime.gcDrain
     1.60s  1.23% 54.43%      7.73s  5.94%  runtime.mapassign_fast64
     1.55s  1.19% 55.62%      1.79s  1.38%  runtime.heapBitsForAddr (inline)
     1.39s  1.07% 56.69%      1.52s  1.17%  runtime.(*pallocBits).summarize
     1.33s  1.02% 57.71%      5.15s  3.96%  bufio.(*Reader).ReadSlice
     1.33s  1.02% 58.74%      1.33s  1.02%  runtime.markBits.isMarked (inline)
     1.32s  1.02% 59.75%      1.80s  1.38%  runtime.spanOf
     1.31s  1.01% 60.76%     60.66s 46.65%  github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState
     1.31s  1.01% 61.77%      1.77s  1.36%  runtime.mapaccess1_fast64

dmitrys99 · 2020-11-12T18:33:55Z

I tried to turn off GC, it does not help, unfortunately. It consumes about 7 GB of data on test case.

exander77 · 2020-11-12T19:57:39Z

@dmitrys99 I think the problem is not the pointers, but a lot of memory allocation on heap.

Each of these calls allocates memory and thus creates another item for the garbage collector to handle:

      flat  flat%   sum%        cum   cum%
 8704.53kB 36.09% 36.09%  8704.53kB 36.09%  github.com/antlr/antlr4/runtime/Go/antlr.NewBaseATNConfig
 4608.14kB 19.11% 55.20%  4608.14kB 19.11%  github.com/antlr/antlr4/runtime/Go/antlr.NewBaseSingletonPredictionContext
 3100.11kB 12.85% 68.06%  3100.11kB 12.85%  github.com/antlr/antlr4/runtime/Go/antlr.(*BaseATNConfigSet).Add

I did modification like this (basically allocating memory in bulk):

+var prealocatedATNConfigCount = 512
+var prealocatedATNConfigMax = prealocatedATNConfigCount*2*2*2*2
+var prealocatedATNConfigIndex = prealocatedATNConfigCount
+var prealocatedATNConfigArray []BaseATNConfig
+
 func NewBaseATNConfig(c ATNConfig, state int, context PredictionContext, semanticContext SemanticContext) *BaseATNConfig {
 	if semanticContext == nil {
 		panic("semanticContext cannot be nil")
 	}
 
-	return &BaseATNConfig{
+	prealocatedATNConfigIndex++
+
+	if prealocatedATNConfigIndex >= prealocatedATNConfigCount {
+		if prealocatedATNConfigCount < prealocatedATNConfigMax {
+			prealocatedATNConfigCount *= 2
+		}
+		prealocatedATNConfigArray = make([]BaseATNConfig, prealocatedATNConfigCount)
+		prealocatedATNConfigIndex = 0
+	}
+
+	prealocatedATNConfigArray[prealocatedATNConfigIndex] = BaseATNConfig{
 		state:                      state,
 		alt:                        c.GetAlt(),
 		context:                    context,
@@ -101,6 +116,8 @@ func NewBaseATNConfig(c ATNConfig, state int, context PredictionContext, semanti
 		reachesIntoOuterContext:    c.GetReachesIntoOuterContext(),
 		precedenceFilterSuppressed: c.getPrecedenceFilterSuppressed(),
 	}
+
+	return &prealocatedATNConfigArray[prealocatedATNConfigIndex];
 }

Ant it cut another 25% off the time on top of your patch, but now I leak memory for some reason. Seems like prealocatedATNConfigArray does not get garbage collected. Any ideas?

exander77 · 2020-11-12T20:55:11Z

Hm, maybe it does not leak but has slightly larger memory use. You can check if it works for you better.

dmitrys99 · 2020-11-13T09:02:23Z

Hm, maybe it does not leak but has slightly larger memory use. You can check if it works for you better.

Thank you for your attempt to fix the issue!

It might be a solution yet I see couple of drawbacks.

Such a fix will not work if you'll try to run several parsers simultaneously, which is true in my case.
You get speedup because in fact you do not produce garbage, you store links to all objects, which is fine, but lead to additional memory consumption.

The real solution should be based on knowledge of Antlr internals. I have questions, which I have no idea how to answer.

When we enter AdaptivePredict procedure, which is high-level consumer of memory here, can we cache ATNConfigs at this level? We definitely can try, but I do not know if some of the ATNConfig "leak" to ATN network. If it is true, then whole schema will fail, or, we have to rework ATN internals to deal with additional cache.
When cleanup actually called and how it can be modelled in Go? C++ uses smart pointers (i.e. destructors); Java, C#, JS and PHP uses generational GC, which has no "pointer" issue I mentioned above.
What is the scope of cleanup? What exact object can be dropped during cleanup?

Knowing this information will allow us to model cleanup procedure in Go in a way it can be performant yet be different from Java's way. Since @parrt is on the thread, may be this questions are to you.

exander77 · 2020-11-13T14:43:10Z

Hm, maybe it does not leak but has slightly larger memory use. You can check if it works for you better.

Thank you for your attempt to fix the issue!

It might be a solution yet I see couple of drawbacks.

Such a fix will not work if you'll try to run several parsers simultaneously, which is true in my case.

You get speedup because in fact you do not produce garbage, you store links to all objects, which is fine, but lead to additional memory consumption.

I think it would work with several parsers running parallel, but nodes from them would mix into the single allocated memory and the memory would be released only when all nodes from all parsers in it would be released. It could be revoked that memory would be allocated in the parser and all nodes in such block would belong to the same parser. I was basically testing different approaches so it is not a perfect solution.

The real solution should be based on knowledge of Antlr internals. I have questions, which I have no idea how to answer.

When we enter AdaptivePredict procedure, which is high-level consumer of memory here, can we cache ATNConfigs at this level? We definitely can try, but I do not know if some of the ATNConfig "leak" to ATN network. If it is true, then whole schema will fail, or, we have to rework ATN internals to deal with additional cache.

When cleanup actually called and how it can be modelled in Go? C++ uses smart pointers (i.e. destructors); Java, C#, JS and PHP uses generational GC, which has no "pointer" issue I mentioned above.

What is the scope of cleanup? What exact object can be dropped during cleanup?

Knowing this information will allow us to model cleanup procedure in Go in a way it can be performant yet be different from Java's way. Since @parrt is on the thread, may be this questions are to you.

I agree that knowledge of Antlr internals and behaviour of the golang garbage collector behaviour is the key here.

KvanTTT · 2021-11-07T14:36:06Z

@jjeffcaii could you take a look if it has already been fixed by your improvements or it's another problem?

jjeffcaii · 2021-11-09T13:01:59Z

@jjeffcaii could you take a look if it has already been fixed by your improvements or it's another problem?

They look similar, but I'm not sure, I think they are all aim to resolve the hash problem, the modification of murmur hash is the same.

dmitrys99 · 2021-11-09T14:08:38Z

I have tested it with 4.9.3. On my side performance increase about 2 times (21 sec vs 49 sec).

This is a good improvement.

But my previous experience shows there might be 7 times. I'll try to combine these two approaches and see.

dmitrys99 · 2021-11-12T09:04:17Z

I have tested test case from #2888. There are times:

Item	Time, sec
Original issue (Antlr 4.8)	37.2
This MR (Antlr 4.8)	17.8
Antlr 4.9.3	22.3

So, while there is an improvement, I still think the problem with Golang target if fundamental. Please take a look to #2888 (comment), there is an explanation.

Moreover, performance is unstable (previously I got better timing), because it depends on Go GC implementation.

I see no other means except Go target redesign.

KvanTTT · 2021-11-12T11:56:38Z

It seems like this MR should had to be merged instead of the newest one.

parrt · 2021-12-27T17:17:27Z

@jcking Could you take a look at this and see how we might combine efforts here on performance? @dmitrys99 could you Provide the test rig that checked performance?

jimidle · 2022-08-22T05:24:13Z

@parrt It might be worth someone checking the performance against the current dev branch after my performance fixes. I suspect that this is now fixed as the need for garbage collection has been improved drastically. We can then close this issue.

dmitrys99 · 2022-08-22T11:14:44Z

@parrt It might be worth someone checking the performance against the current dev branch after my performance fixes. I suspect that this is now fixed as the need for garbage collection has been improved drastically. We can then close this issue.

I did. I can confirm issue test executes 1.6 s on dev branch of 4.10.2 against 25.7 s on 4.8 version.

dmitrys99 · 2022-08-22T11:26:31Z

@parrt It might be worth someone checking the performance against the current dev branch after my performance fixes. I suspect that this is now fixed as the need for garbage collection has been improved drastically. We can then close this issue.

Very interesting to hear how did you do that!

jimidle · 2022-08-22T11:31:54Z

I’ll test it myself as well then. It should be milliseconds I think. There were many bugs and a few bad design choices. Sets, hashing, conflicting alt resolution, LL(*) resolution didn’t work. There may yet be more. And I can see some future improvements. The hashing now reflects Java, but uses Go generics for collections. There’s also a PR waiting that reduces memory allocations and a couple of extra collections that might be worth redoing. I’m just going to fix all outstanding bugs first.

…

On Mon, Aug 22, 2022 at 19:26 Dmitry Solomennikov ***@***.***> wrote: @parrt <https://github.com/parrt> It might be worth someone checking the performance against the current dev branch after my performance fixes. I suspect that this is now fixed as the need for garbage collection has been improved drastically. We can then close this issue. Very interesting to hear how did you do that! — Reply to this email directly, view it on GitHub <#2916 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJ7TMDPSAUUNUJ5J6J5L2DV2NPXHANCNFSM4RS7UZNQ> . You are receiving this because you commented.Message ID: ***@***.***>

parrt · 2022-08-23T04:38:20Z

Should we close as @jimidle's PRs fixed performance or is this still needed?

jimidle · 2022-09-12T06:40:16Z

#2888

@parrt I think we can close this now. Any further performance related stuff will be new issues. Although I have some tweaks to do that will gradually improve both the code and performance.

dmitrys99 · 2022-09-13T07:01:28Z

Closed. Development goes to #2888

dmitrys99 added 4 commits September 19, 2020 11:59

Sign agreement

1b1e8e6

Use cached hash for BaseSingletonPredictionContext

72e6ebb

Eliminate rotateLeft routine

0573e26

Speedup Set datastructure

02e2afa

dmitrys99 mentioned this pull request Sep 19, 2020

Performance issue with Golang target #2888

Closed

dmitrys99 added 3 commits September 23, 2020 16:41

BitSet data structure without map

dff98b6

Change state from pointer to ATNState to state number in ATN

8d62724

Merge branch 'master' into go-runtime-performance-fixup-2

c853e63

exander77 mentioned this pull request Nov 11, 2020

Speed ProtonMail/go-rfc5322#1

Open

parrt requested a review from davesisson November 12, 2020 17:10

parrt added the target:go label Dec 27, 2021

dmitrys99 closed this Sep 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Go runtime performance fixup #2916

Go runtime performance fixup #2916

dmitrys99 commented Sep 19, 2020

dmitrys99 commented Sep 21, 2020

exander77 commented Nov 11, 2020

dmitrys99 commented Nov 11, 2020

exander77 commented Nov 12, 2020 •

edited

Loading

exander77 commented Nov 12, 2020

parrt commented Nov 12, 2020

exander77 commented Nov 12, 2020 •

edited

Loading

dmitrys99 commented Nov 12, 2020

exander77 commented Nov 12, 2020 •

edited

Loading

exander77 commented Nov 12, 2020

dmitrys99 commented Nov 13, 2020

exander77 commented Nov 13, 2020

KvanTTT commented Nov 7, 2021

jjeffcaii commented Nov 9, 2021

dmitrys99 commented Nov 9, 2021

dmitrys99 commented Nov 12, 2021 •

edited

Loading

KvanTTT commented Nov 12, 2021

parrt commented Dec 27, 2021

jimidle commented Aug 22, 2022

dmitrys99 commented Aug 22, 2022

dmitrys99 commented Aug 22, 2022

jimidle commented Aug 22, 2022 via email

parrt commented Aug 23, 2022

jimidle commented Sep 12, 2022

dmitrys99 commented Sep 13, 2022

Go runtime performance fixup #2916

Go runtime performance fixup #2916

Conversation

dmitrys99 commented Sep 19, 2020

dmitrys99 commented Sep 21, 2020

exander77 commented Nov 11, 2020

dmitrys99 commented Nov 11, 2020

exander77 commented Nov 12, 2020 • edited Loading

exander77 commented Nov 12, 2020

parrt commented Nov 12, 2020

exander77 commented Nov 12, 2020 • edited Loading

dmitrys99 commented Nov 12, 2020

exander77 commented Nov 12, 2020 • edited Loading

exander77 commented Nov 12, 2020

dmitrys99 commented Nov 13, 2020

exander77 commented Nov 13, 2020

KvanTTT commented Nov 7, 2021

jjeffcaii commented Nov 9, 2021

dmitrys99 commented Nov 9, 2021

dmitrys99 commented Nov 12, 2021 • edited Loading

KvanTTT commented Nov 12, 2021

parrt commented Dec 27, 2021

jimidle commented Aug 22, 2022

dmitrys99 commented Aug 22, 2022

dmitrys99 commented Aug 22, 2022

jimidle commented Aug 22, 2022 via email

parrt commented Aug 23, 2022

jimidle commented Sep 12, 2022

dmitrys99 commented Sep 13, 2022

exander77 commented Nov 12, 2020 •

edited

Loading

exander77 commented Nov 12, 2020 •

edited

Loading

exander77 commented Nov 12, 2020 •

edited

Loading

dmitrys99 commented Nov 12, 2021 •

edited

Loading