Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use 32bit murmur3 same with java, implement array2dhashset which… #3243

Merged
merged 2 commits into from
Sep 24, 2021

Conversation

jjeffcaii
Copy link

@jjeffcaii jjeffcaii commented Aug 5, 2021

Hi All:

This PR fixes several serious performance problem in Go runtime:

  1. the murmur3 implementation in Go SDK has some problem, it's not consistent with Java.
  2. some hash code implementation is not correct, some of them are not consistent with Java.
  3. the Set implementation which based on map[int][]interface{} has poor performance, it is not consistent with Java, I implement the Array2DHashSet instead, this implementation is same with Java.
  4. the hash cache is now fixed for PredictionContext.

Many problems have been raised, and I think this PR will fix them, please check these issues:
#2888
antlr/grammars-v4#2272
#2152

The core problem in Golang runtime: it wastes too much time on hash code computing because of the incorrect Murmur3 and Set and hasher implementation. Compare the Java and the original Go runtime, Go computes murmur3 ~80,000,000 times, but the Java runtime only ~600,000 times.

Here's my benchmark in my MBP '16 2019, the g4 files are copied from https://github.com/antlr/grammars-v4/tree/master/sql/mysql/Positive-Technologies, the original Go runtime cost 4.8s, and this PR costs 171ms.

Benchmark:

	begin := time.Now()
	var sql string

	sql = "SELECT DATE_FORMAT(IF(COUNT(*)>0,'2021-01-02','2020-01-01'),'%Y') AS cc FROM student WHERE uid BETWEEN 1 AND 5"
	//sql = "SELECT 1"

	is := antlr.NewInputStream(sql)
	lexer := p.NewMySqlLexer(antlrx.NewCaseChangingStream(is, true))
	//lexer := p.NewMySqlLexer(is)

	ts := antlr.NewCommonTokenStream(lexer, antlr.TokenDefaultChannel)

	mp := p.NewMySqlParser(ts)
	//mp.GetInterpreter().SetPredictionMode(antlr.PredictionModeSLL)
	mp.RemoveErrorListeners()
	mp.SetErrorHandler(antlr.NewBailErrorStrategy())

	_ = mp.Root()

	fmt.Println("cost:", time.Since(begin)) 
        // COST:
        // BEFORE: 4.853549529s
        // AFTER: 171ms
        // JAVA: 20ms

@parrt parrt requested a review from davesisson August 5, 2021 16:17
@parrt
Copy link
Member

parrt commented Aug 5, 2021

Can somebody in Go area review? @davesisson @easonlin404 @pboyer ?

@jjeffcaii
Copy link
Author

Is anyone focusing on it?

@parrt
Copy link
Member

parrt commented Aug 23, 2021

Can any of the Go folks take a look @davesisson @easonlin404 @pboyer ? I'm afraid I'm not qualified

@KvanTTT
Copy link
Member

KvanTTT commented Sep 10, 2021

Maybe somebody else can review, not necessary Go folks? @ericvergnaud ? It seems like the critical problem.

@ericvergnaud
Copy link
Contributor

I'm a bit skeptical re the inconsistent use of MurmurHash (why is not used by hashATNConfig ?).
That said, I believe in tests ans as long as the tests pass and there is a performance improvement I can't think of any reason to reject this PR.

@jjeffcaii
Copy link
Author

@ericvergnaud Thanks for the review.

The main difference of murmur3 is that I changed it to 32bit like the java codes, and the hashATNConfig is copied from java runtime:

public static final class ConfigEqualityComparator extends AbstractEqualityComparator<ATNConfig> {
public static final ConfigEqualityComparator INSTANCE = new ConfigEqualityComparator();
private ConfigEqualityComparator() {
}
@Override
public int hashCode(ATNConfig o) {
int hashCode = 7;
hashCode = 31 * hashCode + o.state.stateNumber;
hashCode = 31 * hashCode + o.alt;
hashCode = 31 * hashCode + o.semanticContext.hashCode();
return hashCode;
}
@Override
public boolean equals(ATNConfig a, ATNConfig b) {
if ( a==b ) return true;
if ( a==null || b==null ) return false;
return a.state.stateNumber==b.state.stateNumber
&& a.alt==b.alt
&& a.semanticContext.equals(b.semanticContext);
}
}

To be honest, I am not familiar with the core antlr implementation, most codes were translated from the Java runtime, try my best. Hope it can help those who are using the Go runtime.

@parrt
Copy link
Member

parrt commented Sep 24, 2021

I concur with @ericvergnaud . The tests pass and improves performance; it also brings it in line with what ANTLR core does. Sam Harwell worked really hard to get that performance up, and using that hash function was important. merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants