Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random ArrayIndexOutOfBounds exceptions coming from ParserATNSimulator #804

Closed
parrt opened this issue Jan 24, 2015 · 18 comments · Fixed by #805
Closed

Random ArrayIndexOutOfBounds exceptions coming from ParserATNSimulator #804

parrt opened this issue Jan 24, 2015 · 18 comments · Fixed by #805

Comments

@parrt
Copy link
Member

parrt commented Jan 24, 2015

It looks like a transient problem, even in the environment where we saw the issue originally it hasn't been reproducible. Only happens in our automation environment (rehat 5.7 running on open stack running on I dunno what) and only occasionally.

It happens in here:

synchronized (from) {
    if ( from.edges==null ) { 
        from.edges = new DFAState[atn.maxTokenType+1+1];
    }   

    from.edges[t+1] = to; // connect
}   

so in some case atn.maxTokenType+1 > t must be true.

I checked and it looks like the field "edges" is always protected by its parent DFAState object except for in one suspicious place where it appears to be guarded by an instance of DFA in DFA.java's setPrecedenceDfa method. (all in 4.3)

It would seem on the face of it that you need to synchronize inside setPrecedenceDfa on the DFAState rather than the DFA (or perhaps in addition too?).

Maybe we should just add an assert there.

assert t +1 < from.edges.length : "array index out of bounds!, t = " + t + ", edges.length=" + from.edges.length;

And perhaps add something about which threads are active and their stacks if it fails?

@sharwell
Copy link
Member

Since it's always possible for users to assign types to a token which fall outside the normal bounds defined by [EOF, atn.maxTokenType], the solution is to guard the assignment of the DFA edge. using the length of the edges array.

int index = t + 1;
if (index >= 0 && index < from.edges.length) {
    // add the edge to the DFA
    from.edges[index] = to;
}

I had not observed the issue myself because one of the memory optimizations I needed required the inclusion of this range check quite a while back. Which brings us to the actual cause....

When we added code to generate rule bypass transitions during ATN deserialization, we allowed the deserializer to define tokens which are greater than atn.maxTokenType. When one of these edges is added to the DFA, the exception reported here gets thrown.

One final note - the use of edges in setPrecedenceDfa does not require additional synchronization because the DFAState that owns it is a local variable that has not been assigned to any location visible from other threads.

@parrt
Copy link
Member Author

parrt commented Jan 25, 2015

Wow! You found it. excellent. I'll explore when I finish class prep.

@ericvergnaud
Copy link
Contributor

Not sure that’s the explanation:

  • such a guard is already in place just before the allocation and exception:
    if (from == null || t < -1 || t > atn.maxTokenType) {
    return to;
    }
  • the exception would occur deterministically

Le 25 janv. 2015 à 08:14, Terence Parr notifications@github.com a écrit :

Wow! You found it. excellent. I'll explore when I finish class prep.


Reply to this email directly or view it on GitHub #804 (comment).

@sharwell
Copy link
Member

@ericvergnaud makes a good point.

I looked at a few other possibilities but haven't found any clear problems so far. The two areas of focus for me are:

  • The s0 state for precedence DFAs, which uses an edges array indexed by precedence instead of atn.maxTokenType. However, this state is never added to the states map for the DFA so it will never be the target of a transition within the DFA.

  • The ERROR state. If you ever fail to stop when the ERROR sentinel DFA state is reached, it will cause a big problem (it's never supposed to have outgoing edges, so it's shared across all DFAs). You could add the following to addDFAEdge, but so far I couldn't find any way for this to occur.

    assert from != ERROR;

@ericvergnaud
Copy link
Contributor

Given the context in which it occurs, I don’t think the issue lies in the algorithm itself.

I’m more suspicious with the 2 different allocation strategies for edges depending on whether the DFA is a precedence one or not, especially given the below:

if (!dfa.isPrecedenceDfa() && dfa.atnStartState instanceof StarLoopEntryState) {
    if (((StarLoopEntryState)dfa.atnStartState).precedenceRuleDecision) {
        dfa.setPrecedenceDfa(true);
    }
}

This clearly changes an existing DFA from being a non precedence one to being a precedence one, which leads to change the allocation strategy.

In a multi threaded context, you can therefore simultaneously have 1 thread executing the below in addDFAEdge:

if ( from.edges==null ) {
    from.edges = new DFAState[atn.maxTokenType+1+1];
}

from.edges[t+1] = to; // connect

and another thread executing the below in dfa.setPrecedenceStartState:

s0.edges = Arrays.copyOf(s0.edges, precedence + 1);

if the latter occurs between the former allocation and assignment, we could observe the behaviour.

This could happen as a consequence of: dfa.s0 = s0; in adaptivePredict, which is done after checking for the precedence nature of the DFA, but might have been changed in between by another thread.

In brief, I would delegate s0 assignment to DFA so it can be reliably protected and see if it improves.
I would also delegate edges allocation to DFA.

Le 25 janv. 2015 à 08:55, Sam Harwell notifications@github.com a écrit :

@ericvergnaud https://github.com/ericvergnaud makes a good point.

I looked at a few other possibilities but haven't found any clear problems so far. The two areas of focus for me are:

The s0 state for precedence DFAs, which uses an edges array indexed by precedence instead of atn.maxTokenType. However, this state is never added to the states map for the DFA so it will never be the target of a transition within the DFA.
The ERROR state. If you ever fail to stop when the ERROR sentinel DFA state is reached, it will cause a big problem. You could add the following to addDFAEdge, but so far I couldn't find any way for this to occur.

assert from != ERROR;

Reply to this email directly or view it on GitHub #804 (comment).

@parrt
Copy link
Member Author

parrt commented Jan 25, 2015

Great job, guys! Thanks. sounds suspicious as you say.

@sharwell
Copy link
Member

This is one of the more... "interesting" portions of the algorithm for sure. The above situation can only be a problem if from variable in the first section and s0 variable of the second section refer to the same instance. By design, the DFAState created for a precedence DFA is not added to the DFA.states collection, which means the only way to reach this particular DFAState instance is by accessing the DFA.s0 field directly. The code path in adaptivePredict which uses the DFA.s0 field for a precedence DFA always immediately follows an outgoing edge according to the current precedence level in the parser, so there is no way for that instance to end up in addDFAEdge as the from state.

You'll find that many of these sections are not synchronized, but rather rely on deterministic algorithms to produce usable results. In other words, in places where the algorithm could read a stale value and then write a new one over the top of it, the new value is equivalent to the old value for purposes of this algorithm.

@sharwell
Copy link
Member

Note that while adaptivePredict can call dfa.setPrecedenceDfa(true), there is no case where an existing DFA instance will transition from being a precedence DFA back to a normal DFA. This is a vital condition for the code flow around the block in adaptivePredict which assigns dfa.s0 = s0.

@ericvergnaud
Copy link
Contributor

An I correct in observing that adaptivePredict is only called during testing/profiling?
If so, this may be an almost non problem i.e. we could prevent concurrent execution of adaptivePredict without impacting performance.

Le 25 janv. 2015 à 10:24, Sam Harwell notifications@github.com a écrit :

Note that while adaptivePredict can call dfa.setPrecedenceDfa(true), there is no case where an existing DFA instance will transition from being a precedence DFA back to a normal DFA. This is a vital condition for the code flow around the block in adaptivePredict which assigns dfa.s0 = s0.


Reply to this email directly or view it on GitHub #804 (comment).

@sharwell
Copy link
Member

No, adaptivePredict is the entry point to the prediction algorithm and (along with the methods it calls) the primary focus point of almost every performance-related issue in ANTLR 4.

@sharwell
Copy link
Member

adaptivePredict has two key "modes" of operation. The "warmup" path is when DFA states are getting calculated and edges are being added to the dynamically-generated DFA. The "fast" path is when future input follows edges of the DFA which were previously computed and cached. During triage for a large customer¹ working with ANTLR, it became clear that it's extremely important that no use of synchronized appear anywhere on the "fast" path.

Unlike some systems where "warmup" is strictly before the "fast" path, ANTLR 4 is generally observed to continue adding DFA states as new input is received throughout the parsing process. However, the bulk of this occurs during the "first few files", after which point the majority of decisions end up going through the DFA.

¹ The customer was implementing search functionality and using over 1,000 threads for parsing on a large multi-processor machine. With this scale, a single synchronized instance on the fast path can dominate the observed performance characteristics even if it's restricted to fine-grain locking.

@ericvergnaud
Copy link
Contributor

Yeah it’s all over the place in generated code, but my IDE of course only sees it in non generated code.
It would still be useful to temporarily mark it as synchronized and see if the problem disappears.
This would increase our confidence on where the issue lies.

Le 25 janv. 2015 à 13:38, Sam Harwell notifications@github.com a écrit :

No, adaptivePredict is the entry point to the prediction algorithm and the primary focus point of almost every performance-related issue in ANTLR 4.


Reply to this email directly or view it on GitHub #804 (comment).

@martint
Copy link

martint commented Jan 26, 2015

FWIW, we're seeing this issue intermittently in our system, too:

java.lang.ArrayIndexOutOfBoundsException: 0
    at org.antlr.v4.runtime.atn.ParserATNSimulator.addDFAEdge(ParserATNSimulator.java:1838)
    at org.antlr.v4.runtime.atn.ParserATNSimulator.computeTargetState(ParserATNSimulator.java:609)
    at org.antlr.v4.runtime.atn.ParserATNSimulator.execATN(ParserATNSimulator.java:483)
    at org.antlr.v4.runtime.atn.ParserATNSimulator.adaptivePredict(ParserATNSimulator.java:423)
    at com.facebook.presto.sql.parser.SqlBaseParser.queryTerm(SqlBaseParser.java:1319)
    at com.facebook.presto.sql.parser.SqlBaseParser.queryNoWith(SqlBaseParser.java:1173)
    at com.facebook.presto.sql.parser.SqlBaseParser.query(SqlBaseParser.java:1039)
    at com.facebook.presto.sql.parser.SqlBaseParser.statement(SqlBaseParser.java:743)
    at com.facebook.presto.sql.parser.SqlBaseParser.singleStatement(SqlBaseParser.java:147)
    at com.facebook.presto.sql.parser.SqlParser$$Lambda$3/926727210.apply(Unknown Source)
    at com.facebook.presto.sql.parser.SqlParser.invokeParser(SqlParser.java:93)
    at com.facebook.presto.sql.parser.SqlParser.createStatement(SqlParser.java:66)
    ...

@parrt
Copy link
Member Author

parrt commented Jan 26, 2015

Hi Martin. wow, it's a party! Anything you can do to provide a (smallest possible) test that intermittently causes this issue would be much appreciated and would facilitate a quick fix. This bug concerns me greatly. Sam and Eric have already thought about this quite a bit I will jump into it as soon as I catch up after this first week of school, hopefully.

@sharwell
Copy link
Member

Thanks @martint. The specific index included in the exception message made me realize what the problem is. The failure sequence is:

  1. Thread 1 checks dfa.isPrecedenceDfa()
  2. Thread 2 calls dfa.setPrecedenceDfa(true)
  3. Thread 1 reads s0 as the start state, but it should have instead followed a precedence edge to find the start state

I'll send a fix later today.

@parrt
Copy link
Member Author

parrt commented Jan 26, 2015

Hooray! Crowd-sourcing at it's finest! :)

sharwell added a commit to sharwell/antlr4 that referenced this issue Jan 26, 2015
parrt added a commit that referenced this issue Jan 31, 2015
Fixes #804 potential misuse of the DFA start state when initializing a decision
@electrum
Copy link
Contributor

electrum commented Feb 4, 2015

Thanks for fixing this! Any chance of getting a new release with this fix? Or if that's a big deal, getting a point release with just this fix? We hit this error quite frequently.

@parrt
Copy link
Member Author

parrt commented Feb 4, 2015

well it’s kind of a major transaction cost to get out a new release. how about if I provide a jar? ;)
Ter
On Feb 3, 2015, at 5:24 PM, David Phillips notifications@github.com wrote:

Thanks for fixing this! Any chance of getting a new release with this fix? Or if that's a big deal, getting a point release with just this fix? We hit this error quite frequently.


Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants