Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report InputMismatchException with original context information #1969

Merged
merged 1 commit into from
Aug 1, 2017

Conversation

sharwell
Copy link
Member

@sharwell sharwell commented Jul 26, 2017

Fixes #1922

📝 This needs more test cases, particularly for LL(k) failures where k>1. However, my current understanding is these cases have been impacted by this bug for much longer (since 4.2.2) than the issue being addressed for the k=1 case.

@sharwell
Copy link
Member Author

sharwell commented Jul 26, 2017

@Et999 @michaelpj @PayneSun @marcospassos It should be possible to apply this change to your code by extending DefaultErrorStrategy and overriding sync and reportInputMismatch. Can one of you try this and let me know if it resolves the issue in your test suites? Since the InputMismatchException constructor used in this code is not available, you'll have to do a bit of trickery to make this work right:

int priorState = recognizer.getState();
ParserRuleContext priorContext = recognizer.getContext();
try {
  recognizer.setState(nextTokensState);
  recognizer._ctx = nextTokensContext;
  throw new InputMismatchException(recognizer);
} finally {
  recognizer.setState(priorState);
  recognizer._ctx = priorContext;
}

⚠️ Note that the message does still change in some cases relative to 4.6 - you get "mismatched input" instead of "extraneous token". However, the expected set should be reporting good information again.

@michaelpj
Copy link
Contributor

Thanks enormously for looking into this! I'll give this a try and see how it looks.

@michaelpj
Copy link
Contributor

Built locally, tested. Fixes the regressions in 4.7, and actually improves the errors in several other cases. Big 👍 from me.

@parrt
Copy link
Member

parrt commented Jul 29, 2017

@Et999 and @marcospassos can you build locally and confirm this works for you?

@sharwell
Copy link
Member Author

@parrt After confirmation, I'll start the work to implement this for other targets.

@marcospassos
Copy link
Contributor

I'll test it by tomorrow :)

@parrt
Copy link
Member

parrt commented Jul 30, 2017

@antlr/antlr-targets please note Sam's improvements to the error reporting in this PR.

@parrt
Copy link
Member

parrt commented Jul 30, 2017

ugh. we appear to be getting random travis fluctuations

@marcospassos
Copy link
Contributor

@sharwell we caught 160+ tests failing due to the BC break introduced in 4.7, even applying the patch you provided.

The errors can be grouped into the following cases:

  • No viable alternative at <EOF> is now Mismatched input <EOF> (no big deal)
  • Missing token ) is now Mismatched input <EOF> (worst error reporting)
  • Mismatched input <EOF> is now Extraneous token <EOF> (really weird)

Besides this, we found several cases where error recovery got worst. Given the following grammar and invalid input, notice how the error recovery produces worst guesses:

root: variableDeclarationList? expression;

variableDeclarationList
    : variableDeclaration + ;

variableDeclaration
    : 'let' variableDeclarator (',' variableDeclarator)* ';' ;

variableDeclarator
    : identifier '=' expression ;

expression: ...;

Invalid input:

let a = , b = 2; true

In 4.6, the previously input was recognized as:

let a = b; true

While in 4.7 the same input is recognized as:

let a = b;  2

This difference is probably related to the same issue, right?

@sharwell
Copy link
Member Author

sharwell commented Aug 1, 2017

@marcospassos If you are testing error messages, I would not expect this change to reduce the number of "regressing tests", since the messages will still change. Is your project open source so I can help triage the differences?

Also, before we go too far we should make sure the change was properly applied since the result discrepancy between @michaelpj and @marcospassos is so large.

💭 I notice that your examples all mention EOF. Do all of your failing tests involve EOF handling?

@marcospassos
Copy link
Contributor

marcospassos commented Aug 1, 2017

@sharwell the project is not open source. However, we're not testing only error messages, but also the generated parse tree. Why don't you consider the error message a BC break? This fix not only changed the error messages but also made the parser produce different parse trees, which is a serious BC break.

Btw, is "Extraneous token <EOF>" supposed to be a valid error message?

@sharwell
Copy link
Member Author

sharwell commented Aug 1, 2017

@marcospassos I sent you an email 👍

@sharwell
Copy link
Member Author

sharwell commented Aug 1, 2017

After talking with @marcospassos, it appears the negative outcomes from this experiment were likely influenced by uncommon characteristics of the language being parsed. In particular:

  1. Input text is typically short, so error handling strategies which favor error locality to error recovery are favorable
  2. Input is closer to natural language than most, so distinctive tokens that tend to yield good error recovery in many languages (e.g. the braces in Java) don't help for this one

For this case, I suggested creating a type derived from DefaultErrorStrategy using the implementation of sync prior to my change, since it yields better results for this language.

@marcospassos
Copy link
Contributor

Thank you @sharwell!

@parrt
Copy link
Member

parrt commented Aug 1, 2017

Ok, sounds like we have no objections to this improvement.

@parrt parrt merged commit a042180 into antlr:master Aug 1, 2017
@parrt
Copy link
Member

parrt commented Aug 1, 2017

Thanks, @sharwell !!

@sharwell sharwell deleted the propagate-error-sets branch August 1, 2017 23:25
@michaelpj michaelpj mentioned this pull request Nov 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants