-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worse expected token output in ANTLR 4.7 #1922
Comments
This is serious problem for us, since it degrades almost every parse error message. |
|
I'm not sure what's going on here - it looks like either I note that |
I would need to see some additional examples here. The error message in this case is less than optimal, but it's definitely the intended outcome of that change. Since |
💭 It may be possible to remember the ATNConfigSet following the last matched character, and use that position to calculate the expected set in deferred error handling scenarios. In other words, while we don't change the location where the error is detected, we can change the message that gets calculated there. |
Good to know that it's intentional. That makes this more of a feature request, then. Consider the following slightly more realistic grammar:
On input However, if we uninline
then we get the worse There are two reasons this feels undesirable:
Conceptually, I would expect error recovery to operate before leaving |
(The point of the different grammar above is that the problem is indeed not specific to |
As a general observation, the deeper in the tree you are when recovery is attempted, the worse recovery is in practice. On the contrary, the deeper in the tree you are when recovery is attempted, the better error messaging is (both locality and accuracy). The preference to step out of a rule before attempting recovery was originally implemented as part of the work for #529. I still think that was the right move for quality of both the parse tree and recovery, but we could improve upon the messaging that is produced for cases like you've described. The simple improvement is keeping track of ATNConfigSet at or after decisions so they can be used in error reporting. For even better error reporting, I suggested to @RobertvanderHulst that two parsing passes could be used - in the first pass the default error strategy is used, but in the second pass subtrees containing parse errors could be re-parsed using a "bail after first error" strategy combined with a prediction algorithm that attempts to take the first viable alternative until an error appears as the LL(1) symbol. The goal was to provide precise and accurate error messages as well as great recovery behavior, but I haven't yet heard back about the results of that in practice. |
I defer to your expertise on whether it's better to step out of a rule before recovery - I'm spitballing here 😁 However, I do want to reiterate that this is a real problem for us, since for whatever reason a lot of the formations in our grammar have this problem - it seems to hit to pretty much any rule that ends with a Here's another similar case reduced from our grammar that has the same problem:
On the input In this case |
@sharwell We are still working on this. In our case, in our X# language, we see most problems in the statementBlock rule which is declared as What we see is that if an error occurs in an expression rule, then the parser leaves the expression rule, leaves the expression statement rule and synchronizes at the end of the statementBlock. This usually gives very cryptic error messages and many lines of code are not even checked by the parser. So when the user fixes one typo (such as a missing comma or closing parenthesis) then the parser is likely to detect another and another. The antlr reference suggests that the parser should either 'eat' unexpected tokens or add missing tokens and continue, but in the current version Antlr does not seem to do that. We have looked at the method you suggested but we do not have a working solution yet. |
I can confirm that Antlr 4.7 introduced BC breaks. We just updated from 4.6 to 4.7 and 300+ unit tests are now broken. We have hundreds of error cases that were _ "missing token" in 4.6 and become "mismatched input" in 4.7. |
Probably related to #1967 |
🔗 For my own reference, the new behavior here was introduced in #1546. |
Hi all. The fix put in by @sharwell was important if I remember correctly. It looks like Sam is working on fixing the error messages. I wonder if this isn't something to do with the "sync" check before any EBNF construct. I remember we messed with that as well. I haven't looked deeply into this. I'm neck deep teaching a boot camp for the next three weeks. |
My solution requires that a call to |
Port 0803c74 from the Java runtime to Swift. This was issue antlr#1922.
Port 0803c74 from the Java runtime to Swift. This was issue antlr#1922.
Port 0803c74 from the Java runtime to Swift. This was issue antlr#1922. This also changes DefaultErrorStrategy.reportInputMismatch to handle when values are missing.
Port 0803c74 from the Java runtime to Swift. This was issue antlr#1922. This also changes DefaultErrorStrategy.reportInputMismatch to handle when values are missing.
Port 0803c74 from the Java runtime to Swift. This was issue antlr#1922. Enable the corresponding tests for Swift.
Port 0803c74 from the Java runtime to Swift. This was issue antlr#1922. Enable the corresponding tests for Swift.
Using the following grammar
on input
baa
we get
mismatched input 'b' expecting <EOF>
with antlr 4.7 butextraneous input 'b' expecting {<EOF>, 'a'}
with antlr 4.6The output of 4.6 is preferable as 'a' was a valid input at that position.
The recovered parse tree is also worse with 4.6 recovering the later members but 4.7 generating an empty body and error nodes for the later members.
This happens in general when there is a rule that is optionally empty. In this case we can get better results by inlineing
body
intofile
but that significantly increases the size of the grammar in the non reduced example and makes the java code for converting the tree more complex.The text was updated successfully, but these errors were encountered: