Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect childs generation with array of element label #1163

Closed
KvanTTT opened this issue Apr 7, 2016 · 11 comments
Closed

Incorrect childs generation with array of element label #1163

KvanTTT opened this issue Apr 7, 2016 · 11 comments

Comments

@KvanTTT
Copy link
Member

KvanTTT commented Apr 7, 2016

I have the following rule in grammar:

multiply_expression
    : expression (op=('+' | '-') expression)*
    ;

But after parser generation I got the following childs in Multiply_expressionContext (C# runtime):

public partial class Multiply_expressionContext : ParserRuleContext {
    public IToken op;   // Incorrect - should be array of IToken.
    ...
    public ExpressionContext[] expression() {    // Correct - array of Expression.
        return GetRuleContexts<Expression>();
    }
}
@beardlybread
Copy link
Contributor

Is multiply_expression defined the same way as additive_expression in the grammar?

@KvanTTT
Copy link
Member Author

KvanTTT commented Apr 8, 2016

Sorry, it's my fault. I replaced additive_expression with multiply_expression.

@beardlybread
Copy link
Contributor

I fed this to the Java runtime, and I get analogous results:

grammar Agrammar;
additiveExpression : expression (op=OP expr=expression)* ;
expression : Number ;
Number : [0-9]+ ;
OP : '+' | '-' ;

Looking at the code, though, I think this is the way it's intended to work. Both symbols (op and expr) are only referenced inside what appears to be the loop that is giving the behavior of the Kleene star:

    public final AdditiveExpressionContext additiveExpression() throws RecognitionException {
        ...
        try {
            enterOuterAlt(_localctx, 1);
            {
            setState(4);
            expression();  // match the first expression
            setState(9);
            _errHandler.sync(this);
            _la = _input.LA(1);  // look ahead (I'm guessing) for an operation
            while (_la==OP) {
                {
                {
                setState(5);
                ((AdditiveExpressionContext)_localctx).op = match(OP); // grab op
                setState(6);
                ((AdditiveExpressionContext)_localctx).expr = expression();  // grab expr
                }
                }
                setState(11);
                _errHandler.sync(this);
                _la = _input.LA(1);  // fall out of loop if we don't see another OP
            }
            }
        }

so in a sense they're kind of behaving like thing in

for (Object thing: things) {
...
}

I'll play with this some more and see, but that's what my gut is telling me.

EDIT:

Yeah, the variables are being masked from use in the listeners because there aren't any callbacks exposed for the * inside the rule. Modifying Antlr to make it work how you want would probably be a bit tricky, though.

EDIT AGAIN:

If you do

additiveExpression : expression (opExpression)* ;
opExpression : op=OP expr=expression ;
...

you get the behavior and the listeners.

@KvanTTT
Copy link
Member Author

KvanTTT commented Apr 9, 2016

I know about this workaround. But it's yet another rule. And this what I would like to avoid.

@ericvergnaud
Copy link
Contributor

I believe it is indeed the way it's supposed to work.
If you think about it, OP being a token, it is an invariant from the parser's point of view, so there's no need to distinguish between '+' and '-' since that is what was written in the lexer (the 2 are equivalent).
If you need to distinguish between them, then you might want to try using 2 different tokens, or a grammar rule.
(I haven't tried it, so I might be wrong)

@KvanTTT
Copy link
Member Author

KvanTTT commented Apr 9, 2016

There are cases in which tokens should be distinguished. In this case I want to use the value of operator in Visitor:

for (int i = 1; i < context.expression().Length; i++)
{
    string opValue = context.op(i - 1).GetText();
}

Without op label it would be tricky.
Anyway such behaviour should be considered as bug.

@ericvergnaud
Copy link
Contributor

Have you tried the proposed alternatives?

@beardlybread
Copy link
Contributor

I did one more thing and tried to assign a label to the star expression directly:

grammar Agrammar;

additiveExpression : expression star=(opExpression)* ;
opExpression : op=OP expr=expression ;
expression : Number ;
Number : [0-9]+ ;
OP : '+' | '-' ;

which gives the error label star assigned to a block which is not a set. This seems to imply that SetAST and BlockAST would need to be semantically identical from the perspective of code generation. Even if we grant that this should be considered a bug, it looks like it would require the code to fundamentally change, so I doubt it will happen. Maybe it could be considered as a major feature addition?

@ericvergnaud
Copy link
Contributor

In the above grammar, if you visit opExpression you'll find that op contains the actual token text, so there is neither a bug or a need for a feature

@beardlybread
Copy link
Contributor

@ericvergnaud Just to clarify, I was speaking hypothetically. I wasn't actually suggesting a fundamental restructuring of the Antlr AST classes would be a good "feature" to add. 😄

@parrt
Copy link
Member

parrt commented Dec 10, 2016

intended behavior. += or repeated references (not parses) give arrays.

@parrt parrt closed this as completed Dec 10, 2016
@parrt parrt added this to the 4.6 milestone Dec 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants