Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow paths as tokenVocab option #1311

Merged
merged 1 commit into from
Nov 19, 2016

Conversation

sebkur
Copy link
Contributor

@sebkur sebkur commented Oct 25, 2016

As reported on the mailing list in 2007 and also in the Gradle forums in the last years, it is currently impossible to define two different grammars in different packages using separate lexer and parser files and import the lexer files using tokenVocab.

Ultimately the problem is that the referenced tokenVocab cannot be resolved when processing a grammar that does not reside in the default package. Using the -lib makes it possible to work around this problem for a single package, but this fails to work for multiple grammars in different packages because -lib can only be specified once.

This patch allows the tokenVocab to specify paths instead of only file names, i.e. it is then possible to write tokenVocab='com/test/html/HTMLLexer';

The ANTLR parser allows quoted strings for the tokenVocab already, and resolving the files just works when specifying a path. However, processing the fileset can still fail because the dependency graph is not built correctly. (Because the dependency graph builder does not identify that HTMLLexer and 'com/test/html/HTMLLexer'; are references to the same thing.

To make the dependency graph builder work, I added some code that removes the quotes and extracts only the filename from the specified tokenVocab.

I have created a testing repository that demonstrates the new behavior of the Tool for two example grammars from the grammars-v4 repository, that have been moved to packages to fit the use case.

@parrt
Copy link
Member

parrt commented Nov 19, 2016

Thanks @sebkur A minimal fix to a real problem.

@parrt parrt merged commit 10ce5d2 into antlr:master Nov 19, 2016
int lastChar = vocabName.charAt(len - 1);
if (len >= 2 && firstChar == '\'' && lastChar == '\'') {
vocabName = vocabName.substring(1, len-1);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the quotes could be done with less code:

String vocabName = tokenVocabNode.getText().replaceAll("\\A'|'\\Z", "");

It could be worthwhile to add a helper to misc.Utils. The Maven plugin also needs to strip quotes.

Copy link
Contributor Author

@sebkur sebkur Nov 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, what your regex does exactly, but it also strips the ' from 'test or test'. I think making sure that we're dealing with a properly quoted string makes sense, i.e. that there's a quote character at the beginning and the end of the input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But 'test or test' is invalid input and rejected by ANTLR. So we can expect the string either to be fully quoted or not quoted at all and the regex removes the single quotes as desired.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants