Allow paths as tokenVocab option #1311

sebkur · 2016-10-25T10:13:58Z

As reported on the mailing list in 2007 and also in the Gradle forums in the last years, it is currently impossible to define two different grammars in different packages using separate lexer and parser files and import the lexer files using tokenVocab.

Ultimately the problem is that the referenced tokenVocab cannot be resolved when processing a grammar that does not reside in the default package. Using the -lib makes it possible to work around this problem for a single package, but this fails to work for multiple grammars in different packages because -lib can only be specified once.

This patch allows the tokenVocab to specify paths instead of only file names, i.e. it is then possible to write tokenVocab='com/test/html/HTMLLexer';

The ANTLR parser allows quoted strings for the tokenVocab already, and resolving the files just works when specifying a path. However, processing the fileset can still fail because the dependency graph is not built correctly. (Because the dependency graph builder does not identify that HTMLLexer and 'com/test/html/HTMLLexer'; are references to the same thing.

To make the dependency graph builder work, I added some code that removes the quotes and extracts only the filename from the specified tokenVocab.

I have created a testing repository that demonstrates the new behavior of the Tool for two example grammars from the grammars-v4 repository, that have been moved to packages to fit the use case.

parrt · 2016-11-19T23:30:16Z

Thanks @sebkur A minimal fix to a real problem.

marcohu · 2016-11-20T13:03:51Z

tool/src/org/antlr/v4/Tool.java

+				int lastChar = vocabName.charAt(len - 1);
+				if (len >= 2 && firstChar == '\'' && lastChar == '\'') {
+					vocabName = vocabName.substring(1, len-1);
+				}


Removing the quotes could be done with less code:

String vocabName = tokenVocabNode.getText().replaceAll("\\A'|'\\Z", "");

It could be worthwhile to add a helper to misc.Utils. The Maven plugin also needs to strip quotes.

Not sure, what your regex does exactly, but it also strips the ' from 'test or test'. I think making sure that we're dealing with a properly quoted string makes sense, i.e. that there's a quote character at the beginning and the end of the input.

But 'test or test' is invalid input and rejected by ANTLR. So we can expect the string either to be fully quoted or not quoted at all and the regex removes the single quotes as desired.

Allow paths as tokenVocab option

961f087

parrt added this to the 4.6 milestone Nov 19, 2016

parrt added comp:build comp:tool type:improvement labels Nov 19, 2016

parrt merged commit 10ce5d2 into antlr:master Nov 19, 2016

marcohu reviewed Nov 20, 2016

View reviewed changes

marcohu mentioned this pull request Nov 20, 2016

Grammar dependency management for Maven plugin #1353

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow paths as tokenVocab option #1311

Allow paths as tokenVocab option #1311

sebkur commented Oct 25, 2016 •

edited

Loading

parrt commented Nov 19, 2016

marcohu Nov 20, 2016

sebkur Nov 20, 2016 •

edited

Loading

marcohu Nov 20, 2016

sebkur Nov 20, 2016

Allow paths as tokenVocab option #1311

Allow paths as tokenVocab option #1311

Conversation

sebkur commented Oct 25, 2016 • edited Loading

parrt commented Nov 19, 2016

marcohu Nov 20, 2016

Choose a reason for hiding this comment

sebkur Nov 20, 2016 • edited Loading

Choose a reason for hiding this comment

marcohu Nov 20, 2016

Choose a reason for hiding this comment

sebkur Nov 20, 2016

Choose a reason for hiding this comment

sebkur commented Oct 25, 2016 •

edited

Loading

sebkur Nov 20, 2016 •

edited

Loading