Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiling seperate lexer and parser in subdirectory fails #638

Closed
arthurfabre opened this issue Jun 19, 2014 · 21 comments
Closed

Compiling seperate lexer and parser in subdirectory fails #638

arthurfabre opened this issue Jun 19, 2014 · 21 comments

Comments

@arthurfabre
Copy link

If one defines a seperate Parser and Lexer as such:

Lexer.g4:

lexer grammar Lexer;

tokens {INDENT, DEDENT}

INT     : [0-9]+ ;

Parser.g4:

parser grammar Parser;

options { tokenVocab=Lexer; }

main    
        : INT* EOF
        ;

With the following directory layout:

-project
    -src
        -Lexer.g4
        -Parser.g4
    -build
    -antlr4.2.2-complete.jar

Compiling from the project directory with the following command:
java -jar antlr-4.2.2-complete.jar -o build src/*.g4

Fails with: error(3): cannot find tokens file 'build/Lexer.tokens'

Lexer.tokens is correctly generated, but is in build/src/Lexer.tokens as expected.

This works fine if the grammar files are in the current directory when compiling them (ie one runs java -jar ../antlr-4.2.2-complete.jar -o ../build *.g4 or if an extra -lib build/src/ option is used.

It seems this shouldn't be required and antlr should know where to find the .tokens file it generates.

UPDATE:

Here are some additional details about the proposed feature:

For the following, assume this directory layout:

-project
    -src
        -foo
            -lexer.g4
            -metarParser.g4
            -tafParser.g4
        -bar
            -notamParser.g4
    -build

With the following dependency between the grammars:

lexer.g4:
metarParser.g4: lexer.g4
notamParser.g4: lexer.g4
tafParser.g4: notamParser.g4

In all the following cases it is assumed that all of the grammar files are compiled in a single invocation of antlr, and that we want the generated files
to be in the build directory (optionally in sub folders).

1st case: All grammars in working directory.

  • We wish to compile metarParser.g4

  • Our working directory is project/src/foo/

  • Run command: antlr *.g4 -o ../../build/

    Observed behavior:

    • Antlr compiles lexer.g4
    • lexer.g4 has a relative path of ., the generated token file is therefore in build/
    • Antlr adds the ../../build/ directory to it's "lib" search path
    • Antlr compiles meterParser.g4
    • Antlr searches the "lib" directories for the required lexer.tokens
    • Antlr correctly finds the lexer.tokens file in ../../build/

    Desired behavior:

    • Same.

2nd case: All grammars are in a given subdirectory.

  • We wish to compile metarParser.g4

  • Our working directory is project/

  • Run command: antlr src/foo/*.g4 -o build/

    Observed behavior:

    • Antlr compiles lexer.g4
    • lexer.g4 has a relative path of src/foo/, the generated token file is therefore in build/src/foo/
    • Antlr adds the build/ directory to it's "lib" search path
    • Antlr compiles metarParser.g4
    • Antlr searches the "lib" directories for the required lexer.tokens
    • Antlr failes to find lexer.tokens as build/src/foo/ is not it's search path

    Desired behavior:

    • Antlr compiles lexer.g4
    • lexer.g4 has a relative path of src/foo/, the generated token file is therefore in build/src/foo/
    • Antlr adds the build/src/foor directory to it's "lib" search path
    • Antlr compiles metarParser.g4
    • Antlr searches the "lib" directories for the required lexer.tokens
    • Antlr correctly finds the lexer.tokens file in build/src/foo/

3rd case: All grammars are in (potentially) different subdirectories.

  • We wish to compile tafParser.g4

  • Our working directory is project/

  • Run command: antlr src/foo/*.g4 src/bar/tafParser.g4 -o build/

    Observed behavior:

    • Antlr compiles lexer.g4
    • lexer.g4 has a relative path of src/foo/, the generated token file is therefore in build/src/foo/
    • Antlr adds the build/ directory to it's "lib" search path
    • Antlr compiles notamParser.g4
    • Antlr searches the "lib" directories for the required lexer.tokens
    • Antlr failes to find lexer.tokens as build/src/foo/ is not it's search path

    Desired behavior:

    • Antlr compiles lexer.g4
    • lexer.g4 has a relative path of src/foo/, the generated token file is therefore in build/src/foo/
    • Antlr adds the build/src/foo/ directory to it's "lib" search path
    • Antlr compiles notamParser.g4
    • notamParser.g4 has a relative path of src/bar/, the generated token file is therefore in build/src/bar/
    • Antlr adds the build/src/bar/ directory to it's "lib" search path
    • Antlr searches the "lib" directories for the required lexer.token
    • Antlr correctly finds the lexer.tokens file in build/src/foo/
    • Antlr compiles tafParser.g4
    • Antlr searches the "lib" directories for the required notamParser.tokens
    • Antlr correctly finds the notamParser.tokens file in build/src/bar/

For the last two cases, all that is required is to keep track of the actual output directory (ie the directory specified with -o switch or the working directory, with the correct subdirectory) for every file, and to search those.

@parrt
Copy link
Member

parrt commented Jun 19, 2014

Hi. It could go either way. .tokens could be from some totally different grammar, area of the disk. Different runs of the tool could be used to generate stuff too which would not know where another had generated them.

@parrt parrt closed this as completed Jun 19, 2014
@arthurfabre
Copy link
Author

Hi,

I agree that there's no way to necessarily know where to search.
It just seems that when compiling several grammar's ANTLR automatically includes the ouput directory in it's search path. and this could possibly be extended to include the "correct" output directory (ie the full path when grammars aren't in the working directory).

I'd me more than happy to implement this and submit a pull request if you think it could be useful.

@sharwell
Copy link
Member

I recommend your initial post to include a concrete proposal describing the exact desired behavior according to the command line. Make sure to address the following:

  1. Multiple files specified, which are all in the current working directory.
  2. Multiple files specified, which are all in the same directory, but not the current working directory.
  3. Multiple files specified in different directories, e.g. a.g4 b.g4 d1/w.g4 d1/x.g4 d2/y.g4 d2/z.g4.

You can then reopen the issue for consideration.

@parrt parrt reopened this Jun 19, 2014
@arthurfabre
Copy link
Author

I've updated my initial post with some more details :)

@mike-lischke
Copy link
Member

mike-lischke commented Apr 19, 2017

I'd like to increase priority for this problem, so that it finally gets solved. I'm also hit by it and I wonder why this is still open given that a simple

java -ja <path>xxx.jar test/TLexer.g4 test/TParser.g4

call fails already (error: error(160): TParser.g4:4:14: cannot find tokens file ./TLexer.tokens). It also fails when an output path is specified. To me it looks fundamentally broken! A working search strategy could be this:

  1. Search .tokens files where the grammar is, if no output path is specified. This is also the folder(s) where the generated files end up in this case.
  2. If an output path is given then look there.
  3. If file not found try the lib folder (as it is now).

Sounds simple, right? But this only works if ANTLR would stop creating weird paths (e.g. by automatically combining output and grammar subpaths). Make the output path imperative. It's the ultimate target where to look. If a package-like folder structure is required one can easily construct the correct output path before invoking ANTLR.

@parrt
Copy link
Member

parrt commented Apr 19, 2017

what about -lib on that java command? Won't that work?

@mike-lischke
Copy link
Member

Well, yes, it would probably help in this particular case, but if you have your generated files spread over various subfolders in the output folder then a single lib path doesn't improve the situation. Also, wouldn't it be a misuse of the lib setting? I consider that as a fall back for everything that doesn't belong directly to the current grammar (e.g. shared grammars).

After having looked at the way the paths are constructed I feel the need to make this simpler. Automatic subfolder structure creation is so suprising and makes management difficult. I love KISS, if there is an output folder given put the generated files there otherwise where the source files are. This is how most people would expect it, I believe.

@mike-lischke
Copy link
Member

The -lib parameter does not help, since we would need to specify the (possibly not yet existing) output folder there. ANTLR4 will error out if that's the case.

@parrt
Copy link
Member

parrt commented Apr 20, 2017

I'm pretty sure that -o generates files into that specific directory. Are you confusing this with maven plug-in? That thing definitely adds the package. -lib should be where it finds .tokens and such. why won't that work? Have you seen that bug about maven versus antlr tool behavior differing? There was a bunch of explanation there

@davesisson
Copy link
Contributor

davesisson commented Apr 20, 2017 via email

@parrt
Copy link
Member

parrt commented Apr 20, 2017

I examined where tool vs mvn plugin dumps stuff #1593 and antlr/intellij-plugin-v4#293 I wrote "(mvn plugin) uses a custom Tool that alters standard tool behavior to write generated code using dir structure it sees for .g4 files." but I seem to contradict myself on the 2nd link "Ok, confirming mvn plugin shoves stuff in output dir not package subdir." Ugh. all this directory stuff is painful. ok, a test shows the following:

varmint:/tmp/test $ mkdir -p us/parr
varmint:/tmp/test $ vi us/parr/T.g4
varmint:/tmp/test $ tree
.
└── us
    └── parr
        └── T.g4

2 directories, 1 file
varmint:/tmp/test $ a4.7.1 us/parr/T.g4 
varmint:/tmp/test $ tree
.
└── us
    └── parr
        ├── T.g4
        ├── T.tokens
        ├── TBaseListener.java
        ├── TLexer.java
        ├── TLexer.tokens
        ├── TListener.java
        └── TParser.java

2 directories, 7 files
varmint:/tmp/test $ mkdir build
varmint:/tmp/test $ a4.7.1 -o build us/parr/T.g4 
varmint:/tmp/test $ tree
.
├── build
│   └── us
│       └── parr
│           ├── T.tokens
│           ├── TBaseListener.java
│           ├── TLexer.java
│           ├── TLexer.tokens
│           ├── TListener.java
│           └── TParser.java
└── us
    └── parr
        ├── T.g4
        ├── T.tokens
        ├── TBaseListener.java
        ├── TLexer.java
        ├── TLexer.tokens
        ├── TListener.java
        └── TParser.java

5 directories, 13 files

The tool (not mvn plugin) appears to write things based upon the relative path using -o as the root. Jumping into that dir shows

varmint:/tmp/test/us/parr $ a4.7.1 -o /tmp/test/build T.g4 
varmint:/tmp/test/us/parr $ tree /tmp/test
/tmp/test
├── build
│   ├── T.tokens
│   ├── TBaseListener.java
│   ├── TLexer.java
│   ├── TLexer.tokens
│   ├── TListener.java
│   └── TParser.java
└── us
    └── parr
        └── T.g4

And this still puts stuff in build:

$ a4.7.1 -o /tmp/test/build -package us.parr T.g4 

So it's listening to the relative dirs not the package per se.

@parrt
Copy link
Member

parrt commented Apr 20, 2017

Seems like it'd be hard to change all this given the software that likely depends on this behavior.

@davesisson
Copy link
Contributor

davesisson commented Apr 20, 2017 via email

@parrt
Copy link
Member

parrt commented Apr 20, 2017

Ah. Good thought. Seems @mike-lischke has the right idea.

If no -o search for grammars and .tokens in directory where grammar is; if not found, look in -lib.

If -o search for grammars and .tokens in output dir then where grammar is then -lib.

Does that sound right? Then -Xwe-finally-fixed-that-weird-output-issue-with-options 😮 to activate new functionality. Deprecate old functionality with warning it'll be gone sometime soon.

Relative path on grammar files is not relevant to where stuff gets generated. Gen'd files either go in . or output dir specifically.

Actually if they specify -lib that should override so it'd look there before where grammar is (which is the default location).

@mike-lischke
Copy link
Member

mike-lischke commented Apr 20, 2017

I'm fine with a deprecation step here, since the patch changes behavior. However, it only changes behavior for relative grammar paths. Everything else stays as it is (because it works fine). For absolute grammar paths the output dir structure is very flat (no subdirs from the grammar paths), which is what I prefer all the time. If no grammar path is given but an output path, everything ends up (again) in the output folder. If there is no output folder then generated files go to the same folder where the grammars are.

About -lib: the problem here is that ANTLR4 checks if the lib path exists. Now when you use the output path as the lib path too and that path doesn't exist yet, ANTLR4 throws an error and stops. A workaround is to manually create the output dir before invoking the jar.

@parrt
Copy link
Member

parrt commented Oct 21, 2017

@mike-lischke is this something you would like to see inserted for 4.7.1?

@mike-lischke
Copy link
Member

mike-lischke commented Oct 21, 2017

Yes, absolutely. I'm currently using a hand crafted jar with that fix and having this patch in the official jar is certainly way better. The belonging PR is #1905.

Thanks.

@parrt
Copy link
Member

parrt commented Oct 21, 2017

Roger that. I will implement the scheme I have above outlined. I need to tweak the -lib functionality per your comment above right? It should ignore errors from -lib foo if foo doesn't exist right? It seems that you and I differ here If there is no output folder then generated files go to the same folder where the grammars are. I'm trying to refresh my memory, but my comment above seems to think they either go in the current directory or the output directory. The relative path on a grammar file reference has no effect other than to say where to get the grammar. Can you clarify for me?

@martinda
Copy link

I think this will also help to fix gradle/gradle#2565.

@mike-lischke
Copy link
Member

There are actually 2 aspects here:

  1. Search path strategy (and I think we agree here)
  2. Output path strategy, where I changed code so that relative paths no longer dictate the output path. The source path should not have any influence in the output path selection. Also I'd prefer to put generated files close to the grammars (if the -o is not given) to make it easier to find them. This is particularly important for automated generation as part of a larger process (where you might have special needs for the current workdir).

Note: I had to recreate the PR I created to fix this issue, because in the previous one all changes got lost (not the first time, thanks git). Please take a look at that.

parrt added a commit to parrt/antlr4 that referenced this issue Nov 4, 2017
…ct if you use a new option `-Xexact-output-dir`. Fixes antlr#1087, Fixes antlr#753, Fixes antlr#638.
@parrt
Copy link
Member

parrt commented Nov 4, 2017

Closed by #2065

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants