Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[modules] Seamlessly fuse hand-written and generated code #1716

Closed
11 tasks
danieldietrich opened this issue Nov 27, 2016 · 6 comments
Closed
11 tasks

[modules] Seamlessly fuse hand-written and generated code #1716

danieldietrich opened this issue Nov 27, 2016 · 6 comments

Comments

@danieldietrich
Copy link
Contributor

danieldietrich commented Nov 27, 2016

Prerequisites:

Spike:

  • Re-utilize the existing 'block-parser' of xtext-protexted-regions, i.e. re-implement it based on javaslang in a separate project, code-name: 'javaslang-generator'. (A future version of javaslang-generator might be based on javaslang-parser instead of using a proprietary block-parser.)
  • In a first step scan src/main/java/** for *.java files and use the block-parser to parse regions. The result will be [non-generated, region-start, generated, region-end, non-generated, region-start, generated, region-end, non-generated, ...]. The regions need to have internal, intermediate unique ids (e.g. a sequential number).
  • Copy the relevant parts of region-start sections to a Scala file. These regions need to be wrapped in a block containing the unique id mentioned above (because we need to substitute the generated regions of the original file in a later step). Also include the base-generator into that scala file (currently our Generator.scala). The include may be a native Scala include or a copy-paste include, depending on how the generator is run (as Scala script or as compiled Scala file).
  • Run the dynamically built Scala generator (as script or as compiled Scala file)
  • Parse the output of the Scala generator with our block-parser. Write the previously parsed contents [non-generated, region-start, generated, region-end, non-generated, region-start, generated, region-end, non-generated, ...] back to the original file and substitute the generated parts on the way.

Next steps:

  • Document the code generator and make it public available by announcement
  • Add support for locales / character encodings
  • Make source folders configurable
  • Create a Maven plugin
  • Create a Gradle plugin

We already commit generated code, it is located in src-gen/. It would be much 'nicer' if the code completely resides in src/ and specific 'regions' are filled-in by the code generator. For those who are already familiar with code generator techniques, these could be called reverse-protected-regions.

Normally protected regions contain code, labeled with a unique id (within a comment), that should not be overwritten on subsequent generator calls. Each generator call overwrites all files but restores the protected regions.

We should reverse that and keep all files but replace the content of generated code blocks.

Benefit: Keeping track of the imports would be simple because we would only need to organize the imports. Currently we use an import manager to keep track of imports while generating code.


The Generated Code

Example:

// file Tuple.java
interface Tuple {

    // region[tuple-of] {
    static <T1> Tuple1<T1> of(T1 t1) {
        return new Tuple1<>(t1);
    }
    static <T1, T2> Tuple2<T1, T2> of(T1 t1, T2 t2) {
        return new Tuple2<>(t1, t2);
    }
    static <T1, T2, T3> Tuple3<T1, T2, T3> of(T1 t1, T2 t2, T3 t3) {
        return new Tuple3<>(t1, t2, t3);
    }
    // }

}

Region markers:

  • I think JSON-like start/end markers do not fit becaue the end-marker// } can clash with other comments:
// region[tuple-of] {

// generated code

// }
  • XML-like markers would fit but XML is considered 'uncool' and 'noisy'. Also we don't benefit from a schema or type information:
// <region id="tuple-of">

// generated code

// </region>
  • I like the idea of using easy to spot and hard to confuse proprietary generated region tags:
// GENERATED tuple-of >>>

// generated code

// <<< GENERATED

The Code Generator

  1. Read source files
  2. Replace GENERATED regions with generated code (matched by unique id)
  3. (Over)write source files

We could use our existing code generator to generate code and write an addition that reads/substitutes/writes the source files.

But another idea is, to place the code generator directly to the peace of code that will be generated:

// file Tuple.java
interface Tuple {

    /* GENERATED >>>
        (1 to 3).gen(i => xs"""
          static <${gen("T", i)}> Tuple$i<${gen("T", i)}>(${gen("T", i)}) {
              return new Tuple$i(${gen("t", i)});
          }
        """)("\n\n")
    */
    static <T1> Tuple1<T1> of(T1 t1) {
        return new Tuple1<>(t1);
    }
    static <T1, T2> Tuple2<T1, T2> of(T1 t1, T2 t2) {
        return new Tuple2<>(t1, t2);
    }
    static <T1, T2, T3> Tuple3<T1, T2, T3> of(T1 t1, T2 t2, T3 t3) {
        return new Tuple3<>(t1, t2, t3);
    }
    // <<< GENERATED

}

The code generator would have to perform the following steps:

  1. Read source files
  2. Read code generator sections from comments and build a runtime-code generator by copying these regions into a code generator file that uses standard code generation utility imports
  3. Replace GENERATED regions with generated code
  4. (Over)write source files

Pros:

  • No need for unique ids any more

Cons:

  • No IDE help when developing the code generators for partial code sections
@danieldietrich danieldietrich added this to the 2.1.0 milestone Nov 27, 2016
@danieldietrich danieldietrich changed the title Partial code generation Seamlessly fuse hand-written and generated code Nov 27, 2016
@emmanueltouzery
Copy link
Contributor

But if we modify the source files, it'll require annoying discipline not to commit processed files (when we must commit the original, with the generation code).

The workflow would be what then, generate to work, then git reset these files before commiting? Even with a script, it seems very kludgy and unintuitive.

Also what do we ship in -src.jar? Presumably processed source, or it will confuse IDEs who know how to download sources.

Notice that this applies to both proposals. Seems to me like we really need a distinction like src and src-gen?

@emmanueltouzery
Copy link
Contributor

Hmm no i was wrong. We always commit with generated code, but keeping the markers. If the generator code is inline we don't remove it, but keep it, commented out.

I think keeping the generator code out is better. We can get syntax highlighting, and possibly in rhe future, we can avoid regenerating by comparing timestamps of the partially generated file and the generator source.

@danieldietrich
Copy link
Contributor Author

Yes, committing the generated code has benefits. We can see the effects by looking at the 'unfolded' code. This makes sense when performing code reviews for PRs. There are no surprises.

Additionally we need to ensure that generated code isn't changed directly. We do this by integrating the code-unfolding of our source files into the build process. I.e. manual changes are overwritten.

Ruslan wrote a script that automatically runs on CI build, it checks whether generated code was manually changed. If so, the build breaks.


There are still details to solve, e.g. declaring java comments within java comments etc. But first I will create a technical spike that solves our problem.

@danieldietrich
Copy link
Contributor Author

danieldietrich commented Nov 28, 2016

Once I wrote xtext-protected-regionst, a protected region support for Xtext code generators.

Some of that code can be re-used, e.g. the heuristic to parse protected regions from a source file etc.


I already have more 'features' in mind:

  • By default generated code is a block with a start tag GENERATED >>> and an end tag <<< GENERATED (the white-space is optional)
  • The start and end tags are denoted within a java comment:
    • single line comment //
    • multiline comment /*, */
    • javadoc /**, */
  • We will support code generation within the same line. It is determined by the existence of a newline after the start-tag resp. before the end-tag.
// ok
package test;
class Test {
    /*GENERATED>>>s"Hello ${fqn}!"*/Hello test.Test!/*<<<GENERATED*/
}

// ok!
package test;
class Test {
    /*GENERATED>>>s"Hello ${fqn}!"*/
    Hello test.Test!
    /*<<<GENERATED*/
}
  • The start tag GENERATED [unique-id] >>> may contain a unique id that connects this block with a generator function located outside of the file (arguments for that function have to be defined, i.e. file name etc.)
  • The start tag GENERATED >>> scala code may contain the code generator in form of scala code. We may call functions located somewhere outside in a base code generator.
  • It is possible to add comments to GENERATED regions (, e.g. to describe why we generated code). For that reason we put generator code inside a <code> tag. This enables syntax highlighting when viewed in a broweser (e.g. using prismjs).
interface Tuple {

    /* GENERATED >>>
    text text text text text text text text text text text text text text text
    text text text text text text text text text text text text text text text
    text text text text text text text text text text text text text text text
    ------------------------------------------------<pre><code language="scala">
        (1 to 3).gen(i => xs"""
          static <${gen("T", i)}> Tuple$i<${gen("T", i)}>(${gen("T", i)}) {
              return new Tuple$i(${gen("t", i)});
          }
        """)("\n\n")
    </code></pre>---------------------------------------------------------------
    text text text text text text text text text text text text text text text
    text text text text text text text text text text text text text text text
    text text text text text text text text text text text text text text text
    */
    static <T1> Tuple1<T1> of(T1 t1) {
        return new Tuple1<>(t1);
    }
    
    static <T1, T2> Tuple2<T1, T2> of(T1 t1, T2 t2) {
        return new Tuple2<>(t1, t2);
    }
    
    static <T1, T2, T3> Tuple3<T1, T2, T3> of(T1 t1, T2 t2, T3 t3) {
        return new Tuple3<>(t1, t2, t3);
    }
    // <<< GENERATED

}

Please note that <pre><code> content within a javadoc comment may cause doclint to produce errors/warnings, e.g. when html entities like <, > are used. Therefore we suggest to use singleline und multiline comments.

@danieldietrich
Copy link
Contributor Author

danieldietrich commented Dec 6, 2016

This might also solve maintenance issues, namely

  1. repeating code that has to be maintained on each collection type separately, for example static methods like narrow(). This could be simplified by having a unique scala generator method that is referenced and parameterized from within the new inline code generator regions:
interface Map {

    // GENERATED >>> genStaticTraversableMethods("Map", isMap = true)

    // Empty before first call of generator ;-) This comment will be overwritten!

    // <<< GENERATED
}

And the underlying code generator contains the method referenced above:

genStaticTraversableMethods(name: String, isMap: Boolean = false): String = {

    val type = im.getType("javaslang.collection." + name)
    val generics = if (isMap) "<K, V>" else "<T>"
    val widenedGenerics = if (isMap) "<? extends K, ? extends V>" else "<T>"

    xs"""
        @SuppressWarnings("unchecked")
        public static $generics $type$generics($type$widenedGenerics it) {
            return ($type$generics) it;
        }
    """

}

Note: In this example only narrow() is generated but the same method may also generate n other methods like flatten(), ...

There may be generator methods, depending on where to start within the type hierarchy:

  • genSeqMethods
  • genSetMethods
  • genMapMethods

The generator is able to contain the logic on its own when to generate abstract methods, overridden abstract methods and overridden implementations. This will greatly simplify the maintained code because we pull the logic to a single place!

Update:

This will also be the key to solve #1326 (removing abstract types like AbstractMap, AbstractMultimap etc.). The code will simply be generated by shared generator methods instead of using (internal) inheritance.

Maybe it will also be the key to keep types (like Future/Try and Map/Multimap) in sync? (see #1551, #1532)

@danieldietrich
Copy link
Contributor Author

Out-of-scope. To keep efforts minimal, we will stick with the existing Scala code generator.

If we need to check conformity of methods across distinct Types (like Option and Try when removing Value), we could implement unit tests that check method existence using reflection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants