[modules] Seamlessly fuse hand-written and generated code #1716

danieldietrich · 2016-11-27T22:32:06Z

Prerequisites:

Find a simple way to run Scala source code from within Java

Spike:

Re-utilize the existing 'block-parser' of xtext-protexted-regions, i.e. re-implement it based on javaslang in a separate project, code-name: 'javaslang-generator'. (A future version of javaslang-generator might be based on javaslang-parser instead of using a proprietary block-parser.)
In a first step scan src/main/java/** for *.java files and use the block-parser to parse regions. The result will be [non-generated, region-start, generated, region-end, non-generated, region-start, generated, region-end, non-generated, ...]. The regions need to have internal, intermediate unique ids (e.g. a sequential number).
Copy the relevant parts of region-start sections to a Scala file. These regions need to be wrapped in a block containing the unique id mentioned above (because we need to substitute the generated regions of the original file in a later step). Also include the base-generator into that scala file (currently our Generator.scala). The include may be a native Scala include or a copy-paste include, depending on how the generator is run (as Scala script or as compiled Scala file).
Run the dynamically built Scala generator (as script or as compiled Scala file)
Parse the output of the Scala generator with our block-parser. Write the previously parsed contents [non-generated, region-start, generated, region-end, non-generated, region-start, generated, region-end, non-generated, ...] back to the original file and substitute the generated parts on the way.

Next steps:

Document the code generator and make it public available by announcement
Add support for locales / character encodings
Make source folders configurable
Create a Maven plugin
Create a Gradle plugin

We already commit generated code, it is located in src-gen/. It would be much 'nicer' if the code completely resides in src/ and specific 'regions' are filled-in by the code generator. For those who are already familiar with code generator techniques, these could be called reverse-protected-regions.

Normally protected regions contain code, labeled with a unique id (within a comment), that should not be overwritten on subsequent generator calls. Each generator call overwrites all files but restores the protected regions.

We should reverse that and keep all files but replace the content of generated code blocks.

Benefit: Keeping track of the imports would be simple because we would only need to organize the imports. Currently we use an import manager to keep track of imports while generating code.

The Generated Code

Example:

// file Tuple.java
interface Tuple {

    // region[tuple-of] {
    static <T1> Tuple1<T1> of(T1 t1) {
        return new Tuple1<>(t1);
    }
    static <T1, T2> Tuple2<T1, T2> of(T1 t1, T2 t2) {
        return new Tuple2<>(t1, t2);
    }
    static <T1, T2, T3> Tuple3<T1, T2, T3> of(T1 t1, T2 t2, T3 t3) {
        return new Tuple3<>(t1, t2, t3);
    }
    // }

}

Region markers:

I think JSON-like start/end markers do not fit becaue the end-marker// } can clash with other comments:

// region[tuple-of] {

// generated code

// }

XML-like markers would fit but XML is considered 'uncool' and 'noisy'. Also we don't benefit from a schema or type information:

// <region id="tuple-of">

// generated code

// </region>

I like the idea of using easy to spot and hard to confuse proprietary generated region tags:

// GENERATED tuple-of >>>

// generated code

// <<< GENERATED

The Code Generator

Read source files
Replace GENERATED regions with generated code (matched by unique id)
(Over)write source files

We could use our existing code generator to generate code and write an addition that reads/substitutes/writes the source files.

But another idea is, to place the code generator directly to the peace of code that will be generated:

// file Tuple.java
interface Tuple {

    /* GENERATED >>>
        (1 to 3).gen(i => xs"""
          static <${gen("T", i)}> Tuple$i<${gen("T", i)}>(${gen("T", i)}) {
              return new Tuple$i(${gen("t", i)});
          }
        """)("\n\n")
    */
    static <T1> Tuple1<T1> of(T1 t1) {
        return new Tuple1<>(t1);
    }
    static <T1, T2> Tuple2<T1, T2> of(T1 t1, T2 t2) {
        return new Tuple2<>(t1, t2);
    }
    static <T1, T2, T3> Tuple3<T1, T2, T3> of(T1 t1, T2 t2, T3 t3) {
        return new Tuple3<>(t1, t2, t3);
    }
    // <<< GENERATED

}

The code generator would have to perform the following steps:

Read source files
Read code generator sections from comments and build a runtime-code generator by copying these regions into a code generator file that uses standard code generation utility imports
Replace GENERATED regions with generated code
(Over)write source files

Pros:

No need for unique ids any more

Cons:

No IDE help when developing the code generators for partial code sections

The text was updated successfully, but these errors were encountered:

emmanueltouzery · 2016-11-28T05:38:07Z

But if we modify the source files, it'll require annoying discipline not to commit processed files (when we must commit the original, with the generation code).

The workflow would be what then, generate to work, then git reset these files before commiting? Even with a script, it seems very kludgy and unintuitive.

Also what do we ship in -src.jar? Presumably processed source, or it will confuse IDEs who know how to download sources.

Notice that this applies to both proposals. Seems to me like we really need a distinction like src and src-gen?

emmanueltouzery · 2016-11-28T06:50:24Z

Hmm no i was wrong. We always commit with generated code, but keeping the markers. If the generator code is inline we don't remove it, but keep it, commented out.

I think keeping the generator code out is better. We can get syntax highlighting, and possibly in rhe future, we can avoid regenerating by comparing timestamps of the partially generated file and the generator source.

danieldietrich · 2016-11-28T12:25:11Z

Yes, committing the generated code has benefits. We can see the effects by looking at the 'unfolded' code. This makes sense when performing code reviews for PRs. There are no surprises.

Additionally we need to ensure that generated code isn't changed directly. We do this by integrating the code-unfolding of our source files into the build process. I.e. manual changes are overwritten.

Ruslan wrote a script that automatically runs on CI build, it checks whether generated code was manually changed. If so, the build breaks.

There are still details to solve, e.g. declaring java comments within java comments etc. But first I will create a technical spike that solves our problem.

danieldietrich · 2016-11-28T12:58:34Z

Once I wrote xtext-protected-regionst, a protected region support for Xtext code generators.

Some of that code can be re-used, e.g. the heuristic to parse protected regions from a source file etc.

I already have more 'features' in mind:

By default generated code is a block with a start tag GENERATED >>> and an end tag <<< GENERATED (the white-space is optional)
The start and end tags are denoted within a java comment:
- single line comment //
- multiline comment /*, */
- javadoc /**, */
We will support code generation within the same line. It is determined by the existence of a newline after the start-tag resp. before the end-tag.

// ok
package test;
class Test {
    /*GENERATED>>>s"Hello ${fqn}!"*/Hello test.Test!/*<<<GENERATED*/
}

// ok!
package test;
class Test {
    /*GENERATED>>>s"Hello ${fqn}!"*/
    Hello test.Test!
    /*<<<GENERATED*/
}

The start tag GENERATED [unique-id] >>> may contain a unique id that connects this block with a generator function located outside of the file (arguments for that function have to be defined, i.e. file name etc.)
The start tag GENERATED >>> scala code may contain the code generator in form of scala code. We may call functions located somewhere outside in a base code generator.
It is possible to add comments to GENERATED regions (, e.g. to describe why we generated code). For that reason we put generator code inside a <code> tag. This enables syntax highlighting when viewed in a broweser (e.g. using prismjs).

interface Tuple {

    /* GENERATED >>>
    text text text text text text text text text text text text text text text
    text text text text text text text text text text text text text text text
    text text text text text text text text text text text text text text text
    ------------------------------------------------<pre><code language="scala">
        (1 to 3).gen(i => xs"""
          static <${gen("T", i)}> Tuple$i<${gen("T", i)}>(${gen("T", i)}) {
              return new Tuple$i(${gen("t", i)});
          }
        """)("\n\n")
    </code></pre>---------------------------------------------------------------
    text text text text text text text text text text text text text text text
    text text text text text text text text text text text text text text text
    text text text text text text text text text text text text text text text
    */
    static <T1> Tuple1<T1> of(T1 t1) {
        return new Tuple1<>(t1);
    }
    
    static <T1, T2> Tuple2<T1, T2> of(T1 t1, T2 t2) {
        return new Tuple2<>(t1, t2);
    }
    
    static <T1, T2, T3> Tuple3<T1, T2, T3> of(T1 t1, T2 t2, T3 t3) {
        return new Tuple3<>(t1, t2, t3);
    }
    // <<< GENERATED

}

Please note that <pre><code> content within a javadoc comment may cause doclint to produce errors/warnings, e.g. when html entities like <, > are used. Therefore we suggest to use singleline und multiline comments.

danieldietrich · 2016-12-06T21:27:51Z

This might also solve maintenance issues, namely

repeating code that has to be maintained on each collection type separately, for example static methods like narrow(). This could be simplified by having a unique scala generator method that is referenced and parameterized from within the new inline code generator regions:

interface Map {

    // GENERATED >>> genStaticTraversableMethods("Map", isMap = true)

    // Empty before first call of generator ;-) This comment will be overwritten!

    // <<< GENERATED
}

And the underlying code generator contains the method referenced above:

genStaticTraversableMethods(name: String, isMap: Boolean = false): String = {

    val type = im.getType("javaslang.collection." + name)
    val generics = if (isMap) "<K, V>" else "<T>"
    val widenedGenerics = if (isMap) "<? extends K, ? extends V>" else "<T>"

    xs"""
        @SuppressWarnings("unchecked")
        public static $generics $type$generics($type$widenedGenerics it) {
            return ($type$generics) it;
        }
    """

}

Note: In this example only narrow() is generated but the same method may also generate n other methods like flatten(), ...

There may be generator methods, depending on where to start within the type hierarchy:

genSeqMethods
genSetMethods
genMapMethods

The generator is able to contain the logic on its own when to generate abstract methods, overridden abstract methods and overridden implementations. This will greatly simplify the maintained code because we pull the logic to a single place!

Update:

This will also be the key to solve #1326 (removing abstract types like AbstractMap, AbstractMultimap etc.). The code will simply be generated by shared generator methods instead of using (internal) inheritance.

Maybe it will also be the key to keep types (like Future/Try and Map/Multimap) in sync? (see #1551, #1532)

danieldietrich · 2018-08-07T01:44:11Z

Out-of-scope. To keep efforts minimal, we will stick with the existing Scala code generator.

If we need to check conformity of methods across distinct Types (like Option and Try when removing Value), we could implement unit tests that check method existence using reflection.

danieldietrich added build/infrastructure/ide «non-public» labels Nov 27, 2016

danieldietrich added this to the 2.1.0 milestone Nov 27, 2016

danieldietrich changed the title ~~Partial code generation~~ Seamlessly fuse hand-written and generated code Nov 27, 2016

danieldietrich mentioned this issue Nov 30, 2016

applicative lifting for Option, Try, Future... #1683

Closed

danieldietrich mentioned this issue Dec 12, 2016

Fix backward compatibilty issue regarding *Map.of(Object...) in 2.1.0-alpha #1757

Closed

danieldietrich mentioned this issue Jan 1, 2017

Implement Traversable.slideBy(classifier) #1775

Closed

danieldietrich modified the milestones: 3.0.0, 2.1.0 Mar 4, 2017

danieldietrich mentioned this issue Jul 6, 2017

Add unit test that ensures Map and Multimap keep in sync #1542

Closed

danieldietrich mentioned this issue Sep 13, 2017

Annotate generated classes with @Generated #2081

Closed

This was referenced Nov 5, 2017

Annotate generated classes with @Generated #2151

Merged

JDK9 does not (directly) support @Generated anymore, we should remove it again. #2154

Closed

danieldietrich changed the title ~~Seamlessly fuse hand-written and generated code~~ [modules] Seamlessly fuse hand-written and generated code Nov 26, 2017

danieldietrich closed this as completed Aug 7, 2018

danieldietrich added the reverted/closed/duplicate label Aug 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[modules] Seamlessly fuse hand-written and generated code #1716

[modules] Seamlessly fuse hand-written and generated code #1716

danieldietrich commented Nov 27, 2016 •

edited

Loading

emmanueltouzery commented Nov 28, 2016

emmanueltouzery commented Nov 28, 2016

danieldietrich commented Nov 28, 2016

danieldietrich commented Nov 28, 2016 •

edited

Loading

danieldietrich commented Dec 6, 2016 •

edited

Loading

danieldietrich commented Aug 7, 2018

[modules] Seamlessly fuse hand-written and generated code #1716

[modules] Seamlessly fuse hand-written and generated code #1716

Comments

danieldietrich commented Nov 27, 2016 • edited Loading

The Generated Code

The Code Generator

emmanueltouzery commented Nov 28, 2016

emmanueltouzery commented Nov 28, 2016

danieldietrich commented Nov 28, 2016

danieldietrich commented Nov 28, 2016 • edited Loading

danieldietrich commented Dec 6, 2016 • edited Loading

danieldietrich commented Aug 7, 2018

danieldietrich commented Nov 27, 2016 •

edited

Loading

danieldietrich commented Nov 28, 2016 •

edited

Loading

danieldietrich commented Dec 6, 2016 •

edited

Loading