Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial support of Project CRaC #23950

Merged
merged 1 commit into from
May 13, 2022
Merged

Conversation

AntonKozlov
Copy link
Contributor

@AntonKozlov AntonKozlov commented Feb 24, 2022

There is a project CRaC in OpenJDK: https://openjdk.java.net/projects/crac/. A bit more verbose description is at https://github.com/crac/docs#readme. The idea of the project is to allow creating image (checkpoint) of an application and JVM state, and then start new instances from the image to reduce start-up and warm-up time. Applications are allowed to manage their state on the checkpoint and restore. Apps are also obligated to take care of external resources such as file handles and network connections. Such care naturally resides in frameworks, so from the very start, we pay special attention to popular ones, including Quarkus.

This change provides initial CRaC support to Quarkus, making an unmodified app https://github.com/CRaC/example-quarkus/tree/dev-quarkus to start on CRaC JDK in a matter of tenth of milliseconds. To avoid a hard dependency on the new features provided by CRaC, applications are suggested to use org-crac package https://github.com/CRaC/org.crac. So the new code should not harm the compatibility with existing runtimes.

However, I'm totally lost about how to document and provide tests. Could someone point or describe how it would need to look like? Native image mode seems to require passing all of the tests. Would a similar way of testing be sufficient for CRaC, in theory?

@quarkus-bot quarkus-bot bot added area/core area/dependencies Pull requests that update a dependency file area/devtools Issues/PR related to maven, gradle, platform and cli tooling/plugins area/vertx labels Feb 24, 2022
@Sanne
Copy link
Member

Sanne commented Feb 25, 2022

Very interesting, thanks for the contribution!
Looking forward for thoughts from @stuartwdouglas , @n1hility & @dmlloyd

@@ -21,7 +23,7 @@
* The implementation also contains optimizations that allow the ClassLoader to keep a minimum number of jars open
* while also preventing the lookup of the entire classpath for missing resources in known directories (like META-INF/services).
*/
public final class RunnerClassLoader extends ClassLoader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must it implement the interface - it is not sufficient just having methods on the class ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, an object should implement the interface to be able to register. If this is a problem (RunnerClassLoader implementing Resource interface), I think it should be possible to create an inner class implementing the interface and keep the RunnerClassLoader signature unchanged. Should I change the code this way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah; I think the less exposure we have of crac api the better as it (as I see it) just a internal implementation detail.

Ultimately if we could find a way to have this as an extension so it only "cost" when crac is possible to enable.

Also...does this build and run in native ? (Ie. I know crac won't be enabled but would need it to not prevent use of native - at least if available in core)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't worry about crac building in native, we can hint it to be seen as dead code during analysis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'm changing this piece.

I've tried building my quarkus-example in native mode without any issue. This surprised me a bit :) (org-crac uses reflection to access API), but it went smoothly. Also, I'm relying on GHA to run test in native as well.

@AntonKozlov
Copy link
Contributor Author

I just spotted and fixed the link to the branch of example-quarkus repo, now it should be correct. Looking forward to the feedback!

@stuartwdouglas
Copy link
Member

I can't help thinking that we should probably add a SPI for this, rather than adding CRaC into the core. Basically a build item that can be used to provide pre-checkpoint/post-restore tasks. This would allow CraC support to be provided as part of Quarkiverse.

My initial thoughts around this when we were talking about CRIU were that it would be nice if Quarkus could provide the checkpoint as part of its build process, however the warm up step from your example where you apply load to warm up the JVM kinda complicates this, as this would be application dependent.

@cescoffier
Copy link
Member

I agree with Stuart. I would not put that in core (while being pretty cool).

@AntonKozlov
Copy link
Contributor Author

I can't help thinking that we should probably add a SPI for this, rather than adding CRaC into the core. Basically a build item that can be used to provide pre-checkpoint/post-restore tasks. This would allow CraC support to be provided as part of Quarkiverse.

Interesting, is there a good example of a similar approach implemented? I've tried to "fork" necessary components for Quarkiverse, but failed. Having an SPI, that is, some awareness in the core that can be extended by Quarkiverse should be a simpler way.

My initial thoughts around this when we were talking about CRIU were that it would be nice if Quarkus could provide the checkpoint as part of its build process, however the warm up step from your example where you apply load to warm up the JVM kinda complicates this, as this would be application dependent.

Yes, CRaC is intended to run on the application as a whole, to be able to capture complete initialized state (and JIT compilations, in the current implementation). Although warm-up phase is optional, it is one of the reasons apps may opt-in for CRaC. The CRaC image is somewhat similar to the native image -- a binary used for deployment, probably as a part of a container image.

@n1hility
Copy link
Member

n1hility commented Mar 1, 2022

+1 and Thank you @AntonKozlov !

Copy link
Member

@cescoffier cescoffier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said in my previous comment, it looks great, but having a dependency on Crac from the core may not be the best approach.
We need to implement a way to extract crac support in its own extension. It looks like @stuartwdouglas has some ideas to enable this.

stop(p);
CountDownLatch latch = new CountDownLatch(1);
p.future().onComplete(event -> latch.countDown());
latch.await();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that method is called on the event loop, you cannot await.
From where is this method called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the checkpoint is requested by the jcmd (as in example-quarkus), a separate thread is created to call j.c.C.checkpointRestore, which calls all beforeCheckpoint's (these methods are called in a particular order, sequentially). More info about this is https://crac.github.io/openjdk-builds/javadoc/api/java.base/jdk/crac/package-summary.html. Image is created only when all methods finish successfully. So the wait is here to ensure the server is stopped.

@@ -34,6 +34,7 @@

import javax.enterprise.event.Event;

import org.crac.Resource;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want a dependency on crac.

@gastaldi
Copy link
Contributor

gastaldi commented Apr 6, 2022

Marking as draft as the direct dependency on crac isn't desired and it needs tests

@gastaldi gastaldi marked this pull request as draft April 6, 2022 20:27
@geoand
Copy link
Contributor

geoand commented Apr 18, 2022

Should we move ahead with the SPI approach?

@AntonKozlov
Copy link
Contributor Author

I was not able to find a decent example of a SPI implemented in the core to get this right. The org-crac is a kind of SPI itself (provides org.crac.* classes and in the runtime detects the presence of jdk.crac.* that is the actual CRaC API implementation; if the latter is missing, org.crac does nothing), so the SPI we are discussing should be something well-suited to Quarkus.

Would it be fine to introduce io.quarkus.crac.* classes in the core, that will detect e.g. io.quarkus.crac.impl.* classes that will be provided by an extension? And to detect the presence by the reflection?

@geoand
Copy link
Contributor

geoand commented Apr 19, 2022

I just re-read how Crac is split and to be honest, I'm okay with having the dependency on the API if we can't find a more generic way of dealing with such functionality. We would however need buy in from others as well.

@stuartwdouglas
Copy link
Member

It does appear to only be 5 classes and is basically an SPI itself, maybe it is ok to add a dependency.

@AntonKozlov
Copy link
Contributor Author

I will happily change this back from Draft :)

For better understanding, what are the criteria to add a dependency to the core? Is it the difference in cumulative static footprint with dependencies on storage or something else?

@geoand
Copy link
Contributor

geoand commented May 4, 2022

We don't have any explicit criteria, but I think most of us will agree that such dependencies need to be very lightweight and also be rather stable.

I personally think we can move this ahead, but of course we'll need more buy in :)

@n1hility
Copy link
Member

n1hility commented May 6, 2022

I am +1 on the dep if its lightweight

@Sanne
Copy link
Member

Sanne commented May 6, 2022

I'm +1 as well.

One thing I'd remind is that core dependencies can't be loaded by our enhanced classloader; we've seen each jar present in the system classloader consumes extra memory - so I'm not concerned about the size of jars but more about the number of them.

To address this I'd propose that Quarkus actually combines all jars into a single bootstrap jar? If that's acceptable it could also be stripped of unneccessary code from the libraries we already depend on; not that it would be much, but the approach opens doors.

Not meaning to side-track this conversation though - just offering a way out: adding such things will be negligible, so not a reason of concern.
Let's merge it as soon as conflicts are resolved :)

@maxandersen
Copy link
Member

+1 let's go. Now we understand the implications better so all good for adding it to core as the 5 classes are simple with no deps.

@geoand
Copy link
Contributor

geoand commented May 6, 2022

Thanks for the feedback everyone.

@AntonKozlov can you please rebase onto main and take the PR out of draft?

@geoand
Copy link
Contributor

geoand commented May 7, 2022

To address this I'd propose that Quarkus actually combines all jars into a single bootstrap jar? If that's acceptable it could also be stripped of unneccessary code from the libraries we already depend on; not that it would be much, but the approach opens doors

It's certainly worth looking into

@geoand
Copy link
Contributor

geoand commented May 9, 2022

@AntonKozlov I hope you don't mind I squashed the commits, rebased the PR and force pushed.

@geoand geoand marked this pull request as ready for review May 9, 2022 10:03
@geoand
Copy link
Contributor

geoand commented May 9, 2022

What I think we'll need is some documentation showing users have to take advantage of this feature

@geoand
Copy link
Contributor

geoand commented May 10, 2022

Any final words before I merge this?

@AntonKozlov
Copy link
Contributor Author

Sorry, I had no chance to rebase and does it create any significant conflicts. Of course I don't mind :) Thanks a lot for the help and the support!

@geoand
Copy link
Contributor

geoand commented May 10, 2022

Thank you!

@geoand
Copy link
Contributor

geoand commented May 10, 2022

@AntonKozlov I have a question: Is it possible to have the checkpoint data generated on JVM exit (similar to AppCDS)?

@AntonKozlov
Copy link
Contributor Author

The easiest way is to call the checkpointRestore method somewhere in java code. The restored JVM instance will continue the execution right after the method return. But if the method is called right before the exit, then the restored instance will exit immediately then.

Another approach is to generate checkpoint after some timeout. That is to start the app, wait for a bit, and run jcmd <pid> JDK.checkpoint. Before the jcmd, it will be possible to apply some workload, so the JVM will warm-up before checkpoint -- so the restored one will be equally warmed-up. This looks kind of significant change compared to e.g. AppCDSBuildStep, but in this way we'll get the most benefit from the CRaC.

@geoand
Copy link
Contributor

geoand commented May 10, 2022

Understood, thanks!

@Sanne Sanne merged commit 292d89d into quarkusio:main May 13, 2022
@quarkus-bot quarkus-bot bot added this to the 2.10 - main milestone May 13, 2022
@kdubb kdubb modified the milestones: 2.10.0.CR1, 2.14 - main Oct 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core area/dependencies Pull requests that update a dependency file area/devtools Issues/PR related to maven, gradle, platform and cli tooling/plugins area/vertx
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants