-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial support of Project CRaC #23950
Conversation
Very interesting, thanks for the contribution! |
@@ -21,7 +23,7 @@ | |||
* The implementation also contains optimizations that allow the ClassLoader to keep a minimum number of jars open | |||
* while also preventing the lookup of the entire classpath for missing resources in known directories (like META-INF/services). | |||
*/ | |||
public final class RunnerClassLoader extends ClassLoader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
must it implement the interface - it is not sufficient just having methods on the class ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, an object should implement the interface to be able to register. If this is a problem (RunnerClassLoader implementing Resource interface), I think it should be possible to create an inner class implementing the interface and keep the RunnerClassLoader signature unchanged. Should I change the code this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah; I think the less exposure we have of crac api the better as it (as I see it) just a internal implementation detail.
Ultimately if we could find a way to have this as an extension so it only "cost" when crac is possible to enable.
Also...does this build and run in native ? (Ie. I know crac won't be enabled but would need it to not prevent use of native - at least if available in core)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't worry about crac building in native, we can hint it to be seen as dead code during analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'm changing this piece.
I've tried building my quarkus-example in native mode without any issue. This surprised me a bit :) (org-crac uses reflection to access API), but it went smoothly. Also, I'm relying on GHA to run test in native as well.
I just spotted and fixed the link to the branch of example-quarkus repo, now it should be correct. Looking forward to the feedback! |
I can't help thinking that we should probably add a SPI for this, rather than adding CRaC into the core. Basically a build item that can be used to provide pre-checkpoint/post-restore tasks. This would allow CraC support to be provided as part of Quarkiverse. My initial thoughts around this when we were talking about CRIU were that it would be nice if Quarkus could provide the checkpoint as part of its build process, however the warm up step from your example where you apply load to warm up the JVM kinda complicates this, as this would be application dependent. |
I agree with Stuart. I would not put that in core (while being pretty cool). |
Interesting, is there a good example of a similar approach implemented? I've tried to "fork" necessary components for Quarkiverse, but failed. Having an SPI, that is, some awareness in the core that can be extended by Quarkiverse should be a simpler way.
Yes, CRaC is intended to run on the application as a whole, to be able to capture complete initialized state (and JIT compilations, in the current implementation). Although warm-up phase is optional, it is one of the reasons apps may opt-in for CRaC. The CRaC image is somewhat similar to the native image -- a binary used for deployment, probably as a part of a container image. |
+1 and Thank you @AntonKozlov ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said in my previous comment, it looks great, but having a dependency on Crac from the core may not be the best approach.
We need to implement a way to extract crac support in its own extension. It looks like @stuartwdouglas has some ideas to enable this.
stop(p); | ||
CountDownLatch latch = new CountDownLatch(1); | ||
p.future().onComplete(event -> latch.countDown()); | ||
latch.await(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that method is called on the event loop, you cannot await.
From where is this method called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the checkpoint is requested by the jcmd (as in example-quarkus), a separate thread is created to call j.c.C.checkpointRestore, which calls all beforeCheckpoint's (these methods are called in a particular order, sequentially). More info about this is https://crac.github.io/openjdk-builds/javadoc/api/java.base/jdk/crac/package-summary.html. Image is created only when all methods finish successfully. So the wait is here to ensure the server is stopped.
@@ -34,6 +34,7 @@ | |||
|
|||
import javax.enterprise.event.Event; | |||
|
|||
import org.crac.Resource; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we want a dependency on crac.
Marking as draft as the direct dependency on crac isn't desired and it needs tests |
Should we move ahead with the SPI approach? |
I was not able to find a decent example of a SPI implemented in the core to get this right. The org-crac is a kind of SPI itself (provides Would it be fine to introduce |
I just re-read how Crac is split and to be honest, I'm okay with having the dependency on the API if we can't find a more generic way of dealing with such functionality. We would however need buy in from others as well. |
It does appear to only be 5 classes and is basically an SPI itself, maybe it is ok to add a dependency. |
I will happily change this back from Draft :) For better understanding, what are the criteria to add a dependency to the core? Is it the difference in cumulative static footprint with dependencies on storage or something else? |
We don't have any explicit criteria, but I think most of us will agree that such dependencies need to be very lightweight and also be rather stable. I personally think we can move this ahead, but of course we'll need more buy in :) |
I am +1 on the dep if its lightweight |
I'm +1 as well. One thing I'd remind is that core dependencies can't be loaded by our enhanced classloader; we've seen each jar present in the system classloader consumes extra memory - so I'm not concerned about the size of jars but more about the number of them. To address this I'd propose that Quarkus actually combines all jars into a single bootstrap jar? If that's acceptable it could also be stripped of unneccessary code from the libraries we already depend on; not that it would be much, but the approach opens doors. Not meaning to side-track this conversation though - just offering a way out: adding such things will be negligible, so not a reason of concern. |
+1 let's go. Now we understand the implications better so all good for adding it to core as the 5 classes are simple with no deps. |
Thanks for the feedback everyone. @AntonKozlov can you please rebase onto main and take the PR out of draft? |
It's certainly worth looking into |
@AntonKozlov I hope you don't mind I squashed the commits, rebased the PR and force pushed. |
What I think we'll need is some documentation showing users have to take advantage of this feature |
Any final words before I merge this? |
Sorry, I had no chance to rebase and does it create any significant conflicts. Of course I don't mind :) Thanks a lot for the help and the support! |
Thank you! |
@AntonKozlov I have a question: Is it possible to have the checkpoint data generated on JVM exit (similar to AppCDS)? |
The easiest way is to call the checkpointRestore method somewhere in java code. The restored JVM instance will continue the execution right after the method return. But if the method is called right before the exit, then the restored instance will exit immediately then. Another approach is to generate checkpoint after some timeout. That is to start the app, wait for a bit, and run |
Understood, thanks! |
There is a project CRaC in OpenJDK: https://openjdk.java.net/projects/crac/. A bit more verbose description is at https://github.com/crac/docs#readme. The idea of the project is to allow creating image (checkpoint) of an application and JVM state, and then start new instances from the image to reduce start-up and warm-up time. Applications are allowed to manage their state on the checkpoint and restore. Apps are also obligated to take care of external resources such as file handles and network connections. Such care naturally resides in frameworks, so from the very start, we pay special attention to popular ones, including Quarkus.
This change provides initial CRaC support to Quarkus, making an unmodified app https://github.com/CRaC/example-quarkus/tree/dev-quarkus to start on CRaC JDK in a matter of tenth of milliseconds. To avoid a hard dependency on the new features provided by CRaC, applications are suggested to use org-crac package https://github.com/CRaC/org.crac. So the new code should not harm the compatibility with existing runtimes.
However, I'm totally lost about how to document and provide tests. Could someone point or describe how it would need to look like? Native image mode seems to require passing all of the tests. Would a similar way of testing be sufficient for CRaC, in theory?