-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid field reload in ephe_get_field #11749
Conversation
Thanks @jmid. The fix is correct. A shortened test case is useful. I was able to reliably repeat the crashes with the for loop going only until Since the program only has 3 domains running in parallel (one of which is quickly expected to wait on |
OK, test case added with 100 iterations in a8f5da6 |
AppVeyor failure is on an unrelated test https://ci.appveyor.com/project/avsm/ocaml/builds/45478262#L4785. |
Ouch, we knew this sort of race could appear in legacy code in theory but I did not expect that it would appear so quickly in practice. Looking at |
I wrote tests of the low-level |
to be clear, weak module is an almost complete reimplementation on Multicore. The buggy code is not legacy code. It is some code that kind of got left in as the weak.c file was refactored from the concurrent minor GC to the parallel minor GC. (Walking the Git Blame history on the deleted line should tell you the entire picture). |
Ok, I now understand this is a simple bug unrelated to backwards-compatibility issues. This also means that there is no reason to look for other issues of this kind in the same file. |
Indeed. If we are looking for bugs introduced in legacy code due to the addition of parallelism, we will need to look elsewhere. |
Avoid field reload in ephe_get_field (cherry picked from commit 82c6a8d)
Cherry-picked on 5.0 as 3061e71 . |
Consider the following program, performing unsynchronized
Weak.get
andWeak.set
:This can result in weird values being observed:
The problem is in
ephe_get_field
where we should avoid reloading the field, as it may have been altered in parallel:ocaml/runtime/weak.c
Lines 244 to 256 in 408bba1
I'm unsure
The PR was a tag-team effort with @kayceesrk who cooked up a quick fix after I had found the above test case.