-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llvm location aliasing information #324
Llvm location aliasing information #324
Conversation
… llvm Does not yet include metadata, so this version will not compile
…equence Giving too much information causes unbearable slowdown in llvm, as the representation has n^2 size. There may be unforeseen correctness issues with aliasing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments about the SimpleOper.t
datatype.
val fromOper = | ||
fn Operand.StackOffset _ => Stack | ||
| Operand.Offset _ => Offset | ||
| Operand.SequenceOffset _ => SequenceOffset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These aren't quite accurate. For Operand.Offset
and Operand.SequenceOffset
, we need to look at the type of the base
field, because both operands can be used for non-ML data (e.g., Operand.Offset
is used to access fields of GCState
and Operand.SequenceOffset
is used to implement the CPointer_{get,set}<ty>
primitives).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've filtered based on types and put non-objptrs into the Other category (which is disjoint, rather than unknown). That should be sound for our flows, as far as I can tell.
|
||
structure SimpleOper = struct | ||
|
||
datatype t = Stack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree that Offset
and SequenceOffset
would not alias, a simple Stack | Heap | Other
distinction seems simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I've added the offsets back in, and the complexity seems to have little effect on time for TBAA-based aliasing, I'd argue that there's little benefit to that simplicity any more.
Indeed using type-based analysis instead of the alias.scope infrastructure was more performant for llvm. With the full information, hamlet takes 32 seconds to compile, versus 37 with alias.scope, and 30 with no aliasing information (and about 32 as well for the simple information). I didn't see your comments when pushing the code above. |
Here's some results from testing on oxygen with the current version (Stack/Offset/SequenceOffset with offsets via TBAA). Mostly positive, some slowdowns are consistent and others (lexgen) appear to be noise. |
I forgot to upload this: Third column is with stack indices not distinguished. The current commit has the different stack indices |
0dca4b5
to
b2054be
Compare
Good point, applied the change. |
I started a preliminary version, which is a bit unsound (having 1 objptr disjoint from unknown, rather than nested, and missing the vector/array fixes). The overall benefit seems negligible from quick testing, and the compile time seems to actually worsen on hamlet (or other larger programs), so I think we don't get much more benefit than the Stack/Heap info we had. For master vs aliasing with types: Hamlet.MLton0.batch.0.ll has 938 lines of metadata for tbaa with these changes. Similar or more for other files in the batch. |
Benchmark results (sulfur; g357a28440)Specs:
Run-Time RatioThis table shows the effect of
There does seem to be some overlap in the effects of Unfortunately,
Compile-Time Ratio
Executable-Size Ratios
|
I've updated |
If I'm understanding |
Sorry about that, I'll allow that here. but...
This was the note I made in the comment above. 1af34cb is not fully sound. But it also has no real benefits and a lot of costs, so if it's expected to be more performant and faster compiled with unsoundness, then it would be even worse when fixed, so I wanted to drop that. I only left 1af34cb here for visibility. 95e8e37 is sound, and much better for compile time, with no real runtime differences. I would recommend rebasing back and force pushing. |
O.k. So, we should revert |
Oops, I missed your last sentence: rebasing back and force pushing. I'll do that. |
Change the object allocation sequence from CW(Frontier) = header; dst = Frontier + NORMAL_METADATA_SIZE; Frontier += size; to dst = Frontier + NORMAL_METADATA_SIZE; OW(dst, ~GC_HEADER_SIZE) = header; Frontier += size; This ensures that the write to the heap is through an `Offset` operand with an `Objptr` base, so that the LLVM tbaa will treat the store as one into the heap.
1af34cb
to
b787e91
Compare
Some notes for future improvements to the alias-analysis metadada:
|
Although the type information had very little effect, passing aliasing information about the stack and heap was sufficient to affect many programs.
The only information included is distinguishing the style of operand, only StackOffset/Offset/SequenceOffset are included. Including too much causes major slowdowns in compilation, even with relatively minor sets, as the !noalias information must be reported in sets of total size n^2. Given that the only ways to pass information to LLVM are this and type-alias information, it seems we would need to get an update into LLVM to support more without heavy slowdown, or to handle it as a performance bug.
It's a bit iffy on whether it's worth it. Both versions have considerable compile time costs (which feels like an LLVM bug, considering one of those has only four disjoint classes). There's a mix of slowdowns and speedups, with perhaps a slight bias towards speedups. I've attached two output files: the one with no indices, and the other with indices. Both are run on my machine, so there's some expected differences with e.g. I/O.
detailed-out.txt
simple-out.txt
Some code for object pointers is left in from attempting type-based-aliasing. RepType.deObjptrs does feel a gap, and I ran into the need for it in other code. Otherwise though, we can revert and squash to reduce churn.