Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking how much memory is committed per object heap #36731

Merged

Conversation

cshung
Copy link
Member

@cshung cshung commented May 20, 2020

The overall goal of this PR (and more to come) is to provide a per-object-heap level hard limit.

In particular, this PR introduced a new field named committed_by_oh that has 4 entries corresponding to small object heap, large object heap, pinned object heap, and supporting data structure. The values are the amount of memory committed in bytes.

The key change here is passing the oh parameters to the various virtual_* methods. These tell the method which object heap this virtual operation is meant for. The various call site needs to provide that value. Most of the time, we know the call's purpose, by looking at the locals or the flags on the associated segment.

The rest of the code is meant for making sure these passed values are correct. In particular, all the code that is under the VERIFY_COMMITTED_BY_OH definition is optional and is meant for testing only. The amount of memory committed for an object heap should match with the committed memory calculated from the associated segments of that object heap. However, it is not always true. In particular, by the time the virtual_* operation started until the segment objects are updated (or threaded), it is not true. The unsaved_virtual_operation_count is meant for making sure that we do not assert for identical value if we knew the virtual_* operation has already happened but the data structure is not yet updated. The begin_virtual_operation and complete_virtual_operation pair is meant to update that variable, and also provides a pairing of these operations by the character argument.

The testing was a bit more ambitious than it has to be. We only need to make sure all the passed oh parameter value (and hence the committed_by_oh value) is correct under the case for heap_hard_limit, but I expanded the check so that the values are true even if heap_hard_limit is not there. This is to avoid accidental misuse of those parameter values in the future. A slight twist is that if VERIFY_COMMITTED_BY_OH is not defined, the committed_by_oh is not maintained at all if heap_hard_limit is not on, this is meant for performance. I wanted to avoid taking the critical section unnecessarily.

@ghost
Copy link

ghost commented May 20, 2020

Tagging subscribers to this area: @Maoni0
Notify danmosemsft if you want to be subscribed.

@VSadov
Copy link
Member

VSadov commented May 20, 2020

I am wondering about the ultimate purpose of this accounting. - Is that for diagnostics or we expect users to actually specify separate individual limits per kind of allocation?

I.E. User can allow XX megs for small objects, YY megs for large, ZZ megs for pinned...

@cshung
Copy link
Member Author

cshung commented May 20, 2020

I am wondering about the ultimate purpose of this accounting. - Is that for diagnostics or we expect users to actually specify separate individual limits per kind of allocation?

I.E. User can allow XX megs for small objects, YY megs for large, ZZ megs for pinned...

Yes. the overall goal of this PR (and more to come) is to provide (i.e. let user specifies) a per-object-heap level hard limit.

@Maoni0
Copy link
Member

Maoni0 commented May 21, 2020

this verification code seems unnecessarily complicated. I would propose a much simpler way to verify.

the reason why virtual_commit and virtual_decommit takes a global cs is because they need to care about the global use of the committed memory so we don't exceed the hardlimit. but for verification you can just verify per heap. and the methods that could modify a heap_segment's committed are already synchronized per heap and they are either PER_HEAP or take a heap number or heap. so you can keep committed_by_oh_per_heap and you don't need to take any locks because these operations are already guarded by the per heap lock when needed. you can get rid of the begin/end_virtual_operation and just put the verification each time after we update committed. delete segments is a none issue for hardlimit but if you really want to verify the non hardlimit scenario too, you can manually update the committed_by_oh_per_heap when we delete segments.

you can verify that the committed_by_oh is the sum of committed_by_oh_per_heap of all heaps in a GC if you want.

what do you think?

@cshung
Copy link
Member Author

cshung commented May 21, 2020

what do you think?

I think that is a very interesting idea, this can hit 3 birds in one stone:

  • Simplify the logic,
  • Take check_commit_cs less frequently, and
  • Verify (in addition to the committed_by_oh values) that the heap (number) is also correct.

The verification code was placed under VERIFY_COMMITTED_BY_OH for a reason. I knew they are non-trivial (and therefore could have bugs or complexity issue), they were designed to be optional.

For the interest of time, I think we can proceed with this plan:

  • Delete the code under VERIFY_COMMITTED_BY_OH
  • Merge the PR
  • File a tracking issue to do the testing work, and
  • Complete the per object heap limit config switch feature.

I ran the stress tests (for both workstation and server GC, both without heap_hard_limit) with VERIFY_COMMITTED_BY_OH turned on. No assert was recorded for 24 hours each. Therefore I am pretty sure the code is correct.

@cshung cshung force-pushed the public/dev/andrew/track-commit-by-object-heap branch 2 times, most recently from b41e8da to a0c8a9b Compare May 21, 2020 18:34
@Maoni0
Copy link
Member

Maoni0 commented May 21, 2020

I would suggest to not merge the PR as is. it's true that it's under an optional define but we should not check in code even under an optional define when we know we would change it and the change doesn't require that much effort (I think this should be much easier to implement than the one you did). I would estimate that since you now have looked through the relevant code paths it wouldn't take more than a few days, right?

@cshung
Copy link
Member Author

cshung commented May 21, 2020

I would suggest to not merge the PR as is. it's true that it's under an optional define but we should not check in code even under an optional define when we know we would change it

I think you might have misunderstood what I just said. I have pushed the change to delete the code under the definition.

it wouldn't take more than a few days, right?

I think I shouldn't take more than a week. I was just thinking it is probably a better idea to do the config switches first. Either way will work fine for me.

@mjsabby
Copy link
Contributor

mjsabby commented May 21, 2020

@cshung can you make sure Large Pages is tested in this case? even manually? I'll be able to put it on bing.com as soon as your build shows up in rolling builds.

@Maoni0
Copy link
Member

Maoni0 commented May 21, 2020

sorry @cshung, I misread. I'm totally fine with checking in the rest of the code but I have not looked at it in detail yet. let me do that. I see the description of the PR still mentions the code you deleted, BTW.

src/coreclr/src/gc/gc.cpp Outdated Show resolved Hide resolved
src/coreclr/src/gc/gc.cpp Outdated Show resolved Hide resolved
src/coreclr/src/gc/gcpriv.h Outdated Show resolved Hide resolved
src/coreclr/src/gc/gcpriv.h Outdated Show resolved Hide resolved
src/coreclr/src/gc/gc.cpp Outdated Show resolved Hide resolved
@cshung cshung force-pushed the public/dev/andrew/track-commit-by-object-heap branch from cfcab1f to a68babc Compare May 22, 2020 17:02
@cshung
Copy link
Member Author

cshung commented May 22, 2020

@cshung can you make sure Large Pages is tested in this case? even manually? I'll be able to put it on bing.com as soon as your build shows up in rolling builds.

@mjsabby I am very happy that Bing is going to try it out. This is going to be a very good validation of the work.

This PR is indeed trying to solve the 3x commit problem we discovered earlier.

However, this PR alone is insufficient, as it only tracks the committed memory per object heap. In particular, it doesn't have the new configuration switches yet.

I am currently working on introducing the new switches, with that, I will alter the initial reservation logic, and add the per object heap commit limit checks. With that, we will solve the 3x commit problem.

I will make sure large pages are tested when I introduce the switches.

Copy link
Member

@Maoni0 Maoni0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@cshung cshung force-pushed the public/dev/andrew/track-commit-by-object-heap branch 3 times, most recently from ea0f6c7 to 8cd12c8 Compare May 26, 2020 14:45
@cshung cshung force-pushed the public/dev/andrew/track-commit-by-object-heap branch from 8cd12c8 to 5fdb407 Compare May 26, 2020 14:47
@cshung cshung force-pushed the public/dev/andrew/track-commit-by-object-heap branch from 5fdb407 to f50de72 Compare May 26, 2020 14:58
@cshung cshung merged commit 68fd4fb into dotnet:master May 26, 2020
@cshung cshung deleted the public/dev/andrew/track-commit-by-object-heap branch May 26, 2020 18:09
@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants