Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a managed implementation of assembly binder #91400

Closed
wants to merge 143 commits into from

Conversation

huoyaoyuan
Copy link
Member

#85558 (comment)

It should be possible to move the whole assembly loader to C#. We would need a special unmanaged path to load CoreLib to bootstrap, but the rest can be managed. It is too big to do it all at once. I think a good place to start with this refactoring would be BINDER_SPACE::*.

I've ported the types under BINDER_SPACE::, and a large portion of binding logic in assemblybindercommon.cpp. I also created a disabled feature switch to guard the code.

The logic is ported closely from C++ and aims line-to-line match, to keep the behavior. Initially I want to convert all the HRESULTs to exceptions, but it's not practical to use exception in control logic of binder, so I kept HRESULTs in AssemblyBinderCommon.

/cc @jkotas

Is this in the desired direction? I haven't do anything around managed/unmanaged boundary. I think code should be ported to managed, until something we don't want.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Aug 31, 2023
@ghost
Copy link

ghost commented Aug 31, 2023

Tagging subscribers to this area: @vitek-karas, @agocke, @VSadov
See info in area-owners.md if you want to be subscribed.

Issue Details

#85558 (comment)

It should be possible to move the whole assembly loader to C#. We would need a special unmanaged path to load CoreLib to bootstrap, but the rest can be managed. It is too big to do it all at once. I think a good place to start with this refactoring would be BINDER_SPACE::*.

I've ported the types under BINDER_SPACE::, and a large portion of binding logic in assemblybindercommon.cpp. I also created a disabled feature switch to guard the code.

The logic is ported closely from C++ and aims line-to-line match, to keep the behavior. Initially I want to convert all the HRESULTs to exceptions, but it's not practical to use exception in control logic of binder, so I kept HRESULTs in AssemblyBinderCommon.

/cc @jkotas

Is this in the desired direction? I haven't do anything around managed/unmanaged boundary. I think code should be ported to managed, until something we don't want.

Author: huoyaoyuan
Assignees: -
Labels:

area-AssemblyLoader-coreclr

Milestone: -

@jkotas
Copy link
Member

jkotas commented Aug 31, 2023

This is good raw material. Creating a clean managed/unmanaged boundary is probably going to be the harder part. Some thoughts:

  • The unmanaged binder calls out to managed to log tracing messages and to fire resolve events. This back and forth between managed and unmanaged should go away, the unmanaged code should just do a single call to managed to do everything.
  • The binding using TPA list should be folded into DefaultAssemblyLoadContext, no need for it to live in separate type.
  • The managed equivalent of TextualIdentityParser exists as System.Reflection.AssemblyNameFormatter. We should just use the managed AssemblyNameFormatter instead of duplicating it as TextualIdentifyParser.
  • TextualIdentifyParser may be used by debugger or in places that are not able to run managed code. If you run into situations like this, leave it alone in C++ for now.
  • We avoid C# record in core libraries. C# record generates too much hidden code bloat that the trimmer is not able to delete.
  • I am not sure what is the purpose of the binding failure HRESULT error cache. The VM caches the binding failures with the exception details (look for AppDomain::AddExceptionToCache). I would think that the binding failure HRESULT error cache should be unnecessary.

int* dwPAFlags = stackalloc int[2];
using IMdInternalImport pIMetaDataAssemblyImport = BinderAcquireImport(pPEImage, dwPAFlags);

Architecture = AssemblyBinderCommon.TranslatePEToArchitectureType(dwPAFlags);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The managed binder should not need to deal with image architectures. The unmanaged runtime should validate very early whether the image is good for current architecture (and reject it if it is not). By the time we get to the managed binder, we should know that the image architecture is good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started with focusing the data structures, and did not look at the caller. Leaving the cleanup later should be fine, since cleaning managed code is easy.

@vitek-karas
Copy link
Member

@elinor-fung FYI

@huoyaoyuan
Copy link
Member Author

Good points for the behaviors that we don't need in managed binder. I'm aiming for a working POC first, and not tweaking yet. Records are just simplification since I don't want to write GetHashCode etc. first.

Rough thought of steps:

  • Explore the scope of managed binder, create shape for things to port, and use placeholder for boundary and complex functions.
  • Fulfill implementation and boundary incrementally, to get the POC working.
  • Create feature switch guard for unmanaged code replaced, and filter out things we can't remove, for example which used by debugger.
  • Tweak boundary and optimise managed code.
  • Turn on the feature switch eventually.

Questions:

  • Should I create a tracking issue for this? This is likely not a single-PR work.
  • Should we get some code merged with feature switch disabled first? How complete should it be?
  • Will this affect bootstrapping on new platform/architecture?

@huoyaoyuan
Copy link
Member Author

huoyaoyuan commented Mar 10, 2024

I've done some measurement about working set:

It's about 3.5MB of total regression comparing to main for a given run. Break down by vmmap:
Heap (unmanaged?): +0.8MB
Image: +1.5MB. The working set of CoreLib, coreclr, clrjit and icu.dll has all increased.
Managed: vmmap can't show this category of PR build, but successes for main branch build. It's about 1.5MB for main.
Mapped file: +0.6MB for icudtl.dat
Page table: -0.3MB
Private data: +276KB, a block of R/W data
Sharable: +2MB. Not sure what it represents.

So one obvious conclusion is that some globalization related code are touched unintentionally.

Setting globalization invariant mode doesn't change the regression though.

Copy link
Contributor

@ANahr ANahr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple cases in the code that might trigger globalization code

Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@huoyaoyuan huoyaoyuan reopened this Apr 12, 2024
@huoyaoyuan
Copy link
Member Author

Keep open since it's functionally complete.

Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@huoyaoyuan huoyaoyuan reopened this May 12, 2024
@huoyaoyuan
Copy link
Member Author

Status report:

Working Set:

About 15.2 MB -> 16.3 MB under default configuration.
The difference is mainly +0.6MB in mapped image of CoreLib, and +0.3MB of managed heap, +0.2MB of sharable page.
There is reduction of mapped image of coreclr.dll, but the difference is much smaller.

Start time:

Measured with Measure-Command in PowerShell as an external source. The startup time range shifts from 23.6 ~ 24.8ms to 25.3 ~ 26.7ms. The regression is clearly measurable.

File size:

-39KB in coreclr.dll, +92KB in CoreLib (R2R).

There are still some space of dead code for optimization, but I kept this PR as close to native code as possible to help reviewing.
Cleanup around managed->native calls may also improve performance in the future.

Comment on lines +171 to +176
if (outPath.EndsWith(".ni.dll", StringComparison.OrdinalIgnoreCase)
|| outPath.EndsWith(".ni.exe", StringComparison.OrdinalIgnoreCase))
{
simpleName = outPath[iSimpleNameStart..^7];
isNativeImage = true;
}
Copy link
Member Author

@huoyaoyuan huoyaoyuan May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ni.dll still a thing? I remember it's output of NGen.

Even if it's still a thing, we should probably remove this too because the shared framework (TPA) does not contain any ni files.
The performance difference is negligible though.

@huoyaoyuan
Copy link
Member Author

I'm going to abandon this for now.

Per offline discussion with @jkotas , there's no net improvement for this PR. The newly created managed-native boundaries are quite a lot. It's not easy to eliminate them since many of the native components are used by diagnostics interfaces like DAC, or bootstrap path for executing managed application. There's also duplications between managed and native code, and potential discrepancies with CoreLib loading.

If anyone want to revisit this in the future, consider to create a branch in runtimelab. There are also improvable small points discovered during this.

Although not a successful experiment, still great thanks to @jkotas for assisting on this!

@huoyaoyuan huoyaoyuan closed this Jun 12, 2024
@agocke
Copy link
Member

agocke commented Jun 12, 2024

@huoyaoyuan thanks for your effort! I’m sure it will be useful for the next changes in this area.

@github-actions github-actions bot locked and limited conversation to collaborators Jul 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-AssemblyLoader-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants