Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide option to shorten symbol names by replacing them with a digest #705

Closed
1 of 3 tasks
h1467792822 opened this issue Dec 19, 2023 · 5 comments
Closed
1 of 3 tasks
Labels
major-change A proposal to make a major change to rustc major-change-accepted A major change proposal that was accepted T-compiler Add this label so rfcbot knows to poll the compiler team

Comments

@h1467792822
Copy link

h1467792822 commented Dec 19, 2023

Proposal

Added an optimization option that allows users to replace full symbol mangling names based on hash digests, greatly reducing the length of symbol names in dylib. At the expense of commissioning capabilities such as readability of symbol names, this option eliminates the space bottleneck encountered by using Rust to replace existing C/C++ functional modules in resource-constrained scenarios.

Motivation

The average length of symbol names in the rust standard library is about 100 bytes, while the average length of symbol names in the C++ standard library is about 50 bytes. In some embedded environments where dynamic library are widely used, rust dynamic library symbol name space has become one of the key bottlenecks of application, Especially when the existing C/C++ module is reconstructed into the rust module.

The standard library is a typical example. The proportion of the .dynstr segment in the entire elf file in the standard library of rust and that in the standard library of c++ is compared. Compare the data of specific symbols in .dynstr. The comparison data is as follows:

The proportion of .dynstr in the rust standard library is about twice that in C++:

.dynstr total
rust libstd.so 265559 804896 0.33
rust libstd.so(symbol_mangling_version=v0) 318710 858144 0.37
c++ libstdc++.so 267295 1594864 0.17

Remarks:

  1. The build environment is ubuntu 18.04 LTS.
  2. The rust standard library build options include: panc="abort", opt-leve="z", codegen-units=1,strip=true, debug=true. and the .rustc section is removed.

In C++, the average length of symbol names after mangling is about 50, while in rust, the length of symbol names after mangling is about 100.

size count average
rust libstd.so start with _ZN 263106 2722 96
not start with _ZN 2314 184 12
rust libstd.so(symbol_mangling_version=v0) start with _R 316371 2722 116
not start with _R 2314 184 12
c++ libstdc++.so start with _ZN 218957 4187 52.3
not start with _ZN 49331 1362 36.2

Finding a way to shorten the symbolic names of rust dynamic libraries is of great value.

Design

Shorter symbolic names based on digests

The solution is to replace its full mangling name with a digest, select a specific hash algorithm to generate a digest from the full symbolic mangling name. and the space of the .dynstr section can be greatly reduced, even better than that of C++.

We can use post-processing tools to do this, right? For example, objcopy. Unfortunately, objcopy --redefine-syms cannot modify or shorten the symbol name of .dynsym. Using post-processing tools to reduce dynstr segment space is much more difficult than expected. If rustc itself can solve the problem of using rust language in specific scenarios, it will be the simplest and most convenient solution for users and will greatly promote the application of rust language in a wider range of scenarios.

Usage Constraints

For debugging, If you replace the full symbol name with the digest, it is difficult to find the corresponding code based on the symbol name of the dynamic library. Therefore, the debugging information backed up by the user and the full code are required. Considering that crate is widely used in rust, the final symbolic name consists of crate and a digest is a reasonable scheme.

What can I do if a symbol name conflict occurs due to a hash conflict? After all, hash conflicts are theoretically unavoidable. There are two scenarios for this conflict, one is inside the dylib and the other is between multiple dylibs.

  1. If the scene is inside dylib, rustc will find and report this problem. The user can choose not to optimize the symbol name length, or the internal hash algorithm allows the user to provide a new salt value for recalculation, which allows the user to try to eliminate such a small probability of collision events.
  • If an rlib is dependent on a dylib, all symbols of public APIs in the rlib will be included in the dylib. If there is a hash conflict between these symbols, they will be detected immediately.

  • The case for generic functions is different. The symbol of a generic function may be scattered in multiple downstream dylibs. If the symbol of a generic function still contains crate name, hash conflicts between the generic function and other symbols of the same crate cannot be detected in time during construction. This symbol conflict is left over until it occurs during run time. In this case, instantiating-crate name is used to replace crate name can completely eliminate the risk of the preceding potential hash conflict.

  1. In a scenario between dylibs, because the final symbol name already includes the name of crate, Symbol conflicts may occur only when different versions of the same rlib are depended on by different dylibs. The same symbol name corresponds to implementations of different versions. Generally, if an rlib is used by multiple dylibs, the rlib should be replaced by its dylib. If multiple dylibs depend on the same rlib for other reasons, the symbols in the upstream rlib are not expected to be exported. This is similar to the -Wl, --exclude-libs function of gcc, In rust, -C link-arg=-Wl,--exclude-libs=libfoo.rlib can be used to avoid exporting symbols in the upstream rlib.

By comparing the exported symbols of each dynamic library, users are able to detect symbol conflicts between dylibs in advance. Although symbol name conflicts between dynamic libraries may lead to undefined behavior during running, the risk is controllable in actual application after users know the constraint.

In addition,it is not compatible with existing options: -C instrument-coverage.

Final Design Scheme

The value hashed of symbol-mangling-version is added to support shortening symbol names.

  1. Currently, only the unstable option: -C symbol-mangling-version=hashed -Z unstable-options can be used.
  2. For non-generic functions,the format of the final symbol name is _RNxC{length}{crate name}{length}H{64-bits hash}. For generic functions,the format of the final symbol name is _RNxC{length}{instantating-crate name}{length}H{64-bits hash}. complies with the existing specification (https://rust-lang.github.io/rfcs/2603-rust-symbol-name-mangling-v0.html#syntax-of-mangled-names).
  3. The 64-bit hash is encoded based on base-62 and the final terminator _ is removed because it does not help prevent hash collisions.
  4. The salt value can be transferred using -C metadata=<salt> to eliminate rare hash conflicts.

Test Data

According to the test data, the total space of the entire dylib is saved by about 20% when this option is used. For details, see the PR: rust-lang/rust#118636

Mentors or Reviewers

@m-ou-se is willing to do the review work. Very grateful for their help!

Process

The main points of the Major Change Process are as follows:

  • File an issue describing the proposal.
  • A compiler team member or contributor who is knowledgeable in the area can second by writing @rustbot second.
    • Finding a "second" suffices for internal changes. If however, you are proposing a new public-facing feature, such as a -C flag, then full team check-off is required.
    • Compiler team members can initiate a check-off via @rfcbot fcp merge on either the MCP or the PR.
  • Once an MCP is seconded, the Final Comment Period begins. If no objections are raised after 10 days, the MCP is considered approved.

You can read more about Major Change Proposals on forge.

Comments

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

@h1467792822 h1467792822 added major-change A proposal to make a major change to rustc T-compiler Add this label so rfcbot knows to poll the compiler team labels Dec 19, 2023
@rustbot
Copy link
Collaborator

rustbot commented Dec 19, 2023

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

Concerns or objections to the proposal should be discussed on Zulip and formally registered here by adding a comment with the following syntax:

@rustbot concern reason-for-concern 
<description of the concern> 

Concerns can be lifted with:

@rustbot resolve reason-for-concern 

See documentation at https://forge.rust-lang.org

cc @rust-lang/compiler @rust-lang/compiler-contributors

@rustbot rustbot added the to-announce Announce this issue on triage meeting label Dec 19, 2023
@m-ou-se
Copy link
Member

m-ou-se commented Dec 19, 2023

m-ou-se is willing to do the review work

I'm happy to give advice where I can, but I'm not part of the compiler team.

@michaelwoerister
Copy link
Member

@rustbot second

Let's definitely implement this as an unstable option so we can gather experience about performance and how to expose it to users.

@rustbot rustbot added the final-comment-period The FCP has started, most (if not all) team members are in agreement label Dec 19, 2023
@michaelwoerister
Copy link
Member

I'd suggest renaming the MCP to something like "Provide option to shorten symbol names by replacing them with a digest"

@h1467792822 h1467792822 changed the title new option to reduce the size of dylib Provide option to shorten symbol names by replacing them with a digest Dec 19, 2023
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Dec 28, 2023
@apiraino
Copy link
Contributor

@rustbot label -final-comment-period +major-change-accepted

@rustbot rustbot added major-change-accepted A major change proposal that was accepted to-announce Announce this issue on triage meeting and removed final-comment-period The FCP has started, most (if not all) team members are in agreement labels Feb 13, 2024
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-change A proposal to make a major change to rustc major-change-accepted A major change proposal that was accepted T-compiler Add this label so rfcbot knows to poll the compiler team
Projects
None yet
Development

No branches or pull requests

5 participants