Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cargo should skip rerun-if-changed paths to files in published crate sources #11083

Closed
sfackler opened this issue Sep 13, 2022 · 7 comments · Fixed by #11613
Closed

Cargo should skip rerun-if-changed paths to files in published crate sources #11083

sfackler opened this issue Sep 13, 2022 · 7 comments · Fixed by #11613
Labels
A-rebuild-detection Area: rebuild detection and fingerprinting C-bug Category: bug

Comments

@sfackler
Copy link
Member

Problem

When running CI builds in CircleCI, I've noticed that caching the Cargo registry and target directory works most, but not all of the time to avoid rebuilding dependencies. In particular, tikv-jemalloc-sys is rebuilt on every CI run.

From what I can tell from reading the cargo::core::compiler::fingerprint log output, it looks like the issue is tikv-jemalloc-sys's build script emits a rerun-if-changed directive for the jemalloc source directory published with the crate. It seems like Circle's caching logic does not preserve file modification times, and while that seems to normally be fine it forces rebuilds in this one particular case.

I believe the same should be true for basically any crate that does something similar (openssl-src, etc) but haven't checked.

Steps

  1. Depend on a published crate that reports a rerun-if-changed on files bundled with the crate.
  2. Cache the Cargo source and target directories in a manner that does not preserve mtimes.
  3. Set that most dependencies in the crate graph (cc, etc) are correctly detected as not needing a rebuild, but the rerun-if-changed crate does not.

I've put together a simple project that reproduces the issue: https://github.com/sfackler/target-cache-test

A Circle build with fingerprint trace logging: https://app.circleci.com/pipelines/github/sfackler/target-cache-test/2/workflows/1dee508c-bd46-4729-a6a6-2715fff14c5a/jobs/2

A copy of the logs if you don't want to log into circle:

[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] fingerprint at: /root/project/target/debug/.fingerprint/target-cache-test-ad795130651e176c/bin-target-cache-test
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] new local fingerprints deps "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/libc-0.2.132"
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] max output mtime for "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/libc-0.2.132" is "/root/project/target/debug/build/libc-875143ba2a651140/build_script_build-875143ba2a651140" 1663091920.000000000s
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] all paths up-to-date relative to "/root/project/target/debug/.fingerprint/libc-875143ba2a651140/dep-build-script-build-script-build" mtime=1663091920.000000000s
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] filesystem up-to-date "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/libc-0.2.132"
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] max output mtime for "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/libc-0.2.132" is "/root/project/target/debug/build/libc-06ac331ace17f261/output" 1663091920.000000000s
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] max dep mtime for "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/libc-0.2.132" is "/root/project/target/debug/build/libc-875143ba2a651140/build_script_build-875143ba2a651140" 1663091920.000000000s
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] all paths up-to-date relative to "/root/project/target/debug/build/libc-06ac331ace17f261/output" mtime=1663091920.000000000s
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] filesystem up-to-date "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/libc-0.2.132"
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] max output mtime for "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/libc-0.2.132" is "/root/project/target/debug/deps/liblibc-dadc43429530f2e4.rlib" 1663091921.000000000s
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] max dep mtime for "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/libc-0.2.132" is "/root/project/target/debug/build/libc-06ac331ace17f261/output" 1663091920.000000000s
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] all paths up-to-date relative to "/root/project/target/debug/.fingerprint/libc-dadc43429530f2e4/dep-lib-libc" mtime=1663091920.000000000s
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] filesystem up-to-date "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/libc-0.2.132"
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] new local fingerprints deps "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/tikv-jemalloc-sys-0.5.1+5.3.0-patched"
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] max output mtime for "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/cc-1.0.73" is "/root/project/target/debug/deps/libcc-a23a0e4ab5ac45ce.rlib" 1663091921.000000000s
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] all paths up-to-date relative to "/root/project/target/debug/.fingerprint/cc-a23a0e4ab5ac45ce/dep-lib-cc" mtime=1663091920.000000000s
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] filesystem up-to-date "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/cc-1.0.73"
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] max output mtime for "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/fs_extra-1.2.0" is "/root/project/target/debug/deps/libfs_extra-d1409e0b5297afde.rlib" 1663091920.000000000s
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] all paths up-to-date relative to "/root/project/target/debug/.fingerprint/fs_extra-d1409e0b5297afde/dep-lib-fs_extra" mtime=1663091920.000000000s
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] filesystem up-to-date "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/fs_extra-1.2.0"
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] max output mtime for "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/tikv-jemalloc-sys-0.5.1+5.3.0-patched" is "/root/project/target/debug/build/tikv-jemalloc-sys-2b39192ec04e05b2/build_script_build-2b39192ec04e05b2" 1663091921.000000000s
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] max dep mtime for "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/tikv-jemalloc-sys-0.5.1+5.3.0-patched" is "/root/project/target/debug/deps/libcc-a23a0e4ab5ac45ce.rlib" 1663091921.000000000s
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] max dep mtime for "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/tikv-jemalloc-sys-0.5.1+5.3.0-patched" is "/root/project/target/debug/deps/libfs_extra-d1409e0b5297afde.rlib" 1663091920.000000000s
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] all paths up-to-date relative to "/root/project/target/debug/.fingerprint/tikv-jemalloc-sys-2b39192ec04e05b2/dep-build-script-build-script-build" mtime=1663091921.000000000s
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] filesystem up-to-date "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/tikv-jemalloc-sys-0.5.1+5.3.0-patched"
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] max output mtime for "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/tikv-jemalloc-sys-0.5.1+5.3.0-patched" is "/root/project/target/debug/build/tikv-jemalloc-sys-8101985b937c9e9f/output" 1663091921.000000000s
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] max dep mtime for "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/tikv-jemalloc-sys-0.5.1+5.3.0-patched" is "/root/project/target/debug/build/tikv-jemalloc-sys-2b39192ec04e05b2/build_script_build-2b39192ec04e05b2" 1663091921.000000000s
[2022-09-13T18:00:20Z INFO  cargo::core::compiler::fingerprint] stale: changed "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/tikv-jemalloc-sys-0.5.1+5.3.0-patched/jemalloc"
[2022-09-13T18:00:20Z INFO  cargo::core::compiler::fingerprint]           (vs) "/root/project/target/debug/build/tikv-jemalloc-sys-8101985b937c9e9f/output"
[2022-09-13T18:00:20Z INFO  cargo::core::compiler::fingerprint]                FileTime { seconds: 1663091921, nanos: 0 } != FileTime { seconds: 1663092020, nanos: 432004673 }
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] max output mtime for "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/tikv-jemalloc-sys-0.5.1+5.3.0-patched" is "/root/project/target/debug/deps/libtikv_jemalloc_sys-812ba7dac85e5ef8.rmeta" 1663091951.000000000s
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] max dep mtime for "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/tikv-jemalloc-sys-0.5.1+5.3.0-patched" is "/root/project/target/debug/deps/liblibc-dadc43429530f2e4.rmeta" 1663091921.000000000s
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] max output mtime for "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/tikv-jemallocator-0.5.0" is "/root/project/target/debug/deps/libtikv_jemallocator-c2826f4f246f2c52.rmeta" 1663091951.000000000s
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] max dep mtime for "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/tikv-jemallocator-0.5.0" is "/root/project/target/debug/deps/liblibc-dadc43429530f2e4.rmeta" 1663091921.000000000s
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] max output mtime for "/root/project" is "/root/project/target/debug/deps/target_cache_test-ad795130651e176c" 1663091951.000000000s
[2022-09-13T18:00:20Z INFO  cargo::core::compiler::fingerprint] fingerprint error for target-cache-test v0.1.0 (/root/project)/Build/TargetInner { name: "target-cache-test", doc: true, ..: with_path("/root/project/src/main.rs", Edition2021) }
[2022-09-13T18:00:20Z INFO  cargo::core::compiler::fingerprint]     err: current filesystem status shows we're outdated
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] fingerprint at: /root/project/target/debug/.fingerprint/tikv-jemallocator-c2826f4f246f2c52/lib-tikv-jemallocator
[2022-09-13T18:00:20Z INFO  cargo::core::compiler::fingerprint] fingerprint error for tikv-jemallocator v0.5.0/Build/TargetInner { tested: false, benched: false, ..: lib_target("tikv-jemallocator", ["lib"], "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/tikv-jemallocator-0.5.0/src/lib.rs", Edition2018) }
[2022-09-13T18:00:20Z INFO  cargo::core::compiler::fingerprint]     err: current filesystem status shows we're outdated
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] fingerprint at: /root/project/target/debug/.fingerprint/libc-dadc43429530f2e4/lib-libc
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] fingerprint at: /root/project/target/debug/.fingerprint/libc-06ac331ace17f261/run-build-script-build-script-build
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] fingerprint at: /root/project/target/debug/.fingerprint/libc-875143ba2a651140/build-script-build-script-build
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] fingerprint at: /root/project/target/debug/.fingerprint/tikv-jemalloc-sys-812ba7dac85e5ef8/lib-tikv-jemalloc-sys
[2022-09-13T18:00:20Z INFO  cargo::core::compiler::fingerprint] fingerprint error for tikv-jemalloc-sys v0.5.1+5.3.0-patched/Build/TargetInner { ..: lib_target("tikv-jemalloc-sys", ["lib"], "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/tikv-jemalloc-sys-0.5.1+5.3.0-patched/src/lib.rs", Edition2018) }
[2022-09-13T18:00:20Z INFO  cargo::core::compiler::fingerprint]     err: current filesystem status shows we're outdated
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] fingerprint at: /root/project/target/debug/.fingerprint/tikv-jemalloc-sys-8101985b937c9e9f/run-build-script-build-script-build
[2022-09-13T18:00:20Z INFO  cargo::core::compiler::fingerprint] fingerprint error for tikv-jemalloc-sys v0.5.1+5.3.0-patched/RunCustomBuild/TargetInner { ..: custom_build_target("build-script-build", "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/tikv-jemalloc-sys-0.5.1+5.3.0-patched/build.rs", Edition2018) }
[2022-09-13T18:00:20Z INFO  cargo::core::compiler::fingerprint]     err: current filesystem status shows we're outdated
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] fingerprint at: /root/project/target/debug/.fingerprint/tikv-jemalloc-sys-2b39192ec04e05b2/build-script-build-script-build
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] fingerprint at: /root/project/target/debug/.fingerprint/cc-a23a0e4ab5ac45ce/lib-cc
[2022-09-13T18:00:20Z DEBUG cargo::core::compiler::fingerprint] fingerprint at: /root/project/target/debug/.fingerprint/fs_extra-d1409e0b5297afde/lib-fs_extra
   Compiling tikv-jemalloc-sys v0.5.1+5.3.0-patched
[2022-09-13T18:00:55Z DEBUG cargo::core::compiler::fingerprint] new local fingerprints deps "/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/tikv-jemalloc-sys-0.5.1+5.3.0-patched"
[2022-09-13T18:00:55Z DEBUG cargo::core::compiler::fingerprint] write fingerprint (b16608c139b4fa80) : /root/project/target/debug/.fingerprint/tikv-jemalloc-sys-8101985b937c9e9f/run-build-script-build-script-build
   Compiling tikv-jemallocator v0.5.0
[2022-09-13T18:00:55Z DEBUG cargo::core::compiler::fingerprint] write fingerprint (9dc3e3c9592e9add) : /root/project/target/debug/.fingerprint/tikv-jemalloc-sys-812ba7dac85e5ef8/lib-tikv-jemalloc-sys
[2022-09-13T18:00:55Z DEBUG cargo::core::compiler::fingerprint] write fingerprint (65cbcef45534a05) : /root/project/target/debug/.fingerprint/tikv-jemallocator-c2826f4f246f2c52/lib-tikv-jemallocator
   Compiling target-cache-test v0.1.0 (/root/project)
[2022-09-13T18:00:56Z DEBUG cargo::core::compiler::fingerprint] write fingerprint (52394158d0809914) : /root/project/target/debug/.fingerprint/target-cache-test-ad795130651e176c/bin-target-cache-test
    Finished dev [unoptimized + debuginfo] target(s) in 35.90s

Possible Solution(s)

It seems like Cargo already assumes that the source of published crates will never change given that we don't see the same rebuilding behavior for most crates. It could potentially extend that to ignoring rerun-if-changed paths that point into the published crate source directory.

Notes

No response

Version

No response

@sfackler sfackler added the C-bug Category: bug label Sep 13, 2022
@sfackler
Copy link
Member Author

Ah - I've looked into things a bit more - Circle's caching does preserve mtimes, but I was only caching ~/.cargo/registry/cache but not ~/.cargo/registry/src. It seems like unifying the special casing of the src directory would still be nice here, but obviously not required!

@ehuss ehuss added the A-rebuild-detection Area: rebuild detection and fingerprinting label Sep 26, 2022
@arriven
Copy link
Contributor

arriven commented Dec 31, 2022

I've been hitting similar issue and decided to investigate a bit. I was only able to reproduce it with cranelift-isle (due to this), grpcio-sys (due to this), and librocksdb-sys (due to this) but at the same time other similar packages don't seem to trigger this behavior (i.e. libsqlite3-sys here or zstd-sys here) so it's something more specific than just published crate sources

The easiest way to test this locally is to have one of those packages in dependencies (note: for libsqlite3-sys you need to enable bundled feature and in case of zstd-sys you'd need to build for wasm32-unknown-unknown, otherwise it doesn't output rerun-if-changed for files), run cargo build or similar, remove $CARGO_HOME/registry/src, and then re-run cargo build

I don't have a clue on how cargo works internally yet so my initial assumptions are:
a) rerun-if-changed path containing file extensions unknown to cargo (cranelift-isle uses a custom extension while rocksdb, grpcio-sys, and tikv-jemalloc-sys contain large submodules with a lot of different files some of which might hit this)
b) rerun-if-changed path points to a directory instead of specific files

I also had an assumption about this behavior being triggered by pointing rerun-if-changed to a submodule dir (based on tikv-jemalloc-sys, grpcio-sys, and librocksdb-sys) but it doesn't fit cranelift-isle case since it just has a plain subdir in the repo

Additional note is that similar behavior happens if you have git-based dependencies and don't cache $CARGO_HOME/git/checkout folder but it triggers on any rerun-if-changed file (so libsqlite3-sys and zstd-sys get affected too) and also forces cargo to re-download git submodules of the package if present (I should probably open a separate issue about submodules)

Until this is resolved it would probably be wise to warn people about this in the cargo book

@weihanglo
Copy link
Member

Your investigation is fantastic! Thank you for the thorough review :)

b) rerun-if-changed path points to a directory instead of specific files

I believe that's because when Cargo walks the filesystem to collect mtimes, it also takes mtimes of directories into account. However, when cargo packages and uploads .crate file, it doesn't handle a directory as a tar entry. As a result, directories in an unpacked source in ~/.cargo/registry/src of a crate always have a mtime with the same timestamp of unpacking.

pub fn mtime_recursive(path: &Path) -> Result<FileTime> {

And yep, to solve this, skipping rerun-if-changed is a more correct way IMO. Alternatively, I guess one can also make Cargo ignore mtime from directories. That may be more feasible but not sure if anyone depends on this behavior (which is not recommended as their mtimes are unstable).

@arriven
Copy link
Contributor

arriven commented Jan 1, 2023

@weihanglo thanks for the feedback, I already thought that b) seems more plausible but having detailed breakthrough of how cargo works definitely helps

I have a feeling that another way to address this would be to implement #6529 (except that we would need to hash all the contents, not just rustc output Edit: current approach proposed there doesn't cover rerun-if-changed for now, that would require allowing package maintainers to specify which digest to use for the check which would lead to a slow rollout due to backward compatibility)

However, when cargo packages and uploads .crate file, it doesn't handle a directory as a tar entry

I also wonder what would be the impact of changing this part instead? it seems like the least invasive change since no tool should probably depend on the content of .crate file (I may be missing something, though)

Anyhow, I'd be interested in trying to fix this myself once we decide on the approach

@arriven
Copy link
Contributor

arriven commented Jan 14, 2023

A small update on this: I've started hacking around these approaches, changing mtime_recursive to ignore directories leads to an issue where cargo no longer detects if a file gets removed from the directory (the corresponding test also fails on cargo not detecting the creation of empty directory but I doubt it would've brought any issues on its own). Unless ignoring file removals and empty directory creations is acceptable I think it's indeed better to make rerun-if-changed ignore published crate sources since changing the packaging to also include directories to the tarball would only affect newer versions of those packages and would lead to a slower adoption of the fix

@arriven
Copy link
Contributor

arriven commented Jan 14, 2023

At the same time, I can definitely see a scenario where someone wants to debug a registry crate by modifying the source in $CARGO_HOME (i.e. by add some debug prints to the source code) and then gets extremely confused when the crate doesn't get rebuilt. Not sure if this use case should be considered by it's worth mentioning

@weihanglo
Copy link
Member

Have yet got time to figure out where the best place is for this change, but I created test for hacking on that.

Click to expand the test
#[cargo_test]
fn rerun_if_changed_directory() {
    // build script of a dependency contains a `rerun-if-changed` pointing to a directory
    //
    // TODO: this currently asserts the buggy behavior described in rust-lang/cargo#11083
    Package::new("mylib-sys", "1.0.0")
        .file("mylib/balrog.c", "")
        .file("src/lib.rs", "")
        .file(
            "build.rs",
            r#"
                fn main() {
                    // Changing to mylib/balrog.c will not trigger a rebuild
                    println!("cargo:rerun-if-changed=mylib");
                }
            "#,
        )
        .publish();

    let p = project()
        .file(
            "Cargo.toml",
            r#"
                [package]
                name = "foo"
                version = "0.0.1"

                [dependencies]
                mylib-sys = "1.0.0"
            "#,
        )
        .file("src/main.rs", "fn main() {}")
        .build();

    p.cargo("check").run();

    // Delete regitry src to make directories being recreated with the latest timestamp.
    cargo_home().join("registry/src").rm_rf();

    p.cargo("check --verbose")
        .with_stderr(
            "\
[FRESH] mylib-sys v1.0.0
[FRESH] foo v0.0.1 ([CWD])
[FINISHED] dev [unoptimized + debuginfo] target(s) in [..]
",
        )
        .run();
}

At the same time, I can definitely see a scenario where someone wants to debug a registry crate by modifying the source in $CARGO_HOME (i.e. by add some debug prints to the source code) and then gets extremely confused when the crate doesn't get rebuilt. Not sure if this use case should be considered by it's worth mentioning

Reiterate what was discussed on Zulip. #9455 was an attempt to making registry/src readonly. Thus I don't think it is an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rebuild-detection Area: rebuild detection and fingerprinting C-bug Category: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants