Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable redirect_dir for avoiding incorrect diff #2491

Merged
merged 1 commit into from
Jan 5, 2022

Conversation

ktock
Copy link
Collaborator

@ktock ktock commented Nov 29, 2021

#2490

Overlayfs differ cannot calculate diff correctly if redirect_dir is enabled.
This commit fixes this issue by falling back to walking differ if redirect_dir is enabled as done in moby (moby/moby#34342). This commit fixes this issue by disabling redirect_dir if it's supported by kernel.

This'll cause performance drawback on such kernels so we should fix overlayfs differ to handle redirect_dir correctly on these kernels.

cc @sipsma

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this but on my system(Docker Desktop) I have /sys/module/overlay/parameters/redirect_dir=N, when I mv a directory I get no error and regular whiteout without xattrs for the source dir. What is the case where error should appear.

If no error appears then I'd rather use a mount option that always disables this and overwrites global default until differ can handle it.

return diffSupported
}

func isDiffSupported() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use /sys/module/overlay/parameters/redirect_dir.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And check user namespace: torvalds/linux@2d2f2d7

Disable redirect_dir and metacopy options, because these would allow
privilege escalation through direct manipulation of the
"user.overlay.redirect" or "user.overlay.metacopy" xattrs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we should always append redirect_dir=off to the overlayfs mount option (if kernel supports redirect_dir)?

Copy link
Member

@AkihiroSuda AkihiroSuda Nov 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we should always append redirect_dir=off to the overlayfs mount option (if kernel supports redirect_dir)?

s/always/when in UserNS/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior with userxattr is really weird. The kernel forces redirect_dir off when userxattr is provided, but it also throws an error if you specify any options for redirect_dir, even setting it to off: https://github.com/torvalds/linux/blob/cb690f5238d71f543f4ce874aa59237cf53a877c/fs/overlayfs/super.c#L726-L745

So I think we should skip appending any redirect_dir option when in a user ns to get the behavior we want.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using nofollow (redirects are not created and not followed) didn't return errors.

Based on the discussion with @tonistiigi (#2491 (review)), we should always append redirect_dir=nofollow when /sys/module/overlay/parameters/redirect_dir exists?

If no error appears then I'd rather use a mount option that always disables this and overwrites global default until differ can handle it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using nofollow (redirects are not created and not followed) didn't return errors.

Oh cool, good catch. The only downside to nofollow is that pre-existing snapshots created while the kernel was defaulting redirect_dir=on will no longer appear correctly (I don't think so anyways, haven't tested). This is only an issue if a user is already running on a kernel with that default and then upgrades to a commit from the master branch, so it's probably not a huge deal until the next release. But if it's easy to skip appending any redirect_dir= option at all when userxattr is present that might be slightly safer imo.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if it's easy to skip appending any redirect_dir= option at all when userxattr is present that might be slightly safer imo.

@AkihiroSuda SGTY?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review. Fixed the PR based on the comments:

  • Disable redirect_dir using the mount option redirect_dir=off when /sys/module/overlay/parameters/redirect_dir exists.
  • if buildkit is in an user namespace, we disable redirect_dir using redirect_dir=nofollow (using off results in an error).

@sipsma
Copy link
Collaborator

sipsma commented Nov 30, 2021

I'm not familiar with this but on my system(Docker Desktop) I have /sys/module/overlay/parameters/redirect_dir=N, when I mv a directory I get no error and regular whiteout without xattrs for the source dir. What is the case where error should appear.

If no error appears then I'd rather use a mount option that always disables this and overwrites global default until differ can handle it.

'mv' is most likely getting EXDEV and then falling back to doing a recursive copy, same as it would if you tried to mv between separate filesystems. This was causing me a ton of confusion for a while too.

@tonistiigi
Copy link
Member

tonistiigi commented Nov 30, 2021

'mv' is most likely getting EXDEV and then falling back to doing a recursive copy, same as it would if you tried to mv between separate filesystems. This was causing me a ton of confusion for a while too.

Ah, indeed. But given that most systems have redirect_dir=N and I haven't seen many reports of people hitting the EXDEV error would it still make sense to just disable it always instead. Especially if we plan to fix it later in differ. I'd rather give most people with the default settings access to faster differ than try to optimize for rename(2) speed for a small number of people with custom settings.

edit: also, the optimized rename doesn't matter for the layer blobs anyway. When we switch to walking differ we still get the same file duplication as we would get with the recursive directory mv.

cache/blobs.go Outdated
@@ -142,6 +142,11 @@ func computeBlobChain(ctx context.Context, sr *immutableRef, createIfNeeded bool
// TODO: add support for fuse-overlayfs
enableOverlay = false
}
if enableOverlay {
if isOverlayDiffSupported() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be !isOverlayDiffSupported ?

Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sipsma
Copy link
Collaborator

sipsma commented Nov 30, 2021

edit: also, the optimized rename doesn't matter for the layer blobs anyway. When we switch to walking differ we still get the same file duplication as we would get with the recursive directory mv.

This is true for mv /foo /foo2 cases but it does make a difference for the layer blob content in cases like mv /foo /foo2 && mv /foo2 /foo. In those cases, you will surprisingly get a diff when redirect_dir is off but not if it's on or if you are using the native snapshotter and busybox's mv. This is because busybox will attempt to preserve file attrs when falling back to a copy in mv after getting EXDEV, but for some reason it will always truncate the subsecond timestamp fields to 0. If the original file timestamp wasn't truncated, the differ sees that they are not equal and considers them changed, whereas a successful rename syscall will preserve the full timestamp and thus not indicate a diff.

However this is not exactly a common case. Also, we can fix it by updating the differ to compare file content when it sees that either timestamp is truncated (right now it only does that when both are) or when it sees that the upper timestamp is older than the lower one. That would only fix this particular case though, not necessarily all possible ones, but that's probably good enough.

@tonistiigi
Copy link
Member

but for some reason it will always truncate the subsecond timestamp fields to 0.

that looks like busybox bug, right? coreutils is correct?

@sipsma
Copy link
Collaborator

sipsma commented Nov 30, 2021

that looks like busybox bug, right? coreutils is correct?

Yeah based on quick test it looks like coreutils does the right thing (at least in version 8.32).

@ktock ktock marked this pull request as ready for review December 6, 2021 07:00
}

type fromContainerd struct {
name string
snapshots.Snapshotter
idmap *idtools.IdentityMapping
idmap *idtools.IdentityMapping
redirectDirOption string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite the need for this property. Why not just put a public method for this is overlay package, with sync.Once ensuring proc read happens only once. And the places (2 I think) that need to overwrite the mounts call into these functions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review. Fixed to avoid adding the property.

@ktock ktock changed the title Fallback to walking differ if redirect_dir is enabled Disable redirect_dir for avoiding incorrect diff Dec 7, 2021
return redirectDirOption
}

func mountsRedirectDirOption(mounts []mount.Mount, redirectDirOption string) (ret []mount.Mount) {
Copy link
Member

@tonistiigi tonistiigi Dec 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setRedirectDir might be better name.

Or maybe even setMountOption(mounts, k, v)

Copy link
Collaborator

@sipsma sipsma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tonistiigi
Copy link
Member

ping @AkihiroSuda

@tonistiigi
Copy link
Member

@AkihiroSuda ping

@ktock Needs rebase

return
}
if userns.RunningInUserNS() {
redirectDirOption = "nofollow" // follow is not allowed in user ns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kernel before 4.15 seemed to lack this: torvalds/linux@438c84c

While the upstream kernel didn't support overlayfs with userns at that time, Ubuntu XX.XX had supported it, so maybe we should just set redirectDirOption=""

@ktock ktock marked this pull request as draft December 22, 2021 14:01
Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
@ktock
Copy link
Collaborator Author

ktock commented Dec 24, 2021

Fixed the patch according to the review comments.

PTAL 🙏

@ktock ktock marked this pull request as ready for review December 24, 2021 07:52
@tonistiigi
Copy link
Member

@ktock We have a failure that looks related in buildx repo via master buildkit image.

#22 [linux/amd64 buildx-build 1/1] RUN --mount=type=bind,target=.   --mount=type=cache,target=/root/.cache   --mount=type=cache,target=/go/pkg/mod   --mount=type=bind,source=/tmp/.ldflags,target=/tmp/.ldflags,from=buildx-version   set -x; xx-go build -ldflags "$(cat /tmp/.ldflags) -w -s" -o /usr/bin/buildx ./cmd/buildx &&   xx-verify --static /usr/bin/buildx
#22 ERROR: process "/bin/sh -c set -x; xx-go build -ldflags \"$(cat /tmp/.ldflags) ${LDFLAGS}\" -o /usr/bin/buildx ./cmd/buildx &&   xx-verify --static /usr/bin/buildx" did not complete successfully: failed to mount /tmp/buildkit-mount2802602399: [{Type:overlay Source:overlay Options:[index=off workdir=/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/24/work upperdir=/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/24/fs lowerdir=/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/15/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/14/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/13/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/12/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/11/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/10/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/9/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/6/fs redirect_dir=off]}]: no such file or directory

#18 [linux/amd64 buildx-build 1/1] RUN --mount=type=bind,target=.   --mount=type=cache,target=/root/.cache   --mount=type=cache,target=/go/pkg/mod   --mount=type=bind,source=/tmp/.ldflags,target=/tmp/.ldflags,from=buildx-version   set -x; xx-go build -ldflags "$(cat /tmp/.ldflags) -w -s" -o /usr/bin/buildx ./cmd/buildx &&   xx-verify --static /usr/bin/buildx
#18 ...

#34 [linux/amd64 buildx-build 1/1] RUN --mount=type=bind,target=.   --mount=type=cache,target=/root/.cache   --mount=type=cache,target=/go/pkg/mod   --mount=type=bind,source=/tmp/.ldflags,target=/tmp/.ldflags,from=buildx-version   set -x; xx-go build -ldflags "$(cat /tmp/.ldflags) -w -s" -o /usr/bin/buildx ./cmd/buildx &&   xx-verify --static /usr/bin/buildx
#34 0.279 container process is already dead
#34 CANCELED

#58 [linux/amd64 buildx-build 1/1] RUN --mount=type=bind,target=.   --mount=type=cache,target=/root/.cache   --mount=type=cache,target=/go/pkg/mod   --mount=type=bind,source=/tmp/.ldflags,target=/tmp/.ldflags,from=buildx-version   set -x; xx-go build -ldflags "$(cat /tmp/.ldflags) -w -s" -o /usr/bin/buildx ./cmd/buildx &&   xx-verify --static /usr/bin/buildx
#58 0.355 container process is already dead
#58 CANCELED

#50 [linux/amd64 buildx-build 1/1] RUN --mount=type=bind,target=.   --mount=type=cache,target=/root/.cache   --mount=type=cache,target=/go/pkg/mod   --mount=type=bind,source=/tmp/.ldflags,target=/tmp/.ldflags,from=buildx-version   set -x; xx-go build -ldflags "$(cat /tmp/.ldflags) -w -s" -o /usr/bin/buildx ./cmd/buildx &&   xx-verify --static /usr/bin/buildx
#50 0.411 container process is already dead
#50 CANCELED

#38 [linux/amd64 buildx-build 1/1] RUN --mount=type=bind,target=.   --mount=type=cache,target=/root/.cache   --mount=type=cache,target=/go/pkg/mod   --mount=type=bind,source=/tmp/.ldflags,target=/tmp/.ldflags,from=buildx-version   set -x; xx-go build -ldflags "$(cat /tmp/.ldflags) -w -s" -o /usr/bin/buildx ./cmd/buildx &&   xx-verify --static /usr/bin/buildx
#38 0.415 container process is already dead
#38 CANCELED

#46 [linux/amd64 buildx-build 1/1] RUN --mount=type=bind,target=.   --mount=type=cache,target=/root/.cache   --mount=type=cache,target=/go/pkg/mod   --mount=type=bind,source=/tmp/.ldflags,target=/tmp/.ldflags,from=buildx-version   set -x; xx-go build -ldflags "$(cat /tmp/.ldflags) -w -s" -o /usr/bin/buildx ./cmd/buildx &&   xx-verify --static /usr/bin/buildx
#46 0.414 container process is already dead
#46 CANCELED

#42 [linux/amd64 buildx-build 1/1] RUN --mount=type=bind,target=.   --mount=type=cache,target=/root/.cache   --mount=type=cache,target=/go/pkg/mod   --mount=type=bind,source=/tmp/.ldflags,target=/tmp/.ldflags,from=buildx-version   set -x; xx-go build -ldflags "$(cat /tmp/.ldflags) -w -s" -o /usr/bin/buildx ./cmd/buildx &&   xx-verify --static /usr/bin/buildx
#42 0.415 container process is already dead
#42 CANCELED

#18 [linux/amd64 buildx-build 1/1] RUN --mount=type=bind,target=.   --mount=type=cache,target=/root/.cache   --mount=type=cache,target=/go/pkg/mod   --mount=type=bind,source=/tmp/.ldflags,target=/tmp/.ldflags,from=buildx-version   set -x; xx-go build -ldflags "$(cat /tmp/.ldflags) -w -s" -o /usr/bin/buildx ./cmd/buildx &&   xx-verify --static /usr/bin/buildx
#18 0.413 container process is already dead
#18 CANCELED

#54 [linux/amd64 buildx-build 1/1] RUN --mount=type=bind,target=.   --mount=type=cache,target=/root/.cache   --mount=type=cache,target=/go/pkg/mod   --mount=type=bind,source=/tmp/.ldflags,target=/tmp/.ldflags,from=buildx-version   set -x; xx-go build -ldflags "$(cat /tmp/.ldflags) -w -s" -o /usr/bin/buildx ./cmd/buildx &&   xx-verify --static /usr/bin/buildx
#54 0.416 container process is already dead
#54 CANCELED

#26 [linux/amd64 buildx-build 1/1] RUN --mount=type=bind,target=.   --mount=type=cache,target=/root/.cache   --mount=type=cache,target=/go/pkg/mod   --mount=type=bind,source=/tmp/.ldflags,target=/tmp/.ldflags,from=buildx-version   set -x; xx-go build -ldflags "$(cat /tmp/.ldflags) -w -s" -o /usr/bin/buildx ./cmd/buildx &&   xx-verify --static /usr/bin/buildx
#26 0.417 container process is already dead
#26 CANCELED

#30 [linux/amd64 buildx-build 1/1] RUN --mount=type=bind,target=.   --mount=type=cache,target=/root/.cache   --mount=type=cache,target=/go/pkg/mod   --mount=type=bind,source=/tmp/.ldflags,target=/tmp/.ldflags,from=buildx-version   set -x; xx-go build -ldflags "$(cat /tmp/.ldflags) -w -s" -o /usr/bin/buildx ./cmd/buildx &&   xx-verify --static /usr/bin/buildx
#30 0.438 container process is already dead
#30 CANCELED
------
 > [linux/amd64 buildx-build 1/1] RUN --mount=type=bind,target=.   --mount=type=cache,target=/root/.cache   --mount=type=cache,target=/go/pkg/mod   --mount=type=bind,source=/tmp/.ldflags,target=/tmp/.ldflags,from=buildx-version   set -x; xx-go build -ldflags "$(cat /tmp/.ldflags) -w -s" -o /usr/bin/buildx ./cmd/buildx &&   xx-verify --static /usr/bin/buildx:
------
Dockerfile:29
--------------------
  28 |     ARG TARGETPLATFORM
  29 | >>> RUN --mount=type=bind,target=. \
  30 | >>>   --mount=type=cache,target=/root/.cache \
  31 | >>>   --mount=type=cache,target=/go/pkg/mod \
  32 | >>>   --mount=type=bind,source=/tmp/.ldflags,target=/tmp/.ldflags,from=buildx-version \
  33 | >>>   set -x; xx-go build -ldflags "$(cat /tmp/.ldflags) ${LDFLAGS}" -o /usr/bin/buildx ./cmd/buildx && \
  34 | >>>   xx-verify --static /usr/bin/buildx
  35 |     
--------------------
error: failed to solve: process "/bin/sh -c set -x; xx-go build -ldflags \"$(cat /tmp/.ldflags) ${LDFLAGS}\" -o /usr/bin/buildx ./cmd/buildx &&   xx-verify --static /usr/bin/buildx" did not complete successfully: failed to mount /tmp/buildkit-mount2802602399: [{Type:overlay Source:overlay Options:[index=off workdir=/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/24/work upperdir=/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/24/fs lowerdir=/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/15/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/14/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/13/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/12/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/11/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/10/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/9/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/6/fs redirect_dir=off]}]: no such file or directory
make: *** [Makefile:28: release] Error 1

https://github.com/docker/buildx/runs/4777886890?check_suite_focus=true#step:7:135

PTAL

@ktock
Copy link
Collaborator Author

ktock commented Jan 17, 2022

#2562 will fix #2491 (comment)

@crazy-max crazy-max added this to the v0.10.0 milestone Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants