overlay differ: Do file comparison in some cases. #2480

sipsma · 2021-11-23T22:37:56Z

This change results in the overlay differ comparing files to determine
if they are actually part of the diff. This is needed to resolve
differences between the blobs created by the overlay differ and the
double-walking differ.

Before this change, the overlay differ always just assumed that if a
file was in the upperdir it must be part of the diff and included it as
an add or a modify change. However, there are situations in which files
can appear in the upperdir without having been modified or even opened.
For example, if "foo" is a file or dir present in the lowerdirs of an
overlay mount and you run "mv foo footmp; mv footmp foo", then the
upperdir will contain foo (in addition to any files found under foo if
it's a dir). In this situation, the double-walking differ would not
include foo as part of the diff, but the overlay differ would.

This meant that the overlay differ would potentially include extra files
in each blob for such diffs relative to the double-walking differ. As of
now, while this does increase image size, it doesn't result in any
inconsistencies in terms of the contents of images because it just
results in files/dirs getting duplicated on top of their equivalents.

However, for the upcoming DiffOp support, this inconsistency could
actually result in the same operation producing mounts with different
contents depending on which differ is used. This change is therefore
necessary in order to enforce DiffOp consistency (on top of the possible
improvements to exported image size).

The main concern here is that this could undo the performance benefits
that the overlay differ was intended to fix. However, in practice the
situations where this has worse performance are quite obscure and the
benefits should still be present.

First, consider the case where foo is a directory and the user does the
equivalent of "mv foo footmp; mv footmp foo". Even before this change,
the overlay differ would see that foo is marked as opaque and thus fall
back to using the double-walking differ. So there's no performance
regression in this case as the double-walking differ does the same
file comparisons as were added in this commit.

For the case where the user shuffles a file back and forth, there will
potentially be a slow file content based comparison if the underlying
file has a truncated nanosecond timestamp (i.e. it was unpacked from a
tar file). However, the situations in which you shuffle an individual
file without changing it (or open it for writing but then write nothing)
and it's large enough in size for content comparisons to be slow are
obscure. Additionally, while the content comparison may be slow, there
will be time saved during export because the file won't be included
unnecessarily in the exported blob, so it's a tradeoff rather than a
pure loss.

In situations where the user actually did change a file and it shows up
in the upperdir, it should be extremely rare that the content comparison
code path is followed. It would require that the user changed no other
metadata of the file, including size, and both mod timestamps were the
same (which could only really happen if their underlying filesystem
lacked support for nanosecond precision and they modified the file
within 1 second of its modification in the lowerdir or they manually
changed the modtime with chtimes).

Signed-off-by: Erik Sipsma erik@sipsma.dev

@tonistiigi @ktock This is mostly just preparation for DiffOp, but may be desirable anyways. I decided to split it out from the DiffOp PR in case there's any discussion needed on the possible performance impact, but like I explain in the commit message I don't think that should be a major concern. Let me know your thoughts.

tonistiigi · 2021-11-23T23:43:20Z

util/overlay/overlay_linux.go

+		if err2 != nil && err2 != io.EOF {
+			return false, err2
+		}
+		if n1 != n2 || !bytes.Equal(b1[:n1], b2[:n2]) {


how is the n1 != n2 comparison correct in here?

Well n1 != n2 if one hit EOF before the other (in which case they are different sizes) or if a read got cut short. AFAIK the go runtime is supposed to transparently handle reads that were interrupted by signals and would otherwise return a non-nil error, but I don't see that 100% guaranteed in writing anywhere. I guess in the very obscure situation where something went wrong and a read got cut short before filling the buffer or hitting EOF but err is nonetheless nil, it's probably fine to just say the files aren't equal and leave here.

That being said, it's redundant in that bytes.Equal inherently checks this anyways, so we can remove it if you prefer.

Also, while thinking about this I realized that the only thing stopping our code from trying to read non-regular files is the check in sameDirent for f1.Size() == 0, which should always return true for fifos, sockets, chardevs, etc. That is a very fragile check though and more just implied by the linux manpages rather than fully guaranteed, so I'm also updating this function to verify f1 and f2 are regular files before trying to read them. Trying to compare contents of anything else could lead to pretty nasty bugs, so it's worth guarding against it from future changes to this code.

There's no guarantee that Read returns a specific number of bytes. It just returns whatever single syscall returned.

Yes, and the read syscall is made with count parameter set to the len of the provided buffer, which is the same for both files, so the situations where n1 != n2 are either:

The files aren't the same size (in which case they aren't equal)

They aren't regular files (which we guard against)

The syscall was interrupted (which should be handled by the go runtime or otherwise result in a non-nil err being returned)

The underlying filesystem is behaving extremely strangely and returning less than count bytes for a regular file even though EOF hasn't been reached and nothing interrupted it.

We can probably turn 4 into an error condition rather than just returning false, nil, but I'd prefer to do that along with updating continuity to have the same behavior so the differs are consistent.

behaving extremely strangely

I don't think this case is extremely strange. It's part of the contract. Network-based filesystems definitely do it. I have not looked at the internals of every possible linux filesystem and even when some of them guarantee it atm they can change it any time. There is no guarantee for it even in the vfs-to-syscall layer iiuc.

Can't we use io.ReadFull here?

Yeah, I always thought the rule of thumb was that regular files only return short reads on EOF, signal interrupt or conditions where errno is set, but you're right that it's not technically in violation of the spec to do otherwise and network filesystems or FUSE filesystems are much more likely to do things outside the norm, so we should handle this under the assumption that eventually someone will do something very weird (like you've said previously).

Can't we use io.ReadFull here?

Yeah good call, that handles the complication for us. Updated with that. I will send out a PR to continuity to have the double-walking differ do the same thing since that's where this code was originally from.

By the way, I left in n1 != n2 in the update because it should be correct now that ReadFull is used. It's technically also asserted by byte.Equal but I'm not sure how efficiently so I figured it didn't hurt to leave it here.

AkihiroSuda · 2021-11-24T02:14:37Z

Does this have performance drawback?

sipsma · 2021-11-24T02:27:20Z

Does this have performance drawback?

I go into detail about this in the commit message, but the TL;DR is that the only possible performance drawback is in very obscure cases, and even in those cases it's more of a tradeoff rather than a performance hit. The use cases described in the issues linked to in the PR that added the overlay differ should not be impacted AFAICT (#1704 and #1192).

An example of an obscure case where this changes the performance is say you have a large file foo which was unpacked from a tarfile that truncated nanosecond timestamps. Then you run mv foo footmp; mv footmp foo, which causes foo to show up in the upperdir even though it wasn't modified. Before this commit, foo would be considered part of the diff and thus included in the layer's blob even though it didn't change. Now, foo will be checked to see if it's actually different.

This causes the overlay differ to create the same blobs as the double-walking differ, which is my main priority here. But I think it's also good in that it prevents that layers from including unneeded files, which saves time+bandwidth pushing and pulling the layers. And in the vast majority of the cases, the overlay differ retains its performance benefits over the double-walking differ anyways.

ktock · 2021-11-24T13:38:28Z

It sounds reasonable to me that overlayfs-differ should avoid the additional redundancy that isn't created by walking-differ.
However, in terms of the following:

This is needed to resolve differences between the blobs created by the overlay differ and the double-walking differ.

I am a bit worried about whether we can guarantee that every differ always creates the exact same blob from the same set of snapshots. e.g. What happens if the tar library is fixed to emit bits different from the older version?

sipsma · 2021-11-24T18:07:24Z

I am a bit worried about whether we can guarantee that every differ always creates the exact same blob from the same set of snapshots. e.g. What happens if the tar library is fixed to emit bits different from the older version?

Ah, I should have said "differences between the diffs in the blobs" rather than "differences between the blobs". It's totally fine for the bits of the blobs to be different, we just want to ensure the contents resulting from unpacking the blobs are the same between the two differs. Different versions of tar, different compressions, etc. are fine.

tonistiigi · 2021-11-24T18:48:25Z

util/overlay/overlay_linux.go

+		return false, errors.Errorf("%s is not a regular file", p2)
+	}
+
+	b1 := make([]byte, compareChunkSize)


Would be good if this used sync.Pool as well.

This change results in the overlay differ comparing files to determine if they are actually part of the diff. This is needed to resolve differences between the blobs created by the overlay differ and the double-walking differ. Before this change, the overlay differ always just assumed that if a file was in the upperdir it must be part of the diff and included it as an add or a modify change. However, there are situations in which files can appear in the upperdir without having been modified or even opened. For example, if "foo" is a file or dir present in the lowerdirs of an overlay mount and you run "mv foo footmp; mv footmp foo", then the upperdir will contain foo (in addition to any files found under foo if it's a dir). In this situation, the double-walking differ would not include foo as part of the diff, but the overlay differ would. This meant that the overlay differ would potentially include extra files in each blob for such diffs relative to the double-walking differ. As of now, while this does increase image size, it doesn't result in any inconsistencies in terms of the contents of images because it just results in files/dirs getting duplicated on top of their equivalents. However, for the upcoming DiffOp support, this inconsistency could actually result in the same operation producing mounts with different contents depending on which differ is used. This change is therefore necessary in order to enforce DiffOp consistency (on top of the possible improvements to exported image size). The main concern here is that this could undo the performance benefits that the overlay differ was intended to fix. However, in practice the situations where this has worse performance are quite obscure and the benefits should still be present. First, consider the case where foo is a directory and the user does the equivalent of "mv foo footmp; mv footmp foo". Even before this change, the overlay differ would see that foo is marked as opaque and thus fall back to using the double-walking differ. So there's no performance regression in this case as the double-walking differ does the same file comparisons as were added in this commit. For the case where the user shuffles a file back and forth, there will potentially be a slow file content based comparison if the underlying file has a truncated nanosecond timestamp (i.e. it was unpacked from a tar file). However, the situations in which you shuffle an individual file without changing it (or open it for writing but then write nothing) that is large enough in size for content comparisons to be slow are obscure. Additionally, while the content comparison may be slow, there will be time saved during export because the file won't be included unnecessarily in the exported blob, so it's a tradeoff rather than a pure loss. In situations where the user actually did change a file and it shows up in the upperdir, it should be extremely rare that the content comparison code path is followed. It would require that the user changed no other metadata of the file, including size, and both mod timestamps were the same (which could only really happen if their underlying filesystem lacked support for nanosecond precision and they modified the file within 1 second of its modification in the lowerdir or they manually changed the modtime with chtimes). Signed-off-by: Erik Sipsma <erik@sipsma.dev>

sipsma requested review from ktock and tonistiigi November 23, 2021 22:39

tonistiigi reviewed Nov 23, 2021

View reviewed changes

sipsma force-pushed the overlay-diff-file-fix branch from d8ba7b0 to 3484ad5 Compare November 24, 2021 02:14

sipsma force-pushed the overlay-diff-file-fix branch from 3484ad5 to a4bbbf2 Compare November 24, 2021 18:00

tonistiigi reviewed Nov 24, 2021

View reviewed changes

sipsma force-pushed the overlay-diff-file-fix branch from a4bbbf2 to 1829291 Compare November 24, 2021 19:40

tonistiigi approved these changes Nov 25, 2021

View reviewed changes

ktock approved these changes Nov 25, 2021

View reviewed changes

sipsma merged commit 159bb1e into moby:master Nov 25, 2021

crazy-max added this to the v0.10.0 milestone Feb 4, 2022

vladaionescu mentioned this pull request Mar 1, 2022

SAVE ARTIFACT not updating local file with AS LOCAL after RUN --no-cache earthly/earthly#1695

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

overlay differ: Do file comparison in some cases. #2480

overlay differ: Do file comparison in some cases. #2480

sipsma commented Nov 23, 2021

tonistiigi Nov 23, 2021

sipsma Nov 24, 2021

tonistiigi Nov 24, 2021

sipsma Nov 24, 2021

tonistiigi Nov 24, 2021

ktock Nov 24, 2021

sipsma Nov 24, 2021 •

edited

Loading

sipsma Nov 24, 2021

AkihiroSuda commented Nov 24, 2021

sipsma commented Nov 24, 2021 •

edited

Loading

ktock commented Nov 24, 2021

sipsma commented Nov 24, 2021

tonistiigi Nov 24, 2021

sipsma Nov 24, 2021

overlay differ: Do file comparison in some cases. #2480

overlay differ: Do file comparison in some cases. #2480

Conversation

sipsma commented Nov 23, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sipsma Nov 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AkihiroSuda commented Nov 24, 2021

sipsma commented Nov 24, 2021 • edited Loading

ktock commented Nov 24, 2021

sipsma commented Nov 24, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sipsma Nov 24, 2021 •

edited

Loading

sipsma commented Nov 24, 2021 •

edited

Loading