Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overlay differ: Do file comparison in some cases. #2480

Merged
merged 1 commit into from
Nov 25, 2021

Conversation

sipsma
Copy link
Collaborator

@sipsma sipsma commented Nov 23, 2021

This change results in the overlay differ comparing files to determine
if they are actually part of the diff. This is needed to resolve
differences between the blobs created by the overlay differ and the
double-walking differ.

Before this change, the overlay differ always just assumed that if a
file was in the upperdir it must be part of the diff and included it as
an add or a modify change. However, there are situations in which files
can appear in the upperdir without having been modified or even opened.
For example, if "foo" is a file or dir present in the lowerdirs of an
overlay mount and you run "mv foo footmp; mv footmp foo", then the
upperdir will contain foo (in addition to any files found under foo if
it's a dir). In this situation, the double-walking differ would not
include foo as part of the diff, but the overlay differ would.

This meant that the overlay differ would potentially include extra files
in each blob for such diffs relative to the double-walking differ. As of
now, while this does increase image size, it doesn't result in any
inconsistencies in terms of the contents of images because it just
results in files/dirs getting duplicated on top of their equivalents.

However, for the upcoming DiffOp support, this inconsistency could
actually result in the same operation producing mounts with different
contents depending on which differ is used. This change is therefore
necessary in order to enforce DiffOp consistency (on top of the possible
improvements to exported image size).

The main concern here is that this could undo the performance benefits
that the overlay differ was intended to fix. However, in practice the
situations where this has worse performance are quite obscure and the
benefits should still be present.

First, consider the case where foo is a directory and the user does the
equivalent of "mv foo footmp; mv footmp foo". Even before this change,
the overlay differ would see that foo is marked as opaque and thus fall
back to using the double-walking differ. So there's no performance
regression in this case as the double-walking differ does the same
file comparisons as were added in this commit.

For the case where the user shuffles a file back and forth, there will
potentially be a slow file content based comparison if the underlying
file has a truncated nanosecond timestamp (i.e. it was unpacked from a
tar file). However, the situations in which you shuffle an individual
file without changing it (or open it for writing but then write nothing)
and it's large enough in size for content comparisons to be slow are
obscure. Additionally, while the content comparison may be slow, there
will be time saved during export because the file won't be included
unnecessarily in the exported blob, so it's a tradeoff rather than a
pure loss.

In situations where the user actually did change a file and it shows up
in the upperdir, it should be extremely rare that the content comparison
code path is followed. It would require that the user changed no other
metadata of the file, including size, and both mod timestamps were the
same (which could only really happen if their underlying filesystem
lacked support for nanosecond precision and they modified the file
within 1 second of its modification in the lowerdir or they manually
changed the modtime with chtimes).

Signed-off-by: Erik Sipsma erik@sipsma.dev

@tonistiigi @ktock This is mostly just preparation for DiffOp, but may be desirable anyways. I decided to split it out from the DiffOp PR in case there's any discussion needed on the possible performance impact, but like I explain in the commit message I don't think that should be a major concern. Let me know your thoughts.

if err2 != nil && err2 != io.EOF {
return false, err2
}
if n1 != n2 || !bytes.Equal(b1[:n1], b2[:n2]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is the n1 != n2 comparison correct in here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well n1 != n2 if one hit EOF before the other (in which case they are different sizes) or if a read got cut short. AFAIK the go runtime is supposed to transparently handle reads that were interrupted by signals and would otherwise return a non-nil error, but I don't see that 100% guaranteed in writing anywhere. I guess in the very obscure situation where something went wrong and a read got cut short before filling the buffer or hitting EOF but err is nonetheless nil, it's probably fine to just say the files aren't equal and leave here.

That being said, it's redundant in that bytes.Equal inherently checks this anyways, so we can remove it if you prefer.

Also, while thinking about this I realized that the only thing stopping our code from trying to read non-regular files is the check in sameDirent for f1.Size() == 0, which should always return true for fifos, sockets, chardevs, etc. That is a very fragile check though and more just implied by the linux manpages rather than fully guaranteed, so I'm also updating this function to verify f1 and f2 are regular files before trying to read them. Trying to compare contents of anything else could lead to pretty nasty bugs, so it's worth guarding against it from future changes to this code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no guarantee that Read returns a specific number of bytes. It just returns whatever single syscall returned.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and the read syscall is made with count parameter set to the len of the provided buffer, which is the same for both files, so the situations where n1 != n2 are either:

  1. The files aren't the same size (in which case they aren't equal)
  2. They aren't regular files (which we guard against)
  3. The syscall was interrupted (which should be handled by the go runtime or otherwise result in a non-nil err being returned)
  4. The underlying filesystem is behaving extremely strangely and returning less than count bytes for a regular file even though EOF hasn't been reached and nothing interrupted it.

We can probably turn 4 into an error condition rather than just returning false, nil, but I'd prefer to do that along with updating continuity to have the same behavior so the differs are consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

behaving extremely strangely

I don't think this case is extremely strange. It's part of the contract. Network-based filesystems definitely do it. I have not looked at the internals of every possible linux filesystem and even when some of them guarantee it atm they can change it any time. There is no guarantee for it even in the vfs-to-syscall layer iiuc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use io.ReadFull here?

Copy link
Collaborator Author

@sipsma sipsma Nov 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I always thought the rule of thumb was that regular files only return short reads on EOF, signal interrupt or conditions where errno is set, but you're right that it's not technically in violation of the spec to do otherwise and network filesystems or FUSE filesystems are much more likely to do things outside the norm, so we should handle this under the assumption that eventually someone will do something very weird (like you've said previously).

Can't we use io.ReadFull here?

Yeah good call, that handles the complication for us. Updated with that. I will send out a PR to continuity to have the double-walking differ do the same thing since that's where this code was originally from.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I left in n1 != n2 in the update because it should be correct now that ReadFull is used. It's technically also asserted by byte.Equal but I'm not sure how efficiently so I figured it didn't hurt to leave it here.

@AkihiroSuda
Copy link
Member

Does this have performance drawback?

@sipsma
Copy link
Collaborator Author

sipsma commented Nov 24, 2021

Does this have performance drawback?

I go into detail about this in the commit message, but the TL;DR is that the only possible performance drawback is in very obscure cases, and even in those cases it's more of a tradeoff rather than a performance hit. The use cases described in the issues linked to in the PR that added the overlay differ should not be impacted AFAICT (#1704 and #1192).

An example of an obscure case where this changes the performance is say you have a large file foo which was unpacked from a tarfile that truncated nanosecond timestamps. Then you run mv foo footmp; mv footmp foo, which causes foo to show up in the upperdir even though it wasn't modified. Before this commit, foo would be considered part of the diff and thus included in the layer's blob even though it didn't change. Now, foo will be checked to see if it's actually different.

This causes the overlay differ to create the same blobs as the double-walking differ, which is my main priority here. But I think it's also good in that it prevents that layers from including unneeded files, which saves time+bandwidth pushing and pulling the layers. And in the vast majority of the cases, the overlay differ retains its performance benefits over the double-walking differ anyways.

@ktock
Copy link
Collaborator

ktock commented Nov 24, 2021

It sounds reasonable to me that overlayfs-differ should avoid the additional redundancy that isn't created by walking-differ.
However, in terms of the following:

This is needed to resolve differences between the blobs created by the overlay differ and the double-walking differ.

I am a bit worried about whether we can guarantee that every differ always creates the exact same blob from the same set of snapshots. e.g. What happens if the tar library is fixed to emit bits different from the older version?

@sipsma
Copy link
Collaborator Author

sipsma commented Nov 24, 2021

I am a bit worried about whether we can guarantee that every differ always creates the exact same blob from the same set of snapshots. e.g. What happens if the tar library is fixed to emit bits different from the older version?

Ah, I should have said "differences between the diffs in the blobs" rather than "differences between the blobs". It's totally fine for the bits of the blobs to be different, we just want to ensure the contents resulting from unpacking the blobs are the same between the two differs. Different versions of tar, different compressions, etc. are fine.

return false, errors.Errorf("%s is not a regular file", p2)
}

b1 := make([]byte, compareChunkSize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good if this used sync.Pool as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

This change results in the overlay differ comparing files to determine
if they are actually part of the diff. This is needed to resolve
differences between the blobs created by the overlay differ and the
double-walking differ.

Before this change, the overlay differ always just assumed that if a
file was in the upperdir it must be part of the diff and included it as
an add or a modify change. However, there are situations in which files
can appear in the upperdir without having been modified or even opened.
For example, if "foo" is a file or dir present in the lowerdirs of an
overlay mount and you run "mv foo footmp; mv footmp foo", then the
upperdir will contain foo (in addition to any files found under foo if
it's a dir). In this situation, the double-walking differ would not
include foo as part of the diff, but the overlay differ would.

This meant that the overlay differ would potentially include extra files
in each blob for such diffs relative to the double-walking differ. As of
now, while this does increase image size, it doesn't result in any
inconsistencies in terms of the contents of images because it just
results in files/dirs getting duplicated on top of their equivalents.

However, for the upcoming DiffOp support, this inconsistency could
actually result in the same operation producing mounts with different
contents depending on which differ is used. This change is therefore
necessary in order to enforce DiffOp consistency (on top of the possible
improvements to exported image size).

The main concern here is that this could undo the performance benefits
that the overlay differ was intended to fix. However, in practice the
situations where this has worse performance are quite obscure and the
benefits should still be present.

First, consider the case where foo is a directory and the user does the
equivalent of "mv foo footmp; mv footmp foo". Even before this change,
the overlay differ would see that foo is marked as opaque and thus fall
back to using the double-walking differ. So there's no performance
regression in this case as the double-walking differ does the same
file comparisons as were added in this commit.

For the case where the user shuffles a file back and forth, there will
potentially be a slow file content based comparison if the underlying
file has a truncated nanosecond timestamp (i.e. it was unpacked from a
tar file). However, the situations in which you shuffle an individual
file without changing it (or open it for writing but then write nothing)
that is large enough in size for content comparisons to be slow are
obscure. Additionally, while the content comparison may be slow, there
will be time saved during export because the file won't be included
unnecessarily in the exported blob, so it's a tradeoff rather than a
pure loss.

In situations where the user actually did change a file and it shows up
in the upperdir, it should be extremely rare that the content comparison
code path is followed. It would require that the user changed no other
metadata of the file, including size, and both mod timestamps were the
same (which could only really happen if their underlying filesystem
lacked support for nanosecond precision and they modified the file
within 1 second of its modification in the lowerdir or they manually
changed the modtime with chtimes).

Signed-off-by: Erik Sipsma <erik@sipsma.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants