Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kaniko Performance Testing. #1305

Open
tejal29 opened this issue Jun 8, 2020 · 8 comments
Open

Kaniko Performance Testing. #1305

tejal29 opened this issue Jun 8, 2020 · 8 comments
Labels
area/performance issues related to kaniko performance enhancement categorized interesting

Comments

@tejal29
Copy link
Member

tejal29 commented Jun 8, 2020

Hello Kaniko Users,

A lot of kaniko users have mentioned, snapshotting has been a bottleneck especially for npm, yarn and apt projects which end up downloading a number of dependencies.
see issues #1282, #972

We have a new Kaniko Images with a bunch of improvements described below.

gcr.io/kaniko-project/executor:perf
gcr.io/kaniko-project/executor:debug-perf

Over the last month, we did a bunch of improvements like

FROM bash:4.4

ARG NUM
COPY context.txt .
COPY make.sh .
SHELL ["/usr/local/bin/bash", "-c"]
RUN ./make.sh $NUM
RUN ls -al /workdir | wc

Note: This does not mimic real world npm, or apt scenarios since the files created as random text files in one directory.

In order, to further decrease the snapshotting time, we added 2 new features:

  1. A New Snapshot Strategy based on D. J. Bernstein's Redo tool
    In this snapshot strategy, to detect file changes, unlike the Full Snapshot Modethe contents of the file are not hashed. Only the following Attributes of the files are hashed.

Using redo snapshotter, reduced the build time from 30 mins to 12 mins about 56% improvement.
The time spent in computing hash for all the files in FS, reduced from 15 mins to mere 18 seconds.
Note: These are small text files and in full snapshot mode the time to hash a file depends on the file size.

You can use the new redo snapshotter with the following command line argument
--snapshotMode=redo.

  1. A newer implementation of Run command which does not rely on snapshotting at all.
    In this approach, in order to compute which files were changed, by creating a marker file before executing the Run command.
    Then we walk the entire filesystem which takes about 1 to 3 seconds for 700Kfiles, to find all files whose ModTime is greater than the marker file.
    With this new run command implementation, the total build time is reduced for 29 mins to 7 mins with a huge improvement of 75%

To use the new Run command implementation, use --use-new-run=true flag.

I am looking forward for all kaniko users to try these new flags and let us know if they help make your builds faster

@SaschaSchwarze0
Copy link
Contributor

Hi @tejal29, we were investigating the performance image for the https://github.com/redhat-developer/build project. The improvements are great. I also investigated the exact changes that were done.

Introducing a file system snapshotting mode that works without looking at the file content is foreseeable a great improvement. The fact that the algorithm seems to be proven due to previous usages in other projects gives us confidence.

The use-new-run flag on the other hand, I do not feel comfortable with. Conceptually, one issue is that there are ways to explicitly set the mtime to some value, using touch for example (unlikely to be an issue for in our Dockerfile builds), but also tar on extraction does this (according to https://apenwarr.ca/log/20181113, search for Is mtime monotonically increasing? on that page). And extracting some tar ball in a Dockerfile feels like a typical use case. The existing issues opened because of this imo prove that. The other issue is that it claims to work without filesystem snapshotting at all and I do not understand how it would detect deleted files. Can you explain this?

So, my current assessment for https://github.com/redhat-developer/build is to only use the new snapshot mode. The performance improvements also look good. Is there an outlook when this becomes officially supported as part of a Kaniko release?

@SaschaSchwarze0
Copy link
Contributor

Hi @tejal29. I noticed the v0.24.0 release. Thank you for doing that. What about my concerns about the use-new_run flag mentioned above, do you have comments about them?

@tejal29
Copy link
Member Author

tejal29 commented Aug 11, 2020

@SaschaSchwarze0, Thanks for all your questions.

The other issue is that it claims to work without filesystem snapshotting at all and I do not understand how it would detect deleted files. Can you explain this?

The way we detect file deletions with --use-new-run is by traversing the filesystem.
With performant file walking libraries, the time takes to walk FS is within milliseconds.

We build a dictionary of files in previous layer and current FS. Detect deletions by traversing both the dictionaries.

Re:

The use-new-run flag on the other hand, I do not feel comfortable with. Conceptually, one issue is that there are ways to explicitly set the mtime to some value, using touch for example (unlikely to be an issue for in our Dockerfile builds), but also tar on extraction does this (according to https://apenwarr.ca/log/20181113, search for Is mtime monotonically increasing? on that page). And extracting some tar ball in a Dockerfile feels like a typical use case. The existing issues opened because of this imo prove that.

I think you maybe right. I haven't had time to investigate further. It was an experiment which i put out there. This might now work for most of the cases like you mentioned.

@SaschaSchwarze0
Copy link
Contributor

SaschaSchwarze0 commented Aug 12, 2020

Thank you @tejal29 for your answer:

We build a dictionary of files in previous layer and current FS. Detect deletions by traversing both the dictionaries.

Okay, in my interpretation this is then still some kind of a snapshot of the filesystem, but creating a dictionary of existing files obviously is faster than building any hash for each file even if the file content is ignored.

So, we continue with --snapshotMode=redo alone, https://github.com/redhat-developer/build/blob/master/samples/buildstrategy/kaniko/buildstrategy_kaniko_cr.yaml#L34. Great improvement.

@tejal29
Copy link
Member Author

tejal29 commented Aug 12, 2020

So, we continue with --snapshotMode=redo alone, https://github.com/redhat-developer/build/blob/master/samples/buildstrategy/kaniko/buildstrategy_kaniko_cr.yaml#L34. Great improvement.

yes! --snapshotMode=redo should suffice and be more accurate then --use-new-run. Thanks a lot for confirming and all your feedback.

@The-Compiler
Copy link

Using redo snapshotter, reduced the build time from 30 mins to 12 mins about 56% improvement.

As another data point, for a build based on a Jupyter image: 20 to 55 minutes before (on a self-hosted GitLab CI runner), 2 minutes after. Definitely a massive improvement.

@Abdellwahed
Copy link

I managed to resolve the problem in my GitLab CI configuration by incorporating the enhancement introduced in this GitHub issue: #1680. All you have to do is utilize version 1.8.0 or any older version and specify the --compressed-caching flag as false like this: --compressed-caching=false.

@anutator
Copy link

--compressed-caching=false helped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance issues related to kaniko performance enhancement categorized interesting
Projects
None yet
Development

No branches or pull requests

6 participants