Kaniko Performance Testing. #1305

tejal29 · 2020-06-08T17:34:44Z

Hello Kaniko Users,

A lot of kaniko users have mentioned, snapshotting has been a bottleneck especially for npm, yarn and apt projects which end up downloading a number of dependencies.
see issues #1282, #972

We have a new Kaniko Images with a bunch of improvements described below.

gcr.io/kaniko-project/executor:perf
gcr.io/kaniko-project/executor:debug-perf

Over the last month, we did a bunch of improvements like

Snapshot FS on first cache miss. #1214 - Snapshot FS on first cache miss.
[Perf] Reduce loops over files when taking FS snapshot. #1283 - Reduce loops over files when taking FS snapshot.
check file changed in loop #1302 - check if file is changed in go.Walk
With the changes above, for 700K files, we saw an improvement of ~23%
The Total Build time reduced from 30 minutes to 22 minutes for below Dockerfile which created a N number of files in Run

FROM bash:4.4

ARG NUM
COPY context.txt .
COPY make.sh .
SHELL ["/usr/local/bin/bash", "-c"]
RUN ./make.sh $NUM
RUN ls -al /workdir | wc

Note: This does not mimic real world npm, or apt scenarios since the files created as random text files in one directory.

In order, to further decrease the snapshotting time, we added 2 new features:

A New Snapshot Strategy based on D. J. Bernstein's Redo tool
In this snapshot strategy, to detect file changes, unlike the Full Snapshot Modethe contents of the file are not hashed. Only the following Attributes of the files are hashed.
- File Size,
- File Mod Time
- File Mode
- File Group Id, UserId
  See (https://apenwarr.ca/log/20181113) to understand "mtime" could result in snapshot errors.

Using redo snapshotter, reduced the build time from 30 mins to 12 mins about 56% improvement.
The time spent in computing hash for all the files in FS, reduced from 15 mins to mere 18 seconds.
Note: These are small text files and in full snapshot mode the time to hash a file depends on the file size.

You can use the new redo snapshotter with the following command line argument
--snapshotMode=redo.

A newer implementation of Run command which does not rely on snapshotting at all.
In this approach, in order to compute which files were changed, by creating a marker file before executing the Run command.
Then we walk the entire filesystem which takes about 1 to 3 seconds for 700Kfiles, to find all files whose ModTime is greater than the marker file.
With this new run command implementation, the total build time is reduced for 29 mins to 7 mins with a huge improvement of 75%

To use the new Run command implementation, use --use-new-run=true flag.

I am looking forward for all kaniko users to try these new flags and let us know if they help make your builds faster

The text was updated successfully, but these errors were encountered:

SaschaSchwarze0 · 2020-06-30T06:22:29Z

Hi @tejal29, we were investigating the performance image for the https://github.com/redhat-developer/build project. The improvements are great. I also investigated the exact changes that were done.

Introducing a file system snapshotting mode that works without looking at the file content is foreseeable a great improvement. The fact that the algorithm seems to be proven due to previous usages in other projects gives us confidence.

The use-new-run flag on the other hand, I do not feel comfortable with. Conceptually, one issue is that there are ways to explicitly set the mtime to some value, using touch for example (unlikely to be an issue for in our Dockerfile builds), but also tar on extraction does this (according to https://apenwarr.ca/log/20181113, search for Is mtime monotonically increasing? on that page). And extracting some tar ball in a Dockerfile feels like a typical use case. The existing issues opened because of this imo prove that. The other issue is that it claims to work without filesystem snapshotting at all and I do not understand how it would detect deleted files. Can you explain this?

So, my current assessment for https://github.com/redhat-developer/build is to only use the new snapshot mode. The performance improvements also look good. Is there an outlook when this becomes officially supported as part of a Kaniko release?

SaschaSchwarze0 · 2020-07-06T07:07:29Z

Hi @tejal29. I noticed the v0.24.0 release. Thank you for doing that. What about my concerns about the use-new_run flag mentioned above, do you have comments about them?

tejal29 · 2020-08-11T22:44:59Z

@SaschaSchwarze0, Thanks for all your questions.

The other issue is that it claims to work without filesystem snapshotting at all and I do not understand how it would detect deleted files. Can you explain this?

The way we detect file deletions with --use-new-run is by traversing the filesystem.
With performant file walking libraries, the time takes to walk FS is within milliseconds.

We build a dictionary of files in previous layer and current FS. Detect deletions by traversing both the dictionaries.

Re:

The use-new-run flag on the other hand, I do not feel comfortable with. Conceptually, one issue is that there are ways to explicitly set the mtime to some value, using touch for example (unlikely to be an issue for in our Dockerfile builds), but also tar on extraction does this (according to https://apenwarr.ca/log/20181113, search for Is mtime monotonically increasing? on that page). And extracting some tar ball in a Dockerfile feels like a typical use case. The existing issues opened because of this imo prove that.

I think you maybe right. I haven't had time to investigate further. It was an experiment which i put out there. This might now work for most of the cases like you mentioned.

SaschaSchwarze0 · 2020-08-12T09:07:49Z

Thank you @tejal29 for your answer:

We build a dictionary of files in previous layer and current FS. Detect deletions by traversing both the dictionaries.

Okay, in my interpretation this is then still some kind of a snapshot of the filesystem, but creating a dictionary of existing files obviously is faster than building any hash for each file even if the file content is ignored.

So, we continue with --snapshotMode=redo alone, https://github.com/redhat-developer/build/blob/master/samples/buildstrategy/kaniko/buildstrategy_kaniko_cr.yaml#L34. Great improvement.

tejal29 · 2020-08-12T21:56:14Z

So, we continue with --snapshotMode=redo alone, https://github.com/redhat-developer/build/blob/master/samples/buildstrategy/kaniko/buildstrategy_kaniko_cr.yaml#L34. Great improvement.

yes! --snapshotMode=redo should suffice and be more accurate then --use-new-run. Thanks a lot for confirming and all your feedback.

The-Compiler · 2021-10-01T12:37:41Z

Using redo snapshotter, reduced the build time from 30 mins to 12 mins about 56% improvement.

As another data point, for a build based on a Jupyter image: 20 to 55 minutes before (on a self-hosted GitLab CI runner), 2 minutes after. Definitely a massive improvement.

Abdellwahed · 2023-09-08T13:52:30Z

I managed to resolve the problem in my GitLab CI configuration by incorporating the enhancement introduced in this GitHub issue: #1680. All you have to do is utilize version 1.8.0 or any older version and specify the --compressed-caching flag as false like this: --compressed-caching=false.

anutator · 2023-09-11T14:27:33Z

--compressed-caching=false helped

tejal29 added the area/performance issues related to kaniko performance enhancement label Jun 8, 2020

This was referenced Jun 13, 2020

Speed up pushing layer to local cache #1313

Closed

Can one use Kaniko with a RAMdisk to improve snapshotting performance? #1310

Open

fabiotamagno mentioned this issue Jun 14, 2020

Missing files in node_modules when using --use-new-run=true #1316

Closed

SaschaSchwarze0 mentioned this issue Jul 6, 2020

Update Kaniko to 0.24.0, use new snapshotMode shipwright-io/build#286

Merged

This was referenced Aug 12, 2020

Image build process Freezes on Taking snapshot of full filesystem... #1333

Open

Building with --use-new-run=true generates an incomplete image #1317

Closed

This was referenced Jun 8, 2021

chore: improve Kaniko build time jenkins-x-charts/jxboot-helmfile-resources#357

Merged

chore: improve Kaniko build time jx3-gitops-repositories/jx3-gke-gsm#9

Merged

aaron-prindle added interesting categorized labels Jul 11, 2023

aaron-prindle mentioned this issue Aug 18, 2023

document more clearly what the purpose of the --use-new-run feature is and how it works #2575

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kaniko Performance Testing. #1305

Kaniko Performance Testing. #1305

tejal29 commented Jun 8, 2020 •

edited

Loading

SaschaSchwarze0 commented Jun 30, 2020

SaschaSchwarze0 commented Jul 6, 2020

tejal29 commented Aug 11, 2020 •

edited

Loading

SaschaSchwarze0 commented Aug 12, 2020 •

edited

Loading

tejal29 commented Aug 12, 2020

The-Compiler commented Oct 1, 2021

Abdellwahed commented Sep 8, 2023

anutator commented Sep 11, 2023

Kaniko Performance Testing. #1305

Kaniko Performance Testing. #1305

Comments

tejal29 commented Jun 8, 2020 • edited Loading

SaschaSchwarze0 commented Jun 30, 2020

SaschaSchwarze0 commented Jul 6, 2020

tejal29 commented Aug 11, 2020 • edited Loading

SaschaSchwarze0 commented Aug 12, 2020 • edited Loading

tejal29 commented Aug 12, 2020

The-Compiler commented Oct 1, 2021

Abdellwahed commented Sep 8, 2023

anutator commented Sep 11, 2023

tejal29 commented Jun 8, 2020 •

edited

Loading

tejal29 commented Aug 11, 2020 •

edited

Loading

SaschaSchwarze0 commented Aug 12, 2020 •

edited

Loading