Skip to content
This repository has been archived by the owner on Oct 2, 2023. It is now read-only.

Failed while running push-all #525

Closed
hanglethao opened this issue Sep 21, 2018 · 6 comments · Fixed by #1658
Closed

Failed while running push-all #525

hanglethao opened this issue Sep 21, 2018 · 6 comments · Fixed by #1658

Comments

@hanglethao
Copy link

Hi

I am using push_all to push a container_bundle which contains more than 50 images.
And I got the error below.

line 134: wait: pid 2438 is not a child of this shell

I think it relates to this wait.
https://github.com/bazelbuild/rules_docker/blob/master/contrib/push-all.sh.tpl#L37

@smukherj1
Copy link
Collaborator

Hello hanglethao,

Thanks for reporting this issue. We need more information about the platform where you observed this issue. Specifically the following:-

  1. Operating System and Version (eg. Ubuntu 16.04 64-bit or Debian 9 64 bit, etc)
  2. Bazel version (eg. 0.15.1)
  3. Version of your execution shell (eg. bash 4.3.12)

Please also provide a test case to reproduce this issue. You could add a test case under //rules_docker/tests in a forked repo or simply a tarball of an individual bazel project that shows the issue.

Also, if you have any ideas on why the issue might be happening, don't hesitate to submit a pull request with the proposed fix. The error above indicates that the process the script is being asked to wait for wasn't launched from the same shell.

@hanglethao
Copy link
Author

Hi @smukherj1

Operating System and Version

ubuntu:16.04

Bazel version

0.16.1

rules_docker version

0.5.1

Version of your execution shell

version 4.3.48(1)-release (x86_64-pc-linux-gnu)

I just created 50 binaries and 50 go_images. Then I used the following docker_push to push all the images and got that error.

  1. BUILD.bazel
load(":image.bzl", "all_images")
load("@io_bazel_rules_docker//container:container.bzl", "container_bundle")

container_bundle(
    name = "bundle_to_push",
    images = all_images(),
    stamp = True,
)

load("@io_bazel_rules_docker//contrib:push-all.bzl", "docker_push")

docker_push(
    name = "push_all_images",
    bundle = ":bundle_to_push",
)

  1. image.bazel
def all_images():
    images = {}
    for i in range (1, 50):
        cmd = "example-%s" % i
        images["$(REGISTRY)/$(PROJECT_ID)/%s:{STABLE_GIT_COMMIT}" %cmd ] = "//cmd/%s:image" % cmd
    return images

@nlopezgi
Copy link
Contributor

Thanks for providing the repro details, unfortunately we do not have tests that use a large number of images at the moment, so this might be a regression or a long standing issue (or even a flawed design that prevents this from working with a large number of images).

Have you tried to do any debugging to find more about the root cause (e.g., trying out with a different number of images to see if there's a specific number at which container_bundle starts to fail, tried modifying https://github.com/bazelbuild/rules_docker/blob/master/contrib/push-all.sh.tpl#L37 to see if its indeed the wait statement that causes the issue)?

I'm particularly interested in knowing with how many images the error starts to occur for you. This will help us assess what is the impact of this issue to better estimate priority to fix it. If this is only occurring with a very large number of images (lets say, 15 plus, or 40 plus), I think priority to fix for us would be relatively low (if its happening with 5 or 10 images, then we do probably need to fix it sooner).

Overall, I think supporting the use case of a large number of images in container bundle is one we should support but its not a high priority for us atm (unless we start getting lots of customers chiming in on this thread and reporting this issue is also important for them) so it might take a few weeks before we can debug in detail.

We would be super happy to help review PRs if you can debug and find a fix for the issue!

@nlopezgi
Copy link
Contributor

nlopezgi commented Dec 4, 2018

Hi @hanglethao,
Any interest in submitting a PR to fix this (and add a test that uses 50+ images)?

@imjasonh
Copy link
Contributor

I'm also hitting this today, pushing 178 images 😅

Would it be reasonable to remove the async + wait behavior, and instead just do each push in sequence? I think that would likely make this whole process easier to reason about, though possibly somewhat slower. It would also enable us to inject retries more sanely, if we decide to.

If that's reasonable I can probably send a PR for that.

cc @mattmoor

@mattmoor
Copy link
Contributor

Honestly doing all in parallel at high image counts seems perilous, but sequencing them seems like it'd make things slow 😬

In ko we throttle pushes to GOMAXPROCS, so perhaps some sane upper-bound on concurrent pushes is called for?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants