Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR81 Breaks builds on Intel Macs #82

Open
tostanoski opened this issue Jan 17, 2023 · 11 comments
Open

PR81 Breaks builds on Intel Macs #82

tostanoski opened this issue Jan 17, 2023 · 11 comments

Comments

@tostanoski
Copy link

tostanoski commented Jan 17, 2023

We've been using your image for your Jekyll project, and since 1/13 our Intel Mac users have had issues building. Either the build seems to hang and never completes, or it completes with a lot of warnings after a very extended amount of time. One of my writers ran with --verbose and saw that it was indeed rendering pages, but much, much slower than before. On an M1 Max I had no issues.

I've attached screenshots of top running in the Docker container and most of the warnings that occurred before the build completes:
container top
warnings

Dunno if this is an "us" issue or not. As some background, we're using your image as jekyll/jekyll still has the webrick issue that you took care of. I don't have a lot of Docker knowledge; I inherited this project. Already this year I've had to find a suitable replacement for jekyll/jekyll-minimal as envygeeks images are all AMD64 and we're transitioning to Apple Silicon now. Please let me know if you need anything more from me.

@tostanoski
Copy link
Author

Also: using this image works on both Apple and Intel CPUs, albeit slower than yours did: ahmetozer/jekyll

@BretFisher
Copy link
Owner

Hmm, well, I made two changes in the last week. Moved from Ruby 2 to 3 (now that the latest Jekyll has added webrick by default)... and removed bundler 1.x so that it only has 2.x.

I'll assume your issue is related to Ruby 3, though seeing it behave differently on a certain platform and not the other is very odd. There's a post that's saying the latest Ruby 3.2 has an issue with Liquid, which I didn't dig into, but I made you some things to help in testing.

Luckily I tag builds and keep old versions so that we can compare them. I recommend when testing that you pin the stable-<date> tags, so you're positive you are testing the same image build-time across platforms.

Here's a unique tag for the latest build, 3 days ago: https://github.com/BretFisher/jekyll-serve/pkgs/container/jekyll/63555853?tag=stable-20230115120431

Here's the tag for last month's build: https://github.com/BretFisher/jekyll-serve/pkgs/container/jekyll/59034762?tag=stable-20221215120509

I've also created two PRs for testing (PRs are how I get GitHub Actions to make us some images to play with). PR builds won't be in Docker Hub, so you'll need to use the GHCR names:

One uses Ruby 3.1 rather than 3.2: #83

One uses Ruby 2.7 like it did before last week: #84

Lastly, with Docker Desktop you can force using one platforms image on a different platform. In this case I'd say if you still have the issue with ghcr.io/bretfisher/jekyll:stable-20230115120431 on the intel box, then try forcing it to download the arm64 image and test again. It should be way slower (QEMU emulation will eat 10-30% I think) but it might clue you in to where the issue lies: docker run --platform linux/arm64 ghcr.io/bretfisher/jekyll:stable-20230115120431

I'd also do the inverse on the M1, forcing it to run the intel: docker run --platform linux/amd64 ghcr.io/bretfisher/jekyll:stable-20230115120431

I hope that helps with your testing.

Also, if you get me a reproducible test, I'm happy to help.

@tostanoski
Copy link
Author

This helps quite a bit, thank you :) I now have a better plan of attack for when images get the better of me again.

Looking at things more closely now, builds aren't failing, but what used to take around 300 seconds is now taking upwards of hours. To the writers, this looks like a failure, particularly without --verbose on. I have a feeling their builds were just taking so long on our primary project (we have about 4 megs of Markdown files in the VVRN project and 11 megs in VVH, our primary project; both have the same infrastructure) that they appeared to be failing. Some reported that builds completed after hours of letting them sit, but they didn't seem to be reporting issues with this smaller project, likely because it was taking longer but not so much longer that it looks like it didn't work.

I probably should have mentioned that we primarily use the jekyll-serve container. I'm not entirely sure why I'm not using your jekyll container for our production builds; maybe I'll reevaluate that after I get all of this sorted. I was able to find the tags for older builds of jekyll-serve so I did some testing on my M1 Max. Attached is the smaller of our two repositories; please don't judge my amateurish Jekyll skills :) Github doesn't like Bzipped files, so I Zipped a Bzipped Tarball, and now I have a headache ;)
vvrn-develop.tar.bz2.zip

Here're the results of my testing our clean dev server build (serve.sh or docker-compose build --no-cache && docker-compose up) on the VVRN project:

FROM bretfisher/jekyll-serve 18 seconds
FROM ahmetozer/jekyll 30 seconds
FROM ghcr.io/bretfisher/jekyll:stable-20230115120431 90 seconds
FROM ghcr.io/bretfisher/jekyll-serve:stable-20230113194043 91 seconds
FROM ghcr.io/bretfisher/jekyll-serve:stable-20220916052939 15 seconds
FROM --platform=linux/amd64 ghcr.io/bretfisher/jekyll-serve:stable-20220916052939 28 seconds
FROM --platform=linux/arm64 ghcr.io/bretfisher/jekyll-serve:stable-20230113194043
92 seconds

I haven't the foggiest as to why the supposed current Docker Hub version of your image is much faster than the latest tagged version and is on par with the September version. The latest Docker Desktop has Rosetta for Linux support, so it's handling Intel code a lot faster than QEMU did.

This occurred while I was traveling to HQ for the first time to meet up with my team, so I only brought the M1 Mac with me. Why would I have needed two laptops for this trip? :) Once my coworkers have some bandwidth, I can get some hard numbers on their builds.

@tostanoski
Copy link
Author

First set of Intel numbers. This is from the older MacBook Pro with an 8-core Core i9 2.3 GHz.
FROM bretfisher/jekyll-serve: ~4 minutes

FROM ahmetozer/jekyll: ~1 minute

FROM ghcr.io/bretfisher/jekyll-serve:stable-20230113194043: ~3 minutes

FROM ghcr.io/bretfisher/jekyll-serve:stable-20220916052939: ~3 minutes

Color me confused. I'll have a few more coworkers test, and see if I can get someone at home to pop open Zoom on my Intel Mac to do more involved testing with the larger repository.

I was able to test PR83 on the M1 and found no real differences from current stable in build time, but I did see those deprecation messages for the first time. They did not occur with jekyll:stable-20221215120509 (tested like for like just to be sure). I see that PR84 didn't seem to build, so I'm curious if whatever issue caused that is related. We're probably not that lucky :)

@BretFisher
Copy link
Owner

Thanks for the info. I've never had to worry about the performance of this repo, so I have no idea if a specific type of Jekyll install is faster than others. I believe this issue comes down to the version of Ruby, Jekyll, and its dependencies, which I don't track or usually have to deal with.

But in your case, with the desire to do large builds as fast as possible. You'll likely need to start caring, testing, and ultimately tracking your own version changes that you test with before upgrading. Sadly, I don't do any of that in this repo 'cus it's simple and not something I would use for a team's production workflow.

But I'm still happy to help where I can, and I'm learning here too :)

The ahmetozer/jekyll image is two years old and runs on an Alpine base image. Here's the dockerfile. It may be fast, but I wouldn't recommend using it since it lacks anything new since 2020.

I don't see you testing the PRs I built, which I think will shine the real light on this issue. You can use pr-based tags to test my 3.1 and 2.7 Ruby builds against latest, which is 3.2 as of this month:

bretfisher/jekyll:pr-84 Moved back to Ruby 2.7
bretfisher/jekyll:pr-83 Moved back to Ruby 3.1

Lastly, I've updated these PRs just now to also update system gem's, which in the case of Ruby 2, are outdated enough in the official images to cause build fails on installing the Jekyll gem. I've never had to do that, but it seems its getting more complicated to keep things working between Ruby version, Jekyll version, and the Liquid templating version.

So test those PRs and LMK please.

@tostanoski
Copy link
Author

Got it! I wasn't sure if there was much difference between your stock Jekyll image and the Jekyll-serve image. I'm not seeing much difference between PR-83 and PR-84 on the M1 Max, but I'm getting to the point where I think this thing can handle anything I throw at it. I just wish it wasn't so heavy.

I'm going to have my team test both PRs on both our projects and I'll report back.

It seems like by the end of this I'm going to be better off rolling my own image, as I can't seem to rely on what Jekyll considers their "official" images, and your images seem to be fitter for your purposes or smaller projects. I wouldn't even know where to start with that, so you'll likely find me in your class sooner than later :)

@BretFisher
Copy link
Owner

If I were you, I'd find the fastest build, have it spit out the versions for ruby, jekyll, and liquid (at minimum), and pin to those with apt and gem, in the dockerfile. Then build a new image and test. Hopefully, I'm right and the speed difference is in the older versions of these many dependencies, and I guess the latest versions are slower and give you linting warnings like you're seeing (which are likely valid warnings, meaning you have outdated code).

Ideally, the latest version of everything is always the fastest, but that's not always the case, and the mix of various versions seems to be what's giving you all the different build speeds. So your "fastest build" was likely accidental because every image someone ran had different versions than the month before (I've been rebuilding latest every month for over a year).

Once you find the versions of ruby, jekyll, and liquid, you'll likely want to pin with apt and gem. I didn't see a gemfile in your zip, so you'll want that too, to control gem versions.

This article seems good on speeding up builds by replacing slower tools and keeping things updated.

Good luck!

@BretFisher
Copy link
Owner

BretFisher commented Jan 20, 2023

I was curious about the wildly different build times so I did some version comparisons with the build times.

Your best-case image was ahmetozer/jekyll, and we know that was last built in 2020 with an Alpine base image. Note that Alpine uses a different c compiler (musl) than Debian, Ubuntu, and nearly all other image types. It's sometimes a problem, with things being slower or not working at all with Ruby/Python/Node, but in this case it seems that a 2020 version is the "sweet spot of versions".

Note that different package managers will install different versions and builds of Jekyll, so when you're not pinning versions, everything seems random between different Docker base images (and apt vs. apk vs. gem)

This is what that old image is running:

ahmetozer/jekyll:latest

Alpine 3.13-sh
ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [aarch64-linux-musl]
jekyll 4.2.0
liquid 4.0.3
kramdown 2.4.0

**30 seconds on jekyll build**

Next, I tried to use ruby:alpine (latest Alpine, latest Ruby), and Jekyll latest won't even install. That's why I don't usually use/support Alpine images anymore, it's not always possible to get them working.

ruby:alpine

Alpine 3.17
ruby 3.2.0 (2022-12-25 revision a528908271) [aarch64-linux-musl]

Jekyll won't install with apk or gem

Next, I tried ruby:2-alpine (latest Alpine, Ruby 2.7 latest)

ruby:2-alpine

Alpine 3.17
ruby 2.7.7p221 (2022-11-24 revision 168ec2b1e5) [aarch64-linux-musl]
jekyll 4.2.2
liquid 4.0.3
kramdown 2.3.2

**16 seconds on jekyll build** -- NICE

Next, I tried ruby:2 and tried installing Jekyll through apt. The jekyll build failed with a weird Liquid exception.

ruby:2 (debian-based)
ruby 2.7.7p221 (2022-11-24 revision 168ec2b1e5) [aarch64-linux]
apt based jekyll 3.9.0
liquid 4.0.3
kramdown 2.3.0

Would finish jekyll build with error:
  Liquid Exception: Could not find document '{{ include.path }}' in tag 'link'. Make sure the document exists and the path is correct. in /site/content/_rn_gr/22r3/22r3-data-model-changes-clinical-operations.md

Next, I used the latest ruby:2 image and trying installing Jekyll through gem. Note this is the only one that gave me the Dart Sass 2.0.0 warnings.

ruby:2
ruby 2.7.7p221 (2022-11-24 revision 168ec2b1e5) [aarch64-linux]
jekyll 4.3.1
liquid 4.0.4
kramdown 2.4.0

**137 seconds on jekyll build**

Lastly, I wondered "it is as simple as pinning to Jekyll 4.2.x?" So I tried it on the Debian ruby:2 and Debian ruby:3, while using gem install jekyll -v 4.2.2 to only install the older Jekyll that's two minor versions out of date.

With ruby:2, it was 15.5 seconds!
With ruby:3 it erroed out about a invalid date. I noticed though that Liquid and Kramdown were newer versions like the 137 seconds above, so who knows.

Which should you use?

I think the answer lies in you needing to pin to the latest Jekyll 4.2, which is 4.2.2 released about a year ago, and only use that on a Ruby 2.7 latest image. It seems both Ruby 3 or Jekyll 4.3+ will cause either warnings, slow builds, or errors on your site.

I tested both Alpine-based and Debian-based images and Alpine was 16s and Debian was 15.5s. Since Debian is easier to use IMO, and can pin all things if you want, this is the Dockerfile I recommend. I've updated the Ruby:2 PR to be this Dockerfile.

If you just wanted a simpler jekyll-serve you could use a trimmed down version of that one above like this:

FROM ruby:2.7

RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    git \
    && rm -rf /var/lib/apt/lists/*

RUN gem install bundler && gem install jekyll -v 4.2.2

EXPOSE 4000

WORKDIR /site

CMD [ "bundle", "exec", "jekyll", "serve", "--force_polling", "-H", "0.0.0.0", "-P", "4000" ]

So I did this and came out on M1 Pro with 15-16s:

docker run -v $(pwd):/site -it --entrypoint bash --pull always bretfisher/jekyll:pr-84

Once you digest all this, I recommend storing your own Dockerfile (or forking this repo) and building your own image so I don't accidently break your builds by updating my versions. The Docker Mastery course can fill in the gaps of building, running, compose, GitHub Actions, etc. https://www.bretfisher.com/courses

@ryandaugherty7
Copy link

ryandaugherty7 commented Mar 14, 2023

@BretFisher thanks so much for the above research and the simplified ruby:2.7 Dockerfile. You just saved my life with something I've been working on, as I had been trying to track down why jekyll serve was running so slow on my container on M1 mac. 15-16 sec is about accurate now that I'm using ruby:2.7.

@BretFisher
Copy link
Owner

Great!

@BretFisher
Copy link
Owner

I should add a build example so we can track times in each PR/commit so this doesn't happen again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants