Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update for new flyctl, add instructions, fix media volume #2

Merged
merged 35 commits into from
Nov 21, 2022

Conversation

indirect
Copy link
Collaborator

@indirect indirect commented Nov 17, 2022

This set of changes updates the steps for the changes to flyctl options since the guide was originally written.

It also uses Overmind to run both Rails and Sidekiq inside the same VM, so they can share the volume with cached media. Without sharing a volume, network errors generate background jobs to try again later, and Sidekiq downloads the media into a container that the web server can't get to, causing random broken image tags.

Also added some notes about setting up SMTP, as well as upgrading Mastodon, and using custom domains with SSL.

Fixes #1.

this lets us update to newer versions of mastodon by updating the docker image and deploying
with binstubs you can run eg `bin/production status` or `bin/production-redis status` without having to remember or type `-c fly.whatever.toml`
db:setup actually generates an error, since fly has already created the database when we ran `fly pg create`. we just need to load the schema before the first deploy, so that we don't have to migrate from zero during the release command.
including how to run migrations before the new code is live via a second, temporary app
@tmm1
Copy link
Owner

tmm1 commented Nov 18, 2022

Without sharing a volume, network errors generate background jobs to try again later, and Sidekiq downloads the media into a container that the web server can't get to, causing random broken image tags.

Ah, interesting. This is when you're not using S3 etc for attachments?

@indirect
Copy link
Collaborator Author

Yes, if you use Fly volumes to hold the fetched remote media, the files downloaded by Sidekiq become random 404s. On Fly, volumes are 1-1 with VMs.

@tmm1
Copy link
Owner

tmm1 commented Nov 18, 2022

Gotcha. Thanks for working on this!

I just remembered I think there's one more piece missing, the streaming nodejs server.

IIRC, you have to run the nginx inside docker too and tell it to send /api/v1/streaming requests to the nodejs server. We'll have to add the nodejs streaming entry to the Procfile as well.

See jesseplusplus/decodon#7

Puma uses less memory when it is completely out of cluster mode, which
we actively want while running in a tiny Fly VM. This does not reduce
the number of Puma processes, since we were already only running one.
This is the default on Heroku, to save RAM: https://devcenter.heroku.com/changelog-items/1683

Also recommended by Nake Berkopec, noted Rails performance optimizer, at https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html.

Also recommended by Mike Perham, author of Sidekiq, at https://www.mikeperham.com/2018/04/25/taming-rails-memory-bloat/.
@indirect
Copy link
Collaborator Author

Thanks for calling that out, I added the node streaming server to the Procfile and added Caddy to reverse proxy just the one URL.

@tmm1
Copy link
Owner

tmm1 commented Nov 19, 2022

Awesome, this is looking great.

Re: #1, since the issue there is specific to local assets.. it'd be nice to still allow separate sidekiq workers that can be scaled up separately from the web app. Maybe we can document the choices in the readme? So if you choose to store assets locally then you have to run in the combined mode, otherwise you can separate them via env vars or something?

I would guess most mastodon instances are using s3 or other cloud storage. It might make sense to make that the default here too, so you're not stuck with a single puma/sidekiq if you need to scale?

@indirect
Copy link
Collaborator Author

Yeah, I think that's probably true. Part of the reason I removed the [processes] directive is that it's a little half-baked--you can never remove process groups once they've been created, and VMs seem to stop correctly associating with regions. In my original testing, Sidekiq kept starting up in a random far-away region, even when it was supposedly only allowed to use the same region as the redis and web servers.

Let me think about how to implement an option or transition plan and get back to you later today or tomorrow.

@indirect
Copy link
Collaborator Author

Okay, I think I've figured out a reasonable plan that includes a way to transition from volumes to cloud storage, and from all-in-one to "web" and "sidekiq" apps that can scale their count separately. You can even keep image URLs identical via Caddy, if you want that. I've tested this out on my own instance, migrating from a Fly Volume to Wasabi (which is API-identical to S3), and then migrating the sidekiq worker out into a separate app so it can scale on its own.

However, Mastodon seems very shouty about how you must never run more than one sidekiq process with the scheduler job, and it's not at all clear to me how to scale a sidekiq process or app while keeping that true. I would love to hear how you have solved this elsewhere.

@indirect
Copy link
Collaborator Author

After a bunch more fiddling, I think [processes] actually is the way to go for separately scalable worker processes that share env vars and secrets, we just have to be sure to only ever have one process running the schedule queue. I've updated the readme and the fly.toml file to have pre-written sections that you can uncomment to have horizontally scalable Rails and non-schedule-queue Sidekiq VMs.

@tmm1
Copy link
Owner

tmm1 commented Nov 20, 2022

cc @jsierles @tqbf

@tmm1 tmm1 merged commit df9ee64 into tmm1:main Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sidekiq and Puma can't run in separate processes/VMs
2 participants