Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

At/optimizations #135

Open
wants to merge 2 commits into
base: 1.x
Choose a base branch
from
Open

At/optimizations #135

wants to merge 2 commits into from

Conversation

adamtobey
Copy link

@adamtobey adamtobey commented Jan 10, 2024

Testing some ideas to reduce the overhead of the Go wrapper:

  • use linkname to directly link the C functions into the binary instead of calling through cgo
  • compile with -o 3

It looks like linkname may not be reliable (panics in some tests?), and -o 3 may not be relevant with external libzstd?

Benchmark results

M1 max (macbook pro)

Without -o 3

BenchmarkCompress
BenchmarkCompress/cgo
BenchmarkCompress/cgo-10         	  827443	      1357 ns/op	 754.45 MB/s
BenchmarkCompress/direct
BenchmarkCompress/direct-10      	  968805	      1231 ns/op	 831.84 MB/s

With -o 3

BenchmarkCompress
BenchmarkCompress/cgo
BenchmarkCompress/cgo-10         	  916027	      1295 ns/op	 790.86 MB/s
BenchmarkCompress/direct
BenchmarkCompress/direct-10      	 1000000	      1167 ns/op	 877.32 MB/s

//
//go:linkname CompressCtx ZSTD_compressCCtx
//go:noescape
func CompressCtx(ctx *C.ZSTD_CCtx, dst unsafe.Pointer, dc C.size_t, src unsafe.Pointer, sc C.size_t, level C.int) C.size_t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea 😅 .

I haven't verified this, but if this completely bypasses the runtime's cgo machinery, I could imagine a number of bad things to happen:

  • Long GC STW pauses because the runtime is unable to preempt goroutines that are currently in a zstd call.
  • The runtime incorrectly trying to do async preemption on a goroutine running C code, which might cause data corruption, crashes, etc.
  • All Ps (GOMAXPROCs) calling zstd, causing significant latency on other goroutines.

Generally speaking, I'd says there is no presumption of innocence here. This is code likely to by guilty until we can proof otherwise 🙈.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these specific concerns! I'd definitely like to understand this more - obviously CGO exists for a reason 😄

Copy link
Member

@felixge felixge Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not specific to cgo, but this talk explains the P-M-G abstraction in the scheduler and why/how non-go function calls (syscalls/cgo - they are treated mostly the same) require special treatment: https://youtu.be/-K11rY57K7k?t=980&si=wFEQWUPe3NayCjBh

tl;dr: a syscall (or cgo call) might block off-cpu (network, sleep, etc.), so if a lot of goroutine would make such calls, it would cause a lot of wasted CPU. Therefor cgo/syscalls get special treatement by the runtime so they don't end up blocking other goroutines from executing.

I can try to explain his a bit more when I find time, but this might be a good starting point for finding articles on the subject.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just remembered that I had this bookmarked which you'll probably find interesting: https://words.filippo.io/rustgo/

I also realized I forgot to mention one of the most important issues (that might explain the crashes you get in CI) 😅:

C requires large immovable stacks. When using cgo, the Go runtime takes care of switching from a goroutine stack (which is small and movable) to a system stack (which is large and immovable).

Without this switch, you're running C code on a goroutine stack, which will lead to stack overflows and other nasty problems.

So yeah, I think this is likely a dead-end 😞

Copy link

@nsrip-dd nsrip-dd Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't verified this, but if this completely bypasses the runtime's cgo machinery

In addition to the issues you already mentioned, this is also bypassing the ABI translation. Go has an internal calling convention which is different than the C calling convention, and is not guaranteed to be stable across releases.

I can cook up examples that break on my laptop, e.g. make a C function which takes 9 parameters and call it this way. On my arm64 laptop, the 9th argument is junk since the C and Go ABIs use different registers after the 8th argument.

@@ -1,6 +1,7 @@
package zstd

/*
#cgo CFLAGS: -O3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 If we aren't already compiling (our external?) zstd library with -O3, this is probably a good idea. But I don't know if that's the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants