Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SparseMatrixCSC in primal leads to excessive allocations #931

Open
niklasschmitz opened this issue Mar 26, 2021 · 7 comments
Open

SparseMatrixCSC in primal leads to excessive allocations #931

niklasschmitz opened this issue Mar 26, 2021 · 7 comments

Comments

@niklasschmitz
Copy link

Having a SparseMatrixCSC A as part of the primal f seems to currently incur a very large performance penalty, which also came up here: JuliaNLSolvers/NLsolve.jl#205
Below I tried a minimum working example (tested with Julia 1.6 and Zygote 0.6.6):

using Zygote
using SparseArrays
using BenchmarkTools

const N = 10000
const A = spdiagm(0 => fill(10.0, N), 1 => fill(-1.0, N-1), -1 => fill(-2.0, N-1))

f(x) = x'A*x
∇f(x) = A'x + A*x

x0 = rand(N)
@assert isapprox(Zygote.gradient(f, x0)[1], ∇f(x0))
@btime ∇f($x0) # 124.375 μs (6 allocations: 234.61 KiB)
@btime Zygote.gradient($f, $x0) # 397.048 ms (30 allocations: 763.32 MiB)
@DhairyaLGandhi
Copy link
Member

Can you try with #762

@niklasschmitz
Copy link
Author

I just tried #762 on a GH codespace and get about the same timings as above (albeit now with 16 instead of 30 allocs, of the same total size):

@btime ∇f($x0)  # 89.601 μs (6 allocations: 234.61 KiB)
@btime Zygote.gradient($f, $x0)  # 408.552 ms (16 allocations: 763.32 MiB)

@DhairyaLGandhi
Copy link
Member

That to me says that the materialization happens elsewhere. Could you check whether we are doing the correct thing in #762?

@niklasschmitz
Copy link
Author

If I understand correctly, 762 so far is about adjoints of sparse constructors, whereas in my example above A should be treated as a constant and spdiagm not be differentiated through (?) Could this be about the adjoint of * ?

@DhairyaLGandhi
Copy link
Member

We aren't actually doing any transforms in that pr, so yeah.

Mul is a very probable place to check as well, but ideally the dispatches in base should have taken care of that. So if we are hitting suboptimal methods we should investigate

@niklasschmitz
Copy link
Author

niklasschmitz commented Apr 1, 2021

I think what happens here is that

So elimination of unused thunks will solve the above example too, pending on #603. In other cases where the derivative w.r.t. a sparse A is needed, this is probably harder to solve due to #163
Does this make sense?

@learning-chip
Copy link

I came across this NiLang sparse matrix example, which shows quite small overhead of sparse AD, compare to the forward pass. It can be used with Zygote. Maybe useful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants