SparseMatrixCSC in primal leads to excessive allocations #931

niklasschmitz · 2021-03-26T10:14:12Z

Having a SparseMatrixCSC A as part of the primal f seems to currently incur a very large performance penalty, which also came up here: JuliaNLSolvers/NLsolve.jl#205
Below I tried a minimum working example (tested with Julia 1.6 and Zygote 0.6.6):

using Zygote
using SparseArrays
using BenchmarkTools

const N = 10000
const A = spdiagm(0 => fill(10.0, N), 1 => fill(-1.0, N-1), -1 => fill(-2.0, N-1))

f(x) = x'A*x
∇f(x) = A'x + A*x

x0 = rand(N)
@assert isapprox(Zygote.gradient(f, x0)[1], ∇f(x0))
@btime ∇f($x0) # 124.375 μs (6 allocations: 234.61 KiB)
@btime Zygote.gradient($f, $x0) # 397.048 ms (30 allocations: 763.32 MiB)

The text was updated successfully, but these errors were encountered:

DhairyaLGandhi · 2021-03-26T10:54:52Z

Can you try with #762

niklasschmitz · 2021-03-26T11:37:28Z

I just tried #762 on a GH codespace and get about the same timings as above (albeit now with 16 instead of 30 allocs, of the same total size):

@btime ∇f($x0)  # 89.601 μs (6 allocations: 234.61 KiB)
@btime Zygote.gradient($f, $x0)  # 408.552 ms (16 allocations: 763.32 MiB)

DhairyaLGandhi · 2021-03-26T14:01:04Z

That to me says that the materialization happens elsewhere. Could you check whether we are doing the correct thing in #762?

niklasschmitz · 2021-03-26T14:19:42Z

If I understand correctly, 762 so far is about adjoints of sparse constructors, whereas in my example above A should be treated as a constant and spdiagm not be differentiated through (?) Could this be about the adjoint of * ?

DhairyaLGandhi · 2021-03-26T14:35:58Z

We aren't actually doing any transforms in that pr, so yeah.

Mul is a very probable place to check as well, but ideally the dispatches in base should have taken care of that. So if we are hitting suboptimal methods we should investigate

niklasschmitz · 2021-04-01T10:17:21Z

I think what happens here is that

the rrule of * gets called
the derivative w.r.t. A is thunked (and ideally should not be instantiated later)
Zygote then currently unthunks all thunks by default (as pointed out to me by @mzgubic on slack)
the derivative w.r.t. A then gets instantiated as a dense matrix due to Support sparse arrays/matrices as output? #163

So elimination of unused thunks will solve the above example too, pending on #603. In other cases where the derivative w.r.t. a sparse A is needed, this is probably harder to solve due to #163
Does this make sense?

learning-chip · 2021-12-07T07:05:03Z

I came across this NiLang sparse matrix example, which shows quite small overhead of sparse AD, compare to the forward pass. It can be used with Zygote. Maybe useful?

ToucheSir mentioned this issue Dec 7, 2021

Support sparse arrays/matrices as output? #163

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SparseMatrixCSC in primal leads to excessive allocations #931

SparseMatrixCSC in primal leads to excessive allocations #931

niklasschmitz commented Mar 26, 2021

DhairyaLGandhi commented Mar 26, 2021

niklasschmitz commented Mar 26, 2021

DhairyaLGandhi commented Mar 26, 2021

niklasschmitz commented Mar 26, 2021

DhairyaLGandhi commented Mar 26, 2021

niklasschmitz commented Apr 1, 2021 •

edited

Loading

learning-chip commented Dec 7, 2021

SparseMatrixCSC in primal leads to excessive allocations #931

SparseMatrixCSC in primal leads to excessive allocations #931

Comments

niklasschmitz commented Mar 26, 2021

DhairyaLGandhi commented Mar 26, 2021

niklasschmitz commented Mar 26, 2021

DhairyaLGandhi commented Mar 26, 2021

niklasschmitz commented Mar 26, 2021

DhairyaLGandhi commented Mar 26, 2021

niklasschmitz commented Apr 1, 2021 • edited Loading

learning-chip commented Dec 7, 2021

niklasschmitz commented Apr 1, 2021 •

edited

Loading