Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot exceed rate or quality #7

Closed
lindstro opened this issue Mar 24, 2022 · 5 comments
Closed

Cannot exceed rate or quality #7

lindstro opened this issue Mar 24, 2022 · 5 comments

Comments

@lindstro
Copy link
Collaborator

I am doing some compression experiments with a 3D data set generated from a memoryless Gaussian source with zero mean and unit variance, i.e., the data set has no autocorrelation and is essentially incompressible. Using the target PSNR setting (-p option), I cannot get TTHRESH to deliver a PSNR of more than 190 dB, and the corresponding rate does not exceed 31 bits.

The plot below shows the accuracy gain as a function of rate, where the accuracy gain is defined as

α = log₂(σ / E) - R

Here σ = 1 is the standard deviation of the source data, E is the RMS error, and R is the rate. Rate-distortion theory says that we cannot encode such data using an error less than E in fewer than log₂(σ / E) bits/value, and hence we expect α ≤ 0. Moreover, for each additional bit stored, we expect E to halve, so the accuracy gain ought to be constant. At high rates, finite roundoff errors (e.g., when converting from a compressed representation to IEEE floating point) may cause α to dip as E converges to some small, finite value, as exhibited by the zfp curve. The other three compressors all show surprising behavior, though for the sake of this discussion, I am interested only in why TTHRESH gets stuck close to R = 31 regardless of the PSNR setting.

gauss

@rballester
Copy link
Owner

Thanks for finding this. After reproducing the scenario I found the SVD computation to be the culprit. TTHRESH approximates the left singular vectors of a matrix M using the eigendecomposition of M*M^T, which is a fast trick but known to introduce some inaccuracies. I just tried using true SVD and was able to surpass 190dB with it; however, that approach is significantly slower. Is this scenario (which happens at relative error = 1e-9) relevant enough to warrant using SVD? Sure, the program could choose either method on the fly based on the level of error requested, but including Eigen's SVD source makes compilation slower (+60% or so in my machine).

@lindstro
Copy link
Collaborator Author

Given that running TTHRESH on this data (256^3) takes on the order of 5 minutes, I'm not too concerned about adding a few seconds of compile time. :-) But this would be an issue if the SVD computation is also significantly slower.

So is the issue that computing the HOSVD only approximately gives an "incorrect" basis that results in an incorrect error bound? If so, could we address this by using whichever orthonormal basis actually gets computed in the estimation of the error bound? I'm not sure I fully understand the issues involved.

@lindstro
Copy link
Collaborator Author

lindstro commented Apr 1, 2022

I have run into a similar yet seemingly different issue, where at some point the rate spikes when making smallish, 6.02 dB changes to the target PSNR, thus halving the target RMS error. In the plot below, the jump in rate from 6.82 to 28.1 bits occurs when increasing the target PSNR from 204.7 to 210.7 dB. In the previous case above, the rate simply hits a wall, whereas below it makes a large jump. Observed PSNR, however, increases as expected, though increasing the target PSNR above this level does nothing. Could this, too, be related to SVD accuracy? Why the dramatic drop in accuracy gain?

FYI, the test data below is the Miranda viscosity field from SDRBench, where the indices along each dimension have been randomly shuffled. Such shuffling has no impact on TTHRESH but obviously impacts smoothness for the other compressors.

shuffle

@lindstro
Copy link
Collaborator Author

lindstro commented Jul 5, 2022

So the latest commit 4727dc9 does seem to address the error floor but also appears to have some unintended consequences for some (not all) data sets. The accuracy gain for the 256x256x256 double-precision Gaussian error data set (the first one presented in this issue) has now dropped by roughly 3 bits/value (see below). Another concern is the slow leakage in accuracy gain, where the slope is negative rather than zero over most of the plot. Any ideas?

gauss

The shuffled viscosity field does not suffer from this.

shuffle

@lindstro
Copy link
Collaborator Author

lindstro commented Jul 5, 2022

Here's an example of running TTHRESH cd6bb29 on the Gaussian data:

tthresh -t double -s 256 256 256 -p 100 -i gauss.bod -o /tmp/gauss.bod -c /tmp/gauss.tthresh
oldbits = 1073741824, newbits = 267064400, compressionratio = 4.02054, bpv = 15.9183
eps = 3.42735e-05, rmse = 3.42735e-05, psnr = 104.026

With the latest version, TTHRESH does not reach the PSNR target, yet the compression ratio is largely unaffected:

tthresh -t double -s 256 256 256 -p 100 -i gauss.bod -o /tmp/gauss.bod -c /tmp/gauss.tthresh
k=-15 m=6.10352e-05=0x1p-14 err=0.000000e+00
k=-14 m=0.00012207=0x1p-13 err=2.082926e-02
oldbits = 1073741824, newbits = 265485712, compressionratio = 4.04444, bpv = 15.8242
eps = 0.000246452, rmse = 0.000246452, psnr = 86.8902

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants