Cannot exceed rate or quality #7

lindstro · 2022-03-24T00:32:32Z

I am doing some compression experiments with a 3D data set generated from a memoryless Gaussian source with zero mean and unit variance, i.e., the data set has no autocorrelation and is essentially incompressible. Using the target PSNR setting (-p option), I cannot get TTHRESH to deliver a PSNR of more than 190 dB, and the corresponding rate does not exceed 31 bits.

The plot below shows the accuracy gain as a function of rate, where the accuracy gain is defined as

α = log₂(σ / E) - R

Here σ = 1 is the standard deviation of the source data, E is the RMS error, and R is the rate. Rate-distortion theory says that we cannot encode such data using an error less than E in fewer than log₂(σ / E) bits/value, and hence we expect α ≤ 0. Moreover, for each additional bit stored, we expect E to halve, so the accuracy gain ought to be constant. At high rates, finite roundoff errors (e.g., when converting from a compressed representation to IEEE floating point) may cause α to dip as E converges to some small, finite value, as exhibited by the zfp curve. The other three compressors all show surprising behavior, though for the sake of this discussion, I am interested only in why TTHRESH gets stuck close to R = 31 regardless of the PSNR setting.

The text was updated successfully, but these errors were encountered:

rballester · 2022-03-24T16:57:02Z

Thanks for finding this. After reproducing the scenario I found the SVD computation to be the culprit. TTHRESH approximates the left singular vectors of a matrix M using the eigendecomposition of M*M^T, which is a fast trick but known to introduce some inaccuracies. I just tried using true SVD and was able to surpass 190dB with it; however, that approach is significantly slower. Is this scenario (which happens at relative error = 1e-9) relevant enough to warrant using SVD? Sure, the program could choose either method on the fly based on the level of error requested, but including Eigen's SVD source makes compilation slower (+60% or so in my machine).

lindstro · 2022-03-24T17:18:03Z

Given that running TTHRESH on this data (256^3) takes on the order of 5 minutes, I'm not too concerned about adding a few seconds of compile time. :-) But this would be an issue if the SVD computation is also significantly slower.

So is the issue that computing the HOSVD only approximately gives an "incorrect" basis that results in an incorrect error bound? If so, could we address this by using whichever orthonormal basis actually gets computed in the estimation of the error bound? I'm not sure I fully understand the issues involved.

lindstro · 2022-04-01T00:46:56Z

I have run into a similar yet seemingly different issue, where at some point the rate spikes when making smallish, 6.02 dB changes to the target PSNR, thus halving the target RMS error. In the plot below, the jump in rate from 6.82 to 28.1 bits occurs when increasing the target PSNR from 204.7 to 210.7 dB. In the previous case above, the rate simply hits a wall, whereas below it makes a large jump. Observed PSNR, however, increases as expected, though increasing the target PSNR above this level does nothing. Could this, too, be related to SVD accuracy? Why the dramatic drop in accuracy gain?

FYI, the test data below is the Miranda viscosity field from SDRBench, where the indices along each dimension have been randomly shuffled. Such shuffling has no impact on TTHRESH but obviously impacts smoothness for the other compressors.

lindstro · 2022-07-05T14:46:37Z

So the latest commit 4727dc9 does seem to address the error floor but also appears to have some unintended consequences for some (not all) data sets. The accuracy gain for the 256x256x256 double-precision Gaussian error data set (the first one presented in this issue) has now dropped by roughly 3 bits/value (see below). Another concern is the slow leakage in accuracy gain, where the slope is negative rather than zero over most of the plot. Any ideas?

The shuffled viscosity field does not suffer from this.

lindstro · 2022-07-05T16:54:13Z

Here's an example of running TTHRESH cd6bb29 on the Gaussian data:

tthresh -t double -s 256 256 256 -p 100 -i gauss.bod -o /tmp/gauss.bod -c /tmp/gauss.tthresh
oldbits = 1073741824, newbits = 267064400, compressionratio = 4.02054, bpv = 15.9183
eps = 3.42735e-05, rmse = 3.42735e-05, psnr = 104.026

With the latest version, TTHRESH does not reach the PSNR target, yet the compression ratio is largely unaffected:

tthresh -t double -s 256 256 256 -p 100 -i gauss.bod -o /tmp/gauss.bod -c /tmp/gauss.tthresh
k=-15 m=6.10352e-05=0x1p-14 err=0.000000e+00
k=-14 m=0.00012207=0x1p-13 err=2.082926e-02
oldbits = 1073741824, newbits = 265485712, compressionratio = 4.04444, bpv = 15.8242
eps = 0.000246452, rmse = 0.000246452, psnr = 86.8902

This was referenced Mar 25, 2022

Poor compression & quality for difficult-to-compress data szcompressor/SZ#87

Open

Poor compression & quality for difficult-to-compress data CODARcode/MGARD#189

Open

rballester closed this as completed Dec 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot exceed rate or quality #7

Cannot exceed rate or quality #7

lindstro commented Mar 24, 2022

rballester commented Mar 24, 2022

lindstro commented Mar 24, 2022

lindstro commented Apr 1, 2022

lindstro commented Jul 5, 2022

lindstro commented Jul 5, 2022

Cannot exceed rate or quality #7

Cannot exceed rate or quality #7

Comments

lindstro commented Mar 24, 2022

rballester commented Mar 24, 2022

lindstro commented Mar 24, 2022

lindstro commented Apr 1, 2022

lindstro commented Jul 5, 2022

lindstro commented Jul 5, 2022