HLIBpro  3.0
Approximation and Truncation Algorithms

Introduction

Three different dense approximation and low-rank truncation algorithms are implemented in 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 based on

  • Singular Value Decomposition (SVD),
  • Randomized SVD (Rand-SVD) and
  • Rank Revealing QR (RRQR).

SVD always computes a best approximation with respect to the predefined accuracy or rank. However, it has complexity \(\mathcal{O}(n^3)\) for dense approximation and \(\mathcal{O}(k^2 n + k^3 )\) complexity for truncation with \(n\) being the matrix size and \(k\) the rank.

The complexity for Rand-SVD is \(\mathcal{O}(n k^2 + k^3)\) for dense approximation where \(k\) is the rank of the approximant. Truncation has the same complexity but saves a QR factorization compared to SVD. However, as this is a randomized procedure, approximation is not guaranteed.

RRQR uses a QR factorization with column pivoting instead of a SVD, which has complexity \(\mathcal{O}(n^2 k)\) with destination rank \(k\) for dense approximation. The truncation procedure again saves a QR factorization compared to SVD based truncation and slightly reduces complexity from \(\mathcal{O}(k^3)\) to \(\mathcal{O}(k'^2 k)\) with \(k'\) being the rank of the input matrix. Furthermore, RRQR has the same error control as SVD only usually with a slightly larger rank.

Leaving error control aside, the advantage of each of these methods depends on the resulting rank of the approximation, which furthermore determines the complexity of the H-arithmetic. Hence, in different applications different combinations of approximation and truncation methods may provide the best performance.

Activation

The approximation algorithm is controlled in 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 with the parameter CFG::BLAS::approx_method which is of type BLAS::approx_t:

enum approx_t : int
{
use_svd = 0,
use_rrqr = 1,
use_rand = 2
};

The same type is used for the truncation method parameter CFG::BLAS::trunc_method.

Both parameter may then either be set directly

Hpro::CFG::BLAS::approx_method = Hpro::BLAS::use_svd;
Hpro::CFG::BLAS::trunc_method = Hpro::BLAS::use_rrqr;

or via environment variables

export HPRO_BLAS_approx_method=0
export HPRO_BLAS_trunc_method=1

or in the configuration file .hlibpro.conf

[BLAS]
approx_method=0
trunc_method=1

(see also Parameters).

Benchmarks

We will run H-LU benchmarks (without using symmetry) with three different H-matrices with all combinations of approximation and truncation algorithms.

Results are presented for runtime, memory consumption and approximation error.

All benchmarks are performed on a Intel i5-4670 CPU using a single CPU core.

Laplace

This benchmark uses the Laplace SLP operator with a spherical with 32768 triangles and piecewise linear ansatz (see Boundary Element Methods).

Runtime
approx_method
trunc_method SVD RRQR RandSVD
SVD 32.69s 29.80s 29.20s
RRQR 24.81s 22.04s 21.54s
Rand-SVD 33.22s 30.46s 29.89s
Memory
approx_method
trunc_method SVD RRQR RandSVD
SVD 385 MB 385 MB 385 MB
RRQR 387 MB 388 MB 387 MB
Rand-SVD 385 MB 385 MB 385 MB
Error
approx_method
trunc_method SVD RRQR RandSVD
SVD 1.47₁₀-3 1.47₁₀-3 1.47₁₀-3
RRQR 1.45₁₀-3 1.45₁₀-3 1.45₁₀-3
Rand-SVD 1.56₁₀-3 1.74₁₀-3 1.62₁₀-3

The best runtime and also the best approximation is obtained with RRQR truncation. The dense approximation technique has a smaller effect on the total runtime, albeit the Rand-SVD method show a slight advantage here without sacrificing accuracy.

However, using Rand-SVD for low-rank truncation clearly results in higher runtime and reduced accuracy.

All methods show very similar behaviour in terms of memory consumption. However, although having the smallest numbers for runtime, memory consumption is slightly increased for RRQR based truncation, indicating increased ranks after truncation compared to SVD or Rand-SVD.

Helmholtz

This benchmark uses the Helmholtz SLP operator with a spherical with 32768 triangles and piecewise constant ansatz (see Boundary Element Methods).

Runtime
approx_method
trunc_method SVD RRQR RandSVD
SVD 69.91s 64.98s 62.28s
RRQR 58.24s 53.68s 51.00s
Rand-SVD 85.70s 81.01s 77.64s
Memory
approx_method
trunc_method SVD RRQR RandSVD
SVD 861 MB 862 MB 861 MB
RRQR 877 MB 878 MB 877 MB
Rand-SVD 861 MB 862 MB 861 MB
Error
approx_method
trunc_method SVD RRQR RandSVD
SVD 6.73₁₀-4 6.76₁₀-4 6.72₁₀-4
RRQR 6.56₁₀-4 6.59₁₀-4 6.56₁₀-4
Rand-SVD 6.87₁₀-4 6.93₁₀-4 6.93₁₀-4

The overall picture is similar to the Laplace example with RRQR producing the fastest runtime, again in combination with Rand-SVD for dense approximation.

The accuracy is also best with RRQR based truncation but also shows a slight disadvantage for RRQR approximation.

Also for memory, RRQR shows the same behaviour as for Laplace with higher ranks in the low-rank matrices.

Convection-Diffusion

The third benchmark uses a sparse matrix from a 3D convection diffusion equation, partitioned using nested dissection (see Solving a Sparse Equation System). The number of unknowns is 132651.

Runtime
approx_method
trunc_method SVD RRQR RandSVD
SVD 40.10s 30.72s 29.44s
RRQR 34.98s 24.83s 23.33s
Rand-SVD 41.08s 31.06s 29.54s
Memory
approx_method
trunc_method SVD RRQR RandSVD
SVD 1310 MB 1320 MB 1310 MB
RRQR 1320 MB 1330 MB 1320 MB
Rand-SVD 1320 MB 1320 MB 1320 MB
Error
approx_method
trunc_method SVD RRQR RandSVD
SVD 1.74₁₀-3 2.11₁₀-3 5.55₁₀-0
RRQR 2.04₁₀-3 2.46₁₀-3 5.03₁₀-0
Rand-SVD 1.19₁₀-0 2.75₁₀-0 5.92₁₀-0

In terms of runtime and memory, the behaviour stays the same as in the previous examples.

However, obviously Rand-SVD has an approximation problem with data data involved in these examples. The difference here is, that the matrix blocks are no longer densely populated since the original matrix is sparse. This seems to lead to problems with the randomized algorithm.

Conclusion

The SVD algorithm for dense approximation and low-rank truncation is still the default option in 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 due to best approximation. The runtime compared to the other options is also within bounds.

However, for optimal performance all options should be tested for a given application as significant savings can be obtained, especially with RRQR, which also guarantees error control.

See also Accumulator Arithmetic for other parameters influencing arithmetic.