HLIBpro
2.7

Three different dense approximation and lowrank truncation algorithms are implemented in 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 based on
SVD always computes a best approximation with respect to the predefined accuracy or rank. However, it has complexity \(\mathcal{O}(n^3)\) for dense approximation and \(\mathcal{O}(k^2 n + k^3 )\) complexity for truncation with \(n\) being the matrix size and \(k\) the rank.
The complexity for RandSVD is \(\mathcal{O}(n k^2 + k^3)\) for dense approximation where \(k\) is the rank of the approximant. Truncation has the same complexity but saves a QR factorization compared to SVD. However, as this is a randomized procedure, approximation is not guaranteed.
RRQR uses a QR factorization with column pivoting instead of a SVD, which has complexity \(\mathcal{O}(n^2 k)\) with destination rank \(k\) for dense approximation. The truncation procedure again saves a QR factorization compared to SVD based truncation and slightly reduces complexity from \(\mathcal{O}(k^3)\) to \(\mathcal{O}(k'^2 k)\) with \(k'\) being the rank of the input matrix. Furthermore, RRQR has the same error control as SVD only usually with a slightly larger rank.
Leaving error control aside, the advantage of each of these methods depends on the resulting rank of the approximation, which furthermore determines the complexity of the Harithmetic. Hence, in different applications different combinations of approximation and truncation methods may provide the best performance.
The approximation algorithm is controlled in 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 with the parameter CFG::BLAS::approx_method
which is of type BLAS::approx_t:
The same type is used for the truncation method parameter CFG::BLAS::trunc_method
.
Both parameter may then either be set directly
or via environment variables
or in the configuration file .hlibpro.conf
(see also Parameters).
We will run HLU benchmarks (without using symmetry) with three different Hmatrices with all combinations of approximation and truncation algorithms.
Results are presented for runtime, memory consumption and approximation error.
All benchmarks are performed on a Intel i54670 CPU using a single CPU core.
This benchmark uses the Laplace SLP operator with a spherical with 32768 triangles and piecewise linear ansatz (see Boundary Element Methods).
approx_method  

trunc_method  SVD  RRQR  RandSVD 
SVD  32.69s  29.80s  29.20s 
RRQR  24.81s  22.04s  21.54s 
RandSVD  33.22s  30.46s  29.89s 
approx_method  

trunc_method  SVD  RRQR  RandSVD 
SVD  385 MB  385 MB  385 MB 
RRQR  387 MB  388 MB  387 MB 
RandSVD  385 MB  385 MB  385 MB 
approx_method  

trunc_method  SVD  RRQR  RandSVD 
SVD  1.47₁₀3  1.47₁₀3  1.47₁₀3 
RRQR  1.45₁₀3  1.45₁₀3  1.45₁₀3 
RandSVD  1.56₁₀3  1.74₁₀3  1.62₁₀3 
The best runtime and also the best approximation is obtained with RRQR truncation. The dense approximation technique has a smaller effect on the total runtime, albeit the RandSVD method show a slight advantage here without sacrificing accuracy.
However, using RandSVD for lowrank truncation clearly results in higher runtime and reduced accuracy.
All methods show very similar behaviour in terms of memory consumption. However, although having the smallest numbers for runtime, memory consumption is slightly increased for RRQR based truncation, indicating increased ranks after truncation compared to SVD or RandSVD.
This benchmark uses the Helmholtz SLP operator with a spherical with 32768 triangles and piecewise constant ansatz (see Boundary Element Methods).
approx_method  

trunc_method  SVD  RRQR  RandSVD 
SVD  69.91s  64.98s  62.28s 
RRQR  58.24s  53.68s  51.00s 
RandSVD  85.70s  81.01s  77.64s 
approx_method  

trunc_method  SVD  RRQR  RandSVD 
SVD  861 MB  862 MB  861 MB 
RRQR  877 MB  878 MB  877 MB 
RandSVD  861 MB  862 MB  861 MB 
approx_method  

trunc_method  SVD  RRQR  RandSVD 
SVD  6.73₁₀4  6.76₁₀4  6.72₁₀4 
RRQR  6.56₁₀4  6.59₁₀4  6.56₁₀4 
RandSVD  6.87₁₀4  6.93₁₀4  6.93₁₀4 
The overall picture is similar to the Laplace example with RRQR producing the fastest runtime, again in combination with RandSVD for dense approximation.
The accuracy is also best with RRQR based truncation but also shows a slight disadvantage for RRQR approximation.
Also for memory, RRQR shows the same behaviour as for Laplace with higher ranks in the lowrank matrices.
The third benchmark uses a sparse matrix from a 3D convection diffusion equation, partitioned using nested dissection (see Solving a Sparse Equation System). The number of unknowns is 132651.
approx_method  

trunc_method  SVD  RRQR  RandSVD 
SVD  40.10s  30.72s  29.44s 
RRQR  34.98s  24.83s  23.33s 
RandSVD  41.08s  31.06s  29.54s 
approx_method  

trunc_method  SVD  RRQR  RandSVD 
SVD  1310 MB  1320 MB  1310 MB 
RRQR  1320 MB  1330 MB  1320 MB 
RandSVD  1320 MB  1320 MB  1320 MB 
approx_method  

trunc_method  SVD  RRQR  RandSVD 
SVD  1.74₁₀3  2.11₁₀3  5.55₁₀0 
RRQR  2.04₁₀3  2.46₁₀3  5.03₁₀0 
RandSVD  1.19₁₀0  2.75₁₀0  5.92₁₀0 
In terms of runtime and memory, the behaviour stays the same as in the previous examples.
However, obviously RandSVD has an approximation problem with data data involved in these examples. The difference here is, that the matrix blocks are no longer densely populated since the original matrix is sparse. This seems to lead to problems with the randomized algorithm.
The SVD algorithm for dense approximation and lowrank truncation is still the default option in 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 due to best approximation. The runtime compared to the other options is also within bounds.
However, for optimal performance all options should be tested for a given application as significant savings can be obtained, especially with RRQR, which also guarantees error control.
See also Accumulator Arithmetic for other parameters influencing arithmetic.