HLIBpro
3.0

The following parameters of 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 are defined in the Hpro::CFG
namespace with corresponding sub namespaces.
You may either read/write the parameters within your source code, e.g.,
or via environment variables. In the latter case, "::" is replaced by "_" and "CFG" is omitted, e.g., instead of
the environment variable
is used with boolean values defined by 0 or 1.
Furthermore, a list of all parameters with values may be printed via
If the file .hlibpro.conf is present in the current working directory, it is read for setting the parameter values. The syntax of the file is equal to typical configuration files, e.g.
Each of the below described parameter topics (BLAS, Cluster, Build, Arith, Solver, BEM and IO) defines a config section. Within a section the parameter name is equal to the C++ name.
An example for such a configuration file looks like
Comments are supported with the prefix #
, e.g.,
In the C bindings the corresponding functions for setting, reading and printing the parameters are
Please note, that the names and values correspond to the names and values of the environment variables and not of the C++ parameters.
Name  Type  Description  Default Value 

approx_method  int  use SVD (0), RRQR(1) or RandSVD (2) for dense approximation  0 
trunc_method  int  use SVD (0), RRQR(1) or RandSVD (2) for lowrank truncation  0 
power_steps  int  number of steps of power iteration in RandSVD  0 
sample_size  int  block size in adaptive RandSVD  8 
oversampling  int  oversampling size in fixed rank RandSVD  0 
use_double_prec  bool  use double precision SVD for single precision types  false 
use_gesvj  bool  use gesvj instead of gesvd/gesdd  false 
gesvd_limit  idx_t  upper matrix size limit for using *gesvd (*gesdd for larger matrices)  1000 
check_zeroes  bool  check for and remove zero rows/columns in input matrices of SVD  false 
check_inf_nan  bool  check for INF/NAN values in input data  false 
𝖧𝖫𝖨𝖡𝗉𝗋𝗈 implements three different approximation/truncation algorithms based on singular value decomposition (SVD), Rankrevealing QR (RRQR) and randomized SVD (RandSVD).
By default, SVD is used. This algorithm will always return the best approximation but but usually also has the longest runtime.
Rankrevealing QR (RRQR) uses column pivoted QR to determine the numerical rank of a given matrix. This is used in 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 to truncated lowrank matrices with respect to a given precision or rank. In contrast to the SVD based algorithm, RRQR does usually not compute a best approximate, which normally results in a larger rank of the lowrank matrices or reduced accuracy in case of a fixed rank. However, the RRQR algorithm is faster than SVD, which results in a better runtime of the Halgorithm.
RandSVD uses randomized algorithms to compute matrix approximations. It may be used for fixed and adaptive rank computations. In the latter case, an iteration is used to compute a sufficient approximate. Various parameters influence the behaviour of RandSVD and also the possible outcome of the approximation.
Please note, that the behaviour of all algorithms is strongly depended on the given Hmatrix. While in some cases, RRQR or RandSVD may be faster compared to SVD, the latter may be faster in other cases due to higher ranks computed by RRQR/RandSVD.
If 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 is compiled with single precision value types, setting use_double_prec
to true
, results in using double precision for computing the SVD during truncation. This increases accuracy and may reduce memory due to more accurate rank estimates.
LAPACK provides an additional SVD algorithm (gesvj
), which is often more accurate compared to the standard algorithms (gesvd
, gesdd
). Especially for extreme cases, e.g., very small singular values, this may be a better choice. However, gesvj
is slower compared to the other algorithms (usually a factor of two).
The SVD based algorithm may have problems of the input matrix has zero rows/columns (more precise the LAPACK algorithms used by the truncation). To fix this, such rows/columns can be eliminated from the input matrices.
Name  Type  Description  Default Value 

nmin  uint  upper limit for minimal block size  60 
split_mode  default split mode during geometrical clusterung  adaptive_split_axis  
adjust_bbox  bool  adjust bounding box during clustering to local indices and not as defined by parent partitioning  true 
cluster_level_mode  permit block clusters with clusters from different levels or not  cluster_level_equal  
sync_interface_depth  bool  synchronise depth of interface clusters with domain clusters in ND case  true 
sort_wrt_size  bool  sort sub clusters according to size, e.g. larger clusters first  false 
build_scc  bool  compute strongly coupled components prior to algebraic clustering  true 
METIS_random  bool  use randomized seed for METIS  true 
The optimal value for nmin
depends on the given problem and what is to be optimised: memory or runtime. As a rule of thumb, the smaller nmin
the less memory is used and the larger nmin
the better the runtime.
The default value is chosen based on benchmarks for dense matrices. For sparse matrices a higher value is often better (nmin
= 100 ... 300).
When partitioning a given geometry using binary space partitioning (BSP), the split axis may be chosen regularly (regular_split_axis
), i.e., cycling through x, y and z axis or adpative (adaptive_split_axis
), i.e., choose the longest axis. Theory usually assumes regular splitting while adaptive splitting often results in a better (less memory, better runtime) partitioning.
During BSP the sub domains may be assigned a bounding box by splitting the bounding box of the parent domain along the split axis or by recomputing the bounding box based on the coordinate information of the indices in the corresponding sub domains. The latter method usually reduces bounding box sizes and leads to a coarser partitioning of the Hmatrix, which in practise is often more optimal (less memory, better runtime). However, theory usually assumes a nonadaptive bounding box.
When building block clusters, i.e., a product of two clusters, normally only clusters of the same level are permitted to form block clusters. For some applications it may result in a better partitioning when also allowing clusters from different levels during block cluster tree construction (see also sync_interface_depth
).
If nested dissection is used, the depth of the interface tree is normally adjusted to the depth of the domain trees. For some applications, switching this off may result in a better partitioning (see also cluster_level_mode
).
Name  Type  Description  Default Value 

recompress  bool  recompress result of lowrank approximation  true 
coarsen  real  agglomorate lowrank subblocks during construction  false 
to_dense  bool  convert large rank lowrank matrices to dense to save memory  true 
to_dense_ratio  double  defines ratio for dense conversion, i.e., convert if rank >= min(#rows,#cols) * ratio  0.5 
pure_dense  bool  always convert lowrank matrices to dense format (debugging)  false 
use_sparsemat  bool  use sparse matrices during construction  false 
use_zeromat  bool  use special zero matrices during construction  true 
use_ghostmat  use special ghost matrices for distributed computation  false  
check_cb_ret  bool  check data returned by callback functions during construction  true 
symmetrise  bool  symmetrise dense diagonal blocks of symmetric matrices  true 
aca_max_ratio  double  maximal rank ratio (lowrank/fullrank) before stopping  0.25 
When computing lowrank approximations for admissible blocks, the result may not have minimal rank if adaptive rank was chosen. In such cases, a recompression of the block is performed to ensure this property. As it only adds little to the whole computation time, it is on by default.
By joining lowrank subblocks into a single larger lowrank matrix, the overall memory consumption may be reduced. It can also be seens as an adaptive procedure for determining admissible blocks, e.g., if the admissibility criterion for constructing the block cluster tree was to pessimistic. However, the overall computation time typically increases by one third, so this is off by default. Also, in some cases the accuracy is slightly reduced by coarsening.
Lowrank matrices only save memory if the rank is less then half the number of rows/columns in a block. If the rank exceeds this limit, it is better to convert the matrix format to dense representation. By default to_dense
is on with to_dense_ratio
set to 0.5 to optimise matrix memory.
For arithmetic, this limit might be even less since dense arithmetic is more efficient than lowrank arithmetic (the latter usually involving truncations). To optimise the conversion limit, to_dense_ratio
might be set to smaller values than 0.5.
If this is on (default), the return values of callback functions, e.g., for computing matrix coefficients, is checked for illegal values, e.g., NaN or Inf.
In case of symmetric or hermitian matrices, this parameter controls whether dense diagonal blocks are enforced to be symmetric/hermitian. This is done by copying the coefficients from the lower triangular to the upper triangular part. By default, this is on.
If ACA detects convergence problems during lowrank approximation, this ratio defines the maximal rank at which the iteration is stopped by comparing with the full rank, e.g., unless
\[k < \operatorname{fullrank} \cdot \operatorname{aca\_max\_ratio}\]
ACA proceeds.
Name  Type  Description  Default Value 

use_dag  bool  use DAG based arithmetic  true 
dag_version  uint  DAG version to use (1: old, 2: new)  1 
use_accu  bool  use accumulator based arithmetic  false 
lazy_eval  bool  use lazy evaluation in arithmetic  false 
sort_updates  bool  sort updates based on norm before summation for lazy evaluation  false 
abs_eps  real  default absolute error bound for lowrank truncation  0 
recompress  bool  recompress matrix after lowrank approximation  true 
to_dense  bool  convert large rank lowrank matrices to dense to save memory  true 
to_dense_ratio  double  defines ratio for dense conversion, i.e., convert if rank >= min(#rows,#cols) * ratio  0.5 
eval_type  pointwise or blockwise evaluation of blocks  block_wise  
storage_type  storage type of diagonal blocks in factorisation  store_inverse  
max_seq_size  size_t  switching point from parallel to sequential mode  100 
max_seq_size_vec  size_t  switching point for vector based operations  250 
coarsen  bool  coarsen matrix during arithmetic  false 
check_cb_ret  bool  check data returned by callback functions during construction  true 
arith_max_check  uint  upper matrix size limit for addition tests during arithmetic  0 
pseudo_inversion  bool  use pseudo inversion instead of real inversion  false 
symmetrise  bool  symmetrise matrices after factorisation  true 
zero_sum_trunc  bool  truncation of lowrank matrices if one summand has zero rank  true 
For multi and many core chips, the DAG based Harithmetic is able to utilise the CPU resources much better than the classical recursive algorithms. Only for sequential computations, a slightly better runtime may be possible with the recursive approach due to overhead for DAG based computations.
Define the version of DAG to use during arithmetic. Version 1 corresponds to the previous version of DAGs generated directly from Hmatrix structures with global scope. Version 2 is the new approach based on refinement of DAG nodes recursively following the Harithmetic functions with automatic dependency generation.
If true, the accumulator based arithmetic is used, in which updates are accumulated before being applied to the destination blocks. This reduces the number of truncations and therefore the runtime of Harithmetic. However, in some applications the accuracy of the arithmetic is slightly reduced. By default, it is off.
In case of accumulator based arithmetic, updates to a specific block are only computed if this block is needed for further computations. The accuracy of lazy evaluation is inbetween standard arithmetic and eager computation. However, it is often slightly slower compared to eager but still faster compared to standard arithmetic.
If sort_updates
is true
, the updates are combined such that the updates with the smallest norm are summed up first.
Singular values smaller than abs_eps
are removed from the resulting matrix during truncation independent on the given accuracy or rank.
The lowrank approximations computed during Hmatrix construction may not be optimal with respect to the rank. A recompression using SVD (or RRQR) may reduce the rank and hence memorr without affecting accuracy.
The evaluation type affects the handling of diagonal leaf blocks. Either factorisation is also performed for such blocks (point_wise
) or the blocks are handled as a full block using inversion (block_wise
). The latter has several advantages, e.g., permits limited pivoting during inversion and increases performance due to better cache utilisation (BLAS level 3 functions).
When computing matrix factorisations, the diagonal leaf blocks may store either the actual result of the factorisation (store_normal
) or the inverse of such blocks (store_inverse
). Since for evaluation of the inverse operators, either during factorisation or during matrixvector multiplication, the inverse if the matrix blocks is needed, storing the inverse results in better performance.
However, if also the original matrix should be evaluated using the factorised form, normal storage may be more optimal in terms of runtime.
Matrices smaller than max_seq_size
are handled by Harithmetic sequentially since this is more efficient compared to parallel handling. Same holds for max_seq_size_vec
in case of matrixvector operations.
When computing the inverse of diagonal blocks in blockwise factorization algorithms, normal inversion might not be possible (since not regular) or unstable. In such cases, the pseudo inversion may be used to compute approximations to the inverse. By default this off.
If lowrank matrices are summed up, normally a truncation is performed afterwards. However, in case that one of the summands has a zero rank, this might not be necessary. The exception is, if the positive rank summand is an update which was not yet truncated to the desired accuracy. Therefore it is on by default.
Name  Type  Description  Default Value 

max_iter  uint  maximal number of iterations  100 
rel_res_red
 real  relative residual reduction, e.g., stop if \( r_n / r_0 < \varepsilon\)  1e8 abs_res_red
 real  absolute residual reduction, e.g., stop if \(r_n < \varepsilon\)  1e14 rel_res_growth
 real  relative residual growth (divergence), e.g., stop if \(r_n / r_0 > \varepsilon\)  1e6 gmres_restart
 uint  restart for GMRES iteration  20 initialise_start_value
 bool  initialise start value before iteration  true use_exact_residual
 bool  compute exact residual during iteration  false
This parameter stops the iteration in case of divergence.
Normally, the start value of the iteration is initialised as \(x_0 := W b\) with \(W\) being the preconditioner. In case of a good (or very good) preconditioner, e.g., \(W \sim A^{1}\), this may eliminate iteration at all.
However, if the user has a good guess for the start value this behaviour may be switched off.
Most iterations approximate the residual norm during computations. Especially in the preconditioned case, this may result in a large deviation from the real residual norm. At the expense of one (unpreconditioned case) or two (preconditioned case) additional matrixvector multiplications, the exact norm is computed instead.
Name  Type  Description  Default Value 

quad_order  uint  default quadrature order  4 
adaptive_quad_order  bool  use distance adaptive quadrature order  true 
use_simd  bool  use special vector functions if available  true 
use_simd_sse3  bool  use special SSE3 vector functions if available  true 
use_simd_avx  bool  use special AVX vector functions if available  true 
use_simd_avx2  bool  use special AVX2 vector functions if available  true 
use_simd_avx512  bool  use special AVX512 vector functions if available  true 
use_simd_mic  bool  use special MIC vector functions if available  true 
use_simd_vsx  bool  use special VSX vector functions if available  true 
For the evaluation of the bilinear forms a quadrature rule is used for which the default order is defined with this parameter. For high accuracy Harithmetic, a higher order may be used, e.g., 5 or 6.
For farfield evaluation, the quadrature order may be reduced for normal bilinear forms, resulting in much reduced runtime.
All implemented bilinear forms have special implementations using SIMD instructions of the CPU. Normally, this results in a faster computation compared to compiler generated code. So, in most cases this flag is only needed for comparison or for debugging.
Name  Type  Description  Default Value 

use_matlab_syntax  bool  use Matlab syntax for printing complex numbers, vectors and stdio  off 
color_mode  use color in terminal output if supported  true  
charset_mode  use ascii or unicode in normal terminal output  depends on terminal  
permute_save  bool  permute Hmatrix before saving as dense matrix  true 