Parameter Access

The following parameters of 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 are defined in the HLIB::CFG namespace with corresponding sub namespaces.

You may either read/write the parameters within your source code, e.g.,

HLIB::CFG::BLAS::use_rrqr_trunc = true;

or via environment variables. In the latter case, "::" is replaced by "_" and "CFG" is omitted, e.g., instead of

HLIB::CFG::BLAS::use_rrqr_trunc

the environment variable

HLIB_BLAS_use_rrqr_trunc

is used with boolean values defined by 0 or 1.

Furthermore, a list of all parameters with values may be printed via

HLIB::CFG::print_parameters();

Configuration Files

If the file .hlibpro.conf is present in the current working directory, it is read for setting the parameter values. The syntax of the file is equal to typical configuration files, e.g.

[section]

parameter=value

Each of the below described parameter topics (BLAS, Cluster, Build, Arith, Solver, BEM and IO) defines a config section. Within a section the parameter name is equal to the C++ name.

An example for such a configuration file looks like

[BLAS]
check_inf_nan=0
trunc_method=1
approx_method=2
[Build]
recompress=false
[Arith]
use_accu=1
lazy_fac=1

Comments are supported with the prefix #, e.g.,

[Arith]

#use_accu=1

C Bindings

In the C bindings the corresponding functions for setting, reading and printing the parameters are

void hlib_set_config       ( const char *             option,
                             const char *             value,
                             int *                    info );
void hlib_get_config       ( const char *             option,
                             char *                   value,
                             const size_t             len,
                             int *                    info );
void hlib_print_parameters ();

Please note, that the names and values correspond to the names and values of the environment variables and not of the C++ parameters.

Parameter Description

HLIB::CFG::BLAS

Name	Type	Description	Default Value
`approx_method`	int	use SVD (0), RRQR(1) or Rand-SVD (2) for dense approximation	0
`trunc_method`	int	use SVD (0), RRQR(1) or Rand-SVD (2) for low-rank truncation	0
`power_steps`	int	number of steps of power iteration in Rand-SVD	0
`sample_size`	int	block size in adaptive Rand-SVD	8
`oversampling`	int	oversampling size in fixed rank Rand-SVD	0
`use_double_prec`	bool	use double precision SVD for single precision types	false
`use_gesvj`	bool	use gesvj instead of gesvd/gesdd	false
`gesvd_limit`	idx_t	upper matrix size limit for using gesvd (gesdd for larger matrices)	1000
`check_zeroes`	bool	check for and remove zero rows/columns in input matrices of SVD	false
`check_inf_nan`	bool	check for INF/NAN values in input data	false

approx_method / trunc_method

𝖧𝖫𝖨𝖡𝗉𝗋𝗈 implements three different approximation/truncation algorithms based on singular value decomposition (SVD), Rank-revealing QR (RRQR) and randomized SVD (Rand-SVD).

By default, SVD is used. This algorithm will always return the best approximation but but usually also has the longest runtime.

Rank-revealing QR (RRQR) uses column pivoted QR to determine the numerical rank of a given matrix. This is used in 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 to truncated low-rank matrices with respect to a given precision or rank. In contrast to the SVD based algorithm, RRQR does usually not compute a best approximate, which normally results in a larger rank of the low-rank matrices or reduced accuracy in case of a fixed rank. However, the RRQR algorithm is faster than SVD, which results in a better runtime of the H-algorithm.

Rand-SVD uses randomized algorithms to compute matrix approximations. It may be used for fixed and adaptive rank computations. In the latter case, an iteration is used to compute a sufficient approximate. Various parameters influence the behaviour of Rand-SVD and also the possible outcome of the approximation.

Please note, that the behaviour of all algorithms is strongly depended on the given H-matrix. While in some cases, RRQR or Rand-SVD may be faster compared to SVD, the latter may be faster in other cases due to higher ranks computed by RRQR/Rand-SVD.

use_double_prec

If 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 is compiled with single precision value types, setting use_double_prec to true, results in using double precision for computing the SVD during truncation. This increases accuracy and may reduce memory due to more accurate rank estimates.

use_gesvj

LAPACK provides an additional SVD algorithm (gesvj), which is often more accurate compared to the standard algorithms (gesvd, gesdd). Especially for extreme cases, e.g., very small singular values, this may be a better choice. However, gesvj is slower compared to the other algorithms (usually a factor of two).

check_zeroes

The SVD based algorithm may have problems of the input matrix has zero rows/columns (more precise the LAPACK algorithms used by the truncation). To fix this, such rows/columns can be eliminated from the input matrices.

HLIB::CFG::Cluster

Name	Type	Description	Default Value
`nmin`	uint	upper limit for minimal block size	60
`split_mode`		default split mode during geometrical clusterung	`adaptive_split_axis`
`adjust_bbox`	bool	adjust bounding box during clustering to local indices and not as defined by parent partitioning	true
`cluster_level_mode`		permit block clusters with clusters from different levels or not	`cluster_level_equal`
`sync_interface_depth`	bool	synchronise depth of interface clusters with domain clusters in ND case	true
`sort_wrt_size`	bool	sort sub clusters according to size, e.g. larger clusters first	false
`build_scc`	bool	compute strongly coupled components prior to algebraic clustering	true
`METIS_random`	bool	use randomized seed for METIS	true

nmin

The optimal value for nmin depends on the given problem and what is to be optimised: memory or runtime. As a rule of thumb, the smaller nmin the less memory is used and the larger nmin the better the runtime.

The default value is chosen based on benchmarks for dense matrices. For sparse matrices a higher value is often better (nmin = 100 ... 300).

split_mode

When partitioning a given geometry using binary space partitioning (BSP), the split axis may be chosen regularly (regular_split_axis), i.e., cycling through x, y and z axis or adpative (adaptive_split_axis), i.e., choose the longest axis. Theory usually assumes regular splitting while adaptive splitting often results in a better (less memory, better runtime) partitioning.

adjust_bbox

During BSP the sub domains may be assigned a bounding box by splitting the bounding box of the parent domain along the split axis or by recomputing the bounding box based on the coordinate information of the indices in the corresponding sub domains. The latter method usually reduces bounding box sizes and leads to a coarser partitioning of the H-matrix, which in practise is often more optimal (less memory, better runtime). However, theory usually assumes a non-adaptive bounding box.

cluster_level_mode

When building block clusters, i.e., a product of two clusters, normally only clusters of the same level are permitted to form block clusters. For some applications it may result in a better partitioning when also allowing clusters from different levels during block cluster tree construction (see also sync_interface_depth).

sync_interface_depth

If nested dissection is used, the depth of the interface tree is normally adjusted to the depth of the domain trees. For some applications, switching this off may result in a better partitioning (see also cluster_level_mode).

HLIB::CFG::Build

Name	Type	Description	Default Value
`recompress`	bool	recompress result of low-rank approximation	true
`coarsen`	real	agglomorate low-rank subblocks during construction	false
`to_dense`	bool	convert large rank low-rank matrices to dense to save memory	true
`to_dense_ratio`	double	defines ratio for dense conversion, i.e., convert if rank >= min(#rows,#cols) * ratio	0.5
`pure_dense`	bool	always convert low-rank matrices to dense format (debugging)	false
`use_sparsemat`	bool	use sparse matrices during construction	false
`use_zeromat`	bool	use special zero matrices during construction	true
`use_ghostmat`		use special ghost matrices for distributed computation	false
`check_cb_ret`	bool	check data returned by callback functions during construction	true
`symmetrise`	bool	symmetrise dense diagonal blocks of symmetric matrices	true
`aca_max_ratio`	double	maximal rank ratio (lowrank/fullrank) before stopping	0.25

recompress

When computing low-rank approximations for admissible blocks, the result may not have minimal rank if adaptive rank was chosen. In such cases, a recompression of the block is performed to ensure this property. As it only adds little to the whole computation time, it is on by default.

coarsen

By joining lowrank subblocks into a single larger lowrank matrix, the overall memory consumption may be reduced. It can also be seens as an adaptive procedure for determining admissible blocks, e.g., if the admissibility criterion for constructing the block cluster tree was to pessimistic. However, the overall computation time typically increases by one third, so this is off by default. Also, in some cases the accuracy is slightly reduced by coarsening.

to_dense / to_dense_ratio

Lowrank matrices only save memory if the rank is less then half the number of rows/columns in a block. If the rank exceeds this limit, it is better to convert the matrix format to dense representation. By default to_dense is on with to_dense_ratio set to 0.5 to optimise matrix memory.

For arithmetic, this limit might be even less since dense arithmetic is more efficient than lowrank arithmetic (the latter usually involving truncations). To optimise the conversion limit, to_dense_ratio might be set to smaller values than 0.5.

check_cb_ret

If this is on (default), the return values of callback functions, e.g., for computing matrix coefficients, is checked for illegal values, e.g., NaN or Inf.

symmetrise

In case of symmetric or hermitian matrices, this parameter controls whether dense diagonal blocks are enforced to be symmetric/hermitian. This is done by copying the coefficients from the lower triangular to the upper triangular part. By default, this is on.

aca_max_ratio

If ACA detects convergence problems during low-rank approximation, this ratio defines the maximal rank at which the iteration is stopped by comparing with the full rank, e.g., unless

\[k < \operatorname{fullrank} \cdot \operatorname{aca\_max\_ratio}\]

ACA proceeds.

HLIB::CFG::Arith

Name	Type	Description	Default Value
`use_dag`	bool	use DAG based arithmetic	true
`dag_version`	uint	DAG version to use (1: old, 2: new)	1
`use_accu`	bool	use accumulator based arithmetic	false
`lazy_eval`	bool	use lazy evaluation in arithmetic	false
`sort_updates`	bool	sort updates based on norm before summation for lazy evaluation	false
`abs_eps`	real	default absolute error bound for low-rank truncation	0
`recompress`	bool	recompress matrix after low-rank approximation	true
`to_dense`	bool	convert large rank low-rank matrices to dense to save memory	true
`to_dense_ratio`	double	defines ratio for dense conversion, i.e., convert if rank >= min(#rows,#cols) * ratio	0.5
`eval_type`		point-wise or block-wise evaluation of blocks	`block_wise`
`storage_type`		storage type of diagonal blocks in factorisation	`store_inverse`
`max_seq_size`	size_t	switching point from parallel to sequential mode	100
`max_seq_size_vec`	size_t	switching point for vector based operations	250
`coarsen`	bool	coarsen matrix during arithmetic	false
`check_cb_ret`	bool	check data returned by callback functions during construction	true
`arith_max_check`	uint	upper matrix size limit for addition tests during arithmetic	0
`pseudo_inversion`	bool	use pseudo inversion instead of real inversion	false
`symmetrise`	bool	symmetrise matrices after factorisation	true
`zero_sum_trunc`	bool	truncation of low-rank matrices if one summand has zero rank	true

use_dag

For multi and many core chips, the DAG based H-arithmetic is able to utilise the CPU resources much better than the classical recursive algorithms. Only for sequential computations, a slightly better runtime may be possible with the recursive approach due to overhead for DAG based computations.

dag_version

Define the version of DAG to use during arithmetic. Version 1 corresponds to the previous version of DAGs generated directly from H-matrix structures with global scope. Version 2 is the new approach based on refinement of DAG nodes recursively following the H-arithmetic functions with automatic dependency generation.

use_accu

If true, the accumulator based arithmetic is used, in which updates are accumulated before being applied to the destination blocks. This reduces the number of truncations and therefore the runtime of H-arithmetic. However, in some applications the accuracy of the arithmetic is slightly reduced. By default, it is off.

lazy_eval

In case of accumulator based arithmetic, updates to a specific block are only computed if this block is needed for further computations. The accuracy of lazy evaluation is inbetween standard arithmetic and eager computation. However, it is often slightly slower compared to eager but still faster compared to standard arithmetic.

If sort_updates is true, the updates are combined such that the updates with the smallest norm are summed up first.

abs_eps

Singular values smaller than abs_eps are removed from the resulting matrix during truncation independent on the given accuracy or rank.

recompress

The low-rank approximations computed during H-matrix construction may not be optimal with respect to the rank. A recompression using SVD (or RRQR) may reduce the rank and hence memorr without affecting accuracy.

eval_type

The evaluation type affects the handling of diagonal leaf blocks. Either factorisation is also performed for such blocks (point_wise) or the blocks are handled as a full block using inversion (block_wise). The latter has several advantages, e.g., permits limited pivoting during inversion and increases performance due to better cache utilisation (BLAS level 3 functions).

storage_type

When computing matrix factorisations, the diagonal leaf blocks may store either the actual result of the factorisation (store_normal) or the inverse of such blocks (store_inverse). Since for evaluation of the inverse operators, either during factorisation or during matrix-vector multiplication, the inverse if the matrix blocks is needed, storing the inverse results in better performance.

However, if also the original matrix should be evaluated using the factorised form, normal storage may be more optimal in terms of runtime.

max_seq_size (_vec)

Matrices smaller than max_seq_size are handled by H-arithmetic sequentially since this is more efficient compared to parallel handling. Same holds for max_seq_size_vec in case of matrix-vector operations.

pseudo_inversion

When computing the inverse of diagonal blocks in block-wise factorization algorithms, normal inversion might not be possible (since not regular) or unstable. In such cases, the pseudo inversion may be used to compute approximations to the inverse. By default this off.

zero_sum_trunc

If lowrank matrices are summed up, normally a truncation is performed afterwards. However, in case that one of the summands has a zero rank, this might not be necessary. The exception is, if the positive rank summand is an update which was not yet truncated to the desired accuracy. Therefore it is on by default.

HLIB::CFG::Solver

Name	Type	Description	Default Value
`max_iter`	uint	maximal number of iterations	100
`rel_res_red`	real	relative residual reduction, e.g., stop if \( \|r_n\| / \|r_0\| < \varepsilon\)	1e-8
`abs_res_red`	real	absolute residual reduction, e.g., stop if \(\|r_n\| < \varepsilon\)	1e-14
`rel_res_growth`	real	relative residual growth (divergence), e.g., stop if \(\|r_n\| / \|r_0\| > \varepsilon\)	1e6
`gmres_restart`	uint	restart for GMRES iteration	20
`initialise_start_value`	bool	initialise start value before iteration	true
`use_exact_residual`	bool	compute exact residual during iteration	false

rel_res_growth

This parameter stops the iteration in case of divergence.

initialise_start_value

Normally, the start value of the iteration is initialised as \(x_0 := W b\) with \(W\) being the preconditioner. In case of a good (or very good) preconditioner, e.g., \(W \sim A^{-1}\), this may eliminate iteration at all.

However, if the user has a good guess for the start value this behaviour may be switched off.

use_exact_residual

Most iterations approximate the residual norm during computations. Especially in the preconditioned case, this may result in a large deviation from the real residual norm. At the expense of one (unpreconditioned case) or two (preconditioned case) additional matrix-vector multiplications, the exact norm is computed instead.

HLIB::CFG::BEM

Name	Type	Description	Default Value
`quad_order`	uint	default quadrature order	4
`adaptive_quad_order`	bool	use distance adaptive quadrature order	true
`use_simd`	bool	use special vector functions if available	true
`use_simd_sse3`	bool	use special SSE3 vector functions if available	true
`use_simd_avx`	bool	use special AVX vector functions if available	true
`use_simd_avx2`	bool	use special AVX2 vector functions if available	true
`use_simd_avx512`	bool	use special AVX512 vector functions if available	true
`use_simd_mic`	bool	use special MIC vector functions if available	true
`use_simd_vsx`	bool	use special VSX vector functions if available	true

quad_order

For the evaluation of the bilinear forms a quadrature rule is used for which the default order is defined with this parameter. For high accuracy H-arithmetic, a higher order may be used, e.g., 5 or 6.

adaptive_quad_order

For far-field evaluation, the quadrature order may be reduced for normal bilinear forms, resulting in much reduced runtime.

use_simd

All implemented bilinear forms have special implementations using SIMD instructions of the CPU. Normally, this results in a faster computation compared to compiler generated code. So, in most cases this flag is only needed for comparison or for debugging.

HLIB::CFG::IO

Name	Type	Description	Default Value
`use_matlab_syntax`	bool	use Matlab syntax for printing complex numbers, vectors and stdio	off
`color_mode`		use color in terminal output if supported	true
`charset_mode`		use ascii or unicode in normal terminal output	depends on terminal
`permute_save`	bool	permute H-matrix before saving as dense matrix	true

Table of Contents

Parameter Access

Configuration Files

C Bindings

Parameter Description

HLIB::CFG::BLAS

approx_method / trunc_method

use_double_prec

use_gesvj

check_zeroes

HLIB::CFG::Cluster

nmin

split_mode

adjust_bbox

cluster_level_mode

sync_interface_depth

HLIB::CFG::Build

recompress

coarsen

to_dense / to_dense_ratio

check_cb_ret

symmetrise

aca_max_ratio

HLIB::CFG::Arith

use_dag

dag_version

use_accu

lazy_eval

abs_eps

recompress

eval_type

storage_type

max_seq_size (_vec)

pseudo_inversion

zero_sum_trunc

HLIB::CFG::Solver

rel_res_growth

initialise_start_value

use_exact_residual

HLIB::CFG::BEM

quad_order

adaptive_quad_order

use_simd

HLIB::CFG::IO