HLIBpro  2.9.1



  • added apply_add with BLAS::Matrix as argument to TLinearOperator classes
  • added absolute_prec to define TTruncAcc
  • added support for NEON instruction set (Apple M1)
  • added support for Mongoose graph partitioning library (TMongooseAlgPartStrat)
  • enhanced TAlgAdmCond to define maximal number of allowed connecting edges
  • fixed issue in BLAS::qrp for matrices with nrows < ncols
  • fixed coefficient tests in TPermCoeffFn (was missing before)
  • fixed issue in TMatrixProduct with single factor
  • fixed issue in TMaternCovCoeffFn with different row/column coordinates


  • replaced HLIB::complex by std::complex
  • removed old DAG interface
  • modified pivot strategy of standard ACA (better, more robust convergence)
  • using geqr2 instead of geqrf for QR factorization (slightly faster)
  • fixes
    • replaced deprecated features of TBB
    • TGeomGroupCTBuilder: fixed handling of offsets
    • missing instantiation of BLAS::random for BLAS::Vector added
    • wrong solving flags in TLDUInvMatrix
    • in-efficient dependency handling in DAG construction for TLR/Tile-H
    • memory leak in recursive DAG construction fixed



  • fixed weak admissibility (was actually standard admissibility)
  • fixed MBLR clustering (ordering only in one dimension led to extremely rectangular clusters)
  • fixed bug in lowrank approximation (wrong type conversion)
  • fixed non-SIMD implementation of TExpBF (still had additional factor)
  • fixed TDenseCoeffFn in case given dense matrix had non-zero row/column offsets
  • replaced tbb::mutex and tbb::atomic by std versions since marked obsolete in recent TBB versions


  • improved new DAG generation system (better speed and parallel scalability) and made it the default system (old version still available with CFG::dag_version=1)
  • moved non-generic include files into hpro sub-directory for better separation with other libraries
  • added various approximation routines for sums of matrices, operators, e.g., SVD, pairwise SVD, Rand-SVD, Rand-LR and ACA for operator sums
  • expanded lazy accumulator arithmetic to move all updates to leaves only (evaluating all updates simultaneously; see CFG::Arith::lazy_eval and CFG::Arith::sum_approx)
  • added TZeroMatBuilder and build_zero_mat to construct empty matrix for given block clusters tree, e.g., as pre-initialized result of other H-matrix operation
  • parallelization of various routines during clustering, e.g., sorting, etc. (may result in slightly different clustering with different number of CPU cores)
  • bug fixes:
    • permutation of dense matrix in TMatBuilder removed (inconsistent behaviour compared to if H-matrix is built)
    • fixed update of aux. data in H-matrix in copy_nearfield
    • fixed reading of old HLIB files (wrong processor sets)
    • fixed return value of Mem::usage
    • fixed various issue when using single precision



  • fixed various compiler issues with MS Visual C++
  • added build method for coefficient functions to return dense matrix for given index set
  • C bindings:
    • added hlib_matrix_to_dense/rank
    • added hlib_matrix_approx_rank to compute low-rank approximation of given matrix with different methods
  • fixed issue in TBSPNDCTBuilder when no interface is present
  • fixed issue in HLIB::Mem::usage


  • fixed issue with HDF5 library but removed support from binary distribution due to linking problems with newer version of libHDF5
  • added missing functions to sequential NET interface
  • added functions to directly set LR matrices in TRkMatrix
  • using internal grid generation also in example code (laplace/helmholtz)
  • additional spherical grid (different start grid for "inbetween" steps)
  • improved coordinate visualization in PostScript format (better minimal distance estimate)


  • new DAG generation based on recursive algorithms with automatic deduction of dependencies between nodes (default: previous DAG; see CFG::dag_version),
  • new coefficient function for Matern kernel (TMaternCovCoeffFn) and exponential ( \(e^{-|x-y|_2}\)) bilinear form (TExpBF),
  • functions for computing low-rank approximations for sum of matrices directly (using pair-wise SVD approx_sum_svd, randomized SVD approx_sum_randsvd or randomized low-rank approx approx_sum_randlr.),
  • ACA:
    • modified stop criterion for ACA (user controllable maximal rank CFG::Build::aca_max_ratio),
    • added dense fallback for ACA if not converging with only computing those coefficients, not yet computed,
  • added MBLR cluster tree construction (TMBLRCTBuilder),
  • modified handling of matrix coefficient functions, especially TPermCoeffFn. Instead of
    TMyCoeffFn coeff_fn( ..., ct->perm_i2e(), ct->perm_i2e() );
    please use
    TMyCoeffFn my_coeff_fn( ... );
    TPermCoeffFn< ... > coeff_fn( & my_coeff_fn, ct->perm_i2e(), ct->perm_i2e() );
  • (limited) grid generation and refinement in BEM library (see Boundary Element Methods),
  • using libmvec for sin, cos and exp if available (glibc v2.22 and up) with significant speedups in complex valued computations.


  • Accumulator based H-arithmetic reducing number of truncations with support for lazy and eager evaluation
  • added randomized SVD and implemented dense approximation and lowrank truncation for all types (SVD, RRQR and Rand-SVD)
  • also added lowrank approximation algorithms for RRQR and Rand-SVD for H-construction (TRRQRLRApx, TRandSVDLRApx)
  • support for special flat H-hierarchy with optimised arithmetic functions, e.g., in-place inversion.
  • support for block refinement during matrix construction, e.g., if admissibility gives false positives
  • added infinity matrix norm (TInfinityNorm and norm_inf())
  • implemented TOffDiagAdmCond with all off-diagonal blocks being admissible
  • massive code restructuring and cleanup
  • initial support for HDF5 matrix IO (dense and lowrank)
  • support for VSX instruction set (POWER CPUs)
  • special handling for all BLAS functions in case of parallel Intel MKL
  • added parameter configuration with config files
  • added some functions to simplify solver stopping criterion
  • changed behaviour/incompatibilities with previous versions:
    • removed all permutation handling from THMatrix and TNearfieldMulVec
    • no recompression in ACA/HCA (now only in matrix builders)
    • ptrcast() now consistent with cptrcast(), i.e., no * needed
    • some parameter reorganization (see Parameters)
    • previous TMatrix::copy_struct is renamed to TMatrix::copy_struct_from (TMatrix::copy_struct will now return a matrix copy without data)



  • added operator for matrix sum (TMatrixSum) in addition to matrix products
  • missing bindings in C interface (matrix product/sum, apply, apply_add)
  • fixed issue in clustering with predefined partition with many groups
  • support for more matrix formats and more robust IO (if files do not follow standard) for Harwell-Boeing/Matrix-Market format
  • re-enabled parallel block cluster tree construction on shared memory
  • bug fixes:
    • in solve_diag_left_block
    • matrix solves in TLLInvMatrix


  • implemented rank revealing QR based low-rank truncation
  • Solvers:
    • added CGS and TFQMR solvers
    • added support for matrix solves in linear iteration (also H-matrices!)
    • optional computation of exact residual during iteration
    • simplified handling of stop criterion parameters
    • using status field in TSolverInfo instead of exception if solver fails (e.g., breakdown)
    • some code restructuring
  • support for block-wise Jacobi and Gauss-Seidel operators
  • support for AVX512 instruction set
  • many new user controllable parameters (see Parameters)
  • misc.:
    • handling of diagonal in factorisation (inverse or normal) now a runtime option (default: inverse)
    • added optional distance for TWeakAlgAdmCond to support distances other than one
    • support for fixed rank 0
    • correct progress bar support for WAZ factorisation and inversion
    • some reorganization of source/header files
  • C bindings:
    • added hlib_admcond_geom_hilo for THiLoFreqGeomAdmCond
    • additional parameter for blockdiag functions (blocksize)
  • fixed serious issue with Intel TBB and with current Intel TBB-based Intel MKL
  • various bug fixes


  • Added factorization of inverse matrix \(WAZ = I\), enabling vector solves using matrix vector mult. instead of forward/backward solves with much better parallel speedup.
  • Significant improvements in parallel performance of matrix inversion.
  • Improved performance of LU, matrix-vector mult., forward/backward solves.
  • Added function nearfield_sparse to extract H-matrix nearfield as sparse matrix.
  • Switched to adaptive_split_axis as default for clustering.
  • Minor Changes:
    • Additional options for matrix visualization (colormap, etc.)
    • Basic VTK output of block clusters.



  • fixed various bugs and race conditions
  • extended ctors of various matrix classes to accept optional value type field
  • added example on how to assemble block matrices


  • Fixed two bugs in point-wise LU.
  • Solver changes:
    • Refactored solver classes (no interface changes); added TRichardson to replace TSolver in the future.
    • Fixed inconsistent computation of residual norm in solvers. Now Richardson, CG and BiCG will compute standard residual norm, while MINRES and GMRES compute preconditioned residual norm.
    • Made initialisation of start vector in solver classes optional (function initialise_start_value)
  • Added function "diagonal" to extract diagonal of a matrix.
  • Added example "spectrum" to compute spectrum of graph Laplacian.


  • Fixed issues when solving dense matrices (used in new example for many RHSs).
  • Modified THiLoFreqGeomAdmCond: now maximal number of wavelengths per cluster is tested.
  • Refactored geometrical clustering classes and partitioning strategies, thereby fixing several issues.
  • C++11 changes:
    • most object creating functions now return std::unique_ptr,
    • replaced typedef by using,
    • added iterators for TIndexSet, TNodeSet, TGraph, TProcSet (for range based for).
    • Note: needs at least GCC v4.7 or equivalent!
  • Added parameter to algebraic clustering in C bindings to define partitioning algorithm (BFS, multi level, METIS or Scotch).
  • Fixed issues with progress bar during factorisation (wrong block count).
  • Removed BSP style comminucation functions (MPI only now).
  • Finished conversion to new packed_t SIMD type. Using SSE3 instead of SSE2.
  • Added lock to TScotchAlgPartStrat because Scotch is not multi thread safe.


  • Removing implicit reordering of unknowns during matrix-vector multiplication to fix inconsistent behaviour. Please use permutations from cluster trees or H-matrices to reorder vectors or TPermMatrix to represent permuted matrices instead.
  • Speedup improvements for matrix inversion. Triangular inversion and matrix multiplication available in standard user interface.
  • Import/export from/to CCS/CRS matrices simplified.
  • Simplified (and faster) mutex wrapper.
  • Several C++11 changes.


  • Removing reference counters in BLAS interface due to major performance issue on multi-core (-socket) systems. See BLAS/LAPACK Interface on how to use the modified interface (and avoid errors).
  • New, scalable matrix-vector multiplication implemented.
  • Using generic datatype for SIMD instructions, thereby enabling generic SIMD algorithms, e.g. for BEM kernels, and fast adoptation of new SIMD instructions, e.g. AVX2.
  • Started to use block-wise operations if dense matrices are combined with blocked matrices (e.g. during matrix multiplication) instead of vector operations.
  • Removed TVirtualVector (replaced by TScalarVector).
  • Fixed issue with MatrixMarket format (leading whitespaces).



  • fixed race condition in C bindings
  • fixed issue with initialisation of static variables


  • bug fixes


  • Major Changes
    • Switched from OpenMP to Threading Building Blocks as interface to shared memory parallelism, thereby also changing most algorithms to task-based parallelism.
    • Reducing dependency on external libraries by using C++11 features. Also replacing some classes by default C++ versions (finally removing old code).
    • Alternative, non-recursive, level-wise ℋ-LU factorisation based on explicit block dependencies, which provides far better speedup on many-core systems, e.g. Intel MIC architecture.
    • New ℋ-LU factorisation algorithm also applicable in distributed environments, yielding better load-balancing (albeit with limited speedup).
    • Added support for multiple CPUs to many algorithms, e.g. in clustering, norm computations, matrix-vector multiplication and solves, ℋ²-convertion.
  • Minor Changes
    • Optimised BEM kernels for Intel MIC architecture.
    • Introduced TLinearOperator for operators not supporting TMatrix functionality, e.g. factorised matrices.
    • HLIBpro file format changed due to internal changes and due to some bugs in the format. However, backward read compatibility for most files written with earlier versions is kept.
    • Added Support for Cairo library, thereby providing PDF output.
  • And of course: many smaller feature upgrades and bug fixes.