v3.0

using generic value types for all major classes, e.g., matrices, vectors and linear operators
officially renamed namespace and header files to Hpro and hpro (old names still valid for compatibility)
C bindings split into separate functions per value type; also with hpro prefix now (old function names and functionality is still supported)
support for mixed precision computations in solver classes

v2.9

v2.9.1

added apply_add with BLAS::Matrix as argument to TLinearOperator classes
added absolute_prec to define TTruncAcc
added support for NEON instruction set (Apple M1)
added support for Mongoose graph partitioning library (TMongooseAlgPartStrat)
enhanced TAlgAdmCond to define maximal number of allowed connecting edges
fixed issue in BLAS::qrp for matrices with nrows < ncols
fixed coefficient tests in TPermCoeffFn (was missing before)
fixed issue in TMatrixProduct with single factor
fixed issue in TMaternCovCoeffFn with different row/column coordinates

v2.9

replaced HLIB::complex by std::complex
removed old DAG interface
modified pivot strategy of standard ACA (better, more robust convergence)
using geqr2 instead of geqrf for QR factorization (slightly faster)
fixes
- replaced deprecated features of TBB
- TGeomGroupCTBuilder: fixed handling of offsets
- missing instantiation of BLAS::random for BLAS::Vector added
- wrong solving flags in TLDUInvMatrix
- in-efficient dependency handling in DAG construction for TLR/Tile-H
- memory leak in recursive DAG construction fixed

v2.8

v2.8.1

fixed weak admissibility (was actually standard admissibility)
fixed MBLR clustering (ordering only in one dimension led to extremely rectangular clusters)
fixed bug in lowrank approximation (wrong type conversion)
fixed non-SIMD implementation of TExpBF (still had additional factor)
fixed TDenseCoeffFn in case given dense matrix had non-zero row/column offsets
replaced tbb::mutex and tbb::atomic by std versions since marked obsolete in recent TBB versions

v2.8

improved new DAG generation system (better speed and parallel scalability) and made it the default system (old version still available with CFG::dag_version=1)
moved non-generic include files into hpro sub-directory for better separation with other libraries
added various approximation routines for sums of matrices, operators, e.g., SVD, pairwise SVD, Rand-SVD, Rand-LR and ACA for operator sums
expanded lazy accumulator arithmetic to move all updates to leaves only (evaluating all updates simultaneously; see CFG::Arith::lazy_eval and CFG::Arith::sum_approx)
added TZeroMatBuilder and build_zero_mat to construct empty matrix for given block clusters tree, e.g., as pre-initialized result of other H-matrix operation
parallelization of various routines during clustering, e.g., sorting, etc. (may result in slightly different clustering with different number of CPU cores)
bug fixes:
- permutation of dense matrix in TMatBuilder removed (inconsistent behaviour compared to if H-matrix is built)
- fixed update of aux. data in H-matrix in copy_nearfield
- fixed reading of old HLIB files (wrong processor sets)
- fixed return value of Mem::usage
- fixed various issue when using single precision

v2.7

v2.7.2

fixed various compiler issues with MS Visual C++
added build method for coefficient functions to return dense matrix for given index set
C bindings:
- added hlib_matrix_to_dense/rank
- added hlib_matrix_approx_rank to compute low-rank approximation of given matrix with different methods
fixed issue in TBSPNDCTBuilder when no interface is present
fixed issue in HLIB::Mem::usage

v2.7.1

fixed issue with HDF5 library but removed support from binary distribution due to linking problems with newer version of libHDF5
added missing functions to sequential NET interface
added functions to directly set LR matrices in TRkMatrix
using internal grid generation also in example code (laplace/helmholtz)
additional spherical grid (different start grid for "inbetween" steps)
improved coordinate visualization in PostScript format (better minimal distance estimate)

v2.7

new DAG generation based on recursive algorithms with automatic deduction of dependencies between nodes (default: previous DAG; see CFG::dag_version),
new coefficient function for Matern kernel (TMaternCovCoeffFn) and exponential ( \(e^{-|x-y|_2}\)) bilinear form (TExpBF),
functions for computing low-rank approximations for sum of matrices directly (using pair-wise SVD approx_sum_svd, randomized SVD approx_sum_randsvd or randomized low-rank approx approx_sum_randlr.),
ACA:
- modified stop criterion for ACA (user controllable maximal rank CFG::Build::aca_max_ratio),
- added dense fallback for ACA if not converging with only computing those coefficients, not yet computed,
added MBLR cluster tree construction (TMBLRCTBuilder),
modified handling of matrix coefficient functions, especially TPermCoeffFn. Instead of
TMyCoeffFn coeff_fn( ..., ct->perm_i2e(), ct->perm_i2e() );

please use
TMyCoeffFn my_coeff_fn( ... );

TPermCoeffFn< ... > coeff_fn( & my_coeff_fn, ct->perm_i2e(), ct->perm_i2e() );
(limited) grid generation and refinement in BEM library (see Boundary Element Methods),
using libmvec for sin, cos and exp if available (glibc v2.22 and up) with significant speedups in complex valued computations.

v2.6

Accumulator based H-arithmetic reducing number of truncations with support for lazy and eager evaluation
added randomized SVD and implemented dense approximation and lowrank truncation for all types (SVD, RRQR and Rand-SVD)
also added lowrank approximation algorithms for RRQR and Rand-SVD for H-construction (TRRQRLRApx, TRandSVDLRApx)
support for special flat H-hierarchy with optimised arithmetic functions, e.g., in-place inversion.
support for block refinement during matrix construction, e.g., if admissibility gives false positives
added infinity matrix norm (TInfinityNorm and norm_inf())
implemented TOffDiagAdmCond with all off-diagonal blocks being admissible
massive code restructuring and cleanup
initial support for HDF5 matrix IO (dense and lowrank)
support for VSX instruction set (POWER CPUs)
special handling for all BLAS functions in case of parallel Intel MKL
added parameter configuration with config files
added some functions to simplify solver stopping criterion
changed behaviour/incompatibilities with previous versions:
- removed all permutation handling from THMatrix and TNearfieldMulVec
- no recompression in ACA/HCA (now only in matrix builders)
- ptrcast() now consistent with cptrcast(), i.e., no * needed
- some parameter reorganization (see Parameters)
- previous TMatrix::copy_struct is renamed to TMatrix::copy_struct_from (TMatrix::copy_struct will now return a matrix copy without data)

v2.5

v2.5.1

added operator for matrix sum (TMatrixSum) in addition to matrix products
missing bindings in C interface (matrix product/sum, apply, apply_add)
fixed issue in clustering with predefined partition with many groups
support for more matrix formats and more robust IO (if files do not follow standard) for Harwell-Boeing/Matrix-Market format
re-enabled parallel block cluster tree construction on shared memory
bug fixes:
- in solve_diag_left_block
- matrix solves in TLLInvMatrix

v2.5.0

implemented rank revealing QR based low-rank truncation
Solvers:
- added CGS and TFQMR solvers
- added support for matrix solves in linear iteration (also H-matrices!)
- optional computation of exact residual during iteration
- simplified handling of stop criterion parameters
- using status field in TSolverInfo instead of exception if solver fails (e.g., breakdown)
- some code restructuring
support for block-wise Jacobi and Gauss-Seidel operators
support for AVX512 instruction set
many new user controllable parameters (see Parameters)
misc.:
- handling of diagonal in factorisation (inverse or normal) now a runtime option (default: inverse)
- added optional distance for TWeakAlgAdmCond to support distances other than one
- support for fixed rank 0
- correct progress bar support for WAZ factorisation and inversion
- some reorganization of source/header files
C bindings:
- added hlib_admcond_geom_hilo for THiLoFreqGeomAdmCond
- additional parameter for blockdiag functions (blocksize)
fixed serious issue with Intel TBB and with current Intel TBB-based Intel MKL
various bug fixes

v2.4

Added factorization of inverse matrix \(WAZ = I\), enabling vector solves using matrix vector mult. instead of forward/backward solves with much better parallel speedup.
Significant improvements in parallel performance of matrix inversion.
Improved performance of LU, matrix-vector mult., forward/backward solves.
Added function nearfield_sparse to extract H-matrix nearfield as sparse matrix.
Switched to adaptive_split_axis as default for clustering.
Minor Changes:
- Additional options for matrix visualization (colormap, etc.)
- Basic VTK output of block clusters.

v2.3

v2.3.2

fixed various bugs and race conditions
extended ctors of various matrix classes to accept optional value type field
added example on how to assemble block matrices

v2.3.1

Fixed two bugs in point-wise LU.
Solver changes:
- Refactored solver classes (no interface changes); added TRichardson to replace TSolver in the future.
- Fixed inconsistent computation of residual norm in solvers. Now Richardson, CG and BiCG will compute standard residual norm, while MINRES and GMRES compute preconditioned residual norm.
- Made initialisation of start vector in solver classes optional (function initialise_start_value)
Added function "diagonal" to extract diagonal of a matrix.
Added example "spectrum" to compute spectrum of graph Laplacian.

v2.3.0

Fixed issues when solving dense matrices (used in new example for many RHSs).
Modified THiLoFreqGeomAdmCond: now maximal number of wavelengths per cluster is tested.
Refactored geometrical clustering classes and partitioning strategies, thereby fixing several issues.
C++11 changes:
- most object creating functions now return std::unique_ptr,
- replaced typedef by using,
- added iterators for TIndexSet, TNodeSet, TGraph, TProcSet (for range based for).
- Note: needs at least GCC v4.7 or equivalent!
Added parameter to algebraic clustering in C bindings to define partitioning algorithm (BFS, multi level, METIS or Scotch).
Fixed issues with progress bar during factorisation (wrong block count).
Removed BSP style comminucation functions (MPI only now).
Finished conversion to new packed_t SIMD type. Using SSE3 instead of SSE2.
Added lock to TScotchAlgPartStrat because Scotch is not multi thread safe.

v2.2

Removing implicit reordering of unknowns during matrix-vector multiplication to fix inconsistent behaviour. Please use permutations from cluster trees or H-matrices to reorder vectors or TPermMatrix to represent permuted matrices instead.
Speedup improvements for matrix inversion. Triangular inversion and matrix multiplication available in standard user interface.
Import/export from/to CCS/CRS matrices simplified.
Simplified (and faster) mutex wrapper.
Several C++11 changes.

v2.1

Removing reference counters in BLAS interface due to major performance issue on multi-core (-socket) systems. See BLAS/LAPACK Interface on how to use the modified interface (and avoid errors).
New, scalable matrix-vector multiplication implemented.
Using generic datatype for SIMD instructions, thereby enabling generic SIMD algorithms, e.g. for BEM kernels, and fast adoptation of new SIMD instructions, e.g. AVX2.
Started to use block-wise operations if dense matrices are combined with blocked matrices (e.g. during matrix multiplication) instead of vector operations.
Removed TVirtualVector (replaced by TScalarVector).
Fixed issue with MatrixMarket format (leading whitespaces).

v2.0

v2.0.2

fixed race condition in C bindings
fixed issue with initialisation of static variables

v2.0.1

bug fixes

v2.0.0

Major Changes
- Switched from OpenMP to Threading Building Blocks as interface to shared memory parallelism, thereby also changing most algorithms to task-based parallelism.
- Reducing dependency on external libraries by using C++11 features. Also replacing some classes by default C++ versions (finally removing old code).
- Alternative, non-recursive, level-wise ℋ-LU factorisation based on explicit block dependencies, which provides far better speedup on many-core systems, e.g. Intel MIC architecture.
- New ℋ-LU factorisation algorithm also applicable in distributed environments, yielding better load-balancing (albeit with limited speedup).
- Added support for multiple CPUs to many algorithms, e.g. in clustering, norm computations, matrix-vector multiplication and solves, ℋ²-convertion.
Minor Changes
- Optimised BEM kernels for Intel MIC architecture.
- Introduced TLinearOperator for operators not supporting TMatrix functionality, e.g. factorised matrices.
- HLIBpro file format changed due to internal changes and due to some bugs in the format. However, backward read compatibility for most files written with earlier versions is kept.
- Added Support for Cairo library, thereby providing PDF output.
And of course: many smaller feature upgrades and bug fixes.