# News

- using
*generic value types*for all major types (matrices, vectors and linear operators) - officially renamed namespace and header files to
`Hpro`

and`hpro`

(old names still valid for compatibility) - C bindings split into separate functions per value type; also with
`hpro`

prefix now (old function names and functionality is still supported) - support for
*mixed precision*computations for matrix vector multiplication and in solver classes

- added
`apply_add`

with`BLAS::Matrix`

as argument to`TLinearOperator`

classes - added
`absolute_prec`

to define`TTruncAcc`

- added support for NEON instruction set (Apple M1)
- added support for Mongoose graph partitioning library (
`TMongooseAlgPartStrat`

) - enhanced
`TAlgAdmCond`

to define maximal number of allowed connecting edges - fixes:
- issue in
`BLAS::qrp`

for matrices with nrows < ncols - coefficient tests in
`TPermCoeffFn`

(was missing before) - issue in
`TMatrixProduct`

with single factor - issue in
`TMaternCovCoeffFn`

with different row/column coordinates

- issue in

- replaced
`HLIB::complex`

by`std::complex`

- removed old DAG interface
- modified pivot strategy of standard ACA (better, more robust convergence)
- using
`geqr2`

instead of`geqrf`

for QR factorization (slightly faster) - fixes:
- replaced deprecated features of TBB
`TGeomGroupCTBuilder>`

: fixed handling of offsets- missing instantiation of
`BLAS::random`

for`BLAS::Vector`

added - wrong solving flags in
`TLDUInvMatrix`

- in-efficient dependency handling in DAG construction for TLR/Tile-H
- memory leak in recursive DAG construction fixed

- fixed weak admissibility (was actually standard admissibility)
- fixed MBLR clustering (ordering only in one dimension led to extremely rectangular clusters)
- fixed bug in lowrank approximation (wrong type conversion)
- fixed non-SIMD implementation of TExpBF (still had additional factor)
- fixed TDenseCoeffFn in case given dense matrix had non-zero row/column offsets
- replaced
`tbb::mutex`

and`tbb::atomic`

by`std`

versions since marked obsolete in recent TBB versions

- improved new DAG generation system (better speed and parallel scalability) and
made it the default system (old version still available with
`CFG::dag_version=1`

) - moved non-generic include files into
`hpro`

sub-directory for better separation with other libraries - added various approximation routines for sums of matrices, operators, e.g., SVD, pairwise SVD, Rand-SVD, Rand-LR and ACA for operator sums
- expanded lazy accumulator arithmetic to move all updates to leaves only (evaluating
all updates simultaneously; see
`CFG::Arith::lazy_eval`

and`CFG::Arith::sum_approx`

) - added
`TZeroMatBuilder`

and`build_zero_mat`

to construct empty matrix for given block clusters tree, e.g., as pre-initialized result of other H-matrix operation - parallelization of various routines during clustering, e.g., sorting, etc. (may result in slightly different clustering with different number of CPU cores)
- bug fixes:
-permutation of dense matrix in
`TMatBuilder`

removed (inconsistent behaviour compared to if H-matrix is built) -fixed update of aux. data in H-matrix in`copy_nearfield`

-fixed reading of old HLIB files (wrong processor sets) -fixed return value of`Mem::usage`

-fixed various issue when using single precision

- fixed various compiler issues with MS Visual C++
- added
`build`

method for coefficient functions to return dense matrix for given index set - C bindings:
- added
`hlib_matrix_to_dense/rank`

- added
`hlib_matrix_approx_rank`

to compute low-rank approximation of given matrix with different methods

- added
- fixed issue in
`TBSPNDCTBuilder`

when no interface is present - fixed issue in
`HLIB::Mem::usage`

- fixed issue with HDF5 library but removed support from binary distribution due to linking problems with newer version of libHDF5
- added missing functions to sequential
`NET`

interface - added functions to directly set LR matrices in
`TRkMatrix`

- using internal grid generation also in example code (laplace/helmholtz)
- additional spherical grid (different start grid for “inbetween” steps)
- improved coordinate visualization in PostScript format (better minimal distance estimate)

- new DAG generation based on recursive algorithms with automatic deduction of dependencies between
nodes (default: previous DAG; see
`CFG::dag_version`

) - new coefficient function for Matern kernel (
`TMaternCovCoeffFn`

) and exponential bilinear form (`TExpBF`

) - functions for computing low-rank approximations for sum of matrices directly (using pair-wise SVD
`approx_sum_svd`

, randomized SVD`approx_sum_randsvd`

or randomized low-rank approx`approx_sum_randlr`

.) - ACA:
-modified stop criterion for ACA (user controllable maximal rank
`CFG::Build::aca_max_ratio`

) -added dense fallback for ACA if not converging with only computing those coefficients, not yet computed - added MBLR cluster tree construction (
`TMBLRCTBuilder`

) - modified handling of matrix coefficient functions, especially
`TPermCoeffFn`

- (limited) grid generation and refinement in BEM library
- using
*libmvec*for sin, cos and exp if available (glibc v2.22 and up) with significant speedups in complex valued computations - new academic license
**without**any user/host or date limitation

- Accumulator based H-arithmetic reducing number of truncations with support for lazy and eager evaluation
- added randomized SVD and implemented dense approximation and lowrank truncation for all types (SVD, RRQR and Rand-SVD)
- also added lowrank approximation algorithms for RRQR and Rand-SVD for H-construction
(
`TRRQRLRApx`

,`TRandSVDLRApx`

) - support for special flat H-hierarchy with optimised arithmetic functions, e.g., in-place inversion.
- support for block refinement during matrix construction, e.g., if admissibility gives false positives
- added infinity matrix norm (
`TInfinityNorm`

and`norm_inf()`

) - implemented
`TOffDiagAdmCond`

with all off-diagonal blocks being admissible - massive code restructuring and cleanup
- initial support for HDF5 matrix IO (dense and lowrank)
- support for VSX instruction set (POWER CPUs)
- special handling for all BLAS functions in case of parallel Intel MKL
- added parameter configuration with config files
- added some functions to simplify solver stopping criterion
- changed behaviour/incompatibilities with previous versions:
- removed all permutation handling from
`THMatrix`

and`TNearfieldMulVec`

- no recompression in ACA/HCA (now only in matrix builders)
`ptrcast()`

now consistent with`cptrcast()`

, i.e., no`*`

needed- some parameter reorganization
- previous
`TMatrix::copy_struct`

is renamed to`TMatrix::copy_struct_from`

(`TMatrix::copy_struct`

will now return a matrix copy without data)

- removed all permutation handling from

- added operator for matrix sum (
`TMatrixSum`

) in addition to matrix products - missing bindings in C interface (matrix product/sum,
`apply, apply_add`

) - fixed issue in clustering with predefined partition with many groups
- support for more matrix formats and more robust IO (if files do not follow standard) for Harwell-Boeing/Matrix-Market format
- re-enabled parallel block cluster tree construction on shared memory
- bug fixes:
- in
`solve_diag_left_block`

- matrix solves in
`TLLInvMatrix`

- in

- implemented rank revealing QR based low-rank truncation
- Solvers:
- added CGS and TFQMR solvers
- added support for matrix solves in linear iteration (also \mcH-matrices!)
- optional computation of exact residual during iteration
- simplified handling of stop criterion parameters
- using status field in TSolverInfo instead of exception if solver fails (e.g., breakdown)
- some code restructuring

- support for block-wise Jacobi and Gauss-Seidel operators
- support for AVX512 instruction set
- many new user controllable parameters
- misc.:
- handling of diagonal in factorisation (inverse or normal) now a runtime option (default: inverse)
- added optional distance for TWeakAlgAdmCond to support distances other than one
- support for fixed rank 0
- correct progress bar support for WAZ factorisation and inversion
- some reorganization of source/header files

- C bindings:
- added
`hlib_admcond_geom_hilo`

for`THiLoFreqGeomAdmCond`

- additional parameter for
`blockdiag`

functions (blocksize)

- added
- fixed serious issue with Intel TBB and with current Intel TBB-based Intel MKL
- various bug fixes

- Added factorization of inverse matrix
*WAZ = I*, enabling vector solves using matrix vector mult. instead of forward/backward solves with much better parallel speedup. - Significant improvements in parallel performance of matrix inversion.
- Improved performance of LU, matrix-vector mult., forward/backward solves.
- Added function
`nearfield_sparse`

to extract H-matrix nearfield as sparse matrix. - Switched to
`adaptive_split_axis`

as default for clustering. - Minor Changes:
- Additional options for matrix visualization (colormap, etc.)
- Basic VTK output of block clusters.

- fixed various bugs and race conditions
- extended ctors of various matrix classes to accept optional value type field
- added example on how to assemble block matrices

- Fixed two bugs in point-wise LU.
- Solver changes:
- Refactored solver classes (no interface changes); added
`TRichardson`

to replace`TSolver`

in the future. - Fixed inconsistent computation of residual norm in solvers. Now Richardson, CG and BiCG will compute standard residual norm, while MINRES and GMRES compute preconditioned residual norm.
- Made initialisation of start vector in solver classes optional (function
`initialise_start_value`

)

- Refactored solver classes (no interface changes); added
- Added function
`diagonal`

to extract diagonal of a matrix. - Added example
*spectrum*to compute spectrum of graph Laplacian (see also documentation).

- Fixed issues when solving dense matrices (used in new example for many RHSs).
- Modified
`THiLoFreqGeomAdmCond`

: now maximal number of wavelengths per cluster is tested. - Refactored geometrical clustering classes and partitioning strategies, thereby fixing several issues.
- C++11 changes:
- most object creating functions now return
`std::unique_ptr`

, - replaced
`typedef`

by`using`

, - added iterators for
`TIndexSet`

,`TNodeSet`

,`TGraph`

,`TProcSet`

(for range based`for`

).

- most object creating functions now return
- Added parameter to algebraic clustering in C bindings to define partitioning algorithm (BFS, multi level, METIS or Scotch).
- Fixed issues with progress bar during factorisation (wrong block count).
- Removed BSP style communication functions (MPI only now).
- Finished conversion to new
`packed_t`

SIMD type. Using SSE3 instead of SSE2. - Added lock to
`TScotchAlgPartStrat`

because Scotch is not multi thread safe.

- Removing implicit reordering of unknowns during matrix-vector multiplication to fix inconsistent behaviour.
Please use permutations from cluster trees or ℋ-matrices to reorder vectors or
`TPermMatrix`

to represent permuted matrices instead. - Speedup improvements for matrix inversion. Triangular inversion and matrix multiplication available in standard user interface.
- Import/export from/to CCS/CRS matrices simplified.
- Simplified (and faster) mutex wrapper.
- Several C++11 changes.

- Removing reference counters in BLAS interface due to major performance issue on multi-core (-socket) systems. See documentation on how to use the modified interface (and avoid errors).
- New, scalable matrix-vector multiplication implemented.
- Using generic datatype for SIMD instructions, thereby enabling generic SIMD algorithms, e.g. for BEM kernels, and fast adoptation of new SIMD instructions, e.g. AVX2.
- Removed
`TVirtualVector`

(replaced by`TScalarVector`

). - and, as usual: several bugs fixed

- fixed race condition in C bindings
- fixed issue with initialisation of static variables

- fixed some bugs

- Major Changes
- Switched from OpenMP to Threading Building
Blocks as interface to shared memory parallelism, thereby also changing most algorithms to
*task-based*parallelism. - Reducing dependency on external libraries by using C++11 features. Also replacing some classes by default C++ versions (finally removing old code).
- Alternative, non-recursive, level-wise ℋ-LU factorisation based on explicit block dependencies, which provides far better speedup on many-core systems, e.g. Intel MIC architecture.
- New H-LU factorisation algorithm also applicable in distributed environments, yielding better load-balancing (albeit with limited speedup).
- Added support for multiple CPUs to many algorithms, e.g. in clustering, norm computations, matrix-vector multiplication and solves, H²-conversion.
- Minor Changes
- Optimised BEM kernels for Intel MIC architecture.
- Introduced TLinearOperator for operators not supporting TMatrix functionality, e.g. factorised matrices.
- HLIBpro file format changed due to internal changes and due to some bugs in the format. However, backward read compatibility for most files written with earlier versions is kept.
- Added Support for Cairo library, thereby providing PDF output.

- Switched from OpenMP to Threading Building
Blocks as interface to shared memory parallelism, thereby also changing most algorithms to
- And of course: many smaller feature upgrades and bug fixes.

- Matrix Construction:
- Switched to template based coefficient functions (
`TCoeffFn`

and derived) and all depended classes, e.g.`TDenseMBuilder`

, SVD and ACA low rank approximation. - Rewrote HCA:
- Simpler interface containing all neccessary functionality in single class.
- Using template for value type.
- Added base classes for permuted indices and for BEM applications using quadrature.
- Added implementation for Laplace and Helmholtz also for linear ansatz spaces and with support for SSE2 and AVX.

- Cleaned up ACA implementation.
- Changed handling of recompression: should now be handled by default for low rank approximation algorithm and not by matrix construction class (to avoid recompression of optimal results).

- Switched to template based coefficient functions (
- Clustering Changes:
- Added
`TNDBSPPartStrat`

to be used in connection with nested dissection (trying various clusterings and choosing best for ND). - Modified
`TNDBSPCTBuilder`

to more resemble algebraic version, e.g. average depth for interface clusters instead of maximal. - Fixed bug in PCA based clustering and added version for cardinality based clustering.
- Added various flags to modify clustering, e.g. synchronisation of interface depth, enforcing block clusters with same depth of corresponding clusters, using symmetrised weights in algebraic clustering.
- Input/Output and visualisation:
- Fixed bug in reading dense matrices.
- Changed order of dimension for coordinate IO using Matlab format: now ncoord × dimension (e.g. as also used by Sparse Matrix Collection).
- Added VTK visualisation for coordinates (with various options, e.g. marking clusters or index connectivity) and BEM grids.
- Added Output of Grids in HLIB format.
- Added coordinate IO in MatrixMarket format.

- Changes in LAPACK wrapper:
- added LAPACK workspace queries for optimal workspace size instead of using predefined block size
- using
`xGESDD`

for*large*matrices

- various bug fixes.

- Deactivated default coarsening during matrix construction.
- Added special H² matrix builder with predefined cluster bases.

- changes in BEM code:
- Added support for AVX.
- Performance speedups in SSE2 implementation of Helmholtz and Maxwell kernels.
- Runtime detection of SSE2/AVX availability and automatic choice of optimal kernel.

- Added
`matrix_format`

function to matrix coefficient functions to define whether unsymmetric, symmetric or hermitian (default: unsymmetric).

Default`build`

function in matrix builders now without format argument. - Added support for ILP64 BLAS/LAPACK implementations (64bit integers).
- Added support for AMD-LibM (integrated in binary Linux distributions).
- Added vector IO in MatrixMarket format.
- Cleaned up C++ examples (thereby also removing Boost link dependency).
- several bug fixes

- OpenMP exception handling changed: now all threads will stop as soon as possible in case of an error
- fixed several, previously undetected, non-critical compiler warnings (MS Visual C++)
- bug fixes

HLIBpro v1.0 is a major rewrite/reorganisation of many of the H-matrix algorithms. The following list of changes only covers the main topics and is by far not complete.

- added distributed computing via MPI for matrix construction and factorisation
- added H²-matrices
- added internal multi-level graph partitioning for blackbox clustering
- added support for piecewise linear basis functions and Maxwell EFIE/MFIE
- rewrote interface to BLAS/LAPACK
- rewrote C interface with better mapping of internal C++ and C types
- increased robustness of matrix factorisation in case of bad-conditioned matrices
- increased speedup of matrix factorisation in multi-threaded computations
- many performance improvements and bug fixes

- added optional diagonal scaling of H-matrices during LU factorisation
- added blockwise accuracy, e.g. accuracy depending on current matrix block
- rewrote accuracy handling in C bindings
- simplified BSP partitioning methods and added regular cardinality based and principle component based clustering
- added optional balancing of tree depth in cluster tree construction with predefined partitioning
- implemented optional double precision computation of matrix inversion and low-rank truncation in single precision mode
- fixed bug in calling single precision norm functions of LAPACK
- fixed bug in PostScript output and modified H-matrix output in PostScript format
- added support for Jacobi based SVD (
`sgejsv`

and`dgejsv`

) in LAPACK v3.2

- removed ID based cluster tree computations in matrices
- always computing SCC in algebraic clustering, also in nested dissection clustering
- reordering clusters depending on size ratio (large first)
- fixed bug with filenames without directories
- fixed non-exception safe OpenMP usage
- added matrix reduction to nearfield part
- added dense low-rank multiplication if result is large dense matrix

- fixed solve functions in
`TLU`

,`TLDL`

(checking for`NULL`

blocks) - fixed OpenMP call with zero threads in
`TLU`

`TLDL`

and`TMatrixInv`

- fixed
`operator =`

in`autoptr`

(wrong const) - removed unnecessary checks in
`TArray::copy`

- fixed recursive call in
`restrict_blockdiag`

- replaced fixed constants by type dependent constants in
lapack.cc - fixed
`TMatrixInv::multiply_diag`

when only D is dense

- fixed several warnings from Visual C++ and Intel C++ compilers
- moved all global variables and functions into
`HLIB`

namespace (except`xerbla`

override) - enabled user defined prefix for functions and types in C interface and added override for namespace name
- reactivated cardinality check when using
`HLIB_BSP_AUTO`

- replaced threads and mutices by OpenMP (thread start only, no scheduling)
- included log file support in addition to stdout
- added parallel LDL
^{T}factorisation (DD and blockdiag only) - added parallel blockdiag LU factorisation
- added zero approximation during matrix construction (for nearfield only)
- fixed bug in algebraic nested dissection clustering (wrong path length in interface)

- reduced memory consumption/fragmentation in ACA generated matrices with large rank
- added Fiduccia/Mattheyses bisection optimisation for BFS clustering
- added FFT for vectors by implementing support for FFTW3 (optional)
- fixed bug in TBSPPartCTBuilder when using more than two partitions
- fixed potential issues in sorting algorithms
- fixed type issues with
`*_bytesize`

functions in C interface - fixed bug in PostScript visualisation of matrices if matrix norm is zero
- fixed issues with GCC-4.3
- fixed bug in command line parsing of configuration system
- minor modifications to SCons system to increase userfriendliness

**general Algorithmic Changes**- support for single precision arithmetic; has to be decided
*before*compiling HLIBpro - made complete C++ functions and classes visible from outside instead of just C interface functions
- rewrote complex arithmetic to distinguish between symmetric and
hermitian matrices; added LDL
^{H}and LL^{H}factorisations - inversion now based on LU, thereby reducing memory consumption (roughly halved)
- added computation of the diagonal of the inverse without computing the inverse
- added evaluation of LU, LDL
^{T}factorisations (instead of just solving) - removed point-wise LU and LDL
^{T}factorisation (only blocked) to improve robustness with zeroes on diagonal - added (optional) check and fix for singular sub matrices during inversion and factorisation
- added complex valued HCA
- new version of ACA+
- multiplication C = ADB with diagonal D implemented
- implemented bilinear forms for Helmholtz single and double layer potential
- implemented bilinear form for acoustic scattering
- rewrote algebraic clustering for sparse matrices; added support for Scotch and CHACO
- added support for periodic coordinates in clustering
- added clustering with user defined index partition on first level in cluster tree
- added standard admissibility for algebraic clustering
- added maximal level in clustering to prevent infinite recursion
- modified solvers to handle complex valued data
- added permutation of dense matrices without temporary storage (needed in IO)

- support for single precision arithmetic; has to be decided
**parallel Arithmetic**- added thread parallel algorithms for matrix construction, matrix multiplication, inversion and LU factorisation
- redesigned thread pool, thereby fixing race conditions
- added support for Windows threads
- fixed several issues with thread safety

**Input and Output**- added general I/O functions with autodetection of file format
- added output of matrices in Harwell/Boeing format
- added MatrixMarket format
- added support for Ply and surface mesh format (NetGen) for Grid I/O
- fixed format errors in SAMG output
- conversion of arbitrary matrices to sparse format when writing in SAMG or Harwell/Boeing format
- fixed support for symmetric matrices in Harwell/Boeing format

**C interface**- prefixed
*all*functions, types and constants with`hlib_`

(or`HLIB_`

) to prevent collisions with other definitions (OS or libraries) - added support for C99 complex types (if available)
- added
`hlib_set_coarsening`

to activate/deactivate coarsening during matrix construction (default: on) and matrix arithmetic (default: off) - added
`hlib_matrix_inv_diag`

to return diagonal of inverse - added
`hlib_matrix_is_complex`

to test for real or complex valued matrices - added
`hlib_set_nthreads`

to set number of threads - added
`hlib_coord_t`

as special type for coordinates - separated stop criterion and solver in solver interface

- prefixed
**Miscellaneous**- updated CPUflags and Rmalloc
- fixed optimisation issues (leading to infinite loops) in enclosed CLAPACK

**Algorithmic Changes**- added (blocked) LDL
^{T}factorisation (now default for symmetric matrices) - no longer need extra matrix in matrix inversion
- using ACAFull in HCA (instead of SVD)
- adaptively choosing quadrature and interpolation order in ACA and HCA
- rewrote matrix addition to support general cases, e.g. low-rank to blocked
- rewrote low-rank truncation handling
- support for METIS in algebraic clustering routines
- added basic support for “dense” sparse matrices, e.g. with highly coupled indices
- added SSE2 based HCA algorithm
- added infinity norm for vectors
- using norm of preconditioned residual for all solvers if preconditioner is present
- added MINRES iteration
- using
`ADM_AUTO`

as default admissibility - finally removed all asserts and replaced by internal error checking

- added (blocked) LDL
**Input and Output**- VRML97 support
- added Matlab compression (Matlab v7) and structs support
- support for Harwell-Boeing matrix format (read-only)
- modified PostScript output of block-wise SVD; now scaled w.r.t. 2-norm of matrix

**OS and Library support**- MS Windows support
- shared libraries for Linux and Windows
- changed configure system to better handle MS Windows environment
- added internal
`xerbla`

to handle LAPACK errors directly

**C interface**- automatic choice of matrix building in
`hlib_matrix_build_bem_grid`

- introduced
`vector_t`

as type to vectors (no more C arrays) - added Gauss and Sauter triangle quadrature rules
- added functions to access matrix and vector entries
- added
`copyto`

and`copyto_eps`

functions - added
`hlib_matrix_build_dense`

to build H-matrix from dense matrix - changed solver management

- automatic choice of matrix building in
**Miscellaneous**- several improvements and bug fixes
- cleaned up error codes
- updated CPUflags and Rmalloc

**Arithmetic**- added ACA-Full
- added HCA (hybrid cross approximation)
- complex valued ACA and SVD
- added copy with coarsening for H-matrices
- added computation of spectral norm for the inverse of a matrix
- support for permutations in matrix-vector multiplication of sparse matrices
- added support for Laplace SLP/DLP and 3D triangle surface grids
- fixed issues with degenerated bounding boxes in geometrical clustering

**Input/Output**- support for PLTMG matrix format

**Miscellaneous**- replaced error handling with exceptions
- added modified CLAPACK as default implementation of LAPACK to HLIBpro
- integrated CPUFlags into configure system
- added function for fast reciprocal square root

**Arithmetic**- initial support for complex arithmetic
- support for symmetric matrices in arithmetic
- implemented block LU factorisation
- implemented LDL
^{T}factorisation - added Frobenius norm for sparse matrices
- support for CRS format in sparse matrices
- added Jacobi and SOR matrix types (for matrix-vector multiplication)
- implemented hierarchical domain decomposition with parallel arithmetics

**Parallel Algorithms**- thread-parallel Cholesky factorisation
- thread-parallel coarsening of H-matrices
- fixed thread-parallel LU and inversion
- fixed dead-locks in thread-pool
- added direct communication in BSP mode
- parallel addition of matrices and vectors via streams

**Input/Output**- support for Matlab and SAMG format

**Miscellaneous**- introduced C interface functions and types
- added configure system for Makefiles
- added progress meter support for arithmetic
- added internal RTTI system
- support for memory consumption query on HP-UX
- rewrote error handling

- first public version as PHI (
__P__arallel__H__-matrix__I__mplementation) - merged BSP-parallel and thread-parallel versions of H-matrix library