v2.6

Accumulator based H-arithmetic reducing number of truncations with support for lazy and eager evaluation
added randomized SVD and implemented dense approximation and lowrank truncation for all types (SVD, RRQR and Rand-SVD)
also added lowrank approximation algorithms for RRQR and Rand-SVD for H-construction (TRRQRLRApx, TRandSVDLRApx)
support for special flat H-hierarchy with optimised arithmetic functions, e.g., in-place inversion.
support for block refinement during matrix construction, e.g., if admissibility gives false positives
added infinity matrix norm (TInfinityNorm and norm_inf())
implemented TOffDiagAdmCond with all off-diagonal blocks being admissible
massive code restructuring and cleanup
initial support for HDF5 matrix IO (dense and lowrank)
support for VSX instruction set (POWER CPUs)
special handling for all BLAS functions in case of parallel Intel MKL
added parameter configuration with config files
added some functions to simplify solver stopping criterion
changed behaviour/incompatibilities with previous versions:
- removed all permutation handling from THMatrix and TNearfieldMulVec
- no recompression in ACA/HCA (now only in matrix builders)
- ptrcast() now consistent with cptrcast(), i.e., no * needed
- some parameter reorganization (see Parameters)
- previous TMatrix::copy_struct is renamed to TMatrix::copy_struct_from (TMatrix::copy_struct will now return a matrix copy without data)

v2.5

added operator for matrix sum (TMatrixSum) in addition to matrix products
missing bindings in C interface (matrix product/sum, apply, apply_add)
fixed issue in clustering with predefined partition with many groups
support for more matrix formats and more robust IO (if files do not follow standard) for Harwell-Boeing/Matrix-Market format
re-enabled parallel block cluster tree construction on shared memory
bug fixes:
- in solve_diag_left_block
- matrix solves in TLLInvMatrix

implemented rank revealing QR based low-rank truncation
Solvers:
- added CGS and TFQMR solvers
- added support for matrix solves in linear iteration (also H-matrices!)
- optional computation of exact residual during iteration
- simplified handling of stop criterion parameters
- using status field in TSolverInfo instead of exception if solver fails (e.g., breakdown)
- some code restructuring
support for block-wise Jacobi and Gauss-Seidel operators
support for AVX512 instruction set
many new user controllable parameters (see Parameters)
misc.:
- handling of diagonal in factorisation (inverse or normal) now a runtime option (default: inverse)
- added optional distance for TWeakAlgAdmCond to support distances other than one
- support for fixed rank 0
- correct progress bar support for WAZ factorisation and inversion
- some reorganization of source/header files
C bindings:
- added hlib_admcond_geom_hilo for THiLoFreqGeomAdmCond
- additional parameter for blockdiag functions (blocksize)
fixed serious issue with Intel TBB and with current Intel TBB-based Intel MKL
various bug fixes

Added factorization of inverse matrix \(WAZ = I\), enabling vector solves using matrix vector mult. instead of forward/backward solves with much better parallel speedup.
Significant improvements in parallel performance of matrix inversion.
Improved performance of LU, matrix-vector mult., forward/backward solves.
Added function nearfield_sparse to extract H-matrix nearfield as sparse matrix.
Switched to adaptive_split_axis as default for clustering.
Minor Changes:
- Additional options for matrix visualization (colormap, etc.)
- Basic VTK output of block clusters.

Fixed two bugs in point-wise LU.
Solver changes:
- Refactored solver classes (no interface changes); added TRichardson to replace TSolver in the future.
- Fixed inconsistent computation of residual norm in solvers. Now Richardson, CG and BiCG will compute standard residual norm, while MINRES and GMRES compute preconditioned residual norm.
- Made initialisation of start vector in solver classes optional (function initialise_start_value)
Added function "diagonal" to extract diagonal of a matrix.
Added example "spectrum" to compute spectrum of graph Laplacian.

Fixed issues when solving dense matrices (used in new example for many RHSs).
Modified THiLoFreqGeomAdmCond: now maximal number of wavelengths per cluster is tested.
Refactored geometrical clustering classes and partitioning strategies, thereby fixing several issues.
C++11 changes:
- most object creating functions now return std::unique_ptr,
- replaced typedef by using,
- added iterators for TIndexSet, TNodeSet, TGraph, TProcSet (for range based for).
- Note: needs at least GCC v4.7 or equivalent!
Added parameter to algebraic clustering in C bindings to define partitioning algorithm (BFS, multi level, METIS or Scotch).
Fixed issues with progress bar during factorisation (wrong block count).
Removed BSP style comminucation functions (MPI only now).
Finished conversion to new packed_t SIMD type. Using SSE3 instead of SSE2.
Added lock to TScotchAlgPartStrat because Scotch is not multi thread safe.

Removing implicit reordering of unknowns during matrix-vector multiplication to fix inconsistent behaviour. Please use permutations from cluster trees or H-matrices to reorder vectors or TPermMatrix to represent permuted matrices instead.
Speedup improvements for matrix inversion. Triangular inversion and matrix multiplication available in standard user interface.
Import/export from/to CCS/CRS matrices simplified.
Simplified (and faster) mutex wrapper.
Several C++11 changes.

Removing reference counters in BLAS interface due to major performance issue on multi-core (-socket) systems. See BLAS/LAPACK Interface on how to use the modified interface (and avoid errors).
New, scalable matrix-vector multiplication implemented.
Using generic datatype for SIMD instructions, thereby enabling generic SIMD algorithms, e.g. for BEM kernels, and fast adoptation of new SIMD instructions, e.g. AVX2.
Started to use block-wise operations if dense matrices are combined with blocked matrices (e.g. during matrix multiplication) instead of vector operations.
Removed TVirtualVector (replaced by TScalarVector).
Fixed issue with MatrixMarket format (leading whitespaces).

Major Changes
- Switched from OpenMP to Threading Building Blocks as interface to shared memory parallelism, thereby also changing most algorithms to task-based parallelism.
- Reducing dependency on external libraries by using C++11 features. Also replacing some classes by default C++ versions (finally removing old code).
- Alternative, non-recursive, level-wise ℋ-LU factorisation based on explicit block dependencies, which provides far better speedup on many-core systems, e.g. Intel MIC architecture.
- New ℋ-LU factorisation algorithm also applicable in distributed environments, yielding better load-balancing (albeit with limited speedup).
- Added support for multiple CPUs to many algorithms, e.g. in clustering, norm computations, matrix-vector multiplication and solves, ℋ²-convertion.
Minor Changes
- Optimised BEM kernels for Intel MIC architecture.
- Introduced TLinearOperator for operators not supporting TMatrix functionality, e.g. factorised matrices.
- HLIBpro file format changed due to internal changes and due to some bugs in the format. However, backward read compatibility for most files written with earlier versions is kept.
- Added Support for Cairo library, thereby providing PDF output.
And of course: many smaller feature upgrades and bug fixes.