HLIBpro  2.7
Changes

v2.7

  • new DAG generation based on recursive algorithms with automatic deduction of dependencies between nodes (default: previous DAG; see CFG::dag_version),
  • new coefficient function for Matern kernel (TMaternCovCoeffFn) and exponential ( \(e^{-|x-y|_2}\)) bilinear form (TExpBF),
  • functions for computing low-rank approximations for sum of matrices directly (using pair-wise SVD approx_sum_svd, randomized SVD approx_sum_randsvd or randomized low-rank approx approx_sum_randlr.),
  • ACA:
    • modified stop criterion for ACA (user controllable maximal rank CFG::Build::aca_max_ratio),
    • added dense fallback for ACA if not converging with only computing those coefficients, not yet computed,
  • added MBLR cluster tree construction (TMBLRCTBuilder),
  • modified handling of matrix coefficient functions, especially TPermCoeffFn. Instead of
    TMyCoeffFn coeff_fn( ..., ct->perm_i2e(), ct->perm_i2e() );
    please use
    TMyCoeffFn my_coeff_fn( ... );
    TPermCoeffFn< ... > coeff_fn( & my_coeff_fn, ct->perm_i2e(), ct->perm_i2e() );
  • (limited) grid generation and refinement in BEM library (see Boundary Element Methods),
  • using libmvec for sin, cos and exp if available (glibc v2.22 and up) with significant speedups in complex valued computations.


v2.6

  • Accumulator based H-arithmetic reducing number of truncations with support for lazy and eager evaluation
  • added randomized SVD and implemented dense approximation and lowrank truncation for all types (SVD, RRQR and Rand-SVD)
  • also added lowrank approximation algorithms for RRQR and Rand-SVD for H-construction (TRRQRLRApx, TRandSVDLRApx)
  • support for special flat H-hierarchy with optimised arithmetic functions, e.g., in-place inversion.
  • support for block refinement during matrix construction, e.g., if admissibility gives false positives
  • added infinity matrix norm (TInfinityNorm and norm_inf())
  • implemented TOffDiagAdmCond with all off-diagonal blocks being admissible
  • massive code restructuring and cleanup
  • initial support for HDF5 matrix IO (dense and lowrank)
  • support for VSX instruction set (POWER CPUs)
  • special handling for all BLAS functions in case of parallel Intel MKL
  • added parameter configuration with config files
  • added some functions to simplify solver stopping criterion
  • changed behaviour/incompatibilities with previous versions:
    • removed all permutation handling from THMatrix and TNearfieldMulVec
    • no recompression in ACA/HCA (now only in matrix builders)
    • ptrcast() now consistent with cptrcast(), i.e., no * needed
    • some parameter reorganization (see Parameters)
    • previous TMatrix::copy_struct is renamed to TMatrix::copy_struct_from (TMatrix::copy_struct will now return a matrix copy without data)


v2.5

v2.5.1

  • added operator for matrix sum (TMatrixSum) in addition to matrix products
  • missing bindings in C interface (matrix product/sum, apply, apply_add)
  • fixed issue in clustering with predefined partition with many groups
  • support for more matrix formats and more robust IO (if files do not follow standard) for Harwell-Boeing/Matrix-Market format
  • re-enabled parallel block cluster tree construction on shared memory
  • bug fixes:
    • in solve_diag_left_block
    • matrix solves in TLLInvMatrix

v2.5.0

  • implemented rank revealing QR based low-rank truncation
  • Solvers:
    • added CGS and TFQMR solvers
    • added support for matrix solves in linear iteration (also H-matrices!)
    • optional computation of exact residual during iteration
    • simplified handling of stop criterion parameters
    • using status field in TSolverInfo instead of exception if solver fails (e.g., breakdown)
    • some code restructuring
  • support for block-wise Jacobi and Gauss-Seidel operators
  • support for AVX512 instruction set
  • many new user controllable parameters (see Parameters)
  • misc.:
    • handling of diagonal in factorisation (inverse or normal) now a runtime option (default: inverse)
    • added optional distance for TWeakAlgAdmCond to support distances other than one
    • support for fixed rank 0
    • correct progress bar support for WAZ factorisation and inversion
    • some reorganization of source/header files
  • C bindings:
    • added hlib_admcond_geom_hilo for THiLoFreqGeomAdmCond
    • additional parameter for blockdiag functions (blocksize)
  • fixed serious issue with Intel TBB and with current Intel TBB-based Intel MKL
  • various bug fixes


v2.4

  • Added factorization of inverse matrix \(WAZ = I\), enabling vector solves using matrix vector mult. instead of forward/backward solves with much better parallel speedup.
  • Significant improvements in parallel performance of matrix inversion.
  • Improved performance of LU, matrix-vector mult., forward/backward solves.
  • Added function nearfield_sparse to extract H-matrix nearfield as sparse matrix.
  • Switched to adaptive_split_axis as default for clustering.
  • Minor Changes:
    • Additional options for matrix visualization (colormap, etc.)
    • Basic VTK output of block clusters.


v2.3

v2.3.2

  • fixed various bugs and race conditions
  • extended ctors of various matrix classes to accept optional value type field
  • added example on how to assemble block matrices

v2.3.1

  • Fixed two bugs in point-wise LU.
  • Solver changes:
    • Refactored solver classes (no interface changes); added TRichardson to replace TSolver in the future.
    • Fixed inconsistent computation of residual norm in solvers. Now Richardson, CG and BiCG will compute standard residual norm, while MINRES and GMRES compute preconditioned residual norm.
    • Made initialisation of start vector in solver classes optional (function initialise_start_value)
  • Added function "diagonal" to extract diagonal of a matrix.
  • Added example "spectrum" to compute spectrum of graph Laplacian.

v2.3.0

  • Fixed issues when solving dense matrices (used in new example for many RHSs).
  • Modified THiLoFreqGeomAdmCond: now maximal number of wavelengths per cluster is tested.
  • Refactored geometrical clustering classes and partitioning strategies, thereby fixing several issues.
  • C++11 changes:
    • most object creating functions now return std::unique_ptr,
    • replaced typedef by using,
    • added iterators for TIndexSet, TNodeSet, TGraph, TProcSet (for range based for).
    • Note: needs at least GCC v4.7 or equivalent!
  • Added parameter to algebraic clustering in C bindings to define partitioning algorithm (BFS, multi level, METIS or Scotch).
  • Fixed issues with progress bar during factorisation (wrong block count).
  • Removed BSP style comminucation functions (MPI only now).
  • Finished conversion to new packed_t SIMD type. Using SSE3 instead of SSE2.
  • Added lock to TScotchAlgPartStrat because Scotch is not multi thread safe.


v2.2

  • Removing implicit reordering of unknowns during matrix-vector multiplication to fix inconsistent behaviour. Please use permutations from cluster trees or H-matrices to reorder vectors or TPermMatrix to represent permuted matrices instead.
  • Speedup improvements for matrix inversion. Triangular inversion and matrix multiplication available in standard user interface.
  • Import/export from/to CCS/CRS matrices simplified.
  • Simplified (and faster) mutex wrapper.
  • Several C++11 changes.


v2.1

  • Removing reference counters in BLAS interface due to major performance issue on multi-core (-socket) systems. See BLAS/LAPACK Interface on how to use the modified interface (and avoid errors).
  • New, scalable matrix-vector multiplication implemented.
  • Using generic datatype for SIMD instructions, thereby enabling generic SIMD algorithms, e.g. for BEM kernels, and fast adoptation of new SIMD instructions, e.g. AVX2.
  • Started to use block-wise operations if dense matrices are combined with blocked matrices (e.g. during matrix multiplication) instead of vector operations.
  • Removed TVirtualVector (replaced by TScalarVector).
  • Fixed issue with MatrixMarket format (leading whitespaces).


v2.0

v2.0.2

  • fixed race condition in C bindings
  • fixed issue with initialisation of static variables

v2.0.1

  • bug fixes

v2.0.0

  • Major Changes
    • Switched from OpenMP to Threading Building Blocks as interface to shared memory parallelism, thereby also changing most algorithms to task-based parallelism.
    • Reducing dependency on external libraries by using C++11 features. Also replacing some classes by default C++ versions (finally removing old code).
    • Alternative, non-recursive, level-wise ℋ-LU factorisation based on explicit block dependencies, which provides far better speedup on many-core systems, e.g. Intel MIC architecture.
    • New ℋ-LU factorisation algorithm also applicable in distributed environments, yielding better load-balancing (albeit with limited speedup).
    • Added support for multiple CPUs to many algorithms, e.g. in clustering, norm computations, matrix-vector multiplication and solves, ℋ²-convertion.
  • Minor Changes
    • Optimised BEM kernels for Intel MIC architecture.
    • Introduced TLinearOperator for operators not supporting TMatrix functionality, e.g. factorised matrices.
    • HLIBpro file format changed due to internal changes and due to some bugs in the format. However, backward read compatibility for most files written with earlier versions is kept.
    • Added Support for Cairo library, thereby providing PDF output.
  • And of course: many smaller feature upgrades and bug fixes.