News

v3.2

changed TGeomWeakAdmCond to support all dimensions (vertex, edge and face admissibility)
added special memory allocator for all BLAS data (BLAS matrices, vectors and workspace)
added basic support for CUDA and HIP for SVD and QR (in case of large matrices)
misc:
- renamed THLibGridIO to THproGridIO (as all other classes)
- added write function to TGMSHGridIO
- added data_byte_size returning size of numerical data without management data
fixes :
- TTruncAcc: if norm is less than tolerance, truncation rank was maximal
- BEM: segfault when loading unaligned data with AVX512

v3.1

v3.1.2 (released 2024-06-06)

added missing hpro_matrix_conjugate/transpose in C bindings and
simplified eigen functions in BLAS to prevent instantiation problems
re-added aca_dense_fallback parameter

v3.1.1 (released 2024-03-28)

fixed bug in Cholesky factorization

v3.1.0 (released 2024-03-12)

clustering based on space filling curves (using CGAL library) in TSFCCTBuilder
lowrank truncation support based on Frobenius norm
support for adaptive quadrature order in BEM kernels
added restriction to real/imaginary part of matrices (restrict_re/im)
added make_symmetric/hermitian to enforce symmetry status for given matrices

v3.0 (released 2022-07-06)

using generic value types for all major types (matrices, vectors and linear operators)
officially renamed namespace and header files to Hpro and hpro (old names still valid for compatibility)
C bindings split into separate functions per value type; also with hpro prefix now (old function names and functionality is still supported)
support for mixed precision computations for matrix vector multiplication and in solver classes

v2.9

v2.9.1 (released 2021-05-19)

added apply_add with BLAS::Matrix as argument to TLinearOperator classes
added absolute_prec to define TTruncAcc
added support for NEON instruction set (Apple M1)
added support for Mongoose graph partitioning library (TMongooseAlgPartStrat)
enhanced TAlgAdmCond to define maximal number of allowed connecting edges
fixes:
- issue in BLAS::qrp for matrices with nrows < ncols
- coefficient tests in TPermCoeffFn (was missing before)
- issue in TMatrixProduct with single factor
- issue in TMaternCovCoeffFn with different row/column coordinates

v2.9 (released 2020-11-19)

replaced HLIB::complex by std::complex
removed old DAG interface
modified pivot strategy of standard ACA (better, more robust convergence)
using geqr2 instead of geqrf for QR factorization (slightly faster)
fixes:
- replaced deprecated features of TBB
- TGeomGroupCTBuilder>: fixed handling of offsets
- missing instantiation of BLAS::random for BLAS::Vector added
- wrong solving flags in TLDUInvMatrix
- in-efficient dependency handling in DAG construction for TLR/Tile-H
- memory leak in recursive DAG construction fixed

v2.8

v2.8.1 (released 2020-03-27)

fixed weak admissibility (was actually standard admissibility)
fixed MBLR clustering (ordering only in one dimension led to extremely rectangular clusters)
fixed bug in lowrank approximation (wrong type conversion)
fixed non-SIMD implementation of TExpBF (still had additional factor)
fixed TDenseCoeffFn in case given dense matrix had non-zero row/column offsets
replaced tbb::mutex and tbb::atomic by std versions since marked obsolete in recent TBB versions

v2.8 (released 2019-12-29)

improved new DAG generation system (better speed and parallel scalability) and made it the default system (old version still available with CFG::dag_version=1)
moved non-generic include files into hpro sub-directory for better separation with other libraries
added various approximation routines for sums of matrices, operators, e.g., SVD, pairwise SVD, Rand-SVD, Rand-LR and ACA for operator sums
expanded lazy accumulator arithmetic to move all updates to leaves only (evaluating all updates simultaneously; see CFG::Arith::lazy_eval and CFG::Arith::sum_approx)
added TZeroMatBuilder and build_zero_mat to construct empty matrix for given block clusters tree, e.g., as pre-initialized result of other H-matrix operation
parallelization of various routines during clustering, e.g., sorting, etc. (may result in slightly different clustering with different number of CPU cores)
bug fixes: -permutation of dense matrix in TMatBuilder removed (inconsistent behaviour compared to if H-matrix is built) -fixed update of aux. data in H-matrix in copy_nearfield -fixed reading of old HLIB files (wrong processor sets) -fixed return value of Mem::usage -fixed various issue when using single precision

v2.7

v2.7.2 (released 2019-07-03)

fixed various compiler issues with MS Visual C++
added build method for coefficient functions to return dense matrix for given index set
C bindings:
- added hlib_matrix_to_dense/rank
- added hlib_matrix_approx_rank to compute low-rank approximation of given matrix with different methods
fixed issue in TBSPNDCTBuilder when no interface is present
fixed issue in HLIB::Mem::usage

v2.7.1 (released 2019-02-18)

fixed issue with HDF5 library but removed support from binary distribution due to linking problems with newer version of libHDF5
added missing functions to sequential NET interface
added functions to directly set LR matrices in TRkMatrix
using internal grid generation also in example code (laplace/helmholtz)
additional spherical grid (different start grid for “inbetween” steps)
improved coordinate visualization in PostScript format (better minimal distance estimate)

v2.7 (released 2018-11-28)

new DAG generation based on recursive algorithms with automatic deduction of dependencies between nodes (default: previous DAG; see CFG::dag_version)
new coefficient function for Matern kernel (TMaternCovCoeffFn) and exponential bilinear form (TExpBF)
functions for computing low-rank approximations for sum of matrices directly (using pair-wise SVD approx_sum_svd, randomized SVD approx_sum_randsvd or randomized low-rank approx approx_sum_randlr.)
ACA: -modified stop criterion for ACA (user controllable maximal rank CFG::Build::aca_max_ratio) -added dense fallback for ACA if not converging with only computing those coefficients, not yet computed
added MBLR cluster tree construction (TMBLRCTBuilder)
modified handling of matrix coefficient functions, especially TPermCoeffFn
(limited) grid generation and refinement in BEM library
using libmvec for sin, cos and exp if available (glibc v2.22 and up) with significant speedups in complex valued computations
new academic license without any user/host or date limitation

v2.6 (released 2017-10-13)

Accumulator based H-arithmetic reducing number of truncations with support for lazy and eager evaluation
added randomized SVD and implemented dense approximation and lowrank truncation for all types (SVD, RRQR and Rand-SVD)
also added lowrank approximation algorithms for RRQR and Rand-SVD for H-construction (TRRQRLRApx, TRandSVDLRApx)
support for special flat H-hierarchy with optimised arithmetic functions, e.g., in-place inversion.
support for block refinement during matrix construction, e.g., if admissibility gives false positives
added infinity matrix norm (TInfinityNorm and norm_inf())
implemented TOffDiagAdmCond with all off-diagonal blocks being admissible
massive code restructuring and cleanup
initial support for HDF5 matrix IO (dense and lowrank)
support for VSX instruction set (POWER CPUs)
special handling for all BLAS functions in case of parallel Intel MKL
added parameter configuration with config files
added some functions to simplify solver stopping criterion
changed behaviour/incompatibilities with previous versions:
- removed all permutation handling from THMatrix and TNearfieldMulVec
- no recompression in ACA/HCA (now only in matrix builders)
- ptrcast() now consistent with cptrcast(), i.e., no * needed
- some parameter reorganization
- previous TMatrix::copy_struct is renamed to TMatrix::copy_struct_from (TMatrix::copy_struct will now return a matrix copy without data)

v2.5

v2.5.1 (released 2017-02-03)

added operator for matrix sum (TMatrixSum) in addition to matrix products
missing bindings in C interface (matrix product/sum, apply, apply_add)
fixed issue in clustering with predefined partition with many groups
support for more matrix formats and more robust IO (if files do not follow standard) for Harwell-Boeing/Matrix-Market format
re-enabled parallel block cluster tree construction on shared memory
bug fixes:
- in solve_diag_left_block
- matrix solves in TLLInvMatrix

v2.5 (released 2016-09-02)

implemented rank revealing QR based low-rank truncation
Solvers:
- added CGS and TFQMR solvers
- added support for matrix solves in linear iteration (also \mcH-matrices!)
- optional computation of exact residual during iteration
- simplified handling of stop criterion parameters
- using status field in TSolverInfo instead of exception if solver fails (e.g., breakdown)
- some code restructuring
support for block-wise Jacobi and Gauss-Seidel operators
support for AVX512 instruction set
many new user controllable parameters
misc.:
- handling of diagonal in factorisation (inverse or normal) now a runtime option (default: inverse)
- added optional distance for TWeakAlgAdmCond to support distances other than one
- support for fixed rank 0
- correct progress bar support for WAZ factorisation and inversion
- some reorganization of source/header files
C bindings:
- added hlib_admcond_geom_hilo for THiLoFreqGeomAdmCond
- additional parameter for blockdiag functions (blocksize)
fixed serious issue with Intel TBB and with current Intel TBB-based Intel MKL
various bug fixes

v2.4 (released 2015-10-28)

Added factorization of inverse matrix WAZ = I, enabling vector solves using matrix vector mult. instead of forward/backward solves with much better parallel speedup.
Significant improvements in parallel performance of matrix inversion.
Improved performance of LU, matrix-vector mult., forward/backward solves.
Added function nearfield_sparse to extract H-matrix nearfield as sparse matrix.
Switched to adaptive_split_axis as default for clustering.
Minor Changes:
- Additional options for matrix visualization (colormap, etc.)
- Basic VTK output of block clusters.

v2.3

v2.3.2 (released 2015-06-23)

fixed various bugs and race conditions
extended ctors of various matrix classes to accept optional value type field
added example on how to assemble block matrices

v2.3.1 (released 2014-11-12)

Fixed two bugs in point-wise LU.
Solver changes:
- Refactored solver classes (no interface changes); added TRichardson to replace TSolver in the future.
- Fixed inconsistent computation of residual norm in solvers. Now Richardson, CG and BiCG will compute standard residual norm, while MINRES and GMRES compute preconditioned residual norm.
- Made initialisation of start vector in solver classes optional (function initialise_start_value)
Added function diagonal to extract diagonal of a matrix.
Added example spectrum to compute spectrum of graph Laplacian (see also documentation).

v2.3 (released 2014-10-27)

Fixed issues when solving dense matrices (used in new example for many RHSs).
Modified THiLoFreqGeomAdmCond: now maximal number of wavelengths per cluster is tested.
Refactored geometrical clustering classes and partitioning strategies, thereby fixing several issues.
C++11 changes:
- most object creating functions now return std::unique_ptr,
- replaced typedef by using,
- added iterators for TIndexSet, TNodeSet, TGraph, TProcSet (for range based for).
Added parameter to algebraic clustering in C bindings to define partitioning algorithm (BFS, multi level, METIS or Scotch).
Fixed issues with progress bar during factorisation (wrong block count).
Removed BSP style communication functions (MPI only now).
Finished conversion to new packed_t SIMD type. Using SSE3 instead of SSE2.
Added lock to TScotchAlgPartStrat because Scotch is not multi thread safe.

v2.2 (released 2014-07-15)

Removing implicit reordering of unknowns during matrix-vector multiplication to fix inconsistent behaviour. Please use permutations from cluster trees or ℋ-matrices to reorder vectors or TPermMatrix to represent permuted matrices instead.
Speedup improvements for matrix inversion. Triangular inversion and matrix multiplication available in standard user interface.
Import/export from/to CCS/CRS matrices simplified.
Simplified (and faster) mutex wrapper.
Several C++11 changes.

v2.1 (released 2014-05-08)

Removing reference counters in BLAS interface due to major performance issue on multi-core (-socket) systems. See documentation on how to use the modified interface (and avoid errors).
New, scalable matrix-vector multiplication implemented.
Using generic datatype for SIMD instructions, thereby enabling generic SIMD algorithms, e.g. for BEM kernels, and fast adoptation of new SIMD instructions, e.g. AVX2.
Removed TVirtualVector (replaced by TScalarVector).
and, as usual: several bugs fixed

v2.0

v2.0.2 (released 2014-01-24)

fixed race condition in C bindings
fixed issue with initialisation of static variables

v2.0.1 (released 2013-12-05)

fixed some bugs

v2.0 (released 2013-09-18)

Major Changes
- Switched from OpenMP to Threading Building Blocks as interface to shared memory parallelism, thereby also changing most algorithms to task-based parallelism.
- Reducing dependency on external libraries by using C++11 features. Also replacing some classes by default C++ versions (finally removing old code).
- Alternative, non-recursive, level-wise ℋ-LU factorisation based on explicit block dependencies, which provides far better speedup on many-core systems, e.g. Intel MIC architecture.
- New H-LU factorisation algorithm also applicable in distributed environments, yielding better load-balancing (albeit with limited speedup).
- Added support for multiple CPUs to many algorithms, e.g. in clustering, norm computations, matrix-vector multiplication and solves, H²-conversion.
- Minor Changes
  - Optimised BEM kernels for Intel MIC architecture.
  - Introduced TLinearOperator for operators not supporting TMatrix functionality, e.g. factorised matrices.
  - HLIBpro file format changed due to internal changes and due to some bugs in the format. However, backward read compatibility for most files written with earlier versions is kept.
  - Added Support for Cairo library, thereby providing PDF output.
And of course: many smaller feature upgrades and bug fixes.

v1.2 (released 2012-02-23)

Matrix Construction:
- Switched to template based coefficient functions (TCoeffFn and derived) and all depended classes, e.g. TDenseMBuilder, SVD and ACA low rank approximation.
- Rewrote HCA:
  - Simpler interface containing all neccessary functionality in single class.
  - Using template for value type.
  - Added base classes for permuted indices and for BEM applications using quadrature.
  - Added implementation for Laplace and Helmholtz also for linear ansatz spaces and with support for SSE2 and AVX.
- Cleaned up ACA implementation.
- Changed handling of recompression: should now be handled by default for low rank approximation algorithm and not by matrix construction class (to avoid recompression of optimal results).
Clustering Changes:
Added TNDBSPPartStrat to be used in connection with nested dissection (trying various clusterings and choosing best for ND).
Modified TNDBSPCTBuilder to more resemble algebraic version, e.g. average depth for interface clusters instead of maximal.
Fixed bug in PCA based clustering and added version for cardinality based clustering.
Added various flags to modify clustering, e.g. synchronisation of interface depth, enforcing block clusters with same depth of corresponding clusters, using symmetrised weights in algebraic clustering.
Input/Output and visualisation:
- Fixed bug in reading dense matrices.
- Changed order of dimension for coordinate IO using Matlab format: now ncoord × dimension (e.g. as also used by Sparse Matrix Collection).
- Added VTK visualisation for coordinates (with various options, e.g. marking clusters or index connectivity) and BEM grids.
- Added Output of Grids in HLIB format.
- Added coordinate IO in MatrixMarket format.
Changes in LAPACK wrapper:
- added LAPACK workspace queries for optimal workspace size instead of using predefined block size
- using xGESDD for large matrices
various bug fixes.

v1.1

v1.1.1 (released 2011-11-29)

Deactivated default coarsening during matrix construction.
Added special H² matrix builder with predefined cluster bases.

v1.1 (released 2011-11-28)

changes in BEM code:
- Added support for AVX.
- Performance speedups in SSE2 implementation of Helmholtz and Maxwell kernels.
- Runtime detection of SSE2/AVX availability and automatic choice of optimal kernel.
Added matrix_format function to matrix coefficient functions to define whether unsymmetric, symmetric or hermitian (default: unsymmetric).
Default build function in matrix builders now without format argument.
Added support for ILP64 BLAS/LAPACK implementations (64bit integers).
Added support for AMD-LibM (integrated in binary Linux distributions).
Added vector IO in MatrixMarket format.
Cleaned up C++ examples (thereby also removing Boost link dependency).
several bug fixes

v1.0

v1.0.1 (released 2011-09-28)

OpenMP exception handling changed: now all threads will stop as soon as possible in case of an error
fixed several, previously undetected, non-critical compiler warnings (MS Visual C++)
bug fixes

v1.0 (released 2011-07-01)

HLIBpro v1.0 is a major rewrite/reorganisation of many of the H-matrix algorithms. The following list of changes only covers the main topics and is by far not complete.

added distributed computing via MPI for matrix construction and factorisation
added H²-matrices
added internal multi-level graph partitioning for blackbox clustering
added support for piecewise linear basis functions and Maxwell EFIE/MFIE
rewrote interface to BLAS/LAPACK
rewrote C interface with better mapping of internal C++ and C types
increased robustness of matrix factorisation in case of bad-conditioned matrices
increased speedup of matrix factorisation in multi-threaded computations
many performance improvements and bug fixes

v0.13

v0.13.6 (released 2009-01-30)

added optional diagonal scaling of H-matrices during LU factorisation
added blockwise accuracy, e.g. accuracy depending on current matrix block
rewrote accuracy handling in C bindings
simplified BSP partitioning methods and added regular cardinality based and principle component based clustering
added optional balancing of tree depth in cluster tree construction with predefined partitioning
implemented optional double precision computation of matrix inversion and low-rank truncation in single precision mode
fixed bug in calling single precision norm functions of LAPACK
fixed bug in PostScript output and modified H-matrix output in PostScript format
added support for Jacobi based SVD (sgejsv and dgejsv) in LAPACK v3.2

v0.13.5 (released 2008-09-23)

removed ID based cluster tree computations in matrices
always computing SCC in algebraic clustering, also in nested dissection clustering
reordering clusters depending on size ratio (large first)
fixed bug with filenames without directories
fixed non-exception safe OpenMP usage
added matrix reduction to nearfield part
added dense low-rank multiplication if result is large dense matrix

v0.13.4 (released 2008-04-24)

fixed solve functions in TLU, TLDL (checking for NULL blocks)
fixed OpenMP call with zero threads in TLU TLDL and TMatrixInv
fixed operator = in autoptr (wrong const)
removed unnecessary checks in TArray::copy
fixed recursive call in restrict_blockdiag
replaced fixed constants by type dependent constants in lapack.cc
fixed TMatrixInv::multiply_diag when only D is dense

v0.13.3 (released 2008-03-27)

fixed several warnings from Visual C++ and Intel C++ compilers
moved all global variables and functions into HLIB namespace (except xerbla override)
enabled user defined prefix for functions and types in C interface and added override for namespace name
reactivated cardinality check when using HLIB_BSP_AUTO

v0.13.2 (released 2008-02-29)

replaced threads and mutices by OpenMP (thread start only, no scheduling)
included log file support in addition to stdout
added parallel LDL^T factorisation (DD and blockdiag only)
added parallel blockdiag LU factorisation
added zero approximation during matrix construction (for nearfield only)
fixed bug in algebraic nested dissection clustering (wrong path length in interface)

v0.13.1 (released 2008-02-04)

reduced memory consumption/fragmentation in ACA generated matrices with large rank
added Fiduccia/Mattheyses bisection optimisation for BFS clustering
added FFT for vectors by implementing support for FFTW3 (optional)
fixed bug in TBSPPartCTBuilder when using more than two partitions
fixed potential issues in sorting algorithms
fixed type issues with *_bytesize functions in C interface
fixed bug in PostScript visualisation of matrices if matrix norm is zero
fixed issues with GCC-4.3
fixed bug in command line parsing of configuration system
minor modifications to SCons system to increase userfriendliness

v0.13(.0) (released 2007-12-19)

general Algorithmic Changes
- support for single precision arithmetic; has to be decided before compiling HLIBpro
- made complete C++ functions and classes visible from outside instead of just C interface functions
- rewrote complex arithmetic to distinguish between symmetric and hermitian matrices; added LDL^H and LL^H factorisations
- inversion now based on LU, thereby reducing memory consumption (roughly halved)
- added computation of the diagonal of the inverse without computing the inverse
- added evaluation of LU, LDL^T factorisations (instead of just solving)
- removed point-wise LU and LDL^T factorisation (only blocked) to improve robustness with zeroes on diagonal
- added (optional) check and fix for singular sub matrices during inversion and factorisation
- added complex valued HCA
- new version of ACA+
- multiplication C = ADB with diagonal D implemented
- implemented bilinear forms for Helmholtz single and double layer potential
- implemented bilinear form for acoustic scattering
- rewrote algebraic clustering for sparse matrices; added support for Scotch and CHACO
- added support for periodic coordinates in clustering
- added clustering with user defined index partition on first level in cluster tree
- added standard admissibility for algebraic clustering
- added maximal level in clustering to prevent infinite recursion
- modified solvers to handle complex valued data
- added permutation of dense matrices without temporary storage (needed in IO)
parallel Arithmetic
- added thread parallel algorithms for matrix construction, matrix multiplication, inversion and LU factorisation
- redesigned thread pool, thereby fixing race conditions
- added support for Windows threads
- fixed several issues with thread safety
Input and Output
- added general I/O functions with autodetection of file format
- added output of matrices in Harwell/Boeing format
- added MatrixMarket format
- added support for Ply and surface mesh format (NetGen) for Grid I/O
- fixed format errors in SAMG output
- conversion of arbitrary matrices to sparse format when writing in SAMG or Harwell/Boeing format
- fixed support for symmetric matrices in Harwell/Boeing format
C interface
- prefixed all functions, types and constants with hlib_ (or HLIB_) to prevent collisions with other definitions (OS or libraries)
- added support for C99 complex types (if available)
- added hlib_set_coarsening to activate/deactivate coarsening during matrix construction (default: on) and matrix arithmetic (default: off)
- added hlib_matrix_inv_diag to return diagonal of inverse
- added hlib_matrix_is_complex to test for real or complex valued matrices
- added hlib_set_nthreads to set number of threads
- added hlib_coord_t as special type for coordinates
- separated stop criterion and solver in solver interface
Miscellaneous
- updated CPUflags and Rmalloc
- fixed optimisation issues (leading to infinite loops) in enclosed CLAPACK

v0.12 (released 2006-11-01)

Algorithmic Changes
- added (blocked) LDL^T factorisation (now default for symmetric matrices)
- no longer need extra matrix in matrix inversion
- using ACAFull in HCA (instead of SVD)
- adaptively choosing quadrature and interpolation order in ACA and HCA
- rewrote matrix addition to support general cases, e.g. low-rank to blocked
- rewrote low-rank truncation handling
- support for METIS in algebraic clustering routines
- added basic support for “dense” sparse matrices, e.g. with highly coupled indices
- added SSE2 based HCA algorithm
- added infinity norm for vectors
- using norm of preconditioned residual for all solvers if preconditioner is present
- added MINRES iteration
- using ADM_AUTO as default admissibility
- finally removed all asserts and replaced by internal error checking
Input and Output
- VRML97 support
- added Matlab compression (Matlab v7) and structs support
- support for Harwell-Boeing matrix format (read-only)
- modified PostScript output of block-wise SVD; now scaled w.r.t. 2-norm of matrix
OS and Library support
- MS Windows support
- shared libraries for Linux and Windows
- changed configure system to better handle MS Windows environment
- added internal xerbla to handle LAPACK errors directly
C interface
- automatic choice of matrix building in hlib_matrix_build_bem_grid
- introduced vector_t as type to vectors (no more C arrays)
- added Gauss and Sauter triangle quadrature rules
- added functions to access matrix and vector entries
- added copyto and copyto_eps functions
- added hlib_matrix_build_dense to build H-matrix from dense matrix
- changed solver management
Miscellaneous
- several improvements and bug fixes
- cleaned up error codes
- updated CPUflags and Rmalloc

v0.11 (released 2006-05-29)

Arithmetic
- added ACA-Full
- added HCA (hybrid cross approximation)
- complex valued ACA and SVD
- added copy with coarsening for H-matrices
- added computation of spectral norm for the inverse of a matrix
- support for permutations in matrix-vector multiplication of sparse matrices
- added support for Laplace SLP/DLP and 3D triangle surface grids
- fixed issues with degenerated bounding boxes in geometrical clustering
Input/Output
- support for PLTMG matrix format
Miscellaneous
- replaced error handling with exceptions
- added modified CLAPACK as default implementation of LAPACK to HLIBpro
- integrated CPUFlags into configure system
- added function for fast reciprocal square root

v0.10 (released 2006-04-05)

Arithmetic
- initial support for complex arithmetic
- support for symmetric matrices in arithmetic
- implemented block LU factorisation
- implemented LDL^T factorisation
- added Frobenius norm for sparse matrices
- support for CRS format in sparse matrices
- added Jacobi and SOR matrix types (for matrix-vector multiplication)
- implemented hierarchical domain decomposition with parallel arithmetics
Parallel Algorithms
- thread-parallel Cholesky factorisation
- thread-parallel coarsening of H-matrices
- fixed thread-parallel LU and inversion
- fixed dead-locks in thread-pool
- added direct communication in BSP mode
- parallel addition of matrices and vectors via streams
Input/Output
- support for Matlab and SAMG format
Miscellaneous
- introduced C interface functions and types
- added configure system for Makefiles
- added progress meter support for arithmetic
- added internal RTTI system
- support for memory consumption query on HP-UX
- rewrote error handling

v0.9 (released 2004-11-30)

first public version as PHI (Parallel H-matrix Implementation)
merged BSP-parallel and thread-parallel versions of H-matrix library