HLIBpro  3.0
Matrix Construction

Matrix construction in 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 is the process of using a given block cluster tree $$T$$ and constructing matrices for all leaves in $$T$$ and for all inner nodes of $$T$$. The leaf matrices define the actual content of the matrix and may be of various formats, e.g. dense (TDenseMatrix) and low rank (TRkMatrix) matrices . Inner nodes on the other hand define the hierarchy of the H-matrix and are important for the matrix algebra. Such block matrices are usually of type TBlockMatrix.

A special version of a block matrix is THMatrix, which also stores the permutations supplied by block cluster trees to map between internal and external ordering of the unknowns during matrix vector multiplication. Standard block matrices, e.g. of type TBlockMatrix, will only know about the interal ordering of unknowns.

Remarks
This mapping is only supported for matrix vector multiplication and not for general matrix algebra.

All matrix types in 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 are generic with respect to the value type, for which float, double, std::complex< float > and std::complex< double > is supported. We leave this unspecified by using the type value_t .

All matrix construction techniques are derived from the corresponding base class TMatBuilder, which defines the basic build interface, which mainly consists of the function build:

std::unique_ptr< TMatrix< value_t > >
build ( const TBlockClusterTree * cluster, // ← full tree
const TTruncAcc & acc,
TProgressBar * progress = nullptr ) const;
std::unique_ptr< TMatrix< value_t > >
build ( const TBlockCluster * cluster, // ← sub tree
const TTruncAcc & acc,
TProgressBar * progress = nullptr ) const;

Both functions will accept a progress meter to indicate the current state of the process.

The difference between both functions comes from the different type of cluster. In the first case a TBlockClusterTree object is given and thereby also the mappings between internal an external ordering of the unknowns stored within this object. This allows the construction of a THMatrix object. In the second case, this information is not available and therefore (usually) a TBlockMatrix object will be returned.

# Accuracy Control

The approximation accuracy of low-rank matrices during construction or truncation is controlled by TTruncAcc objects.

It provides different limits for the rank:

Fixed Rank

All low-rank matrices will have a fixed rank $$k$$. During truncation only the largest $$k$$ singular values are kept.
Fixed rank accuracy is defined by the function fixed_rank( k ), e.g.,

auto acc = fixed_rank( 5 );

Fixed Precision

The rank is defined adaptively, e.g., for a matrix $$M$$ ensure that for the low-rank approximation $$\tilde M$$ holds $$|M - \tilde M| \le \varepsilon |M|$$, where $$\varepsilon$$ defines the precision. For truncation this means that all singular values $$s_i$$ with $$s_i / s_0 \ge \epsilon$$ are kept.
Fixed precision accuracy is defined by the function fixed_prec( eps ), e.g.,

auto acc = fixed_prec( 1e-4 );

Both may be coupled with

Absolute Precision

The rank is defined adaptively, e.g., for a matrix $$M$$ ensure that for the low-rank approximation $$\tilde M$$ holds $$|M - \tilde M| \le \varepsilon$$, where $$\varepsilon$$ defines the precision. For truncation this means that all singular values $$s_i$$ with $$s_i \ge \epsilon$$ are kept.
The absolute precision is an optional second argument to fixed_rank and fixed_prec, e.g.

auto acc1 = fixed_rank( 5, 1e-8 );
auto acc2 = fixed_prec( 1e-4, 1e-12 );

By default, an the limit for absolute precision is 0.

Furthermore, in case of fixed precision a maximal rank may be set with TTruncAcc::set_max_rank( k ), e.g., the rank may grow adaptively based on the predefined $$\varepsilon$$ up to $$k$$.

## Accuracy for Subblocks

With TBlockTruncAcc it is possible to specify different accuracies for different subblocks of a matrix. The subblocks are hereby addressed with their row and column index sets.

std::vector< TIndexSet > row_idxs{ ris0, ris1, ris2 };
std::vector< TIndexSet > col_idxs{ cis0, cis1 };
std::vector< TTruncAcc > accs{ acc00, acc10, acc20,
acc01, acc11, acc21 };
TBlockTruncAcc acc( row_idxs, col_idxs, accs );

In this example, a $$3 \times 2$$ partitioning of the global indexset is defined by the row indexsets row_idxs and the column indexsets col_idxs. For each subblock of this partitioning a separate accuracy object is used (specified in column major format).

For a given matrix block of an H-matrux, the accuracy object is determined by testing in which subblock of the above partitioning the matrix block is in. Due to this, the index partitioning of TBlockTruncAcc should follow the index partitioning of the H-matrix or otherwise errors may occur with matrix blocks belonging to several accuracy objects (intersecting more than one accuracy subblock).

In principle, this may be applied recursively and thereby a specific accuracy may be defined for each subblock of the H-matrix.

# Dense Matrices

In a dense matrix $$A$$, all leaf nodes of the block cluster tree will usually be represented by a non-zero matrix block. By default, a non-admissible leaf node will become a dense matrix, e.g. TDenseMatrix, and an admissible node will become a low rank matrix, e.g. TRkMatrix. This construction is implemented in TDenseMatBuilder, which needs two additional object to perform the H-matrix construction:

• a coefficient function, which returns the actual matrix coefficients of $$A$$ for a given sub block of the matrix and
• a low rank approximation object, which will construct low rank approximations of sub blocks of $$A$$.

## Coefficient Functions

The coefficient function has to be of type TCoeffFn or, to be precise of a derived class (as TCoeffFn is abstract). The main function in TCoeffFn is:

void eval ( const std::vector< idx_t > & rowis,
const std::vector< idx_t > & colis,
value_t * matrix ) const;
void eval(const TMatrix< value_t > *A, TVector< value_t > *v, const matop_t op, const eval_option_t &options)
evaluate L·U·x = y with lower triangular L and upper triangular U

Here, rowis and colis hold the row and column indices of the sub matrix rowis × colis to evaluate and matrix should be used to store the result in column wise order (as usual in 𝖧𝖫𝖨𝖡𝗉𝗋𝗈), e.g., the coefficient corresponding to the i'th row and j'th column index should be stored at position (i,j) in matrix.

const size_t nrows = tau.size();
const size_t ncols = sigma.size();
for ( size_t j = 0; j < ncols; ++j ) {
const idx_t col_idx = colis[j];
for ( size_t i = 0; i < nrows; ++i ) {
const idx_t row_idx = rowis[i];
const real f = ... // compute coefficient (row_idx,col_idx)
matrix[ j * nrows + i ] = f;
}
}

TCoeffFn also implements another evaluation function:

virtual void eval ( const TIndexSet & rowis,
const TIndexSet & colis,
value_t * matrix ) const;

which is based on TIndexSet. This function can be used, if the index set is contiguous, which is the case with the internal ordering of H-matrices, e.g., clusters.

const size_t nrows = rowis.size();
const size_t ncols = colis.size();
for ( size_t j = 0; j < ncols; ++j ) {
const idx_t col_idx = colis.first() + j;
for ( size_t i = 0; i < nrows; ++i ) {
const idx_t row_idx = rowis.first() + i;
const real f = ... // compute coefficient (row_idx,col_idx)
matrix[ j * nrows + i ] = f;
}
}

By default, the TIndexSet version of eval relies on the above eval function.

Furthermore, TPermCoeffFn implements the mapping from internal to external orderings and adds this permutation to the second eval function, calling the first eval function with index sets in external ordering.

Another property of the final H-matrix has to be defined by the coefficient function, namely the matrix format, e.g. whether it is unsymmetric, symmetric or hermitian. This is done by overloading the function

matform_t matrix_format () const;

A full example for a user defined cofficient function is described in Solving an Integral Equation.

## Low Rank Approximation

𝖧𝖫𝖨𝖡𝗉𝗋𝗈 provides several techniques for computing low rank approximations for dense matrices:

class Description
TSVDLRApx singular value decomposition,
TRandSVDLRApx randomized singular value decomposition,
TRRQRLRApx rank revealing QR,
TACA classical adaptive cross approximation (ACA),
TACAPlus advanced ACA with additional tests,
TACAFull ACA with full pivoting,
THCA hybrid cross approximation (HCA),
TZeroLRApx approximate by 0, e.g. build just nearfield.
TDenseLRApx do not approximate and represent all by dense matrices

Although singular value decomposition will compute the best approximation, it is also the most time consuming with a computational complexity of $$\mathcal{O}(n^3)$$, where $$n$$ is the number of rows/columns.

Randomized SVD is usually faster but needs the full dense block for computations, i.e., complexity is at least $$\mathcal{O}(n^2)$$. The same holds for rank revealing QR.

ACA is a heuristical approximation technique based on consecutive elimination of rank-1 matrices. In contrast to this HCA will always return an approximation of a given accuracy. The advantage of ACA compared to HCA is, that it only needs access to the matrix coefficients, whereas HCA needs further functionality, e.g. evaluation of collocation integrals. Both, ACA and HCA have a runtime of $$\mathcal{O}(n)$$. ACA-Full is a special variant of ACA where the full matrix is searched for the best rank-1 matrix to eliminate. This also guarantees the user-defined accuracy albeit with a higher complexity of ( $$\mathcal{O}(n^2)$$).

Remarks
Except when problems are known, TACAPlus will work in most applications and should therefore be the default choice.

Solving an Integral Equation contains a complete example for H-matrix construction based on dense matrices.

# Sparse Matrices

Representing sparse matrices using H-matrices will usually result in a much higher memory consumption since H-matrices have a significant overhead for storing data. Hence, it should only be performed if additional arithmetic is wanted for these matrices. The standard case is the computation of an H-LU factorisation of a sparse matrix.

The construction of the leaf matrix blocks, and hence, the construction of the H-matrix out of a sparse matrix is done by TSparseMatBuilder. Beside the actual sparse matrix, it expects also mappings for the row and the column index sets between internal and external ordering since the sparse matrix is often given in external ordering. If not, the mappings may be nullptr.

TSparseMatBuilder< value_t > builder( S, ct->perm_i2e(), ct->perm_e2i() );
A = builder.build( bct.get(), acc );

A complete example for H-matrix construction with sparse matrices can be found in Solving a Sparse Equation System.

Remarks
In most applications admissible blocks will be zero in the resulting H-matrix since only for neighboured (geometrically or algebraically) indices, and hence for the nearfield, non-zero coefficients will occur.

A special mode can be activated within TSparseMatBuilder where instead of using TDenseMatrix and TRkMatrix, all leaf matrices will be constructed as sparse matrices itself. Although this significantly reduces the memory consumption of the final H-matrix, no arithmetic is supported for such a matrix aside from matrix-vector multiplication. This mode can be enabled/disable by calling set_sparse_mode:

TSparseMatBuilder< value_t > builder( S );
builder.set_sparse_mode( true ); // enable sparse leaf matrices
A = builder.build( bct.get(), acc );