is sought, whereby H-LU factorisation shall be used to improve the convergence of an iterative solver. No geometrical data is assumed and hence, algebraic clustering is performed for converting \(A\) into an H-matrix. Furthermore, nested dissection is applied to improve the efficiency of the H-LU factorisation.

Loading the Sparse Matrix

The program starts with the inclusion of the needed header files and the initialisation of 𝖧𝖫𝖨𝖡𝗉𝗋𝗈.

#include <cstdlib>
#include <iostream>
#include "hlib.hh"
using namespace std;
using namespace HLIB;
int
main ( int argc, char ** argv )
{
  try 
  {
    INIT();

The next step is the definition of the sparse matrix, which shall be read from a file. 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 supports various foreign matrix formats, e.g. Matlab, SAMG, Harwell-Boeing or Matrixmarket. In this example, the matrix is stored in Matlab format:

auto M = read_matrix( "M.mat" );

The function read_matrix performs autodetection of the file format and is therefore suited for all supported formats.

Remarks: In the case of Matlab, read_matrix reads in the first matrix found in the file. If you have stored several matrices in one file, you have to use TMatlabMatrixIO and provide the name of the matrix to read.

By using unique_ptr, the corresponding object is automatically deleted when the current block is left, thereby reducing the danger of memory leaks. This technique is extensively used in 𝖧𝖫𝖨𝖡𝗉𝗋𝗈.

Since any kind of matrix may be stored in the given file, e.g. a dense matrix, but in the following sparse matrices are expected, the type of the matrix is tested:

if ( ! IS_TYPE( M, TSparseMatrix ) ) 
{
    cout << "given matrix is not sparse (" << M->typestr() << ")" << endl;
    exit( 1 );
}
auto               S = ptrcast( M.get(), TSparseMatrix );

The macro IS_TYPE returns true, if the corresponding object is of the given type. This is an example of the runtime type information system (RTTI) in 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 and works similar to dynamic casts in C++. The advantage of the 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 RTTI is, that further information can be gathered, e.g. the type name through the function typestr(), independent of the corresponding C++ compiler support.

Remarks: The 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 RTTI is also faster than the C++ RTTI since only some numbers are compared but of course, the C++ approach is much more general and needs no special support by the programmer.

Finally, since we know by now that M contains a sparse matrix, we cast the pointer for easier use in the following. The macro ptrcast is just an abbreviation of a C++ cast and was introduced in 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 for ease of use and for debugging purposes (ptrcast is different depending on compilation options).

We now make use of the TSparseMatrix pointer S and may output some information about it:

cout << "  matrix has dimension " << S->rows() << " x " << S->cols() << endl;
cout << "    no of non-zeroes    = " << S->n_non_zero() << endl;
cout << "    matrix is             " << ( S->is_complex() ? "complex" : "real" ) 
     << " valued" << endl;
cout << "    format              = ";
if      ( S->is_nonsym()    ) cout << "non symmetric" << endl;
else if ( S->is_symmetric() ) cout << "symmetric" << endl;
else if ( S->is_hermitian() ) cout << "hermitian" << endl;
cout << "  size of sparse matrix = " << Mem::to_string( S->byte_size() ) << endl;
cout << "  |S|_F                 = " << norm_F( S ) << endl;

This includes the dimension of the matrix, i.e. the number of rows and columns, as well as the number of non zero elements. Furthermore, the value type, real of complex, and matrix properties are printed. Finally, the storage size and the Frobenius norm is computed.

Conversion to H-Format

Having a right hand side, the equation system above could now be solved using an iterative scheme. Unfortunately, most sparse matrices need some form of preconditioning for an acceptable convergence behaviour. The preconditioner shall be in the form of a H-LU factorisation of \(A\) and hence, of S. For this, we need a cluster tree and a block cluster tree.

Since, by assumption, no geometrical data is available, the clustering is performed purely algebraically by using the connectivity relation of the indices stored in the sparse matrix itself. Furthermore, nested dissection is to be used, which needs some special clustering.

Algebraic clustering uses graph partitioning as the underlying technique. 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 implements two graph partitioning algorithms (TBFSAlgPartStart and TMLAlgPartStrat) and supports external algorithms, e.g. METIS and Scotch. Here we use the BFS-based partitioning strategy implemented by TBFSAlgPartStrat.

Nested dissection clustering is based on a standard bissection clustering algorithm, which therefore has to be defined explicitly in the form of TAlgCTBuilder, which on the other hand uses the graph partitioning strategy. Finally, TAlgNDCTBuilder computes the cluster tree for the index set defined by S:

TBFSAlgPartStrat   part_strat;
TAlgCTBuilder      ct_builder( & part_strat );
TAlgNDCTBuilder    nd_ct_builder( & ct_builder );
auto               ct = nd_ct_builder.build( S );

Having the cluster tree, the block cluster tree can be computed:

TWeakAlgAdmCond    adm_cond( S, ct->perm_i2e() );
TBCBuilder         bct_builder;
auto               bct = bct_builder.build( ct.get(), ct.get(), & adm_cond );
cout << "    sparsity constant = " << bct->compute_c_sp() << endl;

Here, the weak admissibility condition for graphs, implemented by TWeakAlgAdmCond, is used. Since the block clusters to test are based on the internal ordering of the H-matrix but the sparse matrix is defined in the external ordering, the corresponding mappings between both is needed by the admissibility condition.

Remarks: See Parameters for nmin, cluster_level_mode and sync_interface_depth, which may be changed to optimise the partitioning.

The printed sparsity constant gives some hint about the complexity of the H-matrix constructed over the block cluster tree, the smaller, the better.

The final conversion of the sparse matrix into H-format is performed by TSparseMBuilder. Here, again the mappings between internal and external ordering are needed. The accuracy can be usually neglected, since low rank blocks are usually empty (clusters of positive distance usually have no overlapping basis functions).

TSparseMBuilder    h_builder( S, ct->perm_i2e(), ct->perm_e2i() );
TTruncAcc          acc( real(0.0) );
auto               A = h_builder.build( bct.get(), acc );
cout << "    size of H-matrix  = " << Mem::to_string( A->byte_size() ) << endl;
cout << "    |A|_F             = " << norm_F( A.get() ) << endl;
{
    auto  PA = make_unique< TPermMatrix >( ct->perm_i2e(), A.get(), ct->perm_e2i() );
    cout << " |S-A|_2 = " << diff_norm_2( S, PA.get() ) << endl;
}

After matrix construction, the size of the H-matrix \(A\) and its norm is printed together with the (relative) norm of the difference \(\|S-A\|_2/\|S\|_2\), indicating any approximation error. For this, the matrix \(A\) should operator with the same ordering of the unknowns as \(S\) and is therefore wrapped in a TPermMatrix object.

H-LU factorisation

The factorisation of A may be computed differently, depending on the format of A, e.g. if it is symmetric or not. In the latter case the standard \(LU\) factorisation is used, implemented in TLU, while in the symmetric case the \(LDL^T\) (or \(LDL^H\) for hermitian matrices) factorisation implemented in TLDL is computed.

The factorisation is performed in place, i.e. the data of A is overwritten by the corresponding factores. This implies, that A is actually not a matrix object after the factorisation since the stored data depends on special handling. For this, linear operators are used with special versions for each factorisation algorithm to permit evaluation of the factorised matrix or its inverse.

const TTruncAcc fac_acc( 1e-4 );

auto A_inv = factorise_inv( A.get(), fac_acc );

The factorisation is performed up to a block-wise accuracy of \(10^{-4}\). Depending on the given matrix this has to be changed to obtain a direct solver (single matrix-vector mult.) or a good preconditioner (small number of iterations).

Some properties of the factorisation are printed next, most notably the inversion error \(\|I-(LU)^{-1}S\|_2\) and the condition estimate \(\|LU\mathbf{1}\|_2\). However, to compare \((LU)^{-1}\) with \(S\), both need to have the same ordering of the indices. Instead of reordering \(S\), the factorised matrix may be wrapped in an object representing a permuted matrix:

auto PA_inv = make_unique< TPermMatrix >( ct->perm_i2e(), A_inv.get(), ct->perm_e2i() );

Here, an operator was built, which first reorders w.r.t. the H-matrix, then applies the factorised matrix and finally reorders back.

Now, the inversion error maybe computed:

cout << "    size of LU factor = " << Mem::to_string( A->byte_size() ) << endl
     << "    inv. error wrt S  = " << inv_approx_2( S, PA_inv.get() )
     << endl
     << "    condest(LU)       = " << condest( PA_inv.get() ) << endl;

Solving the Equation System

Finally, the equation system can be solved using the inverse of the matrix factorisation as a preconditioner. The right hand side is assumed to be available in Matlab format and is read in using read_vector, which again performs file format auto detection. The solution vector is created using the col_vector function of S, which returns a vector defined over the column index set of S.

TAutoSolver        solver;
TSolverInfo        sinfo;
TAutoVectorIO      vio;
auto               b = S->col_vector();
auto               x = read_vector( "b.mat" );
solver.solve( S, x.get(), b.get(), PA_inv.get(), & sinfo );
if ( sinfo.converged() )
    cout << "  converged in " << sinfo.n_iter() << " steps"
         << " with rate " << sinfo.conv_rate()
         << ", |r| = " << sinfo.res_norm() << endl;
else
    cout << "  not converged in " << sinfo.n_iter() << " steps " << endl;
write_vector( x.get(), "x.mat" );

With write_vector the solution vector to a file, again in Matlab format.

Remarks: File format autodetection when reading a file is performed by actually looking at the content of the file, yielding a robust autodetection algorithm. When writing a file, only the filename, especially the filename extension is used, which may not be unique between different formats (but works in most cases).

The program is finished with the finalisation of 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 and the closing of the try block:

    DONE();
  }
  catch ( Error & e )
  {
    cout << e.to_string() << endl;
  }
  return 0;
}

The Plain Program

#include <cstdlib>
#include <iostream>
#include "hlib.hh"
using namespace std;
using namespace HLIB;
int
main ( int argc, char ** argv )
{
    try 
    {
        INIT();
        auto               M = read_matrix( "M.mat" );
        if ( ! IS_TYPE( M, TSparseMatrix ) ) 
        {
            cout << "given matrix is not sparse (" << M->typestr() << ")" << endl;
            exit( 1 );
        }
        auto               S = ptrcast( M.get(), TSparseMatrix );
        cout << "  matrix has dimension " << S->rows() << " x " << S->cols() << endl
             << "    no of non-zeroes    = " << S->n_non_zero() << endl
             << "    matrix is             " << ( S->is_complex() ? "complex" : "real" ) 
             << " valued" << endl
             << "    format              = ";
        if      ( S->is_nonsym()    ) cout << "non symmetric" << endl;
        else if ( S->is_symmetric() ) cout << "symmetric" << endl;
        else if ( S->is_hermitian() ) cout << "hermitian" << endl;
        cout << "  size of sparse matrix = " << Mem::to_string( S->byte_size() ) << endl;
        cout << "  |S|_F                 = " << norm_F( S ) << endl;
        TBFSAlgPartStrat   part_strat;
        TAlgCTBuilder      ct_builder( & part_strat );
        TAlgNDCTBuilder    nd_ct_builder( & ct_builder );
        auto               ct = nd_ct_builder.build( S );
        TWeakAlgAdmCond    adm_cond( S, ct->perm_i2e() );
        TBCBuilder         bct_builder;
        auto               bct = bct_builder.build( ct.get(), ct.get(), & adm_cond );
        cout << "    sparsity constant = " << bct->compute_c_sp() << endl;
        TSparseMBuilder    h_builder( S, ct->perm_i2e(), ct->perm_e2i() );
        TTruncAcc          acc( real(0.0) );
        auto               A = h_builder.build( bct.get(), acc );
        cout << "    size of H-matrix  = " << Mem::to_string( A->byte_size() ) << endl;
        cout << "    |A|_F             = " << norm_F( A.get() ) << endl;
        {
            auto  PA = make_unique< TPermMatrix >( ct->perm_i2e(), A.get(), ct->perm_e2i() );
            cout << " |S-A|_2 = " << diff_norm_2( S, PA.get() ) << endl;
        }
        const TTruncAcc    fac_acc( 1e-4 );
        auto               A_inv  = factorise_inv( A.get(), fac_acc );
        auto               PA_inv = make_unique< TPermMatrix >( ct->perm_i2e(), A_inv.get(), ct->perm_e2i() );
        cout << "    size of LU factor = " << Mem::to_string( A->byte_size() ) << endl
             << "    inv. error wrt S  = " << inv_approx_2( S, PA_inv.get() )
             << endl
             << "    condest(LU)       = " << condest( PA_inv.get() ) << endl;
        TAutoSolver        solver;
        TSolverInfo        sinfo;
        auto               b = S->col_vector();
        auto               x = read_vector( "b.mat" );
        solver.solve( S, x.get(), b.get(), PA_inv.get(), & sinfo );
        if ( sinfo.converged() )
            cout << "  converged in " << sinfo.n_iter() << " steps"
                 << " with rate " << sinfo.conv_rate()
                 << ", |r| = " << sinfo.res_norm() << endl;
        else
            cout << "  not converged in " << sinfo.n_iter() << " steps " << endl;
        write_vector( x.get(), "x.mat" );
        DONE();
    }
    catch ( Error & e )
    {
        cout << e.to_string() << endl;
    }
    return 0;
}

Table of Contents

Loading the Sparse Matrix

Conversion to H-Format

H-LU factorisation

Solving the Equation System

The Plain Program