In this example, an integral equation is to be solved by representing the discretised operator with an H-matrix, whereby the H-matrix is constructed using a matrix coefficient function for entries of the equation system. Furthermore, for the iterative solver a preconditioner is computed using H-LU factorisation.

Problem Description

Given the integral equation

\[ \int_0^1 \log|x-y| \mathbf{u}(y) dy = \mathbf{f}(x), \quad x \in [0,1] \]

with \(\mathbf{u} : [0,1] \to \mathbf{R}\) being sought for a given right hand side \(\mathbf{f} : [0,1] \to \mathbf{R}\), the Galerkin discretisation with constant ansatz functions \(\phi_i, 0 \le i < n\)

\[ \phi_i(x) = \left\{ \begin{array}{ll} 1, & x \in \left[\frac{i}{n},\frac{i+1}{n} \right] \\ 0, & \mathrm{otherwise} \end{array} \right. \]

leads to a linear equation system \(A u = f\) where \(u\) contains the coefficients of the discretised \(\mathbf{u}\) and \(A\) is defined by

\begin{eqnarray*} a_{ij} & = & \int_0^1 \int_0^1 \phi_i(x) \log|x-y| \phi_j(y) dy dx \\ & = & \int_{\frac{i}{n}}^{\frac{i+1}{n}} \int_{\frac{j}{n}}^{\frac{j+1}{n}} \log|x-y| dy dx \nonumber . \end{eqnarray*}

The right hand side \(f\) is given by

\[ f_i = \int_0^1 \phi_i(x) f(x) dx. \]

Remarks: This example is identical to the introductory example in [BoerGraHac:2003b]}.

Coordinates and Cluster Trees

The code starts with the standard initialisation:

#include <iostream>
#include <hlib.hh>
using namespace HLIB;
using namespace std;
using  real_t = HLIB::real;
int main ( int argc, char ** argv ) 
{
  try
  {
    INIT();
    CFG::set_verbosity( 3 );

Together with the initialisation, the verbosity level of HLIBpro was increased to have additional output during execution. The corresponding function set_verbosity is available in the namespace HLIB::CFG.

Remarks: The type real_t is defined to prevent ambiguity errors with HLIB::real and other types of that name.

For clustering the unknowns, coordinate information is to be defined. For this 1d example, the indices are placed in the middle of equally sized intervals.

const size_t              n = 1024;
const double              h = 1.0 / double(n);
std::vector< double * >   vertices( n );
for ( size_t i = 0; i < n; i++ )
{
    vertices[i]    = new double;
    vertices[i][0] = h * double(i) + ( h / 2.0 ); // center of [i/h,(i+1)/h]
}
auto                      coord = make_unique< TCoordinate >( vertices, 1 );

Coordinates in HLIBpro are vectors of the corresponding dimension, e.g. one in for this problem. The set of all coordinates is stored in objects of type TCoordinate.

The usage of smart pointers, e.g. unique_ptr, is advised, and extensively used in HLIBpro, since it decreases the possibility of memory leaks by automatically deleting the objects upon destruction of the spart pointer variable, e.g. when leaving the local block.

Having coordinates for each index, the cluster tree and block cluster tree can be constructed.

TAutoBSPPartStrat         part_strat;
TBSPCTBuilder             ct_builder( & part_strat, 20 );
auto                      ct = ct_builder.build( coord.get() );

For the cluster tree, the partitioning strategy for the indices is determined automatically by TAutoBSPPartStrat. Furthermore, a \(n_{\min}\) of size 20 is used during construction.

TStdGeomAdmCond           adm_cond( 2.0 );
TBCBuilder                bct_builder;
auto                      bct = bct_builder.build( ct.get(), ct.get(), & adm_cond );

For the block cluster tree, the standard admissibility condition (see {eqn:std_adm}) implemented in TStdGeomAdmCond is used with \(\eta = 2\).

Matrix Construction and Coefficient Function

Using the block cluster tree, finally the H-matrix can be constructed. For this, adaptive cross approximation (ACA, see Low Rank Approximation) is applied to compute low rank approximations of admissibly matrixblocks. In HLIBpro, an improved version of ACA is implemented in TACAPlus.

ACA only needs access to the matrix coefficients of the dense matrix. Those are provided by TLogCoeffFn, which will be discussed below. Finally, the block wise accuracy of the H-matrix approximation has to be defined using TTruncAcc objects. In this case, an accuracy of \(\epsilon = 10^{-4}\) is used.

TLogCoeffFn               coefffn( h, ct->perm_i2e(), ct->perm_i2e() );
TACAPlus< real_t >        aca( & coefffn );
TDenseMBuilder< real_t >  h_builder( & coefffn, & aca );
TTruncAcc                 acc( 1e-4, 0.0 );
auto                      A = h_builder.build( bct.get(), acc );

Remarks: Matrix construction is an example of a parallel algorithm in 𝖧𝖫𝖨𝖡𝗉𝗋𝗈. On shared memory machines, e.g. standard PCs or compute servers, by default as many threads as there are processor cores in the system are used by 𝖧𝖫𝖨𝖡𝗉𝗋𝗈. The can be changed by using CFG::set_nthreads().

The coefficient function used during matrix construction is implemented in class TLogCoeffFn.

class TLogCoeffFn : public TPermCoeffFn< real_t > {

It is derived from TPermCoeffFn, where the interface is defined for coefficient evaluation with permuted indices. The reordering is necessary, since due to the construction of the cluster tree the numbering of the indices in the H-matrix is usually different from the corresponding numbering in the application. The latter is referred to as the external numbering, whereas the first is called the internal numbering.

Going on with TLogCoeffFn, the step width for the grid is stored as an internal variable for kernel evaluation:

private:

const double _h;

In the constructor, the permutation for mapping the internal to the external numbering are provided, both for rows and for columns, together with the step width:

public:
    TLogCoeffFn ( const double         h,
                  const TPermutation * row_perm,
                  const TPermutation * col_perm )
            : TPermCoeffFn< real_t >( row_perm, col_perm )
            , _h(h)
    {}

The actual interface for the evaluation of the matrix coefficients is implemented by the function eval, which is defined in TPermCoeffFn and has to be overloaded:

virtual void eval  ( const std::vector< idx_t > &  rowidxs,
                     const std::vector< idx_t > &  colidxs,
                     real_t *                      matrix ) const
{
      const size_t  n = rowidxs.size();
      const size_t  m = colidxs.size();
      for ( size_t  j = 0; j < m; ++j )
      {
          const int  idx1 = colidxs[ j ];
          for ( size_t  i = 0; i < n; ++i )
          {
              const int  idx0 = rowidxs[ i ];
              double     value;
              if ( idx0 == idx1 ) 
                  value = -1.5*_h*_h + _h*_h*std::log(_h);
              else
              {
                  const double dist = _h * ( std::abs( idx0 - idx1 ) - 1 );
                  const double t1   = dist+1.0*_h;
                  const double t2   = dist+2.0*_h;
                  value = ( - 1.5*_h*_h + 0.5*t2*t2*std::log(t2)
                            - t1*t1*std::log(t1) );
                  if ( std::abs(dist) > 1e-8 )
                      value += 0.5*dist*dist*std::log(dist);
              }
              matrix[ j*n + i ] = -value;
          }
      }
  }

This functions gets a set of indices for rows (rowidxs) and columns (colidxs) and a pointer to a memory block, where all the coefficients should be stored. The indices in both sets are already in the external numbering, hence the indices can directly be used for kernel evaluation. The mapping was performed by TPermCoeffFn. The memory layout of matrix is column wise, which holds for almost all data in HLIBpro. This is due to the memory layout of BLAS/LAPACK, originally implemented in Fortran, which uses column wise storage.

TPermCoeffFn is derived from TCoeffFn, which uses a slightly different evaluation interface based on index sets in the internal numbering. TPermCoeffFn overloads this method with the conversion between both numberings and implements the new eval method. To avoid compiler warnings about ``hidden functions'', the first version is brought into local scope by

using TPermCoeffFn< real_t >::eval;

By default, the format of the final matrix defined by the coefficients will be unsymmetric, e.g. lower and upper half of the matrix will be built. Please note, that this will not effect the actual algebraic matrix which may still be symmetric or hermtitian. Only the storage and subsequent algorithms are affected in terms of computational costs.

To change the matrix format, the function matrix_format has to be overloaded. In this case, as the matrix is symmetric, this looks like

virtual matform_t matrix_format () const { return symmetric; }

Finally, TLogCoeffFn has to signal the value type, e.g. HLIB::real or HLIB::complex valued, to the matrix construction object with the method is_complex.

virtual bool is_complex () const { return false; }

};

It should be noted, that for a complex coefficient function, to evaluation method is not called eval but ceval.

H-LU Factorisation

Having constructed the H-matrix, the corresponding equation system shall be solved. In most cases, a standard iterative solver will not be sufficient, e.g. either the convergence rate is to bad or there is no convergence at all. Therefore, a (good) preconditioner is needed, with the inverse of \(A\) being the best possible. Since iterative schemes only need matrix-vector multiplication, the corresponding functionality for the inverse is sufficient. This is provided by an (H-) LU factorisation of \(A\), or for the symmetric case, a LDL factorisation.

For this, either the factorisation classes may be used directly or, instead, the function factorise_inv be empoyed, which chooses uses a LU factorisation for unsymmetric matrices and a LDL factorisation otherwise. The return value is an object providing the functionality of a linear operator, e.g. evaluation of vectors. In this case, it corresponds to an operator for the evaluation of the inverse.

auto B( A->copy() );

auto A_inv = factorise_inv( B.get(), acc );

The copy of A is neccessary, since the matrix is modified during the factorisation. The accuracy is chosen to be the same as during matrix construction.

Iterative Solvers

Still missing is the right hand side of the equation system. In this example, it was chosen, such that \(\mathbf{u} \equiv \mathbf{1}\):

auto                    b = A->row_vector();
for ( size_t  i = 0; i < n; i++ )
    b->set_entry( i, rhs( i, n ) );

with rhs defined by

real_ rhs ( const idx_t  i,
           const idx_t  n )
{
    const real_t  a     = real_t(i)             / real_t(n);
    const real_t  b     = (real_t(i)+real_t(1)) / real_t(n);
    real_t        value = -1.5 * (b - a);
    if ( std::abs( b )       > 1e-8 ) value += 0.5*b*b*std::log(b);
    if ( std::abs( a )       > 1e-8 ) value -= 0.5*a*a*std::log(a);
    if ( std::abs( 1.0 - b ) > 1e-8 ) value -= 0.5*(1.0-b)*(1.0-b)*
                                               std::log(1.0-b);
    if ( std::abs( 1.0 - a ) > 1e-8 ) value += 0.5*(1.0-a)*(1.0-a)*
                                               std::log(1.0-a); 
    return value;
}

The RHS was built using the original ordering of the unknowns. To be used with H-arithmetic, it has to be reordered according to the internal numbering of the H-matrix as defined by the clustering process. The object for the cluster tree stores both kind of permutations, i.e., from external to internal (H) numbering and from internal to external numbering. To reorder the RHS, the external to internal (e2i) permutation is used:

ct->perm_e2i()->permute( b.get() );

Remarks: H-matrices also store the mappings between internal and external numbering for the row and column index sets.

For the iterative solver TAutoSolver is used, which automatically chooses a suitable solver for the given combination of matrix and preconditioner. Furthermore, statistics about the solution process is stored in an object of type TSolverInfo, e.g. convergence rate, number of steps.

TAutoSolver             solver( 1000 );
TSolverInfo             solve_info;
auto                    x = A->col_vector();
solver.solve( A.get(), x.get(), b.get(), A_inv.get(), & solve_info );

To check the accuracy of the computed solution, we compare it with the known exact solution. However, the exact solution uses external ordering, while the computed solution is still based on the internal ordering of the H-matrix. To compare both, the ordering has to be equal:

ct->perm_i2e()->permute( x.get() );
auto                    sol = b->copy();
sol->fill( 1.0 );
x->axpy( 1.0, sol.get() );
std::cout << "  |x-x~| = " << x->norm2() << std::endl;

Remarks: Since the solution vector in this example is constant one, applying a permutation is redundant. However, in other examples, this would be important and was therefore also applied here.

The standard finalisation and catch block finishes the example:

    DONE();
  }
  catch ( Error & e )
  {
    std::cout << e.to_string() << std::endl;
  }
  return 0;
}

The Plain Program

#include <iostream>
#include <hlib.hh>
using namespace std;
using namespace HLIB;
class TLogCoeffFn : public TPermCoeffFn< real_t > {
private:
    const double  _h;
public:
    TLogCoeffFn ( const double         h,
                  const TPermutation * row_perm,
                  const TPermutation * col_perm )
            : TPermCoeffFn( row_perm, col_perm )
            , _h(h)
    {}
  virtual void eval  ( const std::vector< idx_t > &  rowidxs,
                       const std::vector< idx_t > &  colidxs,
                       real_t *                      matrix ) const
  {
        const size_t  n = rowidxs.size();
        const size_t  m = colidxs.size();
        for ( size_t  j = 0; j < m; ++j )
        {
            const int  idx1 = colidxs[ j ];
            for ( size_t  i = 0; i < n; ++i )
            {
                const int  idx0 = rowidxs[ i ];
                double     value;
                if ( idx0 == idx1 ) 
                    value = -1.5*_h*_h + _h*_h*std::log(_h);
                else
                {
                    const double dist = _h * ( std::abs( idx0 - idx1 ) - 1 );
                    const double t1   = dist+1.0*_h;
                    const double t2   = dist+2.0*_h;
                    value = ( - 1.5*_h*_h + 0.5*t2*t2*std::log(t2)
                              - t1*t1*std::log(t1) );
                    if ( std::abs(dist) > 1e-8 )
                        value += 0.5*dist*dist*std::log(dist);
                }
                matrix[ j*n + i ] = -value;
            }
        }
    }
    using TPermCoeffFn< real_t >::eval;
    virtual matform_t  matrix_format  () const { return symmetric; }
    virtual bool       is_complex     () const { return false; }
};
real_t rhs ( const idx_t  i,
           const idx_t  n )
{
    const real_t  a     = real_t(i)             / real_t(n);
    const real_t  b     = (real_t(i)+real_t(1)) / real_t(n);
    real_t        value = -1.5 * (b - a);
    if ( std::abs( b )       > 1e-8 ) value += 0.5*b*b*std::log(b);
    if ( std::abs( a )       > 1e-8 ) value -= 0.5*a*a*std::log(a);
    if ( std::abs( 1.0 - b ) > 1e-8 ) value -= 0.5*(1.0-b)*(1.0-b)*
                                               std::log(1.0-b);
    if ( std::abs( 1.0 - a ) > 1e-8 ) value += 0.5*(1.0-a)*(1.0-a)*
                                               std::log(1.0-a); 
    return value;
}
int main ( int argc, char ** argv ) 
{
  try
  {
    INIT();
    CFG::set_verbosity( 3 );
    const size_t              n = 1024;
    const double              h = 1.0 / double(n);
    std::vector< double * >   vertices( n );
    for ( size_t i = 0; i < n; i++ )
    {
        vertices[i]    = new double;
        vertices[i][0] = h * double(i) + ( h / 2.0 ); // center of [i/h,(i+1)/h]
    }
    auto                      coord = make_unique< TCoordinate >( vertices, 1 );
    TAutoBSPPartStrat         part_strat;
    TBSPCTBuilder             ct_builder( & part_strat, 20 );
    auto                      ct = ct_builder.build( coord.get() );
    TStdGeomAdmCond           adm_cond( 2.0 );
    TBCBuilder                bct_builder;
    auto                      bct = bct_builder.build( ct.get(), ct.get(), & adm_cond );
    TLogCoeffFn               coefffn( h, ct->perm_i2e(), ct->perm_i2e() );
    TACAPlus< real_t >        aca( & coefffn );
    TDenseMBuilder< real_t >  h_builder( & coefffn, & aca );
    TTruncAcc                 acc( 1e-4, 0.0 );
    auto                      A = h_builder.build( bct.get(), acc );
    auto                      B( A->copy() );
    auto                      A_inv = factorise_inv( B.get(), acc );
    auto                      b = A->row_vector();
    for ( size_t  i = 0; i < n; i++ )
        b->set_entry( i, rhs( i, n ) );
    ct->perm_e2i()->permute( b.get() );
    TAutoSolver               solver( 1000 );
    TSolverInfo               solve_info;
    auto                      x = A->col_vector();
    solver.solve( A.get(), x.get(), b.get(), A_inv.get(), & solve_info );
    ct->perm_i2e()->permute( x.get() );
    auto                      sol = b->copy();
    sol->fill( 1.0 );
    x->axpy( 1.0, sol.get() );
    std::cout << "  |x-x~| = " << x->norm2() << std::endl;
    DONE();
  }
  catch ( Error & e )
  {
    std::cout << e.to_string() << std::endl;
  }
  return 0;
}