In this example, an integral equation is to be solved by representing the discretised operator with an H-matrix, whereby the H-matrix is constructed using a matrix coefficient function for entries of the equation system. Furthermore, for the iterative solver a preconditioner is computed using H-LU factorisation.

This example is also applicable for the more general case that a block-wise low-rank approximation of a dense matrix shall be constructed only based on matrix coefficients.

Problem Description

Given the integral equation

\[ \int_0^1 \log|x-y| \mathbf{u}(y) dy = \mathbf{f}(x), \quad x \in [0,1] \]

with \(\mathbf{u} : [0,1] \to \mathbf{R}\) being sought for a given right hand side \(\mathbf{f} : [0,1] \to \mathbf{R}\), the Galerkin discretisation with constant ansatz functions \(\phi_i, 0 \le i < n\)

\[ \phi_i(x) = \left\{ \begin{array}{ll} 1, & x \in \left[\frac{i}{n},\frac{i+1}{n} \right] \\ 0, & \mathrm{otherwise} \end{array} \right. \]

leads to a linear equation system \(A u = f\) where \(u\) contains the coefficients of the discretised \(\mathbf{u}\) and \(A\) is defined by

\begin{eqnarray*} a_{ij} & = & \int_0^1 \int_0^1 \phi_i(x) \log|x-y| \phi_j(y) dy dx \\ & = & \int_{\frac{i}{n}}^{\frac{i+1}{n}} \int_{\frac{j}{n}}^{\frac{j+1}{n}} \log|x-y| dy dx \nonumber . \end{eqnarray*}

The right hand side \(f\) is given by

\[ f_i = \int_0^1 \phi_i(x) f(x) dx. \]

Remarks: This example is identical to the introductory example in S. Börm, L. Grasedyck and W. Hackbusch: "Lecture note 21", MPI MIS Leipzig.

Coordinates and Cluster Trees

The code starts with the standard initialisation:

#include <iostream>
#include <hpro.hh>
 
using namespace Hpro;
using namespace std;
 
using  value_t = double;
using  real_t  = real_type_t< value_t >;
 
int main ( int argc, char ** argv ) 
{
  try
  {
    INIT();
    CFG::set_verbosity( 3 );

Together with the initialisation, the verbosity level of HLIBpro was increased to have additional output during execution. The corresponding function set_verbosity is available in the namespace Hpro::CFG.

Also the value type is defined as value_t together with the correspondig real valued base type of value_t . This simplifies switching to single precision or complex valued computations.

For clustering the unknowns, coordinate information is to be defined. For this 1d example, the indices are placed in the middle of equally sized intervals.

const size_t              n = 1024;
const double              h = 1.0 / double(n);
std::vector< double * >   vertices( n );
 
for ( size_t i = 0; i < n; i++ )
{
    vertices[i]    = new double;
    vertices[i][0] = h * double(i) + ( h / 2.0 ); // center of [i/h,(i+1)/h]
}
 
auto                      coord = make_unique< TCoordinate >( vertices, 1 );

Coordinates in HLIBpro are vectors of the corresponding dimension, e.g. one in for this problem. The set of all coordinates is stored in objects of type TCoordinate .

The usage of smart pointers, e.g. unique_ptr, is advised, and extensively used in HLIBpro, since it decreases the possibility of memory leaks by automatically deleting the objects upon destruction of the smart pointer variable, e.g. when leaving the local program block.

Having coordinates for each index, the cluster tree and block cluster tree can be constructed.

TAutoBSPPartStrat         part_strat;
TBSPCTBuilder             ct_builder( & part_strat, 20 );
auto                      ct = ct_builder.build( coord.get() );

For the cluster tree, the partitioning strategy for the indices is determined automatically by TAutoBSPPartStrat. Furthermore, a \(n_{\min}\) of size 20 is used during construction.

TStdGeomAdmCond           adm_cond( 2.0 );
TBCBuilder                bct_builder;
auto                      bct = bct_builder.build( ct.get(), ct.get(), & adm_cond );

For the block cluster tree, the standard admissibility condition implemented in TStdGeomAdmCond is used with \(\eta = 2\).

Matrix Construction and Coefficient Function

Using the block cluster tree, finally the H-matrix can be constructed. For this, adaptive cross approximation (ACA, see Low Rank Approximation) is applied to compute low rank approximations of admissibly matrixblocks. In HLIBpro, an improved version of ACA is implemented in TACAPlus.

ACA only needs access to the matrix coefficients of the dense matrix. Those are provided by TLogCoeffFn, which will be discussed below. Since the application normally uses a different ordering compared to the internal structures of the H-matrix, one also needs a permutation for all indices, e.g., from internal numbering of H-matrices to the external numbering of the application. This is performed by TPermCoeffFn. Finally, the block wise accuracy of the H-matrix approximation has to be defined using TTruncAcc objects. In this case, an accuracy of \(\epsilon = 10^{-4}\) is used.

TLogCoeffFn                log_coefffn( h );
TPermCoeffFn< value_t >    coefffn( & log_coefffn, ct->perm_i2e(), ct->perm_i2e() );
TACAPlus< value_t >        aca( & coefffn );
TDenseMBuilder< value_t >  h_builder( & coefffn, & aca );
TTruncAcc                  acc( 1e-4, 0.0 );
auto                       A = h_builder.build( bct.get(), acc );

Remarks: Matrix construction is an example of a parallel algorithm in 𝖧𝖫𝖨𝖡𝗉𝗋𝗈. On shared memory machines, e.g. standard PCs or compute servers, by default as many threads as there are processor cores in the system are used by 𝖧𝖫𝖨𝖡𝗉𝗋𝗈. The can be changed by using CFG::set_nthreads().

The coefficient function used during matrix construction is implemented in class TLogCoeffFn.

class TLogCoeffFn : public TCoeffFn< value_t > {

Hpro::TCoeffFn

Base class for returning coefficient for a given indexpair (i,j) in internal ordering.

Definition TCoeffFn.hh:34

It is derived from the base class for coefficient functions TCoeffFn, which provides the basic interface. TLogCoeffFn stores the step width for the grid as an internal variable for kernel evaluation, which is the only argument for the constructor:

private:
    const double  _h;
 
public:
    TLogCoeffFn ( const double  h )
            : _h(h)
    {}

The actual interface for the evaluation of the matrix coefficients is implemented by the function eval, which is defined in TCoeffFn and has to be overloaded:

virtual void eval  ( const std::vector< idx_t > &  rowidxs,
                     const std::vector< idx_t > &  colidxs,
                     value_t *                     matrix ) const
{
      const size_t  n = rowidxs.size();
      const size_t  m = colidxs.size();
 
      for ( size_t  j = 0; j < m; ++j )
      {
          const int  idx1 = colidxs[ j ];
          
          for ( size_t  i = 0; i < n; ++i )
          {
              const int  idx0 = rowidxs[ i ];
              value_t    value;
 
              if ( idx0 == idx1 ) 
                  value = -1.5*_h*_h + _h*_h*std::log(_h);
              else
              {
                  const double  dist = _h * ( std::abs( idx0 - idx1 ) - 1 );
                  const double  t1   = dist+1.0*_h;
                  const double  t2   = dist+2.0*_h;
          
                  value = ( - 1.5*_h*_h + 0.5*t2*t2*std::log(t2)
                            - t1*t1*std::log(t1) );
          
                  if ( std::abs(dist) > 1e-8 )
                      value += 0.5*dist*dist*std::log(dist);
              }
      
              matrix[ j*n + i ] = -value;
          }
      }
  }

This functions gets a set of indices for rows (rowidxs) and columns (colidxs) and a pointer to a memory block, where all the coefficients should be stored. The indices in both sets are already in the external numbering, hence the indices can directly be used for kernel evaluation. The mapping was performed by TPermCoeffFn. The memory layout of matrix is column wise, which holds for almost all data in HLIBpro. This is due to the memory layout of BLAS/LAPACK, originally implemented in Fortran, which uses column wise storage.

TCoeffFn also provides an evaluation function for standard index sets, e.g., TIndexSet. To avoid compiler warnings about `‘hidden functions’', this version is brought into local scope by

using TCoeffFn< value_t >::eval;

By default, the format of the final matrix defined by the coefficients will be unsymmetric, e.g. lower and upper half of the matrix will be built. Please note, that this will not effect the actual algebraic matrix which may still be symmetric or hermitian. Only the storage and subsequent algorithms are affected in terms of computational costs.

To change the matrix format, the function matrix_format has to be overloaded. In this case, as the matrix is symmetric, this looks as

virtual matform_t matrix_format () const { return symmetric; }

H-LU Factorisation

Having constructed the H-matrix, the corresponding equation system shall be solved. In most cases, a standard iterative solver will not be sufficient, e.g. either the convergence rate is to bad or there is no convergence at all. Therefore, a (good) preconditioner is needed, with the inverse of \(A\) being the best possible. Since iterative schemes only need matrix-vector multiplication, the corresponding functionality for the inverse is sufficient. This is provided by an (H-) LU factorisation of \(A\), or for the symmetric case, a LDL factorisation.

For this, either the factorisation classes may be used directly or, instead, the function factorise_inv be empoyed, which chooses uses a LU factorisation for unsymmetric matrices and a LDL factorisation otherwise. The return value is an object providing the functionality of a linear operator, e.g. evaluation of vectors. In this case, it corresponds to an operator for the evaluation of the inverse.

auto B = A->copy();

auto A_inv = factorise_inv( B.get(), acc );

The copy of A is neccessary, since the matrix is modified during the factorisation. The accuracy is chosen to be the same as during matrix construction.

Mixed Precision

Alternatively, the preconditioner may also be computed in a different precision compared to the H-matrix itself, e.g., in single vs. double precision. For this, the copy needs to be constructed using the convert functions, which expect the destination type as their first template argument:

auto B = convert< float >( *A );

or

auto B = convert< std::complex< float > >( *A );

in the case of complex valued problems.

The convert function may also be used to convert from real to complex valued matrices.

Iterative Solvers

Still missing is the right hand side of the equation system. In this example, it was chosen, such that \(\mathbf{u} \equiv \mathbf{1}\):

auto                    b = A->row_vector();
 
for ( size_t  i = 0; i < n; i++ )
    b->set_entry( i, rhs( i, n ) );

with rhs defined by

value_t  rhs ( const idx_t  i,
               const idx_t  n )
{
    const double  a     = double(i)             / double(n);
    const double  b     = (double(i)+double(1)) / double(n);
    value_t       value = -1.5 * (b - a);
  
    if ( std::abs( b )       > 1e-8 ) value += 0.5*b*b*std::log(b);
    if ( std::abs( a )       > 1e-8 ) value -= 0.5*a*a*std::log(a);
    if ( std::abs( 1.0 - b ) > 1e-8 ) value -= 0.5*(1.0-b)*(1.0-b)*std::log(1.0-b);
    if ( std::abs( 1.0 - a ) > 1e-8 ) value += 0.5*(1.0-a)*(1.0-a)*std::log(1.0-a); 
  
    return value;
}

The RHS was built using the original ordering of the unknowns. To be used with H-arithmetic, it has to be reordered according to the internal numbering of the H-matrix as defined by the clustering process. The object for the cluster tree stores both kind of permutations, i.e., from external to internal (H) numbering and from internal to external numbering. To reorder the RHS, the external to internal (e2i) permutation is used:

ct->perm_e2i()->permute( b.get() );

Remarks: H-matrices also store the mappings between internal and external numbering for the row and column index sets.

For the iterative solver TAutoSolver is used, which automatically chooses a suitable solver for the given combination of matrix and preconditioner. Furthermore, statistics about the solution process is stored in an object of type TSolverInfo, e.g. convergence rate, number of steps.

TAutoSolver             solver( 1000 );
TSolverInfo             solve_info;
auto                    x = A->col_vector();
 
solver.solve( A.get(), x.get(), b.get(), A_inv.get(), & solve_info );

To check the accuracy of the computed solution, we compare it with the known exact solution. However, the exact solution uses external ordering, while the computed solution is still based on the internal ordering of the H-matrix. To compare both, the ordering has to be equal:

ct->perm_i2e()->permute( x.get() );
 
auto                    sol = b->copy();
 
sol->fill( 1.0 );
x->axpy( 1.0, sol.get() );
 
std::cout << "  |x-x~| = " << x->norm2() << std::endl;

Remarks: Since the solution vector in this example is constant one, applying a permutation is redundant. However, in other examples, this would be important and was therefore also applied here.

The standard finalisation and catch block finishes the example:

    DONE();
  }
  catch ( Error & e )
  {
    std::cout << e.to_string() << std::endl;
  }
 
  return 0;
}

The Plain Program

#include <iostream>
#include <hpro.hh>
 
using namespace std;
using namespace Hpro;
 
using  value_t = double;
using  real_t  = real_type_t< value_t >;
  
class TLogCoeffFn : public TCoeffFn< value_t > {
private:
    const double  _h;
 
public:
    TLogCoeffFn ( const double  h )
            : _h(h)
    {}
 
  virtual void eval  ( const std::vector< idx_t > &  rowidxs,
                       const std::vector< idx_t > &  colidxs,
                       value_t *                     matrix ) const
  {
        const size_t  n = rowidxs.size();
        const size_t  m = colidxs.size();
 
        for ( size_t  j = 0; j < m; ++j )
        {
            const int  idx1 = colidxs[ j ];
            
            for ( size_t  i = 0; i < n; ++i )
            {
                const int  idx0 = rowidxs[ i ];
                value_t    double     value;
 
                if ( idx0 == idx1 ) 
                    value = -1.5*_h*_h + _h*_h*std::log(_h);
                else
                {
                    const double  dist = _h * ( std::abs( idx0 - idx1 ) - 1 );
                    const double  t1   = dist+1.0*_h;
                    const double  t2   = dist+2.0*_h;
            
                    value = ( - 1.5*_h*_h + 0.5*t2*t2*std::log(t2)
                              - t1*t1*std::log(t1) );
            
                    if ( std::abs(dist) > 1e-8 )
                        value += 0.5*dist*dist*std::log(dist);
                }
        
                matrix[ j*n + i ] = -value;
            }
        }
    }
    using TCoeffFn< value_t >::eval;
 
    virtual matform_t  matrix_format  () const { return symmetric; }
 
    virtual bool       is_complex     () const { return false; }
};
 
value_t rhs ( const idx_t  i,
              const idx_t  n )
{
    const double  a     = double(i)             / double(n);
    const double  b     = (double(i)+double(1)) / double(n);
    value_t       value = -1.5 * (b - a);
  
    if ( std::abs( b )       > 1e-8 ) value += 0.5*b*b*std::log(b);
    if ( std::abs( a )       > 1e-8 ) value -= 0.5*a*a*std::log(a);
    if ( std::abs( 1.0 - b ) > 1e-8 ) value -= 0.5*(1.0-b)*(1.0-b)*std::log(1.0-b);
    if ( std::abs( 1.0 - a ) > 1e-8 ) value += 0.5*(1.0-a)*(1.0-a)*std::log(1.0-a); 
  
    return value;
}
 
int main ( int argc, char ** argv ) 
{
  try
  {
    INIT();
    CFG::set_verbosity( 3 );
 
    const size_t               n = 1024;
    const double               h = 1.0 / double(n);
    std::vector< double * >    vertices( n );
 
    for ( size_t i = 0; i < n; i++ )
    {
        vertices[i]    = new double;
        vertices[i][0] = h * double(i) + ( h / 2.0 ); // center of [i/h,(i+1)/h]
    }
 
    auto                         coord = make_unique< TCoordinate >( vertices, 1 );
 
    TAutoBSPPartStrat            part_strat;
    TBSPCTBuilder                ct_builder( & part_strat, 20 );
    auto                         ct    = ct_builder.build( coord.get() );
 
    TStdGeomAdmCond              adm_cond( 2.0 );
    TBCBuilder                   bct_builder;
    auto                         bct   = bct_builder.build( ct.get(), ct.get(), & adm_cond );
 
    TLogCoeffFn                  log_coefffn( h );
    TPermCoeffFn< value_t >      coefffn( & log_coefffn, ct->perm_i2e(), ct->perm_i2e() );
    TACAPlus< value_t >          aca( & coefffn );
    TDenseMatBuilder< value_t >  h_builder( & coefffn, & aca );
    TTruncAcc                    acc( 1e-4, 0.0 );
    auto                         A     = h_builder.build( bct.get(), acc );
 
    auto                         B     = A->copy();
    auto                         A_inv = factorise_inv( B.get(), acc );
 
    auto                         b     = A->row_vector();
 
    for ( size_t  i = 0; i < n; i++ )
        b->set_entry( i, rhs( i, n ) );
 
    ct->perm_e2i()->permute( b.get() );
 
    TAutoSolver                  solver( 1000 );
    TSolverInfo                  solve_info;
    auto                         x     = A->col_vector();
 
    solver.solve( A.get(), x.get(), b.get(), A_inv.get(), & solve_info );
 
    ct->perm_i2e()->permute( x.get() );
 
    auto                         sol   = b->copy();
 
    sol->fill( 1.0 );
    x->axpy( 1.0, sol.get() );
    
    std::cout << "  |x-x~| = " << x->norm2() << std::endl;
 
    DONE();
  }
  catch ( Error & e )
  {
    std::cout << e.to_string() << std::endl;
  }
 
  return 0;
}

Table of Contents