Problem Description

The number of measurements that must be processed for statistical modeling in environmental applications is usually very large, and these measurements may be located irregularly across a given geographical region. This makes the computing process expensive and the data difficult to manage.

These data are frequently modeled as a realization from a stationary Gaussian spatial random field. Specifically, we let \(\mathbf{Z}=\{Z(\mathbf{s}_1),\ldots,Z(\mathbf{s}_n)\}^\top\), where \(Z(\mathbf{s})\) is a Gaussian random field indexed by a spatial location \(\mathbf{s} \in \Bbb{R}^d\), \(d\geq 1\). Then, we assume that \(\mathbf{Z}\) has mean zero and a stationary parametric covariance function

\[ C(\mathbf{h};{\boldsymbol \theta})=\hbox{cov}\{Z(\mathbf{s}),Z(\mathbf{s}+\mathbf{h})\}, \]

where \(\mathbf{h}\in\Bbb{R}^d\) is a spatial lag vector and \({\boldsymbol \theta}\in\Bbb{R}^q\) is the unknown parameter vector of interest. Statistical inferences about \({\boldsymbol \theta}\) are often based on the joint Gaussian log-likelihood function:

\[ \mathcal{L}(\mathbf{\theta})=-\frac{n}{2}\log(2\pi) - \frac{1}{2}\log \vert \mathbf{C}(\mathbf{\theta}) \vert-\frac{1}{2}\mathbf{Z}^\top \mathbf{C}(\mathbf{\theta})^{-1}\mathbf{Z}, \]

where the covariance matrix \(\mathbf{C}(\mathbf{\theta})\) has entries \(C(\mathbf{s}_i-\mathbf{s}_j;{\boldsymbol \theta})\), \(i,j=1,\ldots,n\). The maximum likelihood estimator of \({\boldsymbol \theta}\) is the value \(\widehat{\boldsymbol \theta}\) that maximizes \(\mathcal{L}\). When the sample size \(n\) is large, the evaluation of \(\mathcal{L}\) becomes challenging, due to the computation of the inverse and log-determinant of the \(n\)-by- \(n\) covariance matrix \(\mathbf{C}(\mathbf{\theta})\). Indeed, this requires \({\cal O}(n^2)\) memory and \({\cal O}(n^3)\) computational steps. Hence, scalable methods that can process larger sample sizes are needed which in this example are provided by \(\mathcal{H}\)-matrices and their arithmetic.

Matérn Covariance Function

Among the many covariance models available, the Matérn family has gained widespread interest in recent years. The Matérn form of spatial correlations was introduced into statistics as a flexible parametric class, with one parameter determining the smoothness of the underlying spatial random field.

The Matérn covariance depends only on the distance \(h:=\Vert \mathbf{s}-\mathbf{s}'\Vert \), where \(\mathbf{s}\) and \(\mathbf{s}'\) are any two spatial locations. The Matérn class of covariance functions is defined as

\[ C(h;{\boldsymbol\theta})=\frac{\sigma^2}{2^{\nu-1}\Gamma(\nu)}\left(\frac{h}{\ell}\right)^\nu K_\nu\left(\frac{h}{\ell}\right), \]

with \({\boldsymbol\theta}=(\sigma^2,\ell,\nu)^\top\); where \(\sigma^2\) is the variance; \(\nu>0\) controls the smoothness of the random field, with larger values of \(\nu\) corresponding to smoother fields; and \(\ell>0\) is a spatial range parameter that measures how quickly the correlation of the random field decays with distance, with larger \(\ell\) corresponding to a faster decay (keeping \(\nu\) fixed). Here \({\mathcal{K}}_\nu\) denotes the modified Bessel function of the second kind of order \(\nu\). When \(\nu=1/2\), the Matérn covariance function reduces to the exponential covariance model and describes a rough field. The value \(\nu=\infty\) corresponds to a Gaussian covariance model that describes a very smooth, infinitely differentiable field. Random fields with a Matérn covariance function are \(\lfloor \nu-1 \rfloor\) times mean square differentiable.

Hierarchical Approximation of Gaussian Log-Likelihood

We use the \(\mathcal{H}\)-matrix technique to approximate the Gaussian likelihood function. The \(\mathcal{H}\)-matrix approximation of the exact log-likelihood \(\mathcal{L}({\boldsymbol\theta})\) is defined by \(\tilde{\mathcal{L}}({\boldsymbol\theta})\):

\[ \tilde{\mathcal{L}}({\boldsymbol\theta})=-\frac{n}{2}\log(2\pi) - \sum_{i=1}^n\log \{\tilde{\mathbf{L}}_{ii}({\boldsymbol\theta})\}-\frac{1}{2}\mathbf{v}({\boldsymbol\theta})^\top \mathbf{v}({\boldsymbol\theta}), \]

where \(\tilde{\mathbf{L}}({\boldsymbol\theta})\) is the \(\mathcal{H}\)-matrix approximation of the Cholesky factor \(\mathbf{L}({\boldsymbol\theta})\) in \(\mathbf{C}({\boldsymbol\theta})=\mathbf{L}({\boldsymbol\theta})\mathbf{L}({\boldsymbol\theta})^\top\), and \(\mathbf{v}({\boldsymbol\theta}) \) is the solution of the linear system \(\tilde{\mathbf{L}}({\boldsymbol\theta})\mathbf{v}({\boldsymbol\theta})=\mathbf{Z}\).

Since the representation of the dense \(\boldsymbol C(\theta)\) requires quadratic storage complexity, approximation is already applied here, e.g., \(\tilde C(\theta)\) is computed instead as an \(\mathcal{H}\)-matrix approximation of \(\boldsymbol C(\theta)\). With this, we obtain \(\tilde{\mathbf{L}}({\boldsymbol\theta})\) as the Cholesky factor in the factorization

\[ \mathbf{\tilde C}({\boldsymbol\theta})=\mathbf{\tilde L}({\boldsymbol\theta})\mathbf{\tilde L}({\boldsymbol\theta})^\top \]

For the \(\mathcal{H}\)-matrix approximation \(\tilde{\mathcal{L}}({\boldsymbol\theta})\) we can either choose a constant rank \(k\), e.g., all low-rank subblocks of the matrix are of rank \(k\), or a fixed accuracy \(\varepsilon > 0\). In the latter case, the rank per subblock is adaptively chosen (see also Accuracy Control).

Of further interest is the value \(\hat \theta\) which maximizes \(\mathcal{L}(\theta)\), which is a three dimensional optimization problem for \((\sigma^2,\ell,\nu)\).

Implementation

The program starts with the usual inclusion of the corresponding HLIBpro header files and the initialisation of HLIBpro:

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include "hlib.hh"
using namespace HLIB;
int
main ( int      argc,
       char **  argv )
{
    INIT();

In the next step, the coordinate and the rhs data is read from a file:

std::vector< T2Point >  vertices;
BLAS::Vector< double >  Z_data;
size_t                  N = 0;
read_data( "datafile.txt", vertices, Z_data );
N = vertices.size();
TCoordinate  coord( vertices );

The function read_data implements the most part. For the actual data file, a text format is assumed in which in the first line the number of coordinates is given. Afterwards, each line contains the coordinate index (counting starts from zero) followed by two spatial coordinates and the value of \(\boldsymbol Z\) is given.

As an example, the following data defines four values of \(\boldsymbol Z\) at the eight points of a unit cube:

The function then looks like:

void
read_data ( const std::string &       datafile,
            std::vector< T2Point > &  vertices,
            BLAS::Vector< double > &  Z_data )
{
    std::ifstream  in( datafile );
    if ( ! in ) // error
        exit( 1 );
    size_t  N = 0;
    in >> N;
    vertices.resize( N );
    Z_data = BLAS::Vector< double >( N );
    for ( idx_t  i = 0; i < idx_t(N); ++i )
    {
        int     index = i;
        double  x, y;
        double  v     = 0.0;
        in >> index >> x >> y >> v;
        vertices[ index ] = T2Point( x, y );
        Z_data( index )   = v;
    }// for
}

Construction of Covariance Matrix

Before the \(\mathcal{H}\)-matrix can be constructed, the block layout of the matrix has to be defined. For this, the coordinates are used to partition the index set. Together with an admissibility condition, the partitioning of the \(\mathcal{H}\)-matrix is defined:

TAutoBSPPartStrat   part_strat;
TBSPCTBuilder       ct_builder( & part_strat );
auto                ct = ct_builder.build( & coord );
TStdGeomAdmCond     adm_cond( 2.0 );
TBCBuilder          bct_builder;
auto                bct = bct_builder.build( ct.get(), ct.get(), & adm_cond );

For the \(\mathcal{H}\)-matrix, the matrix coefficients have to be computed. HLIBpro provides the Matern coefficient function in the form of TMaternCovCoeffFn, which expects a template argument to define the type in which coordinates are provided. In this example, this corresponds to T2DPoint for two dimensional coordinates.

Furthermore, the parameters \(\sigma, \ell\) and \(\nu\) need to be defined. For now, we assume some fixed values for these (see below for maximization of covariance function):

const double  sigma  = 1.0;
const double  length = 9.0e-2;
const double  nu     = 0.5;

With this, the coefficient function for the Matern kernel can be defined:

TMaternCovCoeffFn< T2Point > matern_coefffn( sigma, length, nu, vertices );

TPermCoeffFn< double > coefffn( & matern_coefffn, ct->perm_i2e(), ct->perm_i2e() );

The permutation function converts the numbering from the numberung due to the coordinate indices to the internal numbering of the \(\mathcal{H}\)-matrix.

Equipped with matrix coefficients, we finally can construct the \(\mathcal{H}\)-matrix approximation to \(\boldsymbol C(\theta)\):

TACAPlus< double >          aca( & coefffn );
double                      eps = 1e-4;
auto                        acc = fixed_prec( eps );
TDenseMatBuilder< double >  h_builder( & coefffn, & aca );
auto                        C = h_builder.build( bct.get(), acc );

Computation of Log-Determinant

First, the cholesky factorization needs to be computed of C:

auto  C_fac = C->copy();
chol( C_fac.get(), acc );
auto  C_inv = std::make_unique< TLLInvMatrix >( C_fac.get(), symmetric );

At this point, C_inv is a linear operator enabling evaluation of \(C^{-1}\) while C_fac holds the entries of the Cholesky factor.

The latter are now of interest as we need to compute the log-determinant. Since \(\tilde L\) is of triangular shape and after switching multiplication with logarithm, a summation of the logarithm of its diagonal elements yields the result:

double  log_det_C = 0.0;
for ( idx_t  i = 0; i < idx_t(N); ++i )
    log_det_C += 2*std::log( C_fac->entry( i, i ) ); // two factors L!

Computation of Gaussian Log-Likelihood

Finally, we are able to compute the Gaussian log-likelihood \(\mathcal{L}(\theta)\). Only missing is the vector \(v(\theta)\) as the solution of the rhs \(Z\) and the covariance matrix \(C\).

First, the right hand side is set up with the previously read data:

auto  Z = std::make_unique< TScalarVector >( *ct->root(), Z_data );
ct->perm_e2i()->permute( Z.get() );

The last step is necessary for the vector to have the same ordering as the \(\mathcal{H}\)-matrix.

Solving the equation system is done iteratively using the Cholesky factorization as a preconditioner.

TStopCriterion  sstop( 250, 1e-16, 0.0 );
TCG             solver( sstop );
auto            sol = C->row_vector();
solver.solve( C.get(), sol.get(), Z.get(), C_inv.get() );

The stop criterion was chosen to have at most 250 iterations and to reduce the residual by a factor of \(10^{16}\). The CG solver is used since \(C\) is positive definite.

After iteration sol holds the solution.

The log-likelihood LL is then computed following the above formula as

auto          ZdotCZ = re( Z->dot( sol.get() ) );
const double  log2pi = std::log( 2.0 * Math::pi<double>() );
auto          LL     = -0.5 * ( N * log2pi + log_det_C + ZdotCZ );
std::cout << "   " << "LogLi = " << LL << std::endl;

Remarks: The formula for LL follows the definition of \(\mathcal{L}(\mathbf{\theta})\) with a correctly computed log-determinant, whereas in \(\tilde{\mathcal{L}}({\boldsymbol\theta})\) already the factor \(2\) from the two Cholesky factors is applied cancelling out \(0.5\).

With the finalization of HLIBpro

    DONE();
    return 0;
}

the program for computing the log-likelihood for a fixed parameter set \(\theta\) is complete.

Example Data

Example data for the program is provided by a soil moisture data set with 4000 data points and corresponding values of \(Z\).

Example Moisture Data

You may download the data file from hlibpro.com.

Vizualization

The data assigns a parameter to each spatial point. The program ParaView is able to visualize this point data. For this, we write the coordinates and the parameter in a CSV (comma separated) file:

void
write_csv ( const std::string &             csvfile,
            const std::vector< T2Point > &  vertices,
            const BLAS::Vector< double > &  Z_data )
{
    std::ofstream  out( csvfile );
    out << "x,y,z,v" << std::endl;
    for ( uint  i = 0; i < vertices.size(); ++i )
        out << vertices[i].x() << "," << vertices[i].y() << ",0," << Z_data( i ) << std::endl;
}

In ParaView, open the CSV file and press Apply. The default options for importing CSV data work with the format of the above export function.

After importing, the data is shown as a table. This table is now converted to point data by choosing Filter > Alphabetical > Table to Points. Choose x for the X column, y for the Y column and z for the Z column. You may also activate 2D Points as the data is two dimensional (z set to zero). After pressing Apply, the points should be visible in the Render View. If not, make sure the eye symbol next to TableToPoints1 is active.

Afterwards, you may use the v values for colorization of the points. Just select in the left pane at Coloring v instead of Solid Color (you may also change the colormap).

ParaView also lets you convert the point cloud into a grid. For this, use Filters > Alphabetical > Delaunay 2D and press Apply.

Modifications

LDL factorization

Instead of using the Cholesky factorization to compute the log-determinant and the preconditioner for solving \(C(\theta) v = Z\), one could use LDL factorization which is more stable in case of rounding errors leading to negative diagonal elements during Cholesky (the covariance matrix becomes ill-conditioned with a larger problem dimension).

By default, HLIBpro computes a block-wise LDL factorization, e.g., computing the inverse of diagonal block instead of performing dense LDL. Since the diagonal elements of \(D\) are needed, this is not possible. Therefore, the point-wise version needs to be performed, which is enabled by setting the corresponding option before factorization:

fac_options_t  fac_options;
fac_options.eval = point_wise;
ldl( C_fac.get(), acc, fac_options );
C_inv = std::make_unique< TLDLInvMatrix >( C_fac.get(), symmetric, point_wise );

The computation of the log-determinant also changes, since now only \(D\) has to be considered with \(L\) being unit diagonal matrices:

for ( idx_t i = 0; i < idx_t(N); ++i )

log_det_C += std::log( C_fac->entry( i, i ) );

Three Dimensional Data

As the implementation of the Matern kernel is based on a template argument for the coordinates, it is easy to modify the code to support three dimensional data: just replace T2Point with T3Point. Furthermore, you have to read the third coordinate in read_data:

void
read_data ( const string &            datafile,
            std::vector< T3Point > &  vertices,
            BLAS::Vector< double > &  Z_data )
{
    std::ifstream  in( datafile );
    if ( ! in ) // error
        exit( 1 );
    size_t  N = 0;
    in >> N;
    vertices.resize( N );
    Z_data = BLAS::Vector< double >( N );
    for ( idx_t  i = 0; i < idx_t(N); ++i )
    {
        int     index = i;
        double  x, y, z;
        double  v     = 0.0;
        in >> index >> x >> y >> z >> v;
        vertices[ index ] = T3Point( x, y, z );
        Z_data( index )   = v;
    }// for
}

The Plain Program

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include "hlib.hh"
using namespace HLIB;
void
read_data ( const std::string &       datafile,
            std::vector< T2Point > &  vertices,
            BLAS::Vector< double > &  Z_data )
{
    std::ifstream  in( datafile );
    if ( ! in ) // error
        exit( 1 );
    size_t  N = 0;
    in >> N;
    vertices.resize( N );
    Z_data = BLAS::Vector< double >( N );
    for ( idx_t  i = 0; i < idx_t(N); ++i )
    {
        int     index = i;
        double  x, y;
        double  v     = 0.0;
        in >> index >> x >> y >> v;
        vertices[ index ] = T2Point( x, y );
        Z_data( index )   = v;
    }// for
}
int
main ( int      argc,
       char **  argv )
{
    INIT();
    std::vector< T2Point >        vertices;
    BLAS::Vector< double >        Z_data;
    size_t                        N = 0;
    read_data( "datafile.txt", vertices, Z_data );
    N = vertices.size();
    TCoordinate  coord( vertices );
    TAutoBSPPartStrat             part_strat;
    TBSPCTBuilder                 ct_builder( & part_strat );
    auto                          ct = ct_builder.build( & coord );
    TStdGeomAdmCond               adm_cond( 2.0 );
    TBCBuilder                    bct_builder;
    auto                          bct = bct_builder.build( ct.get(), ct.get(), & adm_cond );
    const double                  sigma  = 1.0;
    const double                  length = 9.0e-2;
    const double                  nu     = 0.5;
    TMaternCovCoeffFn< T2Point >  matern_coefffn( sigma, length, nu, vertices );
    TPermCoeffFn< double >        coefffn( & matern_coefffn, ct->perm_i2e(), ct->perm_i2e() );
    TACAPlus< double >            aca( & coefffn );
    double                        eps = 1e-4;
    auto                          acc = fixed_prec( eps );
    TDenseMatBuilder< double >    h_builder( & coefffn, & aca );
    auto                          C     = h_builder.build( bct.get(), acc );
    auto                          C_fac = C->copy();
    chol( C_fac.get(), acc );
    auto                          C_inv = std::make_unique< TLLInvMatrix >( C_fac.get(), symmetric );
    double                        log_det_C = 0.0;
    for ( idx_t  i = 0; i < idx_t(N); ++i )
        log_det_C += 2*std::log( C_fac->entry( i, i ) ); // two factors L!
    auto                          Z = std::make_unique< TScalarVector >( *ct->root(), Z_data );
    ct->perm_e2i()->permute( Z.get() );
    TStopCriterion                sstop( 250, 1e-16, 0.0 );
    TCG                           solver( sstop );
    auto                          sol = C->row_vector();
    solver.solve( C.get(), sol.get(), Z.get(), C_inv.get() );
    auto                          ZdotCZ = re( Z->dot( sol.get() ) );
    const double                  log2pi = std::log( 2.0 * Math::pi<double>() );
    auto                          LL     = -0.5 * ( N * log2pi + log_det_C + ZdotCZ );
    std::cout << "   " << "LogLi = " << LL << std::endl;
    DONE();
    return 0;
}

Maximization of Log-Likelihood

Function Minimization

The GNU Scientific Library is used to implement the maximization of \(\mathcal{L}(\theta)\) by corresponding minimization functions provided by GSL and we wrap the process into a corresponding function:

#include <gsl/gsl_multimin.h>
enum {
    IDX_SIGMA  = 0,
    IDX_LENGTH = 1,
    IDX_NU     = 2
};
double
maximize_likelihood ( double &                sigma,
                      double &                length,
                      double &                nu,
                      LogLikeliHoodProblem &  problem )
{

The first three parameters provide the start value for the maximization on input and hold the maximal values on output of the function. The last parameter, which corresponds to the problem to maximize is discussed below.

First, the GSL data structures are intialized with the start value and step sizes for each parameter. Since the parameters are entries in an array, we use the above predefined indices to access those.

int        status   = 0;
const int  max_iter = 200;
const gsl_multimin_fminimizer_type *  T = gsl_multimin_fminimizer_nmsimplex2;
gsl_multimin_fminimizer *             s = NULL;
gsl_vector *                          ss;
gsl_vector *                          x;
gsl_multimin_function                 minex_func;
double                                size;
x = gsl_vector_alloc( 3 );        // start value
gsl_vector_set( x, 0, sigma );
gsl_vector_set( x, 1, length );
gsl_vector_set( x, 2, nu );
ss = gsl_vector_alloc( 3 );       // step sizes
gsl_vector_set( ss, IDX_SIGMA,  0.001 );
gsl_vector_set( ss, IDX_LENGTH, 0.004 );
gsl_vector_set( ss, IDX_NU,     0.002 );

Next, the function to maximize (minimize) is defined.

// Initialize method and iterate
minex_func.n      = 3;
minex_func.f      = & eval_logli;
minex_func.params = & problem;

Here, eval_logli implements the evaluation of the Gaussian log-likelihood for a given parameter \(\theta = (\sigma,\ell,\nu)\). The second parameter to this function is set to the problem object and cast back inside to be evaluated.

The return value of eval is negated, because GSL performs a function minimization instead of maximization.

double
eval_logli ( const gsl_vector *  param,
             void *              data )
{
    double sigma  = gsl_vector_get( param, IDX_SIGMA );
    double length = gsl_vector_get( param, IDX_LENGTH );
    double nu     = gsl_vector_get( param, IDX_NU );
    LogLikeliHoodProblem *  problem = static_cast< LogLikeliHoodProblem * >( data );
    return - problem->eval( sigma, length, nu );
}

The GSL part is continued with setting up the iterative s and starting the actual minimization loop:

s = gsl_multimin_fminimizer_alloc( T, 3 );
gsl_multimin_fminimizer_set(s, & minex_func, x, ss );
int     iter = 0;
double  LL = 0;
do
{
    iter++;

Next, a single minimization step is performed and the accuracy criterion tested (tolerance \(\le 10^{-3}\)):

status = gsl_multimin_fminimizer_iterate( s );
if ( status != 0 )
    break;
size   = gsl_multimin_fminimizer_size( s );    // return eps for stopping criteria
status = gsl_multimin_test_size( size, 1e-3 ); // This function tests the minimizer specific characteristic size 
if ( status == GSL_SUCCESS )
    std::cout << "converged to minimum" << std::endl;

The individual parameters are extracted together with the value of the log-likelihood for the current value of \(\theta\):

sigma  = gsl_vector_get( s->x, IDX_SIGMA );
length = gsl_vector_get( s->x, IDX_LENGTH );
nu     = gsl_vector_get( s->x, IDX_NU );
LL     = -s->fval; // convert back (see "eval_logli")

Stop the iteration if the accuracy tolerance was achieved or the maximal number of iterations reached.

    } while (( status == GSL_CONTINUE ) && ( iter < max_iter ));
    gsl_multimin_fminimizer_free( s );
    gsl_vector_free( ss );
    gsl_vector_free( x );
    return LL;
}

Log-Likelihood Problem

Above, the Gaussian log-likelihood problem was evaluated for a fixed parameter \(\theta\). Now, we need to evaluate this for many different values and hence, we restructure the code, which splits constant parts, e.g. coordinate managment and partitioning, and variable parts, e.g., \(\mathcal{H}\)-matrix approximation and factorization, into different functions.

struct LogLikeliHoodProblem
{
    std::vector< T2Point >                vertices;
    std::unique_ptr< TCoordinate >        coord;
    std::unique_ptr< TClusterTree >       ct;
    std::unique_ptr< TBlockClusterTree >  bct;
    std::unique_ptr< TVector >            Z;
    double                                eps;
    LogLikeliHoodProblem ( const std::string &  coord_filename,
                           const double         accuracy )
            : eps( accuracy )
    {
        init( coord_filename );
    }
    void
    init ( const std::string &  datafile );
    double
    eval ( const double  sigma,
           const double  length,
           const double  nu );
};

The init function will hold coordinate set up, matrix partitioning and the right hand side:

void
LogLikeliHoodProblem::init ( const std::string &  datafile )
{
    BLAS::Vector< double >  Z_data;
    read_data( datafile, vertices, Z_data );
    coord = std::make_unique< TCoordinate >( vertices );
    TAutoBSPPartStrat  part_strat;
    TBSPCTBuilder      ct_builder( & part_strat );
    ct = ct_builder.build( coord.get() );
    TStdGeomAdmCond    adm_cond( 2.0 );
    TBCBuilder         bct_builder;
    bct = bct_builder.build( ct.get(), ct.get(), & adm_cond );
    Z   = std::make_unique< TScalarVector >( *ct->root(), std::move( Z_data ) );
    ct->perm_e2i()->permute( Z.get() );
}

Remarks: Since Z_data is a local object and the default construction for a vector is only using a reference to the vector data, we have to use std::move to make a real copy of the vector data. Otherwise, the reference points to freed data.

The eval function will then construct an \(\mathcal{H}\)-matrix, factorize it, compute the log-determinant and solve the equation \(C(\theta) v(\theta) = Z\) for each call during the minimization procedure.

double
LogLikeliHoodProblem::eval ( const double  sigma,
                             const double  length,
                             const double  nu )
{
    TMaternCovCoeffFn< T2Point >  matern_coefffn( length, nu, sigma, vertices );
    TPermCoeffFn< double >        coefffn( & matern_coefffn, ct->perm_i2e(), ct->perm_i2e() );
    TACAPlus< double >            aca( & coefffn );
    auto                          acc = fixed_prec( eps );
    TDenseMatBuilder< double >    h_builder( & coefffn, & aca );
    auto                          C     = h_builder.build( bct.get(), acc );
    auto                          C_fac = C->copy();
    chol( C_fac.get(), acc );
    auto                          C_inv = std::make_unique< TLLInvMatrix >( C_fac.get(), symmetric );
    const size_t                  N         = vertices.size();
    double                        log_det_C = 0.0;
    for ( idx_t  i = 0; i < idx_t(N); ++i )
        log_det_C += 2*std::log( C_fac->entry( i, i ) ); // two factors L!
    TStopCriterion                sstop( 250, 1e-16, 0.0 );
    TCG                           solver( sstop );
    auto                          sol = C->row_vector();
    solver.solve( C.get(), sol.get(), Z.get(), C_inv.get() );
    auto                          ZdotCZ = re( Z->dot( sol.get() ) );
    const double                  log2pi = std::log( 2.0 * Math::pi<double>() );
    auto                          LL     = -0.5 * ( N * log2pi + log_det_C + ZdotCZ );
    return LL;
}

Numerical Experiments

Running the above code (full program see below) with the example data set results in the following (truncated) output:

reading 4000 datapoints
  logliklihood at  σ = 1.001, ℓ = 1.291, ν = 0.323   is -3282.63
  logliklihood at  σ = 1.001, ℓ = 1.291, ν = 0.323   is -3282.63
  logliklihood at  σ = 1.00333, ℓ = 1.29067, ν = 0.321667   is -3279.02
  logliklihood at  σ = 1.00367, ℓ = 1.29133, ν = 0.318333   is -3273.26
  logliklihood at  σ = 1.00533, ℓ = 1.29367, ν = 0.315667   is -3268.93
  logliklihood at  σ = 1.01033, ℓ = 1.29367, ν = 0.309667   is -3261.02
  logliklihood at  σ = 1.01267, ℓ = 1.29733, ν = 0.300333   is -3255.19
  logliklihood at  σ = 1.01522, ℓ = 1.29844, ν = 0.298778   is -3255.03
  logliklihood at  σ = 1.01522, ℓ = 1.29844, ν = 0.298778   is -3255.03
  logliklihood at  σ = 1.01522, ℓ = 1.29844, ν = 0.298778   is -3255.03
...
  logliklihood at  σ = 0.864261, ℓ = 1.34078, ν = 0.236776   is -3235.27
  logliklihood at  σ = 0.860846, ℓ = 1.34216, ν = 0.235051   is -3235.26
  logliklihood at  σ = 0.863625, ℓ = 1.34124, ν = 0.236365   is -3235.26
  logliklihood at  σ = 0.861547, ℓ = 1.34204, ν = 0.235688   is -3235.26
  logliklihood at  σ = 0.860878, ℓ = 1.34233, ν = 0.235164   is -3235.26
  logliklihood at  σ = 0.860878, ℓ = 1.34233, ν = 0.235164   is -3235.26
  logliklihood at  σ = 0.862748, ℓ = 1.34161, ν = 0.236062   is -3235.26
converged to minimum
  logliklihood at  σ = 0.862456, ℓ = 1.34179, ν = 0.236033   is -3235.26

The Plain Code

The remaining code follows the above program and consists of initialisation and finalisation of HLIBpro as well as actually calling the log-likelihood maximization procedure:

#include <iostream>
#include <fstream>
#include <string>
#include <gsl/gsl_multimin.h>
#include "hlib.hh"
using namespace HLIB;
enum {
    IDX_SIGMA  = 0,
    IDX_LENGTH = 1,
    IDX_NU     = 2
};
void
read_data ( const std::string &       datafile,
            std::vector< T2Point > &  vertices,
            BLAS::Vector< double > &  Z_data )
{
    std::ifstream  in( datafile );
    if ( ! in ) // error
        exit( 1 );
    size_t  N = 0;
    in >> N;
    std::cout << "reading " << N << " datapoints" << std::endl;
    vertices.resize( N );
    Z_data = BLAS::Vector< double >( N );
    for ( idx_t  i = 0; i < idx_t(N); ++i )
    {
        int     index = i;
        double  x, y;
        double  v     = 0.0;
        in >> index >> x >> y >> v;
        vertices[ index ] = T2Point( x, y );
        Z_data( index )   = v;
    }// for
}
struct LogLikeliHoodProblem
{
    std::vector< T2Point >                vertices;
    std::unique_ptr< TCoordinate >        coord;
    std::unique_ptr< TClusterTree >       ct;
    std::unique_ptr< TBlockClusterTree >  bct;
    std::unique_ptr< TVector >            Z;
    double                                eps;
    LogLikeliHoodProblem ( const std::string &  coord_filename,
                           const double         accuracy )
            : eps( accuracy )
    {
        init( coord_filename );
    }
    void
    init ( const std::string &  datafile )
    {
        BLAS::Vector< double >  Z_data;
        read_data( datafile, vertices, Z_data );
        coord = std::make_unique< TCoordinate >( vertices );
        TAutoBSPPartStrat  part_strat;
        TBSPCTBuilder      ct_builder( & part_strat );
        ct = ct_builder.build( coord.get() );
        TStdGeomAdmCond    adm_cond( 2.0 );
        TBCBuilder         bct_builder;
        bct = bct_builder.build( ct.get(), ct.get(), & adm_cond );
        Z   = std::make_unique< TScalarVector >( *ct->root(), std::move( Z_data ) );
        ct->perm_e2i()->permute( Z.get() );
    }
    double
    eval ( const double  sigma,
           const double  length,
           const double  nu )
    {
        TMaternCovCoeffFn< T2Point >  matern_coefffn( sigma, length, nu, vertices );
        TPermCoeffFn< double >        coefffn( & matern_coefffn, ct->perm_i2e(), ct->perm_i2e() );
        TACAPlus< double >            aca( & coefffn );
        auto                          acc = fixed_prec( eps );
        TDenseMatBuilder< double >    h_builder( & coefffn, & aca );
        auto                          C     = h_builder.build( bct.get(), acc );
        auto                          C_fac = C->copy();
        chol( C_fac.get(), acc );
        auto                          C_inv = std::make_unique< TLLInvMatrix >( C_fac.get(), symmetric );
        const size_t                  N         = vertices.size();
        double                        log_det_C = 0.0;
        for ( idx_t  i = 0; i < idx_t(N); ++i )
            log_det_C += 2*std::log( C_fac->entry( i, i ) ); // two factors L!
        TStopCriterion                sstop( 250, 1e-16, 0.0 );
        TCG                           solver( sstop );
        auto                          sol = C->row_vector();
        solver.solve( C.get(), sol.get(), Z.get(), C_inv.get() );
        auto                          ZdotCZ = re( Z->dot( sol.get() ) );
        const double                  log2pi = std::log( 2.0 * Math::pi<double>() );
        auto                          LL     = -0.5 * ( N * log2pi + log_det_C + ZdotCZ );
        return LL;
    }
};
double
eval_logli ( const gsl_vector *  param,
             void *              data )
{
    double sigma  = gsl_vector_get( param, IDX_SIGMA );
    double length = gsl_vector_get( param, IDX_LENGTH );
    double nu     = gsl_vector_get( param, IDX_NU );
    LogLikeliHoodProblem *  problem = static_cast< LogLikeliHoodProblem * >( data );
    return - problem->eval( sigma, length, nu );
}
double
maximize_likelihood ( double &                sigma,
                      double &                length,
                      double &                nu,
                      LogLikeliHoodProblem &  problem )
{
    int        status   = 0;
    const int  max_iter = 200;
    const gsl_multimin_fminimizer_type *  T = gsl_multimin_fminimizer_nmsimplex2;
    gsl_multimin_fminimizer *             s = NULL;
    gsl_vector *                          ss;
    gsl_vector *                          x;
    gsl_multimin_function                 minex_func;
    double                                size;
    x = gsl_vector_alloc( 3 );        // start value
    gsl_vector_set( x, IDX_SIGMA,  sigma );
    gsl_vector_set( x, IDX_LENGTH, length );
    gsl_vector_set( x, IDX_NU,     nu );
    ss = gsl_vector_alloc( 3 );       // step sizes
    gsl_vector_set( ss, IDX_SIGMA,  0.001 );
    gsl_vector_set( ss, IDX_LENGTH, 0.004 );
    gsl_vector_set( ss, IDX_NU,     0.002 );
    // Initialize method and iterate
    minex_func.n      = 3;
    minex_func.f      = & eval_logli;
    minex_func.params = & problem;
    s = gsl_multimin_fminimizer_alloc( T, 3 );
    gsl_multimin_fminimizer_set(s, & minex_func, x, ss );
    int     iter = 0;
    double  LL = 0;
    do
    {
        iter++;
        status = gsl_multimin_fminimizer_iterate( s );
        if ( status != 0 )
            break;
        size   = gsl_multimin_fminimizer_size( s );    // return eps for stopping criteria
        status = gsl_multimin_test_size( size, 1e-3 ); // This function tests the minimizer specific characteristic size 
        if ( status == GSL_SUCCESS )
            std::cout << "converged to minimum" << std::endl;
        sigma  = gsl_vector_get( s->x, IDX_SIGMA );
        length = gsl_vector_get( s->x, IDX_LENGTH );
        nu     = gsl_vector_get( s->x, IDX_NU );
        LL     = -s->fval; // convert back
        std::cout << "  logliklihood at"
                  << "  σ = " << sigma
                  << ", ℓ = " << length
                  << ", ν = " << nu
                  << "   is " << LL
                  << std::endl;
    } while (( status == GSL_CONTINUE ) && ( iter < max_iter ));
    gsl_multimin_fminimizer_free( s );
    gsl_vector_free( ss );
    gsl_vector_free( x );
    return LL;
}
int
main ( int      argc,
       char **  argv )
{
    INIT();
    double  sigma  = 1.0;
    double  length = 1.29;
    double  nu     = 0.325;
    LogLikeliHoodProblem  problem( "datafile.txt", 1e-6 );
    auto  LL = maximize_likelihood( sigma, length, nu, problem );
    DONE();
}

Table of Contents