Chi-Squared distribution in C++

You need it but how do you get it ...

This is the first post in a series on the usage of the Chi-Squared (\(\chi^2\)) distribution in C++.

If you need to use the Chi-Squared distribution and the associated Cumulative Distribution Function (CDF), there are only two open-source alternatives I know about

  1. boost
  2. gsl (GNU Scientific Library)

You could also consider to implement your own Chi-Squared probability / cumulative density function. While it may be an interesting learning experience, for production purposes … just NO. I am not even going to bother with convincing you that you should not go down that path.

Another possibility could be to generate a lookup table as proposed and implemented by MoseleyBioinformaticsLab. Then you could integrate the lookup table as a C++ header file in your own project. This idea is sound, but you are limited in how many degrees of freedom (DOF) you can use in your chi2plookup table. As mentioned in the provided link, at DOF = 6 the resulting file is ~40Mb.

The above idea of a lookup table is what inspired this post. What if you need to use the \(\chi^2\) distribution to perform the online Normalized Innovation Squared test for checking a Kalman Filter’s consistency in a robotics application. The lookup table approach is impossible to use.

So what should one do …

Honestly, you don’t have a choice. You have to use boost’s math module or gsl. Before you scream your lungs out that you don’t want to install a large dependency, hear me out and let’s get a few things clear:

  • there is a difference between build dependencies and runtime dependencies …
  • the things you are probably most worried about are runtime dependencies, which are in the form of (most likely) shared libraries …
  • all the includes and static libraries you need for compiling your super awesome binary don’t need to be on the final (target) system …


Let us look at boost first

To use the Chi-Squared distribution from boost you need to use the boost/math/distributions/chi_squared.hpp header file. It is part of Boost’s math module. If you are on Ubuntu or Debian, you can install libboost-math-dev. On Arch you have no choice but to install the full boost.

Code sample

So, let’s say we have this simple piece of code.

#include <boost/math/distributions/chi_squared.hpp>
#include <iostream>

int main(int argc, char *argv[]) {

  const int kMaxNumDof = 100;
  const double kAlpha = 0.05;
  const double kHalfAlpha = kAlpha / 2.0;

  for (std::size_t n = 1; n <= kMaxNumDof; n++) {
    boost::math::chi_squared chiDist(n);
    double upperQ =
        (boost::math::quantile(boost::math::complement(chiDist, kHalfAlpha)));
    double lowerQ = (boost::math::quantile(chiDist, kHalfAlpha));

    std::cout << "[Chi^2_" << n << "(" << kHalfAlpha << "), Chi^2_" << n << "("
              << 1 - kHalfAlpha << ")]"
              << " = [" << lowerQ << ", " << upperQ << "]\n";
  }

  std::cout << std::flush;

  return 0;
}

You can find this code along with the accompanying CMakeLists.txt file needed to build it on Gitlab. My friends recommended me to remind everyone of he importance of not using std::endl in your code unless you absolutely need to [1], [2].

The code evaluates 100 \(\chi^2\) distributions, for each degree of freedom in the interval \(N \in [1, 100]\). And in order to do something with it, it computes the two sided 95% confidence region of each N-degree of freedom distribution.

When ran, the output is similar to this

[Chi^2_1(0.025), Chi^2_1(0.975)] = [0.000982069, 5.02389]
[Chi^2_2(0.025), Chi^2_2(0.975)] = [0.0506356, 7.37776]
[Chi^2_3(0.025), Chi^2_3(0.975)] = [0.215795, 9.3484]
[Chi^2_4(0.025), Chi^2_4(0.975)] = [0.484419, 11.1433]
....
[Chi^2_96(0.025), Chi^2_96(0.975)] = [70.7828, 125]
[Chi^2_97(0.025), Chi^2_97(0.975)] = [71.6415, 126.141]
[Chi^2_98(0.025), Chi^2_98(0.975)] = [72.5009, 127.282]
[Chi^2_99(0.025), Chi^2_99(0.975)] = [73.3611, 128.422]



Checking dependencies

When I do this kind of experiments, I love to use docker.

Using the ubuntu:focal-20210416 docker image, we can do a sanity check before and ensure nothing boost-like is in there.

docker run --rm -it ubuntu:focal-20210416
root@b4480851c173:/# find / -name *boost*
/proc/sys/vm/watermark_boost_factor
/sys/devices/pci0000:00/0000:00:02.0/drm/card0/gt_boost_freq_mhz
/sys/devices/system/cpu/intel_pstate/hwp_dynamic_boost
root@b4480851c173:/#

To compile our code on such an image, we need to install a few extra packages besides libboost-math-dev. The Dockerfile below adds the necessary packages.

FROM ubuntu:focal-20210416

ARG DEBIAN_FRONTEND=noninteractive
ENV TZ=Europe/Paris

RUN apt-get update -y && \
    apt-get install -y --no-install-recommends \
    build-essential cmake\
    libboost-math-dev \
    && apt-get -y autoremove \
    && apt-get -y clean

CMD ["/bin/bash"]

this file doesn’t follow Dockerfiles best practices, see On Dockerfiles for an example of a recommended file.

Let’s build an image from it and call it builder-chi-squared-boost

docker build -f Dockerfile -t builder-chi-squared-boost .

We can now check what files have been added by libboost-math-dev install step

docker run --rm -it builder-chi-squared-boost:latest
root@b4480851c173:/# find / -name *boost*
./usr/share/doc/libboost-math-dev
...
./usr/share/doc/libboost-math1.71-dev
./usr/share/lintian/overrides/libboost-math1.71.0
./usr/share/lintian/overrides/libboost1.71-dev
./usr/lib/x86_64-linux-gnu/libboost_math_c99l.so
./usr/lib/x86_64-linux-gnu/libboost_math_c99f.so.1.71.0
./usr/lib/x86_64-linux-gnu/libboost_math_tr1l.a
./usr/lib/x86_64-linux-gnu/libboost_math_c99.so
./usr/lib/x86_64-linux-gnu/libboost_math_c99f.a
...
./usr/include/boost/
...
./usr/include/c++/9/bits/boost_concept_check.h

I have trimmed the output for clarity. In short, the includes folder with all the headers, all the static (.a) and dynamic (.so) libraries and the package’s .cmake files.

We can now build the sample program inside the docker container by following these steps,

docker run --rm -it -v <ABSOLUTE_PATH_TO_MY_CODE>:/opt/code builder-chi-squared-boost:latest
root@e9e5b63aeead:/opt/code# cd opt/code
root@e9e5b63aeead:/opt/code# cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
root@e9e5b63aeead:/opt/code# cmake --build build -j
root@e9e5b63aeead:/opt/code# ls -lh build/bin/
total 160K
-rwxr-xr-x 1 root root 159K May  6 17:13 chi_squared_with_boost

Now, let see what happens if we “copy” (we are actually mounting it in this example) and then run the binary in the original ubuntu docker image

docker run --rm -it -v <ABSOLUTE_PATH_TO_MY_CODE>/build/bin:/opt/bin ubuntu:focal-20210416
root@dbaf358595ef:/# cd /opt/bin/
root@dbaf358595ef:/opt/bin# ./chi_squared_with_boost
[Chi^2_1(0.025), Chi^2_1(0.975)] = [0.000982069, 5.02389]
[Chi^2_2(0.025), Chi^2_2(0.975)] = [0.0506356, 7.37776]
[Chi^2_3(0.025), Chi^2_3(0.975)] = [0.215795, 9.3484]
[Chi^2_4(0.025), Chi^2_4(0.975)] = [0.484419, 11.1433]
[Chi^2_5(0.025), Chi^2_5(0.975)] = [0.831212, 12.8325]
...
[Chi^2_95(0.025), Chi^2_95(0.975)] = [69.9249, 123.858]
[Chi^2_96(0.025), Chi^2_96(0.975)] = [70.7828, 125]
[Chi^2_97(0.025), Chi^2_97(0.975)] = [71.6415, 126.141]
[Chi^2_98(0.025), Chi^2_98(0.975)] = [72.5009, 127.282]
[Chi^2_99(0.025), Chi^2_99(0.975)] = [73.3611, 128.422]
[Chi^2_100(0.025), Chi^2_100(0.975)] = [74.2219, 129.561]


Lo and Behold, it runs.

But wait, how is that possible. We can check what are the runtime dependencies of our program.

readelf -d chi_squared_with_boost | grep 'NEEDED'
0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]

Boost conclusion

As the output above shows, there is no runtime dependency on any boost dynamic library. Whatever was needed was statically linked. This should have been obvious from the CMakeLists.txt file, as there is no target_link_libraries call. Clearly, for this use case we got lucky. I haven’t investigated what happens if you use more of boost-maths functionality, or any other module. But this post was specifically targeting the issue of using the Chi-Squared (\(\chi^2\)) distribution in C++.

Hence, if you need to use this distribution in your C++ code, just go with boost-math and you only need to use templated C++ header file. This is a tried and tested library, guaranteed to work.

You can find the code, the CMakeLists.txt and the Dockerfile(s) used to generate this example, together with an automated CI pipeline in this Gitlab repository. An example of one successful CI pipeline can be found at this pipeline.

Let us also look at gsl

Since we got lucky with boost, let us look for the sake of the argument at using gsl.

The code to achieve the exact same thing as in the previous example, is given below

#include <gsl/gsl_cdf.h>
#include <iostream>

int main(int argc, char *argv[]) {

  const int kMaxNumDof = 100;
  const double kAlpha = 0.05;
  const double kHalfAlpha = kAlpha / 2.0;

  for (std::size_t n = 1; n <= kMaxNumDof; n++) {

    double lowerQ = gsl_cdf_chisq_Pinv(kHalfAlpha, n);
    double upperQ = gsl_cdf_chisq_Qinv(kHalfAlpha, n);

    // Show the two sided (1-alpha)% confidence region of the n (degrees of
    // freedom) chi-squared distribution
    std::cout << "[Chi^2_" << n << "(" << kHalfAlpha << "), Chi^2_" << n << "("
              << 1 - kHalfAlpha << ")]"
              << " = [" << lowerQ << ", " << upperQ << "]\n";
  }

  std::cout << std::flush;

  return 0;
}

The example differs by using the gsl header, gsl/gsl_cdf.h, for computing the (\(\chi^2\)) p-test values and the corresponding function calls. To get the gsl library, if you are on Ubuntu or Debian, you can install libgsl-dev while on Arch the package is gsl.

The gsl package provides dynamic libraries, hence the CMakeLists.txt file needs to link against them, as show below. You can find both code and CMakeListst.txt file on Gitlab.

cmake_minimum_required(VERSION 3.16)

project(ChiSquaredWithGsl
          LANGUAGES CXX
          VERSION 0.0.1
          DESCRIPTION "Exploring the usage of the chi-squared distribution in C++"
       )

find_package(GSL REQUIRED)

set(TARGET_NAME "chi_squared_with_gsl")

add_executable(${TARGET_NAME} main.cpp)

target_link_libraries(${TARGET_NAME} PRIVATE
  GSL::gsl
  )

set_target_properties(${TARGET_NAME}
  PROPERTIES
  CXX_STANDARD 17
  CXX_STANDARD_REQUIRED ON
  CXX_EXTENSIONS OFF
  RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin
)

The code does the exact same thing as in the previous example and will produce the same output.


Checking dependencies

Again, using docker, we first install the packages needed for compiling the code sample. The Dockerfile below adds the necessary packages.

FROM ubuntu:focal-20210416

ARG DEBIAN_FRONTEND=noninteractive
ENV TZ=Europe/Paris

RUN apt-get update -y && \
    apt-get install -y --no-install-recommends \
    build-essential cmake\
    libgsl-dev \
    && apt-get -y autoremove \
    && apt-get -y clean

CMD ["/bin/bash"]

Then we build an image from it and call it builder-chi-squared-gsl

docker build -f Dockerfile -t builder-chi-squared-gsl .

We check what files have been added by the libgsl-dev install step

docker run --rm -it builder-chi-squared-gsl:latest
root@ccf1ab7b6825:/# find / -name *gsl*
/usr/lib/x86_64-linux-gnu/libgsl.so
/usr/lib/x86_64-linux-gnu/libgsl.so.23.1.0
/usr/lib/x86_64-linux-gnu/libgslcblas.so
/usr/lib/x86_64-linux-gnu/pkgconfig/gsl.pc
/usr/lib/x86_64-linux-gnu/libgslcblas.so.0
/usr/lib/x86_64-linux-gnu/libgsl.so.23
/usr/lib/x86_64-linux-gnu/libgslcblas.so.0.0.0
/usr/lib/x86_64-linux-gnu/libgsl.a
/usr/lib/x86_64-linux-gnu/libgslcblas.a
/usr/bin/gsl-config
/usr/include/gsl
/usr/include/gsl/gsl_randist.h
...

I have trimmed the output for clarity. But we can see that the includes folder is there and all the static (.a) and dynamic (.so) libraries.

We can build the sample program inside the docker container by following these steps,

docker run --rm -it -v <ABSOLUTE_PATH_TO_MY_CODE>:/opt/code builder-chi-squared-gsl:latest
root@5e90b01e99f4:/opt/code# cd opt/code
root@5e90b01e99f4:/opt/code# cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
root@5e90b01e99f4:/opt/code# cmake --build build -j
root@5e90b01e99f4:/opt/code# ls -lh build/bin/
total 160K
-rwxr-xr-x 1 root root 18K May  6 17:13 chi_squared_with_gsl

If we run it, it will produce the familiar output, listing the 95% confidence region for distributions from 1 to 100 degrees of freedom.

If we “copy” and then attempt to run the binary in the original ubuntu docker image we will fail

docker run --rm -it -v <ABSOLUTE_PATH_TO_MY_CODE>/build/bin:/opt/bin ubuntu:focal-20210416
root@dbaf358595ef:/# cd /opt/bin/
root@dbaf358595ef:/opt/bin# ./chi_squared_with_gsl
./chi_squared_with_gsl: error while loading shared libraries: libgsl.so.23: cannot open shared object file: No such file or directory

The error message should be obvious. There is a dependency on the gsl dynamic library against which we linked to in the CMakeLists.txt file. This library is a runtime requirement.

readelf -d build/bin/chi_squared_with_gsl | grep NEEDED
 0x0000000000000001 (NEEDED)             Shared library: [libgsl.so.23]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]

Unlike the boost case, if we want to use this program on an embedded device or a different architecture, we will need to cross-compile gsl itself and then deploy both our custom binary and the gsl dynamic library.

There is a hacky way to get rid of the runtime dependency. It is hacky because cmake does not have a decent way to achieve this, as reported here.

If you noticed, when gsl was installed in the docker image, both the static version (.a) and the dynamic version (.so) of the libraries were provided. We can force cmake to statically link gsl into our binary.

To achieve this the wrong way, we force cmake to only look for static libraries by setting the line below in our CMakeLists.txt file, prior to calling the find_package function.

SET(CMAKE_FIND_LIBRARY_SUFFIXES .a)

find_package(GSL REQUIRED)

Don’t use the above command blindly in your CMakeLists.txt files as it will break linkage with additional shared libraries, since as given above it applies for all subsequent find_package commands.

If we implement the above cmake command and rebuild our binary we can see that there is no longer a dependency on libgsl.so as gsl is baked in our application.

readelf -d build/bin/chi_squared_with_gsl  | grep NEEDED
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]

Summary

If you need to use the Chi-Squared (\(\chi^2\)) distribution in C++ use boost’s math module. You won’t need to install anything extra on your embedded device or production machine.

You can achieve something similar with gsl but you will need to cross-compile and statically link against it. Not difficult, but why not use boost directly …

On Dockerfiles

The Dockerfiles shown in this article doesn’t follow the entire list of recommended best practices.

One that follows the recommendations and has been linted with hadolint is this

FROM ubuntu:focal-20210416

ARG DEBIAN_FRONTEND=noninteractive
ENV TZ=Europe/Paris

RUN apt-get update -y && \
    apt-get install -y --no-install-recommends \
    build-essential=12.8ubuntu1 \
    cmake=3.16.3-1ubuntu1\
    libboost-math-dev=1.71.0.0ubuntu2\
    && apt-get -y autoremove \
    && apt-get -y clean \
    && rm -rf /var/lib/apt/lists/*

CMD ["/bin/bash"]

It differs with the development one shown in the body of the article in two respects; it cleans the apt cache and it pins the versions of each packet.

If specifying package versions is essential for you or your employer, by all means do the second step too. However, if you are prototyping or are in need of changing base images frequently, think twice about it, as you may find yourself constantly changing version numbers in your Dockerfiles.

I consider that fixing package versions doesn’t enhance security, it creates the illusion of security. If Docker or apt-get would have a mechanism for verifying checksums, I would reconsider.




Commenting on this blog will be available soon. Until then, for suggestions, comments, please contact me by clicking on the email icon below.