This is the first post in a series on the usage of the Chi-Squared (\(\chi^2\)) distribution in C++.
If you need to use the Chi-Squared distribution and the associated Cumulative Distribution Function (CDF), there are only two open-source alternatives I know about
You could also consider to implement your own Chi-Squared probability / cumulative density function. While it may be an interesting learning experience, for production purposes … just NO. I am not even going to bother with convincing you that you should not go down that path.
Another possibility could be to generate a lookup table as proposed and implemented by MoseleyBioinformaticsLab. Then you could integrate the lookup table as a C++
header file in your own project. This idea is sound, but you are limited in how many degrees of freedom (DOF)
you can use in your chi2plookup
table. As mentioned in the provided link, at DOF = 6
the resulting file is ~40Mb
.
The above idea of a lookup table is what inspired this post. What if you need to use the \(\chi^2\) distribution to perform the online Normalized Innovation Squared test for checking a Kalman Filter’s consistency in a robotics application. The lookup table approach is impossible to use.
So what should one do …
Honestly, you don’t have a choice. You have to use boost
’s math
module or gsl
. Before you scream your lungs out that you don’t want to install a large dependency, hear me out and let’s get a few things clear:
- there is a difference between build dependencies and runtime dependencies …
- the things you are probably most worried about are runtime dependencies, which are in the form of (most likely)
shared
libraries … - all the
includes
and static libraries you need for compiling your super awesome binary don’t need to be on the final (target) system …
Let us look at boost
first
To use the Chi-Squared distribution from boost
you need to use the boost/math/distributions/chi_squared.hpp
header file. It is part of Boost’s math
module. If you are on Ubuntu or Debian, you can install libboost-math-dev
. On Arch
you have no choice but to install the full boost
.
Code sample
So, let’s say we have this simple piece of code.
#include <boost/math/distributions/chi_squared.hpp>
#include <iostream>
int main(int argc, char *argv[]) {
const int kMaxNumDof = 100;
const double kAlpha = 0.05;
const double kHalfAlpha = kAlpha / 2.0;
for (std::size_t n = 1; n <= kMaxNumDof; n++) {
boost::math::chi_squared chiDist(n);
double upperQ =
(boost::math::quantile(boost::math::complement(chiDist, kHalfAlpha)));
double lowerQ = (boost::math::quantile(chiDist, kHalfAlpha));
std::cout << "[Chi^2_" << n << "(" << kHalfAlpha << "), Chi^2_" << n << "("
<< 1 - kHalfAlpha << ")]"
<< " = [" << lowerQ << ", " << upperQ << "]\n";
}
std::cout << std::flush;
return 0;
}
You can find this code along with the accompanying CMakeLists.txt
file needed to build it on Gitlab. My friends recommended me to remind everyone of he importance of not using std::endl
in your code unless you absolutely need to [1], [2].
The code evaluates 100 \(\chi^2\) distributions, for each degree of freedom in the interval \(N \in [1, 100]\). And in order to do something with it, it computes the two sided 95% confidence region of each N-degree of freedom distribution.
When ran, the output is similar to this
[Chi^2_1(0.025), Chi^2_1(0.975)] = [0.000982069, 5.02389]
[Chi^2_2(0.025), Chi^2_2(0.975)] = [0.0506356, 7.37776]
[Chi^2_3(0.025), Chi^2_3(0.975)] = [0.215795, 9.3484]
[Chi^2_4(0.025), Chi^2_4(0.975)] = [0.484419, 11.1433]
....
[Chi^2_96(0.025), Chi^2_96(0.975)] = [70.7828, 125]
[Chi^2_97(0.025), Chi^2_97(0.975)] = [71.6415, 126.141]
[Chi^2_98(0.025), Chi^2_98(0.975)] = [72.5009, 127.282]
[Chi^2_99(0.025), Chi^2_99(0.975)] = [73.3611, 128.422]
Checking dependencies
When I do this kind of experiments, I love to use docker.
Using the ubuntu:focal-20210416
docker image, we can do a sanity check before and ensure nothing boost
-like is in there.
docker run --rm -it ubuntu:focal-20210416
root@b4480851c173:/# find / -name *boost*
/proc/sys/vm/watermark_boost_factor
/sys/devices/pci0000:00/0000:00:02.0/drm/card0/gt_boost_freq_mhz
/sys/devices/system/cpu/intel_pstate/hwp_dynamic_boost
root@b4480851c173:/#
To compile our code on such an image, we need to install a few extra packages besides libboost-math-dev
. The Dockerfile below adds the necessary packages.
FROM ubuntu:focal-20210416
ARG DEBIAN_FRONTEND=noninteractive
ENV TZ=Europe/Paris
RUN apt-get update -y && \
apt-get install -y --no-install-recommends \
build-essential cmake\
libboost-math-dev \
&& apt-get -y autoremove \
&& apt-get -y clean
CMD ["/bin/bash"]
this file doesn’t follow Dockerfiles best practices, see On Dockerfiles for an example of a recommended file.
Let’s build an image from it and call it builder-chi-squared-boost
docker build -f Dockerfile -t builder-chi-squared-boost .
We can now check what files have been added by libboost-math-dev
install step
docker run --rm -it builder-chi-squared-boost:latest
root@b4480851c173:/# find / -name *boost*
./usr/share/doc/libboost-math-dev
...
./usr/share/doc/libboost-math1.71-dev
./usr/share/lintian/overrides/libboost-math1.71.0
./usr/share/lintian/overrides/libboost1.71-dev
./usr/lib/x86_64-linux-gnu/libboost_math_c99l.so
./usr/lib/x86_64-linux-gnu/libboost_math_c99f.so.1.71.0
./usr/lib/x86_64-linux-gnu/libboost_math_tr1l.a
./usr/lib/x86_64-linux-gnu/libboost_math_c99.so
./usr/lib/x86_64-linux-gnu/libboost_math_c99f.a
...
./usr/include/boost/
...
./usr/include/c++/9/bits/boost_concept_check.h
I have trimmed the output for clarity. In short, the includes
folder with all the headers, all the static (.a
) and dynamic (.so
) libraries and the package’s .cmake
files.
We can now build the sample program inside the docker container by following these steps,
docker run --rm -it -v <ABSOLUTE_PATH_TO_MY_CODE>:/opt/code builder-chi-squared-boost:latest
root@e9e5b63aeead:/opt/code# cd opt/code
root@e9e5b63aeead:/opt/code# cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
root@e9e5b63aeead:/opt/code# cmake --build build -j
root@e9e5b63aeead:/opt/code# ls -lh build/bin/
total 160K
-rwxr-xr-x 1 root root 159K May 6 17:13 chi_squared_with_boost
Now, let see what happens if we “copy” (we are actually mounting it in this example) and then run the binary in the original ubuntu
docker image
docker run --rm -it -v <ABSOLUTE_PATH_TO_MY_CODE>/build/bin:/opt/bin ubuntu:focal-20210416
root@dbaf358595ef:/# cd /opt/bin/
root@dbaf358595ef:/opt/bin# ./chi_squared_with_boost
[Chi^2_1(0.025), Chi^2_1(0.975)] = [0.000982069, 5.02389]
[Chi^2_2(0.025), Chi^2_2(0.975)] = [0.0506356, 7.37776]
[Chi^2_3(0.025), Chi^2_3(0.975)] = [0.215795, 9.3484]
[Chi^2_4(0.025), Chi^2_4(0.975)] = [0.484419, 11.1433]
[Chi^2_5(0.025), Chi^2_5(0.975)] = [0.831212, 12.8325]
...
[Chi^2_95(0.025), Chi^2_95(0.975)] = [69.9249, 123.858]
[Chi^2_96(0.025), Chi^2_96(0.975)] = [70.7828, 125]
[Chi^2_97(0.025), Chi^2_97(0.975)] = [71.6415, 126.141]
[Chi^2_98(0.025), Chi^2_98(0.975)] = [72.5009, 127.282]
[Chi^2_99(0.025), Chi^2_99(0.975)] = [73.3611, 128.422]
[Chi^2_100(0.025), Chi^2_100(0.975)] = [74.2219, 129.561]
Lo and Behold, it runs.
But wait, how is that possible. We can check what are the runtime dependencies of our program.
readelf -d chi_squared_with_boost | grep 'NEEDED'
0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6]
0x0000000000000001 (NEEDED) Shared library: [libm.so.6]
0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
Boost conclusion
As the output above shows, there is no runtime dependency on any boost
dynamic library. Whatever was needed was statically linked. This should have been obvious from the CMakeLists.txt
file, as there is no target_link_libraries
call. Clearly, for this use case we got lucky. I haven’t investigated what happens if you use more of boost-math
s functionality, or any other module. But this post was specifically targeting the issue of using the Chi-Squared (\(\chi^2\)) distribution in C++.
Hence, if you need to use this distribution in your C++ code, just go with boost-math
and you only need to use templated C++ header file. This is a tried and tested library, guaranteed to work.
You can find the code, the CMakeLists.txt
and the Dockerfile
(s) used to generate this example, together with an automated CI pipeline in this Gitlab repository. An example of one successful CI pipeline can be found at this pipeline.
Let us also look at gsl
Since we got lucky with boost
, let us look for the sake of the argument at using gsl
.
The code to achieve the exact same thing as in the previous example, is given below
#include <gsl/gsl_cdf.h>
#include <iostream>
int main(int argc, char *argv[]) {
const int kMaxNumDof = 100;
const double kAlpha = 0.05;
const double kHalfAlpha = kAlpha / 2.0;
for (std::size_t n = 1; n <= kMaxNumDof; n++) {
double lowerQ = gsl_cdf_chisq_Pinv(kHalfAlpha, n);
double upperQ = gsl_cdf_chisq_Qinv(kHalfAlpha, n);
// Show the two sided (1-alpha)% confidence region of the n (degrees of
// freedom) chi-squared distribution
std::cout << "[Chi^2_" << n << "(" << kHalfAlpha << "), Chi^2_" << n << "("
<< 1 - kHalfAlpha << ")]"
<< " = [" << lowerQ << ", " << upperQ << "]\n";
}
std::cout << std::flush;
return 0;
}
The example differs by using the gsl
header, gsl/gsl_cdf.h
, for computing the (\(\chi^2\)) p-test values and the corresponding function calls. To get the gsl
library, if you are on Ubuntu or Debian, you can install libgsl-dev
while on Arch
the package is gsl
.
The gsl
package provides dynamic libraries, hence the CMakeLists.txt
file needs to link against them, as show below. You can find both code and CMakeListst.txt
file on Gitlab.
cmake_minimum_required(VERSION 3.16)
project(ChiSquaredWithGsl
LANGUAGES CXX
VERSION 0.0.1
DESCRIPTION "Exploring the usage of the chi-squared distribution in C++"
)
find_package(GSL REQUIRED)
set(TARGET_NAME "chi_squared_with_gsl")
add_executable(${TARGET_NAME} main.cpp)
target_link_libraries(${TARGET_NAME} PRIVATE
GSL::gsl
)
set_target_properties(${TARGET_NAME}
PROPERTIES
CXX_STANDARD 17
CXX_STANDARD_REQUIRED ON
CXX_EXTENSIONS OFF
RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin
)
The code does the exact same thing as in the previous example and will produce the same output.
Checking dependencies
Again, using docker, we first install the packages needed for compiling the code sample. The Dockerfile below adds the necessary packages.
FROM ubuntu:focal-20210416
ARG DEBIAN_FRONTEND=noninteractive
ENV TZ=Europe/Paris
RUN apt-get update -y && \
apt-get install -y --no-install-recommends \
build-essential cmake\
libgsl-dev \
&& apt-get -y autoremove \
&& apt-get -y clean
CMD ["/bin/bash"]
Then we build an image from it and call it builder-chi-squared-gsl
docker build -f Dockerfile -t builder-chi-squared-gsl .
We check what files have been added by the libgsl-dev
install step
docker run --rm -it builder-chi-squared-gsl:latest
root@ccf1ab7b6825:/# find / -name *gsl*
/usr/lib/x86_64-linux-gnu/libgsl.so
/usr/lib/x86_64-linux-gnu/libgsl.so.23.1.0
/usr/lib/x86_64-linux-gnu/libgslcblas.so
/usr/lib/x86_64-linux-gnu/pkgconfig/gsl.pc
/usr/lib/x86_64-linux-gnu/libgslcblas.so.0
/usr/lib/x86_64-linux-gnu/libgsl.so.23
/usr/lib/x86_64-linux-gnu/libgslcblas.so.0.0.0
/usr/lib/x86_64-linux-gnu/libgsl.a
/usr/lib/x86_64-linux-gnu/libgslcblas.a
/usr/bin/gsl-config
/usr/include/gsl
/usr/include/gsl/gsl_randist.h
...
I have trimmed the output for clarity. But we can see that the includes
folder is there and all the static (.a
) and dynamic (.so
) libraries.
We can build the sample program inside the docker container by following these steps,
docker run --rm -it -v <ABSOLUTE_PATH_TO_MY_CODE>:/opt/code builder-chi-squared-gsl:latest
root@5e90b01e99f4:/opt/code# cd opt/code
root@5e90b01e99f4:/opt/code# cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
root@5e90b01e99f4:/opt/code# cmake --build build -j
root@5e90b01e99f4:/opt/code# ls -lh build/bin/
total 160K
-rwxr-xr-x 1 root root 18K May 6 17:13 chi_squared_with_gsl
If we run it, it will produce the familiar output, listing the 95% confidence region for distributions from 1 to 100 degrees of freedom.
If we “copy” and then attempt to run the binary in the original ubuntu
docker image we will fail
docker run --rm -it -v <ABSOLUTE_PATH_TO_MY_CODE>/build/bin:/opt/bin ubuntu:focal-20210416
root@dbaf358595ef:/# cd /opt/bin/
root@dbaf358595ef:/opt/bin# ./chi_squared_with_gsl
./chi_squared_with_gsl: error while loading shared libraries: libgsl.so.23: cannot open shared object file: No such file or directory
The error message should be obvious. There is a dependency on the gsl
dynamic library against which we linked to in the CMakeLists.txt
file. This library is a runtime requirement.
readelf -d build/bin/chi_squared_with_gsl | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libgsl.so.23]
0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
Unlike the boost
case, if we want to use this program on an embedded device or a different architecture, we will need to cross-compile gsl
itself and then deploy both our custom binary and the gsl
dynamic library.
There is a hacky way to get rid of the runtime dependency. It is hacky because cmake
does not have a decent way to achieve this, as reported here.
If you noticed, when gsl
was installed in the docker image, both the static version (.a
) and the dynamic version (.so
) of the libraries were provided. We can force cmake
to statically link gsl
into our binary.
To achieve this the wrong way, we force cmake
to only look for static libraries by setting the line below in our CMakeLists.txt
file, prior to calling the find_package
function.
SET(CMAKE_FIND_LIBRARY_SUFFIXES .a)
find_package(GSL REQUIRED)
Don’t use the above command blindly in your CMakeLists.txt
files as it will break linkage with additional shared libraries, since as given above it applies for all subsequent find_package
commands.
If we implement the above cmake
command and rebuild our binary we can see that there is no longer a dependency on libgsl.so
as gsl
is baked in our application.
readelf -d build/bin/chi_squared_with_gsl | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6]
0x0000000000000001 (NEEDED) Shared library: [libm.so.6]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
Summary
If you need to use the Chi-Squared (\(\chi^2\)) distribution in C++ use boost
’s math module. You won’t need to install anything extra on your embedded device or production machine.
You can achieve something similar with gsl
but you will need to cross-compile and statically link against it. Not difficult, but why not use boost
directly …
On Dockerfiles
The Dockerfiles shown in this article doesn’t follow the entire list of recommended best practices.
One that follows the recommendations and has been linted with hadolint is this
FROM ubuntu:focal-20210416
ARG DEBIAN_FRONTEND=noninteractive
ENV TZ=Europe/Paris
RUN apt-get update -y && \
apt-get install -y --no-install-recommends \
build-essential=12.8ubuntu1 \
cmake=3.16.3-1ubuntu1\
libboost-math-dev=1.71.0.0ubuntu2\
&& apt-get -y autoremove \
&& apt-get -y clean \
&& rm -rf /var/lib/apt/lists/*
CMD ["/bin/bash"]
It differs with the development one shown in the body of the article in two respects; it cleans the apt
cache and it pins the versions of each packet.
If specifying package versions is essential for you or your employer, by all means do the second step too. However, if you are prototyping or are in need of changing base images frequently, think twice about it, as you may find yourself constantly changing version numbers in your Dockerfiles.
I consider that fixing package versions doesn’t enhance security, it creates the illusion of security. If Docker or apt-get
would have a mechanism for verifying checksums, I would reconsider.
Commenting on this blog will be available soon. Until then, for suggestions, comments, please contact me by clicking on the email icon below.