A C++ implementation of Multivariate Adaptive Regression Splines. This is a semi-brute force search for interactions and non-linearities. It will provide competitive regression performance compared to neural network for, but with much faster model evaluation runtimes.
References:
- Write-up describing the method
- Commercial MARS package by Salford Systems
- R "earth" package documentation
- Stephen Milborrow's resource page
- Additionally, there is a scikit-learn module here
We use OpenMP to achieve good speed-up per core. There
is some memory overhead for each thread launched, which might constrain the total
number of cores available. You can control the number of threads via the
OMP_NUM_THREADS environment variable or the threads argument.
The following timings were obtained on an AMD EPYC 9654 96-Core Processor with 192 logical CPUs. Note that multi-threaded performance is nearly ideal up to 30 cores or so.
These instructions have been verified to work on the following platforms:
- Ubuntu 18.04 and 20.04
- Raspbian 10
- macOS 10.13 (WIP)
Eigen - The code has been tested with version 3.3.4.
sudo apt install -y libeigen3-dev... on macOS:
brew install pkg-config eigenGoogleTest - Not available pre-compiled on Ubuntu (see here); build from source:
sudo apt install -y libgtest-dev cmake
cd /usr/src/gtest
sudo cmake CMakeLists.txt && sudo make
sudo cp lib/*.a /usr/libpybind11 - Install via pip:
pip3 install pybind11Or with conda:
conda install -y pybind11Use the Makefile:
cd mars
make
make test # optional - build and run the unit testsOr install directly via pip:
cd mars
pip install .Here we train a linear model with a categorical interaction.
import numpy as np
X = np.random.randn(10000, 2)
X[:,1] = np.random.binomial(1, .5, size=len(X))
y = 2*X[:,0] + 3*X[:,1] + X[:,0]*X[:,1] + np.random.randn(len(X))
# convert to column-major float
X = np.array(X, order='F', dtype='f')
y = np.array(y, dtype='f')
# Fit the earth model
import mars
model = mars.fit(X, y, max_epochs=8, tail_span=0, linear_only=True)
B = mars.expand(X, model) # expand the basis
beta = np.linalg.lstsq(B, y, rcond=None)[0]
y_hat = B @ beta
# Pretty-print the model
mars.pprint(model, beta)Depending on the random seed, the result should look similar to this:
-0.003
+1.972 * X[0]
+3.001 * X[1]
+1.048 * X[0] * X[1]
