Skip to content

aleon1138/mars

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

111 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MARS

A C++ implementation of Multivariate Adaptive Regression Splines. This is a semi-brute force search for interactions and non-linearities. It will provide competitive regression performance compared to neural network for, but with much faster model evaluation runtimes.

References:

Performance

We use OpenMP to achieve good speed-up per core. There is some memory overhead for each thread launched, which might constrain the total number of cores available. You can control the number of threads via the OMP_NUM_THREADS environment variable or the threads argument.

The following timings were obtained on an AMD EPYC 9654 96-Core Processor with 192 logical CPUs. Note that multi-threaded performance is nearly ideal up to 30 cores or so.

Performance timings of MARS

Supported Platforms

These instructions have been verified to work on the following platforms:

  • Ubuntu 18.04 and 20.04
  • Raspbian 10
  • macOS 10.13 (WIP)

Build Requirements

Eigen - The code has been tested with version 3.3.4.

sudo apt install -y libeigen3-dev

... on macOS:

brew install pkg-config eigen

GoogleTest - Not available pre-compiled on Ubuntu (see here); build from source:

sudo apt install -y libgtest-dev cmake
cd /usr/src/gtest
sudo cmake CMakeLists.txt && sudo make
sudo cp lib/*.a /usr/lib

pybind11 - Install via pip:

pip3 install pybind11

Or with conda:

conda install -y pybind11

Build Instructions

Use the Makefile:

cd mars
make
make test # optional - build and run the unit tests

Or install directly via pip:

cd mars
pip install .

An Example

Here we train a linear model with a categorical interaction.

import numpy as np
X      = np.random.randn(10000, 2)
X[:,1] = np.random.binomial(1, .5, size=len(X))
y      = 2*X[:,0] + 3*X[:,1] + X[:,0]*X[:,1] + np.random.randn(len(X))

# convert to column-major float
X = np.array(X, order='F', dtype='f')
y = np.array(y, dtype='f')

# Fit the earth model
import mars
model = mars.fit(X, y, max_epochs=8, tail_span=0, linear_only=True)
B     = mars.expand(X, model) # expand the basis
beta  = np.linalg.lstsq(B, y, rcond=None)[0]
y_hat = B @ beta

# Pretty-print the model
mars.pprint(model, beta)

Depending on the random seed, the result should look similar to this:

  -0.003
  +1.972 * X[0]
  +3.001 * X[1]
  +1.048 * X[0] * X[1]

About

An implementation of Multivariate Adaptive Regression Splines

Topics

Resources

Stars

Watchers

Forks

Contributors