MARS

A C++ implementation of Multivariate Adaptive Regression Splines. This is a semi-brute force search for interactions and non-linearities. It will provide competitive regression performance compared to neural network for, but with much faster model evaluation runtimes.

References:

Write-up describing the method
Commercial MARS package by Salford Systems
R "earth" package documentation
Stephen Milborrow's resource page
Additionally, there is a scikit-learn module here

Performance

We use OpenMP to achieve good speed-up per core. There is some memory overhead for each thread launched, which might constrain the total number of cores available. You can control the number of threads via the OMP_NUM_THREADS environment variable or the threads argument.

The following timings were obtained on an AMD EPYC 9654 96-Core Processor with 192 logical CPUs. Note that multi-threaded performance is nearly ideal up to 30 cores or so.

Supported Platforms

These instructions have been verified to work on the following platforms:

Ubuntu 18.04 and 20.04
Raspbian 10
macOS 10.13 (WIP)

Build Requirements

Eigen - The code has been tested with version 3.3.4.

sudo apt install -y libeigen3-dev

... on macOS:

brew install pkg-config eigen

GoogleTest - Not available pre-compiled on Ubuntu (see here); build from source:

sudo apt install -y libgtest-dev cmake
cd /usr/src/gtest
sudo cmake CMakeLists.txt && sudo make
sudo cp lib/*.a /usr/lib

pybind11 - Install via pip:

pip3 install pybind11

Or with conda:

conda install -y pybind11

Build Instructions

Use the Makefile:

cd mars
make
make test # optional - build and run the unit tests

Or install directly via pip:

cd mars
pip install .

An Example

Here we train a linear model with a categorical interaction.

import numpy as np
X      = np.random.randn(10000, 2)
X[:,1] = np.random.binomial(1, .5, size=len(X))
y      = 2*X[:,0] + 3*X[:,1] + X[:,0]*X[:,1] + np.random.randn(len(X))

# convert to column-major float
X = np.array(X, order='F', dtype='f')
y = np.array(y, dtype='f')

# Fit the earth model
import mars
model = mars.fit(X, y, max_epochs=8, tail_span=0, linear_only=True)
B     = mars.expand(X, model) # expand the basis
beta  = np.linalg.lstsq(B, y, rcond=None)[0]
y_hat = B @ beta

# Pretty-print the model
mars.pprint(model, beta)

Depending on the random seed, the result should look similar to this:

  -0.003
  +1.972 * X[0]
  +3.001 * X[1]
  +1.048 * X[0] * X[1]

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
mars.py		mars.py
marsalgo.h		marsalgo.h
marslib.cc		marslib.cc
pyproject.toml		pyproject.toml
setup.py		setup.py
timings.png		timings.png
unittest.cc		unittest.cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MARS

Performance

Supported Platforms

Build Requirements

Build Instructions

An Example

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MARS

Performance

Supported Platforms

Build Requirements

Build Instructions

An Example

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages