ALTS: Adaptive Algorithm for the LTS Linear Estimator

Technical Overview:

This is a collection of C++ procedures for computing an approximation to the LTS linear estimator. The program is given a set of points in d-dimensional space (which may be generated by the program from various random distributions) each consisting of (d-1) independent variables and one dependent variable, a coverage value h, and pair of approximation parameters, called the residual error and the quantile error. The LTS estimator is the linear model that minimizes the sum of the h closest squared residuals. This program computes an approximation to this estimator through the use of a technique called geometric branch-and-bound. The nature of the approximation and the algorithm are described in paper "A Practical Approximation Algorithm for the LTS Estimator" (see below).

Software:

The current version of the software, Version 1.1, is provided in a bundle (see below), which contains all the copyright notice, source code, documentation, and a driving program for testing purposes. A Makefile has been provided, which runs on Unix using the g++ compiler.

This program is distributed under conditions of the GNU General Public License.

Download Latest release: (Version 1.1: 01/20/2016) alts_1.1.tar.gz

All Versions:

Version 1.1: (01/20/2016) alts_1.1.tar.gz
Version 1.0: (06/17/2015) alts_1.0.tar.gz

Paper:

A Practical Approximation Algorithm for the LTS Estimator, D. M. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A. Y. Wu, Computational Statistics & Data Analysis, 99 (2016), 148-170. (doi: 10.1016/j.csda.2016.01.016)

Abstract:

The linear least trimmed squares (LTS) estimator is a statistical technique for fitting a linear model to a set of points. It was proposed by Rousseeuw as a robust alternative to the classical least squares estimator. Given a set of n points in real d-dimensional space, the objective is to minimize the sum of the smallest 50% squared residuals (or more generally any given fraction). There exist practical heuristics for computing the linear LTS estimator, but they provide no guarantees on the accuracy of the final result. Two results are presented. First, a measure of the numerical condition of a set of points is introduced. Based on this measure, a probabilistic analysis of the accuracy of the best LTS fit resulting from a set of random elemental fits is presented. This analysis shows th at as the condition of the point set improves, the accuracy of the resulting fit also increases. Second, a new approximation algorithm for LTS, called Adaptive-LTS, is described. Given bounds on the minimum and maximum slope coefficients, this algorithm returns an approximation to the optimal LTS fit whose slope coefficients lie within the given bounds. Empirical evidence of this algorithm's efficiency and effectiveness is provided for a variety of data sets.