In this model, the target value is expected to be a linear combination of the input variables.
y(X, W) = w_0 + w_1 x_1 + ... + w_D x_D
Parameter W is estimated by least squares.
Linear regression is done via instances of:
>>> from scikits.learn import glm
>>> clf = glm.LinearRegression()
Coefficient estimates for multiple linear regression models rely on the independence of the model terms. When terms are correlated and the columns of the design matrix X have an approximate linear dependence, the matrix X(X^T X)^{-1} becomes close to singular. As a result, the least-squares estimate:
\hat{\beta} = (X^T X)^{-1} X^T y
becomes highly sensitive to random errors in the observed response y, producing a large variance. This situation of multicollinearity can arise, for example, when data are collected without an experimental design.
Ridge regression adresses the problem by estimating regression coefficients using:
\hat{\beta} = (X^T X + \alpha I)^{-1} X^T y
The lasso is a shrinkage method like ridge, with subtle but important differences.
TODO