help-gsl
[Top][All Lists]

## Re: Question on regularized regression.

 From: Patrick Alken Subject: Re: Question on regularized regression. Date: Fri, 20 Nov 2020 10:07:57 -0700 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

Hello,

Unfortunately, as you note, support for underdetermined least squares
problems is a bit lacking in GSL. GSL's SVD is currently implemented
only for M >= N. There may be other ways to solve your problem however.
If I understand, you are solving,

min_x || b - Ax ||^2 + \lambda^2 || L x ||^2

You mention ridge regression, which I think usually implies L = I, but
for now we can keep a general L matrix. The above LS problem is
equivalent to:

min_x || [ b ] - [    A     ] x ||^2
|| [ 0 ]   [ lambda*L ]   ||

For your problem, does the augmented matrix [ A ; lambda*L ] have M >=
N? If so, you can use the SVD or QR routines on this matrix to get the
solution. Though it will be a little tedius if you want to compute an
L-curve to find the optimal lambda parameter.

If your augmented matrix still has M < N, then you could use the normal
equations approach,

x = (A^T A + lambda^2 L^T L)^{-1} A^T b

This is not ideal, but works well in practice if the regularized leads
to a reasonably well-conditioned normal equations matrix.

If neither of these solutions work for you, then you might try using the
LAPACK library, which has an SVD routine which works for M < N.

Best,
Patrick

On 11/20/20 9:35 AM, Raymond Salvador wrote:
> Dear colleagues of gsl,
>
> I've been trying to use the regularized regression functions provided by
> the gsl library with an X matrix with nrows < ncols (i.e. with more
> variables than observations, a usual scenario in ridge regression) but
> prior to model fitting a svd is requested (gsl_multifit_linear_svd()) but
> this function does no allow matrices with nrows < ncols. Should I use the
> gsl_multifit_linear_tsvd() instead as a previous step to run
> gsl_multifit_linear_solve()? Thanks a lot for your attention,
>