Re: Constrained non linear regression using ML

help-octave

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Constrained non linear regression using ML

From:	Corrado
Subject:	Re: Constrained non linear regression using ML
Date:	Fri, 19 Mar 2010 09:42:30 +0000
User-agent:	Thunderbird 2.0.0.23 (X11/20090817)

Dear Fredrik, dear Octavers,

Fredrik Lingvall wrote:

The likelihood function, L(theta), would in this case be:

p(y|theta,I) = L(theta) = 1/(2 pi)^(n/2) * 1/sqrt(det(Ce)) * exp(-0.5*
(y-f(theta*x))'*inv(Ce)*(y-f(theta*x)));

where Ce is the covariance matrix for e [e ~ N(0,Ce)]. ML is more
general than LS and in the Gaussian case they are the same only when Ce
is diagonal with the same variance for all elements [Ce =
sigma_e^2*eye(n,n)].

That is equivalent to the assumption of the e's being IID (Indipendentand identically distributed), isn't it?


So let's say:

g(y) = theta*x + transformed_e (1)

or

y = f(theta*x) + e (2)

where g is the link function and f is the inverse link function, andwhere if I guess the right link function, so that transformed_e is betadistributed and e is gaussian distributed (IID), then I should get thesame result using the two alternative solutions:


1) I use a non linear least square constrained algorithm on equation (2).
2) I use ML estimation on (1) by writing the:

likelihood(theta) = p(y|theta,I) = product_i p(y_i | theta, x_i) =product_i 1/B(p,q) (g(y)-theta*x)^(p-1) (1-g(y)+theta*x)^(q-1)

and maximizing the loglikelihood(theta) by using a convex constrainedoptimisation routine, where p,q are theoretically "connected" with mysample:


E(g(y)-theta*x) = p /(p+q)
Var(g(y)-theta*x) = pq/((p+q)^2(p+q+1))

Is that right?

But the if I compute the residuals on y after fitting of the model, theyshould be Gaussian distributed? Is that right?


The really great problem I have here, is that:

1) y is in [0,1]
2) f(theta*x) is (theoretically) in [0,1]
3) the pdf of e is dependent on E(y)

I particular:

1) when E(y) = 0, the distribution of e can only spread positive up to 1
2) when E(y) = 1, the distribution of e can only spread negative to -1

So when I actually use a Gaussian pdf for e in equation (2) or a betafor transformed_e in equation (1) it is actually an approximation bydesperation :D, which yield interesting results, but just better thannot having a model.

If e is distributed differently (for example: beta, in the continuous
case,  or binomial, in the discrete case), then I am better off by
using Maximum Likelihood.

If you have such knowledge then yes.

If would recommend that you try a maximum a posteriori approach (MAP)
instead (of ML) since you have some important prior knowledge about your

parameters, theta - they are positive.

Well .... yes and no.

What I really know is that when I use a certain link function f, thenthe theta should be positive in order for the regression to make anyphysical sense. But if the link function f changes, then this assuptionis not valid any more, and other assumptions may kick in.

If I understand well, the approach you are proposing is Bayesian,assuming a prior exponential positive distribution for the theta.

I do not know the Bayesian approach, thanks a lot for the books.Unfortunately, we do not have the Gregory book in our library, but Iwill download the other one. I stuck at home with bad back and theconnection is not great, so as soon as I am back .... :D

A the same time, I would like to initially use ML for comparison withsome previous work, before moving into Bayesian.

Is the distribution of the error (that is e in equation (2)) reallyimportant when you use Bayesian?

I wonder if the Bayesian approach would help me overcome the problem ofthe pdf of e being connected with E(y). What do you think?

For example, a positive
exponential distribution for your parameters seems like a good choice.
Try to maximize,

theta_est = arg max    lambda_theta*exp(-lambda_theta * theta) * L(theta)
                        theta

for some large but finite lambda_theta (use a large value when you don't
know much about the scale of your parameters). The exponential
distribution has the max entropy property when it is know that theta is
positive (which the Gaussian has for parameters that can be both pos and

neg) which makes it a "safe" assumption.

To be really careful, if you don't have any clue on the scale of your
parameters then you should integrate out (marginalize)  lambda_theta
which gives you more robust estimates. Again I recommend these books:

http://www.cambridge.org/catalogue/catalogue.asp?isbn=9780521841504 but
also Larry Bretthorst's book (available for download here:
http://bayes.wustl.edu/glb/bib.html) and papers.


/Fredrik



--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: address@hidden

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Constrained non linear regression using ML, (continued)

Prev by Date: Plotting directly from a file
Next by Date: Re: trying to optimize my octave program
Previous by thread: Re: Constrained non linear regression using ML
Next by thread: Re: Constrained non linear regression using ML
Index(es):
- Date
- Thread