help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Constrained non linear regression using ML


From: Corrado
Subject: Re: Constrained non linear regression using ML
Date: Fri, 19 Mar 2010 09:42:30 +0000
User-agent: Thunderbird 2.0.0.23 (X11/20090817)

Dear Fredrik, dear Octavers,

Fredrik Lingvall wrote:
The likelihood function, L(theta), would in this case be:

p(y|theta,I) = L(theta) = 1/(2 pi)^(n/2) * 1/sqrt(det(Ce)) * exp(-0.5*
(y-f(theta*x))'*inv(Ce)*(y-f(theta*x)));

where Ce is the covariance matrix for e [e ~ N(0,Ce)]. ML is more
general than LS and in the Gaussian case they are the same only when Ce
is diagonal with the same variance for all elements [Ce =
sigma_e^2*eye(n,n)].
That is equivalent to the assumption of the e's being IID (Indipendent and identically distributed), isn't it?

So let's say:

g(y) = theta*x + transformed_e (1)

or

y = f(theta*x) + e (2)

where g is the link function and f is the inverse link function, and where if I guess the right link function, so that transformed_e is beta distributed and e is gaussian distributed (IID), then I should get the same result using the two alternative solutions:

1) I use a non linear least square constrained algorithm on equation (2).
2) I use ML estimation on (1) by writing the:

likelihood(theta) = p(y|theta,I) = product_i p(y_i | theta, x_i) = product_i 1/B(p,q) (g(y)-theta*x)^(p-1) (1-g(y)+theta*x)^(q-1)

and maximizing the loglikelihood(theta) by using a convex constrained optimisation routine, where p,q are theoretically "connected" with my sample:

E(g(y)-theta*x) = p /(p+q)
Var(g(y)-theta*x) = pq/((p+q)^2(p+q+1))

Is that right?

But the if I compute the residuals on y after fitting of the model, they should be Gaussian distributed? Is that right?

The really great problem I have here, is that:

1) y is in [0,1]
2) f(theta*x) is (theoretically) in [0,1]
3) the pdf of e is dependent on E(y)

I particular:

1) when E(y) = 0, the distribution of e can only spread positive up to 1
2) when E(y) = 1, the distribution of e can only spread negative to -1

So when I actually use a Gaussian pdf for e in equation (2) or a beta for transformed_e in equation (1) it is actually an approximation by desperation :D, which yield interesting results, but just better than not having a model.

If e is distributed differently (for example: beta, in the continuous
case,  or binomial, in the discrete case), then I am better off by
using Maximum Likelihood.
If you have such knowledge then yes.

If would recommend that you try a maximum a posteriori approach (MAP)
instead (of ML) since you have some important prior knowledge about your
parameters, theta - they are positive.
Well .... yes and no.

What I really know is that when I use a certain link function f, then the theta should be positive in order for the regression to make any physical sense. But if the link function f changes, then this assuption is not valid any more, and other assumptions may kick in.

If I understand well, the approach you are proposing is Bayesian, assuming a prior exponential positive distribution for the theta.

I do not know the Bayesian approach, thanks a lot for the books. Unfortunately, we do not have the Gregory book in our library, but I will download the other one. I stuck at home with bad back and the connection is not great, so as soon as I am back .... :D

A the same time, I would like to initially use ML for comparison with some previous work, before moving into Bayesian.

Is the distribution of the error (that is e in equation (2)) really important when you use Bayesian?

I wonder if the Bayesian approach would help me overcome the problem of the pdf of e being connected with E(y). What do you think?
For example, a positive
exponential distribution for your parameters seems like a good choice.
Try to maximize,

theta_est = arg max    lambda_theta*exp(-lambda_theta * theta) * L(theta)
                        theta

for some large but finite lambda_theta (use a large value when you don't
know much about the scale of your parameters). The exponential
distribution has the max entropy property when it is know that theta is
positive (which the Gaussian has for parameters that can be both pos and
neg) which makes it a "safe" assumption.
To be really careful, if you don't have any clue on the scale of your
parameters then you should integrate out (marginalize)  lambda_theta
which gives you more robust estimates. Again I recommend these books:

http://www.cambridge.org/catalogue/catalogue.asp?isbn=9780521841504 but
also Larry Bretthorst's book (available for download here:
http://bayes.wustl.edu/glb/bib.html) and papers.


/Fredrik





--
Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: address@hidden



reply via email to

[Prev in Thread] Current Thread [Next in Thread]