help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Constrained non linear regression using ML


From: Fredrik Lingvall
Subject: Re: Constrained non linear regression using ML
Date: Fri, 19 Mar 2010 13:30:25 +0100
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.8) Gecko/20100318 Lightning/1.0b2pre Thunderbird/3.0.3

On 03/19/10 10:42, Corrado wrote:
> Dear Fredrik, dear Octavers,
>
> Fredrik Lingvall wrote:
>> The likelihood function, L(theta), would in this case be:
>>
>> p(y|theta,I) = L(theta) = 1/(2 pi)^(n/2) * 1/sqrt(det(Ce)) * exp(-0.5*
>> (y-f(theta*x))'*inv(Ce)*(y-f(theta*x)));
>>
>> where Ce is the covariance matrix for e [e ~ N(0,Ce)]. ML is more
>> general than LS and in the Gaussian case they are the same only when Ce
>> is diagonal with the same variance for all elements [Ce =
>> sigma_e^2*eye(n,n)].
>>   
> That is equivalent to the assumption of the e's being IID (Indipendent
> and identically distributed), isn't it?
>

Yes.

> So let's say:
>
> g(y) = theta*x + transformed_e (1)
>
> or
>
> y = f(theta*x) + e (2)
>
> where g is the link function and f is the inverse link function, and
> where if I guess the right link function, so that transformed_e is
> beta distributed and e is gaussian distributed (IID), then I should
> get the same result using the two alternative solutions:
>
> 1) I use a non linear least square constrained algorithm on equation (2).
> 2) I use ML estimation on (1) by writing the:
>
> likelihood(theta) = p(y|theta,I)  =  product_i p(y_i | theta, x_i) = 
> product_i 1/B(p,q) (g(y)-theta*x)^(p-1) (1-g(y)+theta*x)^(q-1)
>
> and maximizing the loglikelihood(theta) by using a convex constrained
> optimisation routine,  where p,q are theoretically "connected" with my
> sample:
>
> E(g(y)-theta*x) = p /(p+q)
> Var(g(y)-theta*x) = pq/((p+q)^2(p+q+1))
>
> Is that right?
>
> But the if I compute the residuals on y after fitting of the model,
> they should be Gaussian distributed? Is that right?

What I think you are saying is that you don't know the variance of the
errors so you are trying to estimate it from data as well? This is
actually not a bad idea since different models will have a different fit
to the data, so the model miss-fit (the error) and hence the variance
will be different for different models.

>
> The really great problem I have here, is that:
>
> 1) y is in [0,1]
> 2) f(theta*x) is (theoretically) in [0,1]

This is prior information that can improve your estimate of theta
significantly. If you know 1), 2), f and x then what can you say about
the bounds on theta for example?

> 3) the pdf of e is dependent on E(y)

Note that y is your data and it is not distributed at all - it is the
numbers that your data recording machine gave you. The error is just
something you "add" because you don't have perfect knowledge of the
physical process that you are studying. The better your knowledge is
(better theory/better model) the less the error becomes.

>
> I particular:
>
> 1) when E(y) = 0, the distribution of e can only spread positive up to 1
> 2) when E(y) = 1, the distribution of e can only spread negative to -1
>
> So when I actually use a Gaussian pdf for e in equation (2)  or a beta
> for transformed_e in equation (1) it is actually an approximation by
> desperation :D, which yield interesting results, but just better than
> not having a model.

No I would not say that it is "an approximation by desperation". The
Gaussian assumption is a very conservative and safe assumption.

>>
>> If would recommend that you try a maximum a posteriori approach (MAP)
>> instead (of ML) since you have some important prior knowledge about your
>> parameters, theta - they are positive. 
> Well .... yes and no.
>
> What I really know is that when I use a certain link function f, then
> the theta should be positive in order for the regression to make any
> physical sense. But if the link function f changes, then this
> assuption is not valid any more, and other assumptions may kick in.

OK.

>
> If I understand well, the approach you are proposing is Bayesian,
> assuming a prior exponential positive distribution for the theta.
>
> I do not know the Bayesian approach, thanks a lot for the books.
> Unfortunately, we do not have the Gregory book in our library, but I
> will download the other one. I stuck at home with bad back and the
> connection is not great, so as soon as I am back .... :D
>
> A the same time, I would like to initially use ML for comparison with
> some previous work, before moving into Bayesian.

The computational difference may not be as huge as you think. In ML you
just try to maximize L(theta) and in MAP (Bayesian) you maximize
p(theta|I)*L(theta) so you end up with an optimization problem in both
cases. When you do ML you don't use all info you have (bounds of the
parameters etc). Using such information often improve your estimates
significantly.

>
> Is the distribution of the error (that is e in equation (2)) really
> important when you use Bayesian?
>
> I wonder if the Bayesian approach would help me overcome the problem
> of the pdf of e being connected with E(y). What do you think?
>

See my comment above.

/Fredrik



reply via email to

[Prev in Thread] Current Thread [Next in Thread]