Dear Julien,
Thank you very much for the reply!
In the meantime, I have already tried this approach (i.e. entering the spatial location as an input), and noticed that the training failed. I thought it
was because of the computer memory, but it makes more sense if STK cannot handle it, because I have a high spec computer.
But, I found a way. I carried out a PCA on the data, trained the emulator on the PCA scores, and when using this emulator in my integrated model, I simply
re-construct the water elevation predictions from the PCA scores. This works: keeps the emulation at minimum and fast.
Thanks again for the kind support and for the fantastic toolbox!
Best wishes,
Attila
From:
Julien Bect [mailto:address@hidden
Sent: Monday, December 15, 2014 8:28 AM
To: Lazar A.
Cc: address@hidden; help-octave Octave
Subject: Re: [Kriging-help] machine learning - multiple target variables
Le 08/12/2014 18:22, Lazar A. a écrit :
I have 69 highly correlated water elevation points along a river system, and I would like to create one GP ‘emulator’ that describes the water
elevation at all of those points. (I.e. I would like to avoid creating 69 separate GP emulators that works independently and disregards the spatial correlation).
I have successfully used the example_kb03 example file to machine-learn the behaviour of one water elevation location, but now, I have to come up with a method to describe the 69 locations.
So, my question is whether the stk toolbox offers any functions that would help to describe multiple timeseries? All 69 water elevation points have the exact same three inputs. Therefore, the input is a 3x1095 matrix, whereas the target would be a 69x1095 matrix
(1095 data points).
Hello Lazar,
Thank you so much for you interest in STK.
*** I also forward your question to the help-octave mailing list, where other people might have a different opinion. ***
Your problem, as I understand it, can be seen as a space-time modeling problem.
Assuming, e.g., that each of you elevation point is describes by its (x,y) position on a map, you can consider your input dataset as a set of
n = 1095 x 69 = 75555
evaluations point on a factor space of dimension
d = 3 + 2 = 5
(three inputs that you already had, plus two additional inputs for the space coordinates).
STK could *in principle* help you with that, provided that you can come up with an appropriate space-time covariance function. (There is a very rich literature about that, but an anisotropic Matérn covariance function, as already provided in STK, could be used
for a start.)
Unfortunately, the current state of implementation of STK cannot handle such large datasets (n = 75555).
If you want to help us improve STK to deal with such large datasets (there are several possible approaches proposed in the literature), you're welcome to join the project !
@++
Julien