Hello Manu,
I was actually thinking of working on something along the same lines. Googling around, I see a few efforts along these lines:
While I have not used these personally, looking at the examples, it appears one needs to be quite aware of MPI / parallel programming to make effective use of these package. It would be nice if all these things could be made more transparent to the end user.
Many years ago, I actually worked with a team at MIT Lincoln Laboratory, designing this sort of thing for Matlab:
The idea was to hide as much of the parallelism as possible. This was done by using a special "dmatrix" class to represent your matrix, which was, in reality, spread across several servers. I don't know if this is still an active project there ( Any of my old Lincoln friends can comment? :-) ).
Of course technology has moved on since then and there is probably alot of new technology we can leverage to make something similar that cleaner/faster/smarter ( i.e. more awsome ). For example, metadata for where the data was was stored in a static file back in days. I reckon we can just query a distributed file system to get this same information these days, along with all the benefits of replication / performance / resilience.
All this being said, perhaps someone more familiar with the source code could comment on how easy / impossible such a thing would be given the current state of Octave? In terms of your direct question of how you can build a compiler for Octave, I believe a good place to start is to look at the lex/yacc grammar files:
octave/libinterp/parse-tree/lex.ll
octave/libinterp/parse-tree/oct-parse.yy
Regards,
Bipin