users-prolog
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reading a file for lexing, one way to do it...


From: Daniel Diaz
Subject: Re: Reading a file for lexing, one way to do it...
Date: Tue, 23 Jul 2013 09:06:11 +0400

Hello,

Le 15 juil. 2013 à 23:47, Sean Charles <address@hidden> a écrit :

OK, well I have produced this tiny little program which so far does what I wanted as a first step, but I am a little confused by my somewhat miraculous arrival at this point and I want to make sure that I understand what is going on.

I ran it with tracing enabled on a small file containing just "ABC\n" and it was reassuring to see it do exactly what I thought it would do in the order that I expected it to BUT even so I would appreciate a little reassurance that I am on the right lines with it.

Here is the code:

lexit(Filename, Tokens) :-
    open(Filename, read, In),
    lexread(In, Tokens),
    close(In).

lexread(In, _) :- at_end_of_stream(In), !.

lexread(In, [ chr(C, Line, Col) | Tokens ]) :-
    stream_line_column(In, Line, Col),
    get_char(In, C),
    lexread(In, Tokens).

Running it:

lexit('small.txt',T).

T = [chr('A',1,1),chr('B',1,2),chr('C',1,3),chr('\n',1,4)|_]

Remark: your list is ended by a variable (reported as an underscore by the top-level). This is totally OK (and often used to further add elements to the end of the list). If it is not the wanted behavior (you want a proper list ended by the empty list []), you have to replace the first lexread/2 clause by:

lexread(In, []) :- at_end_of_stream(In), !.


Somehow I managed to figure out that I could put the "chr()" term inside the head as I read somewhere on stack overflow recently that you could do that to save a step or something. See, I am already running on vague, that's my lack of Prolog experience showing already!

The "confusing" bit is this:

    lexread(In, [ chr(C, Line, Col) | Tokens ]) :-

I can see that "Tokens" remains uninstantiated until the end-of-file condition triggers, at which point the complete call stack is picked up but I am unsure of the reasoning as to why the list comes out in the correct order, I think. I am seeing in my head a whole bund go .() "conses" all waiting to go ff one after the other.

Then this line:

    stream_line_column(In, Line, Col),

instantiates Line and Col thus the term cur(C,Line,Col) is now fully instantiated and then when the tail call to lexread() is made, a new temporary variable is created for Tokens because it is still uninstantiated. This continues until EOF at which point the stack frame is unwound and the list is constructed but why does it appear to be "right" i.e. the tokens read left to right in the same order as the characters in the file. I think I know but I am still al title shaky at this point!



Yes, this is the way to write it in Prolog. It is totally equivalent to:

% equivalent version
lexread(In, ListTok) :-
    stream_line_column(In, Line, Col),
    get_char(In, C),
    lexread(In, Tokens),
    ListTok = [ chr(C, Line, Col) | Tokens ].

which could be more in the spirit of C since the list seems created only when both the head and the tail are known. However it is not the case: the Prolog = is really an equation (over syntactic trees) and not an assignment as in C. So, ListTok = [ chr(C, Line, Col) | Tokens ] is the same as replacing all occurrences of ListTok by the right hand side [ chr(C, Line, Col) | Tokens ] (this is true as long as cuts are not used). From the implementation point of view, when [ chr(C, Line, Col) | Tokens ] is encountered in the head, the calling argument is linked (with a pointer) with a cons cell whose both the head and the tail are fresh variables which will be in turn linked (with pointers) to the created args. NB: a free variables is nothing else than an self-reference (a pointer to itself).


I have used Haskell for a few years now and on a memory consumption perspective, I have a hunch this method is very very bad as it would be creating huge swathes of stack frames especially for a very very large file but I am still learning. I have no doubt that there is a cleaner way using DCG-s but for now this is where I am thinking on.

Don't worry, your code benefits from the so called Last Call Optimization (LCO) (aka tail call optimization): the stack frame is recovered before calling the last predicate of the body. This is important for recursive clauses like the second clause of lexread/2. With LCO, the stack is recovered before the recursive call, thus the whole predicate runs in constant stack space (I mean the control stack, also called "local stack" in the WAM ; the heap obviously grows to store the resulting list).

Note that, in the above "equivalent version" the recursion is no longer terminal, and last (or tail) call optimization (LCO) no longer applies (a new frame is allocated in the local stack at each recursive call). Yet an another reason to prefer your initial version.

Daniel



Thanks,
Sean.




--
Ce message a été vérifié par MailScanner pour des virus ou des polluriels et rien de suspect n'a été trouvé.
_______________________________________________
Users-prolog mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/users-prolog


--
Ce message a été vérifié par MailScanner pour des virus ou des polluriels et rien de suspect n'a été trouvé.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]