help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: avoid interpretation of \n, \t, ... in string


From: Pascal J. Bourguignon
Subject: Re: avoid interpretation of \n, \t, ... in string
Date: Wed, 28 Jan 2009 15:02:07 +0100
User-agent: Gnus/5.1008 (Gnus v5.10.8) Emacs/22.2 (gnu/linux)

Peter Tury <tury.peter@gmail.com> writes:

> Hi,
>
> Pascal J. Bourguignon wrote:
>
>> Switch to Common Lisp.  There's no reader macro in emacs lisp, so you
>> cannot do much about it.  In Common Lisp, you can trivially implement
>
> I think this will be a longer journey sometime in the future. CL is on
> my "todo" list for some time ;-)
>
>> Ok, another way to do it would be to store your paths in a file, and
>> to read it:
>>
>> (defun read-paths (file)
>>   (with-temp-buffer
>>     (insert-file-contents file)
>>     (delete "" (split-string (buffer-substring-no-properties
>>                               (point-min) (point-max))
>>                    "[\n\r]+"))))
>
> Great, thanks!
> I've checked it and found that in fact `buffer-substring-no-
> properties' does the trick here. So my original question can be
> reformulated now:
>
> ---> is there a way to get string (text) representation in a form as
> `buffer-substring-no-properties' do it, i.e. duplicating single `\'-s
> automatically (without(!) interpreting "pseudo-escape-sequences" (\n,
> \t, ...) in the original text)?

buffer-substring-no-properties doesn't do anything.  There is
absolutely no duplicating of any character.

Try to understand that there is only one character in the string "\\".

(length "\\") --> 1

  (insert (format "%s %S" "\\" "\\")) 

inserts:

  \ "\\"


The double backslash comes from the string quoting.

Here are some characters:  abc'\"def

Now the problem is to quote these characters to be able to put them in
a program, as a string literal, so they aren't interpreted as code.
We do that by surrounding the characters with double-quotes:

                          "abc'\"def"

Oops!  That is broken because one of these characters is a
double-quote, so we'd interpret that as the string containing the
characters:
                           abc'\
followed by the symbol named:   def
and a stray double-quote           "

The problem here is that we'd need a way to escape the meaning of the
double-quote, so it doesn't mean anymore to close the string literal.
The idea is to use an 'escape' character, back-slash.

                          "abc'\\"def"

Oops!  Still a problem here.  Since there is also a back-slash in the
string, it needs to be escaped too, otherwise we will consider it
escapes the following character...

                          "abc'\\\"def"

Ok, so now we can tell that this is a string literal because of the
opening double-quote:     "
that contains the normal characters:
                           abc'                            *
then an escaped character prefixed by:
                               \
which is a back-slash character itself:
                                \                          *
then an escaped character prefixed by:
                                  \
which is a double-quote character itself:
                                   "                       *
followed by the normal characters:  def                    *
and closed by a double-quote:          "

So finally, this string literal only contains the characters:
                           abc'\"def


This algorithm of reading string literals is implemented by the emacs
lisp reader.  And of course, when you want to print (format) a string,
you can either output the characters contained in the string (format
"%s" ...), princ), or output characters that will be read a string
literal, with double-quotes and escaping back-slashes (format "%S"
...), prin1, print).

(let ((string "abc with escape: \\ and with substring: \"abc\"."))
   (terpri)
   (princ "with princ: ") (princ string)
   (terpri)
   (princ "with prin1: ") (prin1 string)
   (terpri)
   (princ "with print: ") (print string)
   (terpri))

inserts:

with princ: abc with escape: \ and with substring: "abc".
with prin1: "abc with escape: \\ and with substring: \"abc\"."
with print: 
"abc with escape: \\ and with substring: \"abc\"."

returns: t  
   
The double-quotes and back-slashes are added by prin1 and print just
to allow reading back data that has been printed.







The Common Lisp reader algorithms is more sophisticated, it allows for
hooks called reader macros, which let you implement your own string
reading algorithm.  For example, you could change the escaping
character, or not have any, and this would let  you write strings
containing back-slashes.

We would have to change the function read1 in lread.c to add this
feature.  Unfortunately we cannot just redefine in emeacs lisp such a
function, because all the code written in C is already linked to the
old function written in C, and wouldn't use our implementation in
emacs lisp.  We would have to modify the C sources (and have the patch
accepted by RMS).


-- 
__Pascal Bourguignon__


reply via email to

[Prev in Thread] Current Thread [Next in Thread]