bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#15426: 24.3.50; Multibyte filenames and directory-files in unibyte b


From: Eli Zaretskii
Subject: bug#15426: 24.3.50; Multibyte filenames and directory-files in unibyte buffer
Date: Sat, 21 Sep 2013 09:48:50 +0300

> From: Andreas Politz <politza@hochschule-trier.de>
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,  15426@debbugs.gnu.org
> Date: Fri, 20 Sep 2013 22:56:22 +0200
> 
> (let ((d "/tmp/\303\204")) ;; utf-8 for german umlaut "A 

This makes d a unibyte string:

  (setq d "/tmp/\303\204")
  "/tmp/\303\204"

  (multibyte-string-p d)
    => nil

Why would one do such a thing in the first place?  Are any of the file
names involved in your real-life use case unibyte strings that include
bytes above 127?  If there are, I suggest to find out how did they
come into existence -- that might be the source of your trouble.

Handling of unibyte strings in Emacs is optimized for certain use
cases, certainly not those that manipulate file names on the Lisp
level.  I suggest to stay away of unibyte strings as non-ASCII file
names, unless you really must (which normally is only necessary if you
need to encode and decode file names by hand, like when you get them
from some program, and the encoding of process output is different
from the encoding of file names on your system).  Otherwise, Lisp code
should only ever manipulate file names with non-ASCII characters that
are multibyte strings.

>   (when (file-exists-p d)
>     (delete-directory d t))
>   (make-directory d)
>   (append
>    (list (car (directory-files d t)) 
>          (file-exists-p (car (directory-files d t))))
>    ;; switch to a multibyte buffer
>    (with-temp-buffer
>      (list (car (directory-files d t))
>          (file-exists-p (car (directory-files d t)))))))
> --------------------8<-------------------------------------
> 
> If I save this somewhere (/tmp/foo.el), do
> 
> $ LC_ALL=C emacs -Q /tmp/foo.el
> 
> and evaluate it with C-x C-e, the minibuffer displays
> 
> => ("/tmp/\301\203\300\204/." nil "/tmp/\303\204/." t)

"The minibuffer displays" is the key point here: to display anything
in the minibuffer or echo area, Emacs first _inserts_ the textual
representation of that thing into a buffer, and then triggers
redisplay.  Insertion of unibyte strings into a multibyte buffer, or
insertion of multibyte strings into the minibuffer when the current
buffer is unibyte, causes all kinds of transformations on the inserted
string, whose purpose is to intuit what the user expects to see.  What
you see is the result of those transformations.  And yes, that result
could be baffling at times; that's why I suggest to stay away of
unibyte strings as much as you can, certainly as long as those strings
are file names with non-ASCII characters.

Again, I suggest to figure out if and how did you get unibyte strings
as file names in your original use case.

> I hope that clarifies it.

Sorry, it does not.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]