bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#42904: [PATCH] Non-Unicode frame title crashes Emacs on macOS


From: Mattias Engdegård
Subject: bug#42904: [PATCH] Non-Unicode frame title crashes Emacs on macOS
Date: Fri, 21 Aug 2020 11:39:30 +0200

20 aug. 2020 kl. 21.13 skrev Eli Zaretskii <eliz@gnu.org>:

> I don't think I understand.  mode_line_noprop_buf gets the bytes, and
> then we call make_string on it, so the result is the same as the one
> you'd like to avoid.  Or am I missing something?
> 
> By "settling on multibyte representation", do you mean that we should
> convert raw bytes to their multibyte form?  Or do you mean something
> else?

No, I think we are talking about the same thing. Basically, it's about how the 
bytes end up in mode_line_noprop_buf in the first place, since currently the 
information of whether it should be interpreted as unibyte or multibyte gets 
lost as soon as data from the strings it is composed of (like the buffer name 
for %b, file name for %f etc) is added to it. Then make_string tries to restore 
that information by looking at the bytes, and it is not always accurate.

One way of doing this is to always make sure that the input strings (buffer 
name, file name, frame-title-format etc) are always in multibyte form. Another 
would be to convert to multibyte as those strings are used, presumably in 
decode_mode_spec. You know this code a lot better than I do, but the former may 
be slightly more workable (and efficient).

> Again, what would you like to have instead?  Would calling
> str_as_multibyte do what you want?

No, I don't think so -- once the unibyte/multibyte bit is lost, it can only be 
restored imperfectly if all we have is the sequence of bytes. In mathematical 
terms, the function that maps an arbitrary string object to its bytes has no 
inverse. (Consider the unibyte string "\xc3\xa5" -- should the bytes {c3, a5} 
be recreated as that unibyte string, or as the multibyte string "å"?)

Again we are talking about trivialities here, but perhaps the same syndrome 
will arise in other contexts where it matters more. If we wrote Emacs from 
scratch we likely wouldn't have unibyte strings at all: they are only there for 
compatibility and various niche uses and performance hacks. I don't think it's 
unreasonable to start normalising strings to multibyte where it matters.

Thanks for your patience!






reply via email to

[Prev in Thread] Current Thread [Next in Thread]