emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: master 49e243c0c85: Avoid resizing mutation in subst-char-in-string,


From: Eli Zaretskii
Subject: Re: master 49e243c0c85: Avoid resizing mutation in subst-char-in-string, take two
Date: Tue, 14 May 2024 09:06:54 +0300

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Mon, 13 May 2024 21:20:24 +0200
> Cc: emacs-devel@gnu.org
> 
> 13 maj 2024 kl. 19.53 skrev Eli Zaretskii <eliz@gnu.org>:
> > 
> >> +  (if (and (not inplace)
> >> +           (if (multibyte-string-p string)
> >> +               (> (max fromchar tochar) 127)
> >> +             (> tochar 255)))
> > 
> > Is the above condition correct?  My reading of it is that if INPLACE
> > is non-nil, we use aset (which will resize a string) even if TOCHAR
> > needs more bytes than FROMCHAR.  Which seems to be in contradiction
> > with the goal of the change, as advertised by the log message: "avoid
> > resizing mutation".
> 
> I agree that it does look a bit odd, but it's intentional. First of all, the 
> aim is to insulate non-mutating calls to the function from issues arising 
> from mutation in the implementation. If we don't have to mutate and it's 
> faster and/or safer not to, then we shouldn't.
> 
> Second, the function is documented to change the string in-place if INPLACE 
> is non-nil, so in that case we have no choice but to mutate, or we might 
> silently break reasonable code.

So I guess the log message doesn't describe this intent clearly
enough.

> > why, in the case of a multibyte STRING, does the code look at the
> > codepoints of FROMCHAR and TOCHAR and not at the number of bytes they
> > take in the internal Emacs representation of the characters?
> 
> It's a conservative approximation that is much simpler than computing the 
> size of the internal representation. (It's also the condition proposed in 
> bug#70784.)

Which part of bug#70784 suggested that?  (It's a very long discussion,
and the suggestion at the beginning talks only about the unibyte
case.)

More to the point, the length of the multibyte string
deterministically depends on the character's codepoint, so I don't
really understand why you say it's "much simpler".  We could have a
primitive, say, char-bytes, to do that even faster, if we want this to
be as efficient as possible.  This will allow a large subset of calls
(without INPLACE = t) to be much faster than it is now, without
resizing the string.  IOW, we will be able to "avoid resizing
mutation" in many more cases.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]