bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos


From: handa
Subject: bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos
Date: Sun, 28 Mar 2021 23:29:41 +0900

In article <83pmzkog6x.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > How about something like this method:
> > 1. Encode the buffer text one line by one until we get a longer byte
> > sequence than BYTE.
> > 2. Delete the result of enoding the last line above.
> > 3. Provided that the above last line has chars C1 C2 ... Cn, 
> > encode characters C1...Cn, C1...Cn-1, C1...Cn-2 until we get a shorter
> > byte sequence than BYTE.
> > 
> > The first step may be optimized by encode multiple lines instead of
> > single line.

> Even if we do optimize, this would be very slow, I think.

Whether it is too slow or not depends on what filepos-to-bufferpos is
used for.  Do you know why filepos-to-bufferpos (and
bufferpos-to-filepos) is introduced?

> And what if the buffer has no newlines?

In that case, just do the step 2.  Or, we can use the bi-sectioning
technique.

> In any case, the problem is not with encoding, the problem is with
> decoding.  Encoding doesn't have this problem because we always encode
> more than enough (we use the value of BYTE as the count of
> _characters_ to encode, so for ISO-2022 encoding it is usually much
> more than needed).  By contrast, when decoding, we decode exactly
> BYTE+1 bytes, which then hits the problem if that offset is inside a
> shift sequence.

Then, that implementation should be changed.

Any coding system can have :post-read-conversion and
:pre-write-conversion functions, it is not guaranteed that encoded byte
length is greater than the number of characters.

---
K. Handa
handa@gnu.org





reply via email to

[Prev in Thread] Current Thread [Next in Thread]