[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos
From: |
handa |
Subject: |
bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos |
Date: |
Sun, 28 Mar 2021 23:29:41 +0900 |
In article <83pmzkog6x.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> > How about something like this method:
> > 1. Encode the buffer text one line by one until we get a longer byte
> > sequence than BYTE.
> > 2. Delete the result of enoding the last line above.
> > 3. Provided that the above last line has chars C1 C2 ... Cn,
> > encode characters C1...Cn, C1...Cn-1, C1...Cn-2 until we get a shorter
> > byte sequence than BYTE.
> >
> > The first step may be optimized by encode multiple lines instead of
> > single line.
> Even if we do optimize, this would be very slow, I think.
Whether it is too slow or not depends on what filepos-to-bufferpos is
used for. Do you know why filepos-to-bufferpos (and
bufferpos-to-filepos) is introduced?
> And what if the buffer has no newlines?
In that case, just do the step 2. Or, we can use the bi-sectioning
technique.
> In any case, the problem is not with encoding, the problem is with
> decoding. Encoding doesn't have this problem because we always encode
> more than enough (we use the value of BYTE as the count of
> _characters_ to encode, so for ISO-2022 encoding it is usually much
> more than needed). By contrast, when decoding, we decode exactly
> BYTE+1 bytes, which then hits the problem if that offset is inside a
> shift sequence.
Then, that implementation should be changed.
Any coding system can have :post-read-conversion and
:pre-write-conversion functions, it is not guaranteed that encoded byte
length is greater than the number of characters.
---
K. Handa
handa@gnu.org