[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos
From: |
Eli Zaretskii |
Subject: |
bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos |
Date: |
Sun, 21 Mar 2021 17:27:45 +0200 |
> Date: Thu, 04 Mar 2021 21:21:24 +0000
> From: Gregory Heytings <gregory@heytings.org>
>
> (Disclaimer: I have no knowledge whatsoever about the ISO-2022-JP
> encoding, and although this looks like a bug, I'm not sure this is
> actually a bug; I report this at the suggesion of Eli in bug#46859.)
>
> I downloaded the file [1], and converted it to the ISO-2022-JP encoding
> with iconv -t iso-2022-jp one.txt > iso-2022-jp.txt. The resulting file
> is attached to this bug report. It ends with two CRLFs, at byte offsets
> 2993 and 2995. However, after emacs -Q iso-2022-jp.txt, with M-:
> (goto-char (filepos-to-bufferpos POS 'exact)) we get:
>
> POS = 2991, 2992: last but one visible character (HIRAGANA LETTER RU)
> POS = 2993, 2994: last visible character (IDEOGRAPHIC FULL STOP)
> POS = 2995, 2996: first CRLF
> POS = 2997: second CRLF
> POS = 2998: point-max
> POS = 2999: first CRLF
> POS = 3000, 3001: second CRLF
> POS >= 3002: point-max
>
> I would have expected:
>
> POS = 2989, 2990: last but one visible character (HIRAGANA LETTER RU)
> POS = 2991, 2992: last visible character (IDEOGRAPHIC FULL STOP)
> POS = 2993, 2994: first CRLF
> POS = 2995, 2996: second CRLF
> POS >= 2997: point-max
>
> The opposite operation M-: (bufferpos-to-filepos (- (point) POS) 'exact)
> apparently also has bugs; its return values are not coherent with the
> above ones:
>
> POS = 0: 3003
> POS = 1: 3001
> POS = 2: 2999
> POS = 3 (IDEOGRAPHIC FULL STOP): 2997
> POS = 4 (HIRAGANA LETTER RU): 2995
>
> I would have expected:
>
> POS = 0: 2997
> POS = 1: 2995
> POS = 2: 2993
> POS = 3 (IDEOGRAPHIC FULL STOP): 2991
> POS = 4 (HIRAGANA LETTER RU): 2989
>
> [1]
> https://darza.com/ecbackend/vendor/symfony/mime/Tests/Fixtures/samples/charsets/iso-2022-jp/one.txt
There's something strange going on here with encoding of the buffer
using iso-2022-jp-dos: near the end of the encoded bytestream, between
the encoded HIRAGANA LETTER KO (こ) and HIRAGANA LETTER TO (と), we
get 6 extra bytes: "ESC ( B ESC $ B". AFAIU, this sequence mean
switch to ASCII and then switch back to Japanese. So together these 6
bytes are a no-op as regards to their effect on the text, but they
disrupt the logic of filepos-to-bufferpos because they introduce extra
bytes that aren't there in the original file.
Kenichi, why are these 6 bytes inserted by encode-coding-region, but
not when we encode the same text as part of saving the buffer to its
file? And why does it happen near the end of the text, between those
2 particular letters?
- bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos, Gregory Heytings, 2021/03/04
- bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos,
Eli Zaretskii <=
- bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos, handa, 2021/03/27
- bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos, Eli Zaretskii, 2021/03/27
- bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos, handa, 2021/03/27
- bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos, Eli Zaretskii, 2021/03/27
- bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos, handa, 2021/03/28
- bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos, Eli Zaretskii, 2021/03/28
- bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos, Gregory Heytings, 2021/03/27