[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: docs for insert-file-contents use 'bytes'
From: |
Ted Zlatanov |
Subject: |
Re: docs for insert-file-contents use 'bytes' |
Date: |
Mon, 29 Sep 2008 16:04:13 -0500 |
User-agent: |
Gnus/5.110011 (No Gnus v0.11) Emacs/23.0.60 (gnu/linux) |
On Mon, 29 Sep 2008 23:12:58 +0300 Eli Zaretskii <address@hidden> wrote:
>> From: Ted Zlatanov <address@hidden>
>> Date: Mon, 29 Sep 2008 14:58:17 -0500
>>
>> The docs for insert-file-contents say the range is in bytes, but that
>> function does decoding of the contents. Can it, therefore, read from an
>> undesirable position (e.g. the middle of a UTF-8 sequence)?
EZ> The range _is_ in bytes (you will see in fileio.c that Emacs uses
EZ> `lseek' to get to the required file positions). Yes, reading a part
EZ> of a multibyte sequence is a possibility.
>> How does Emacs handle that?
EZ> Like with any other random bytes, I think: it will produce eight-bit-*
EZ> characters in the buffer. IOW, you get garbled text.
This is not a safe operation mode with multibyte sequences; is there a
way to DTRT? I'm specifically thinking about a paged buffer mode where
you only see a small portion of the file (for editing large files, as we
discussed in another newsgroup a while ago).
>> Either way the docs need to state the operation mode clearly.
EZ> Assuming I don't miss anything, and the above is indeed correct, what
EZ> would you like the doc string to say, exactly?
Maybe add:
"Warning: this is not safe with variable-length multibyte encodings such
as UTF-8, because it works by byte offset without encoding awareness, so
you may get garbled data. See ??? instead."
I don't know if this is the right wording, but it's a pretty essential
operation so it should give some warning about this common (nowadays)
case.
Ted
- docs for insert-file-contents use 'bytes', Ted Zlatanov, 2008/09/29
- Re: docs for insert-file-contents use 'bytes', Eli Zaretskii, 2008/09/29
- Re: docs for insert-file-contents use 'bytes',
Ted Zlatanov <=
- Re: docs for insert-file-contents use 'bytes', Miles Bader, 2008/09/30
- Re: docs for insert-file-contents use 'bytes', Eli Zaretskii, 2008/09/30
- Re: docs for insert-file-contents use 'bytes', Ted Zlatanov, 2008/09/30
- Re: docs for insert-file-contents use 'bytes', Stefan Monnier, 2008/09/30
- Re: docs for insert-file-contents use 'bytes', Eli Zaretskii, 2008/09/30
- Re: docs for insert-file-contents use 'bytes', Kenichi Handa, 2008/09/30