emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: docs for insert-file-contents use 'bytes'


From: Ted Zlatanov
Subject: Re: docs for insert-file-contents use 'bytes'
Date: Mon, 29 Sep 2008 16:04:13 -0500
User-agent: Gnus/5.110011 (No Gnus v0.11) Emacs/23.0.60 (gnu/linux)

On Mon, 29 Sep 2008 23:12:58 +0300 Eli Zaretskii <address@hidden> wrote: 

>> From: Ted Zlatanov <address@hidden>
>> Date: Mon, 29 Sep 2008 14:58:17 -0500
>> 
>> The docs for insert-file-contents say the range is in bytes, but that
>> function does decoding of the contents.  Can it, therefore, read from an
>> undesirable position (e.g. the middle of a UTF-8 sequence)?

EZ> The range _is_ in bytes (you will see in fileio.c that Emacs uses
EZ> `lseek' to get to the required file positions).  Yes, reading a part
EZ> of a multibyte sequence is a possibility.

>> How does Emacs handle that?

EZ> Like with any other random bytes, I think: it will produce eight-bit-*
EZ> characters in the buffer.  IOW, you get garbled text.

This is not a safe operation mode with multibyte sequences; is there a
way to DTRT?  I'm specifically thinking about a paged buffer mode where
you only see a small portion of the file (for editing large files, as we
discussed in another newsgroup a while ago).

>> Either way the docs need to state the operation mode clearly.

EZ> Assuming I don't miss anything, and the above is indeed correct, what
EZ> would you like the doc string to say, exactly?

Maybe add:

"Warning: this is not safe with variable-length multibyte encodings such
as UTF-8, because it works by byte offset without encoding awareness, so
you may get garbled data.  See ??? instead."

I don't know if this is the right wording, but it's a pretty essential
operation so it should give some warning about this common (nowadays)
case.

Ted





reply via email to

[Prev in Thread] Current Thread [Next in Thread]