[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: docs for insert-file-contents use 'bytes'
From: |
Ted Zlatanov |
Subject: |
Re: docs for insert-file-contents use 'bytes' |
Date: |
Tue, 30 Sep 2008 08:48:28 -0500 |
User-agent: |
Gnus/5.110011 (No Gnus v0.11) Emacs/23.0.60 (gnu/linux) |
On Tue, 30 Sep 2008 10:19:26 +0300 Eli Zaretskii <address@hidden> wrote:
>> From: Ted Zlatanov <address@hidden>
>> Date: Mon, 29 Sep 2008 16:04:13 -0500
>>
>> This is not a safe operation mode with multibyte sequences; is there a
>> way to DTRT? I'm specifically thinking about a paged buffer mode where
>> you only see a small portion of the file (for editing large files, as we
>> discussed in another newsgroup a while ago).
EZ> How about this idea: read a bit more than you want, then find safe
EZ> place to end this page-full?
How do I find the next safe position in the byte flow?
>> I don't know if this is the right wording, but it's a pretty essential
>> operation so it should give some warning about this common (nowadays)
>> case.
EZ> Is it really a common case that insert-file-contents is used to read a
EZ> portion of a file? Where is this used?
I want to use it to implement a paged view of large files. We discussed
this in emacs-help and you suggested using insert-file-contents IIRC.
Anyhow, the point is the docs don't mention this issue, let's fix that
first. I mention one possible way to do the code below.
On Tue, 30 Sep 2008 15:06:17 +0900 Miles Bader <address@hidden> wrote:
MB> Ted Zlatanov <address@hidden> writes:
EZ> Like with any other random bytes, I think: it will produce eight-bit-*
EZ> characters in the buffer. IOW, you get garbled text.
>>
>> This is not a safe operation mode with multibyte sequences; is there a
>> way to DTRT? I'm specifically thinking about a paged buffer mode where
>> you only see a small portion of the file (for editing large files, as we
>> discussed in another newsgroup a while ago).
MB> Why is it "not safe"?
Because the text will be corrupted if you seek in the middle of a
multibyte sequence, and there's no way to know in advance if a position
is safe without at least some scanning.
MB> How would you do things differently?
I don't know, I'm just saying the docs don't mention the possibility of
corrupted text. Can we fix that, if possible? The docs just need to
warn, not solve the issue.
MB> In conjunction with _file_ contents, a byte offset seems certainly the
MB> most natural thing. An "encoded character offset", for instance, would
MB> be far less efficient, much more complex to implement (and thus
MB> buggier), and harder to use in general.
Agreed. Still, encoding schemes like UTF-8 are so popular today that
the docs should at least warn about careless seeking to a byte offset.
There could be a insert-file-decoded-contents that seeks to a byte
position and gets the next character at or after that position. That's
not too hard to implement and it's fast.
Ted
- docs for insert-file-contents use 'bytes', Ted Zlatanov, 2008/09/29
- Re: docs for insert-file-contents use 'bytes', Eli Zaretskii, 2008/09/29
- Re: docs for insert-file-contents use 'bytes', Ted Zlatanov, 2008/09/29
- Re: docs for insert-file-contents use 'bytes', Miles Bader, 2008/09/30
- Re: docs for insert-file-contents use 'bytes', Eli Zaretskii, 2008/09/30
- Re: docs for insert-file-contents use 'bytes',
Ted Zlatanov <=
- Re: docs for insert-file-contents use 'bytes', Stefan Monnier, 2008/09/30
- Re: docs for insert-file-contents use 'bytes', Eli Zaretskii, 2008/09/30
- Re: docs for insert-file-contents use 'bytes', Kenichi Handa, 2008/09/30