emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: decode-coding-string gone awry?


From: Stefan Monnier
Subject: Re: decode-coding-string gone awry?
Date: Fri, 18 Feb 2005 07:56:33 -0500
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux)

>> I think it should not be considered valid to decode a multibyte string,
>> whether the string happens to only contains ASCII (or ASCII+eight-bit-*)
>> or not.

> But, we allow decode-coding-region in a multibyte buffer.
> Then, it's strange not to allow something like this:
>   (decode-coding-string (buffer-substring FROM TO) CODING)

Maybe it's strange, but it would catch some bugs without restricting what
the user can do (since she can always pass the multibyte string through
some encode-coding-string or string-*-unibyte before).

>>> It's not a trivial work to change the current code (in coding.c) to signal
>>> an error safely while doing a code conversion.

>> If by "safely" you mean "which will not break currently working code",
>> I agree.  If by "safely" you mean "which will not break properly written
>> code", I disagree.

> I mean by "safely" to signal an error only at a safe place,
> i.e., the place where we can do a global exit.  For
> instance, we can't signal an error in decode_coding_iso2022
> because it may be modifying buffer contents directly.

Oh, sorry, I misunderstood.  In my code, I signal the error at the very
beginning (in code_convert_string1), which I believe is safe.

> By the way, what do you mean by "properly written code"?

I mean code which is written carefully with a good understanding of the
notion of encoding and decoding of coding-systems.  This basically boils
down to clearly distinguishing byte-sequences (aka not yet decoded strings),
typically stored in unibyte strings and buffers, and char-sequences (aka
already decoded strings), typically stored in multibyte strings and buffers.

Admittedly, in buffers the situation is less clear cut than in strings since
the (en|de)coding operations on buffers don't always operate on the whole
buffer at a time (contrary to string (en|de)coding), so we need to allow
decoding byte-sequences in multibyte buffers.


        Stefan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]