[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#38587: base64-decode-region breaks encoding
From: |
Juri Linkov |
Subject: |
bug#38587: base64-decode-region breaks encoding |
Date: |
Mon, 16 Dec 2019 00:40:55 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (x86_64-pc-linux-gnu) |
>> Maybe an additional CODING arg for base64-decode-region?
>
> BASE64 is defined on a sequence of bytes. It doesn't make sense to
> apply it to characters.
But isn't UTF-8 a multibyte encoding represented by a sequence of bytes
(e.g. when saved to a file)?
Then why base64-encode-region couldn't use the buffer's coding
to convert the region to a sequence of bytes?
Also why base64-encode-region accepts region's characters
only from the charsets ‘eight-bit-control’ and ‘eight-bit-graphic’,
but not other UTF-8 characters?
> The input of base64-encode-region needs to be encoded into bytes and the
> output of base64-decode-region needs to be decoded into characters. If
> you do that, you get a full reversible operation.
I guess base64-encode-region already encodes the region into bytes,
but only partially - it signals an error on some characters,
I don't understand why it can't encode all of them.
>> Or it would be enough to use the coding system of the
>> output buffer?
>
> The coding system of the output buffer has nothing to do with the coding
> of the data produced by base64-decode-region, just like
> process-coding-system is independent from the coding system of the
> process buffer.
It's understandable that the coding system of the output buffer
is not necessarily the same as expected from the output of
base64-decode-region.
But is it still possible to tell base64-decode-region
about the expected output coding system? Maybe using
a prefix arg: C-u M-x base64-decode-region could ask
for a coding, defaulting to the buffer's coding.
For example, in Ruby
require 'base64'
Base64.decode64(Base64.encode64("☃"))
=> "\xE2\x98\x83"
indeed outputs ASCII not encoded to UTF-8.
But it's possible to force encoding with:
Base64.decode64(Base64.encode64("☃")).force_encoding('UTF-8')
=> "☃"
Is there an equivalent of force_encoding('UTF-8') in Emacs?
I tried to call after base64-decode-region on its output:
(decode-coding-region (point-min) (point-max) 'binary)
but it doesn't work, neither this:
(encode-coding-region (point-min) (point-max) 'utf-8)
Also this doesn't work on the string output:
(decode-coding-string (base64-decode-string (base64-encode-string "ä"))
'utf-8)
Maybe I'm doing something wrong?
- bug#38587: base64-decode-region breaks encoding, Juri Linkov, 2019/12/12
- bug#38587: base64-decode-region breaks encoding, Lars Ingebrigtsen, 2019/12/12
- bug#38587: base64-decode-region breaks encoding, Eli Zaretskii, 2019/12/13
- bug#38587: base64-decode-region breaks encoding, Juri Linkov, 2019/12/14
- bug#38587: base64-decode-region breaks encoding, Andreas Schwab, 2019/12/15
- bug#38587: base64-decode-region breaks encoding,
Juri Linkov <=
- bug#38587: base64-decode-region breaks encoding, Eli Zaretskii, 2019/12/16
- bug#38587: base64-decode-region breaks encoding, Juri Linkov, 2019/12/16
- bug#38587: base64-decode-region breaks encoding, Eli Zaretskii, 2019/12/17
- bug#38587: base64-decode-region breaks encoding, Juri Linkov, 2019/12/17
- bug#38587: base64-decode-region breaks encoding, Lars Ingebrigtsen, 2019/12/24
- bug#38587: base64-decode-region breaks encoding, Lars Ingebrigtsen, 2019/12/24
- bug#38587: base64-decode-region breaks encoding, Andreas Schwab, 2019/12/16
- bug#38587: base64-decode-region breaks encoding, Lars Ingebrigtsen, 2019/12/17
- bug#38587: base64-decode-region breaks encoding, Eli Zaretskii, 2019/12/15
- bug#38587: base64-decode-region breaks encoding, Juri Linkov, 2019/12/15