[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#38587: base64-decode-region breaks encoding
From: |
Eli Zaretskii |
Subject: |
bug#38587: base64-decode-region breaks encoding |
Date: |
Mon, 16 Dec 2019 17:58:29 +0200 |
> From: Juri Linkov <juri@linkov.net>
> Date: Mon, 16 Dec 2019 00:40:55 +0200
> Cc: Lars Ingebrigtsen <larsi@gnus.org>, 38587@debbugs.gnu.org
>
> > BASE64 is defined on a sequence of bytes. It doesn't make sense to
> > apply it to characters.
>
> But isn't UTF-8 a multibyte encoding represented by a sequence of bytes
> (e.g. when saved to a file)?
When saved to a file, yes.
> Then why base64-encode-region couldn't use the buffer's coding
> to convert the region to a sequence of bytes?
Because it isn't guaranteed that the buffer's encoding is indeed the
right one for this job.
> Also why base64-encode-region accepts region's characters
> only from the charsets ‘eight-bit-control’ and ‘eight-bit-graphic’,
> but not other UTF-8 characters?
Because it wants raw bytes, and only eight-bit charsets fit that
condition. Eight-bit charset is the charset of raw bytes in a
multibyte buffer or string.
(base64-encode-region can also work on unibyte buffers and strings, in
which case "charset" of such "text" has no meaning.)
> > The input of base64-encode-region needs to be encoded into bytes and the
> > output of base64-decode-region needs to be decoded into characters. If
> > you do that, you get a full reversible operation.
>
> I guess base64-encode-region already encodes the region into bytes,
> but only partially - it signals an error on some characters,
> I don't understand why it can't encode all of them.
Once again, because it wants to process only raw bytes.
> But is it still possible to tell base64-decode-region
> about the expected output coding system? Maybe using
> a prefix arg: C-u M-x base64-decode-region could ask
> for a coding, defaulting to the buffer's coding.
If we want to make such a change, then "C-x RET c" is a better prefix
command, as it is consistent with other commands that accept
coding-system overrides.
> Is there an equivalent of force_encoding('UTF-8') in Emacs?
"C-x RET c utf-8 RET M-x SOME-COMMAND RET"
> Also this doesn't work on the string output:
>
> (decode-coding-string (base64-decode-string (base64-encode-string "ä"))
> 'utf-8)
It will work if you encode "ä" first:
(decode-coding-string (base64-decode-string
(base64-encode-string
(encode-coding-string "ä" 'utf-8)))
'utf-8)
- bug#38587: base64-decode-region breaks encoding, Juri Linkov, 2019/12/12
- bug#38587: base64-decode-region breaks encoding, Lars Ingebrigtsen, 2019/12/12
- bug#38587: base64-decode-region breaks encoding, Eli Zaretskii, 2019/12/13
- bug#38587: base64-decode-region breaks encoding, Juri Linkov, 2019/12/14
- bug#38587: base64-decode-region breaks encoding, Andreas Schwab, 2019/12/15
- bug#38587: base64-decode-region breaks encoding, Juri Linkov, 2019/12/15
- bug#38587: base64-decode-region breaks encoding,
Eli Zaretskii <=
- bug#38587: base64-decode-region breaks encoding, Juri Linkov, 2019/12/16
- bug#38587: base64-decode-region breaks encoding, Eli Zaretskii, 2019/12/17
- bug#38587: base64-decode-region breaks encoding, Juri Linkov, 2019/12/17
- bug#38587: base64-decode-region breaks encoding, Lars Ingebrigtsen, 2019/12/24
- bug#38587: base64-decode-region breaks encoding, Lars Ingebrigtsen, 2019/12/24
- bug#38587: base64-decode-region breaks encoding, Andreas Schwab, 2019/12/16
- bug#38587: base64-decode-region breaks encoding, Lars Ingebrigtsen, 2019/12/17
- bug#38587: base64-decode-region breaks encoding, Eli Zaretskii, 2019/12/15
- bug#38587: base64-decode-region breaks encoding, Juri Linkov, 2019/12/15
- bug#38587: base64-decode-region breaks encoding, Eli Zaretskii, 2019/12/15