[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #55452] fopen() does not support encoding argu

From: Andrew Janke
Subject: [Octave-bug-tracker] [bug #55452] fopen() does not support encoding argument
Date: Sat, 9 Mar 2019 13:40:56 -0500 (EST)
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36

Follow-up Comment #17, bug #55452 (project octave):

> I was thinking that Shift-JIS was a multi-byte encoding (like UTF-16 for the
BMP). But after a little bit of reading, it looks like it is a variable byte
encoding like UTF-8 and encodes most of ASCII with the very same byte. So it
might actually work without too much of a hassle.

Almost. Shift-JIS encodes 7-bit ASCII characters as the same single bytes. But
you also have to worry about the other direction: individual bytes of 2-byte
Shift-JIS byte sequences aliasing to ASCII characters. And there it could be a
problem: bytes 0x40–0x7E may appear as the second byte of a 2-byte Shift-JIS
byte sequence. And that range may include characters that appear in `printf()`
formatting sequences, including all the alphabetical characters.

It's a little hard for me to puzzle out whether this means a valid
Shift-JIS-encoded printf format string could be misinterpreted, because no
Shift-JIS multibyte sequences contain bytes that alias to '%', and valid
printf control strings do not allow non-ASCII characters between the '%' and
its following specifier character.

Regardless, I don't think we should be focusing on particular encoding cases
like this in finding a code approach: we should look for a general solution
that handles _any_ encoding without special-case logic.

(And when I say “ASCII” here I mean strict 7-bit ASCII.)


Reply to this item at:


  Message sent via Savannah

reply via email to

[Prev in Thread] Current Thread [Next in Thread]