[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #55195] "help" for core functions contains odd
From: |
Markus Mützel |
Subject: |
[Octave-bug-tracker] [bug #55195] "help" for core functions contains odd symbols for non-ASCII characters |
Date: |
Mon, 10 Dec 2018 12:48:55 -0500 (EST) |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0 |
URL:
<https://savannah.gnu.org/bugs/?55195>
Summary: "help" for core functions contains odd symbols for
non-ASCII characters
Project: GNU Octave
Submitted by: mmuetzel
Submitted on: Mon 10 Dec 2018 05:48:53 PM UTC
Category: Interpreter
Severity: 4 - Important
Priority: 5 - Normal
Item Group: Regression
Status: None
Assigned to: None
Originator Name:
Originator Email:
Open/Closed: Open
Discussion Lock: Any
Release: dev
Operating System: Any
_______________________________________________________
Details:
TL;DR:
Character encoding strikes again. Does the lexer keep track of whether .m
files are from core?
When Octave is configured to use an mfile_encoding other than UTF-8, help text
of function files that are encoded in UTF-8 is displayed with odd characters.
On Windows, this happens with Octave's default settings. Other systems aren't
affected by default (but only if the user configures to use a different
encoding).
E.g.: "help sym" displays a lot of scrambled characters. That is because that
file is encoded in UTF-8 but we assume it to be encoded in the configured
mfile_encoding. Converting it from SYSTEM (CP1252 in my case) to UTF-8 creates
these odd characters.
This is a regression. (Before, we didn't worry about encoding but had problems
handling string vectors from user functions or interacting with the file
system.)
That conversion is done in input.cc in function "file_reader::get_input".
Can we differentiate between .m files from the core or packages (which
probably always are UTF-8) on the one hand and user created .m files (which
could have any encoding) on the other hand at that point? Does the lexer keep
track of this?
What about texinfo settings such as "@documentencoding UTF-8"? Should we parse
for them and do the conversion only conditionally?
Should we skip the help text in the conversion completely? In that case, we
might have to move the conversion elsewhere (to the lexer?).
Alternatively, we could revert the conversion in "help.m" (only if we discover
an "@documentencoding" command?) for functions from the core or from
packages.
But text in strings in functions from core Octave or from packages are
probably encoded in UTF-8 as well (independent from the current
mfile_encoding). So we shouldn't convert functions from core or packages at
all and only do the codepage conversion on user functions.
This might also affect how we should open function files from core Octave (or
from packages) in the embedded editor.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?55195>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [Octave-bug-tracker] [bug #55195] "help" for core functions contains odd symbols for non-ASCII characters,
Markus Mützel <=