bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Non-ASCII characters in @include search path


From: Patrice Dumas
Subject: Re: Non-ASCII characters in @include search path
Date: Mon, 21 Feb 2022 23:00:38 +0100

On Mon, Feb 21, 2022 at 08:46:56PM +0000, Gavin Smith wrote:
> On Sun, Feb 20, 2022 at 10:32:00PM +0100, Patrice Dumas wrote:
> > On Sun, Feb 20, 2022 at 05:27:51PM +0000, Gavin Smith wrote:
> > > If the error message became something like
> > > 
> > > "nœud « �sseul� » non référencé"
> > > 
> > > then encoding this to UTF-8 would break the parts which already were in
> > > UTF-8.
> > 
> > I just commited input decoding (command line, environment, translated
> > messages) and output messages encoding.  I left file names as is, but
> > prepared a customization variable for them.
> > 
> > Now the error message is:
> > 
> > testé.texi:8: warning: nœud « ésseulé » non référencé
> 
> One way of fixing this would be to store the filename separately along with
> the rest of the error message, and prepend the filename when it is output.
> I can try to implement this.

This does not seems to be easy, but probably doable.  It removes the
need to encode before using file related functions perl wants bytes
for, but requires to find all the occurences in code where there could
be some concatenation with strings coming from other command line data,
from customization files and variables or from the Texinfo document.

There are also probably other file name parts that would need to be
encoded as bytes, or it should be made sure that there are already
bytes.  For example @image related file names.

I think that your commit
e11835b62d8f3d43c608013d21683c72e9a54cc3 "@include file name encoding"
would still need to be modified in order to use a specific encoding to
encode the file name to and not simply use utf8::encode as the file
names encoding may not be utf8.  Using the locale encoding as the
default seems better to me, with a possibility to modify the value on
the command line, and FILE_NAMES_ENCODING_NAME could be used for that.
To be checked, but it seems to me that in the XS parser this information
should also be used where the include file name string (and maybe other
file names) should be converted to that encoding from utf-8 if that
encoding is not different from utf-8.

Also we need to do something specific in case this encoding used for
file names bytes is not the same as the MESSAGE_OUTPUT_ENCODING_NAME,
either convert with Encode::from_to or maybe just warn.

-- 
Pat



reply via email to

[Prev in Thread] Current Thread [Next in Thread]