Re: Non-ASCII characters in @include search path

bug-texinfo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Non-ASCII characters in @include search path

From:	Gavin Smith
Subject:	Re: Non-ASCII characters in @include search path
Date:	Thu, 24 Feb 2022 19:40:21 +0000

On Thu, Feb 24, 2022 at 02:33:11PM +0100, Patrice Dumas wrote:
> It fixes the NonXS parser (I modified where it is done, such as to do it
> it before locate_include_file but kept your code), but not for the XS
> parser.  In the XS parser, the @include file name is converted to utf-8
> upon reading.  If the file name is encoded in another encoding on the
> filesystem it won't be found (I tested, it is indeed the case).
> 
> To do something similar to the NonXS parser, one would need, maybe
> in Texinfo/XS/parsetexi/end_line.c in end_line_misc_line around line
> 1428, instead of fullpath = locate_include_file (text); text should be
> converted to the @documentencoding unless it is utf-8 or ascii.

Done in 46732a3290.  I haven't tested this code very much, just by
running the test suite.

> > In any case the cases we are dealing with a very rare here, but I just
> > don't see that the situation is very common where somebody works in
> > a non-UTF-8 locale, has all their filenames in this encoding, and
> > recodes any files they download from the Internet or extracted from a tar
> > file into that encoding.  I've no insight into what use case we would be
> > supporting by using the kocale encoding to interpret any filenames.
> 
> It could also be the reverse, somebody works in an UTF-8 locale
> with a manual in a 8 bit locale and recodes the file names to
> utf-8.

Good point.

> > It seems much more likely to me that somebody would be using a
> > non-UTF-8 locale for whatever reason, and would download Texinfo
> > files with UTF-8 names without recoding the names, and still
> > expect to be able to build them.  (Even if they can't type the
> > names in, it may get build with Makefile rules.)
> 
> To me both are possible.  Speaking for GNU/Linux, some years ago when
> there were still 8 bytes locales, it would have been reasonable to
> assume that people would process differently encoded manuals and recode
> file names without changing the manual itself (either 8 bytes encoded
> manuals in utf8 locale or utf8 manual in 8 bytes locale).  Today this is
> less likely to happen while your scenario is more likely to happen as
> all the manuals should be converted to utf-8, all the locales should be
> utf8 and more file names should be in utf8, even on 8 bytes locales.
> 
> > Some filtering with a customization variable may be necessary for
> > unusual operating systems and/or filesystems.
> 
> Yes, I'll add that after if you don't.  I think that it will need to be
> obeyed by the XS parser too, in the same way as the @include file names
> should be converted to the documentencoding from utf-8.

The customization variable could be the name of an encoding to convert
filenames to, or it could be an on/off variable to use the encoding
from the locale.  I guess that the latter would be sufficient.
I'm happy if you implement this, although I doubt it is urgent.
It should be off for default on all systems except MS-Windows.

I think it would be fairly simple to implement in the XS parser, if
it is done in the Perl code - it would just need to get the name
of the filename encoding.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Non-ASCII characters in @include search path, (continued)

Prev by Date: Re: configure enable xs fails because Texinfo/ModulePath.pm not found
Next by Date: Re: Feature request: api docs
Previous by thread: Re: Non-ASCII characters in @include search path
Next by thread: Re: Non-ASCII characters in @include search path
Index(es):
- Date
- Thread