bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Texinfo 7.0.93 pretest available


From: Bruno Haible
Subject: Re: Texinfo 7.0.93 pretest available
Date: Mon, 09 Oct 2023 19:18:25 +0200

Eli Zaretskii wrote:
> > From: Bruno Haible <bruno@clisp.org>
> > Cc: bug-texinfo@gnu.org
> > Date: Mon, 09 Oct 2023 18:15:05 +0200
> > 
> > Eli Zaretskii wrote:
> > > unless the locale's codeset is UTF-8, any character that is not
> > > printable _in_the_current_locale_ will return -1 from wcwidth.  I'm
> > > guessing that no one has ever tried to run the test suite in a
> > > non-UTF-8 locale before?
> > 
> > I just tried it now: On Linux (Ubuntu 22.04), in a de_DE.UTF-8 locale,

Oops, typo: What I tested was the de_DE.ISO-8859-1 locale:
$ export LC_ALL=de_DE.ISO-8859-1

> > texinfo 7.0.93 build fine and all tests pass.

And likewise on FreeBSD 13.2 with
$ export LC_ALL=de_DE.ISO8859-1

> > This character is U+0237 LATIN SMALL LETTER DOTLESS J. It *should* be
> > recognized as having a width of 1 in all implementations of wcwidth.
> 
> But if U+0237 cannot be represented in the locale's codeset, its width
> can not be 1, because it cannot be printed.  This is my interpretation
> of the standard's language (emphasis mine):
> 
>   DESCRIPTION
> 
>       The wcwidth() function shall determine the number of column
>       positions required for the wide character wc. The application
>       shall ensure that the value of wc is a character representable
>       as a wchar_t, and is a wide-character code corresponding to a
>       valid character in the current locale.
>       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   RETURN VALUE
> 
>       The wcwidth() function shall either return 0 (if wc is a null
>       wide-character code), or return the number of column positions
>       to be occupied by the wide-character code wc, or return -1 (if
>       wc does not correspond to a printable wide-character code).
>          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Since U+0237 is not printable in my locale (it isn't supported by the
> system codepage), the value -1 is correct.  Am I missing something?

True. But why don't we see the same test failure on glibc and on FreeBSD
systems, then, in a locale with ISO-8859-1 encoding?

> > This "simpler approximation" would not return a good result when wc
> > is a control character (such as CR, LF, TAB, or such). It is important
> > that the caller of wcwidth() or wcswidth() is able to recognize that
> > the string as a whole does not have a definite width.
> 
> It is still better than returning -1, don't you agree?

No, I don't agree. Returning -1 tells the caller "watch out, you cannot
assume anything about printed outline of this string".

> But for some reason you completely ignored my more general comment
> about what Texinfo needs from wcwidth.

That's because I am not familiar with the Texinfo code. I don't know
whether and where Texinfo calls wcwidth(), and I don't know with which
expectations it does so.

Bruno






reply via email to

[Prev in Thread] Current Thread [Next in Thread]