[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Texinfo 7.0.93 pretest available
From: |
Eli Zaretskii |
Subject: |
Re: Texinfo 7.0.93 pretest available |
Date: |
Mon, 09 Oct 2023 19:37:55 +0300 |
> From: Bruno Haible <bruno@clisp.org>
> Cc: bug-texinfo@gnu.org
> Date: Mon, 09 Oct 2023 18:15:05 +0200
>
> Eli Zaretskii wrote:
> > unless the locale's codeset is UTF-8, any character that is not
> > printable _in_the_current_locale_ will return -1 from wcwidth. I'm
> > guessing that no one has ever tried to run the test suite in a
> > non-UTF-8 locale before?
>
> I just tried it now: On Linux (Ubuntu 22.04), in a de_DE.UTF-8 locale,
> texinfo 7.0.93 build fine and all tests pass.
de_DE.UTF-8 is a UTF-8 locale. I asked about non-UTF-8 locales. An
example would be de_DE.ISO8859-1. Or what am I missing?
> > Yes, quite a few characters return -1 from wcwidth, in particular the
> > ȷ character above (which explains the above difference).
>
> This character is U+0237 LATIN SMALL LETTER DOTLESS J. It *should* be
> recognized as having a width of 1 in all implementations of wcwidth.
But if U+0237 cannot be represented in the locale's codeset, its width
can not be 1, because it cannot be printed. This is my interpretation
of the standard's language (emphasis mine):
DESCRIPTION
The wcwidth() function shall determine the number of column
positions required for the wide character wc. The application
shall ensure that the value of wc is a character representable
as a wchar_t, and is a wide-character code corresponding to a
valid character in the current locale.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RETURN VALUE
The wcwidth() function shall either return 0 (if wc is a null
wide-character code), or return the number of column positions
to be occupied by the wide-character code wc, or return -1 (if
wc does not correspond to a printable wide-character code).
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Since U+0237 is not printable in my locale (it isn't supported by the
system codepage), the value -1 is correct. Am I missing something?
> There's no reason for it to have a width of -1, since it's not a control
> character.
> There's no reason for it to have a width of 0, since it's not a combining
> mark or a non-spacing character.
> There's no reason for it to have a width of 2, since it's not a CJK character
> and not in a Unicode range with many CJK characters.
I think you assume that all the Unicode letter characters are always
printable in every locale. That's not what I understand, and iswprint
agrees with me, because I get -1 for U+0237 due to this code:
> > return wc == 0 ? 0 : iswprint (wc) ? 1 : -1;
> > I don't think the above logic in Gnulib's wcwidth (which basically
> > replicates the logic in any reasonable wcwidth implementation, so is
> > not specific to Gnulib) fits what Texinfo needs. Texinfo needs to be
> > able to produce output independently of the locale. What matters to
> > Texinfo is the encoding of the output document, not the locale's
> > codeset. So I think we should call uc_width when the output document
> > encoding is UTF-8 (which is the default, including in the above test),
> > regardless of the locale's codeset. Or we could use a simpler
> > approximation:
> >
> > return wc == 0 ? 0 : iswcntrl (wc) ? 0 : 1;
>
> This "simpler approximation" would not return a good result when wc
> is a control character (such as CR, LF, TAB, or such). It is important
> that the caller of wcwidth() or wcswidth() is able to recognize that
> the string as a whole does not have a definite width.
It is still better than returning -1, don't you agree?
But for some reason you completely ignored my more general comment
about what Texinfo needs from wcwidth.
- Re: Texinfo 7.0.93 pretest available, (continued)
- Re: Texinfo 7.0.93 pretest available, Eli Zaretskii, 2023/10/08
- Re: Texinfo 7.0.93 pretest available, Eli Zaretskii, 2023/10/08
- Re: Texinfo 7.0.93 pretest available, Gavin Smith, 2023/10/08
- Re: Texinfo 7.0.93 pretest available, Eli Zaretskii, 2023/10/08
- Re: Texinfo 7.0.93 pretest available, Eli Zaretskii, 2023/10/08
- Re: Texinfo 7.0.93 pretest available, Gavin Smith, 2023/10/08
- Re: Texinfo 7.0.93 pretest available, Eli Zaretskii, 2023/10/08
- Re: Texinfo 7.0.93 pretest available, Gavin Smith, 2023/10/08
- Re: Texinfo 7.0.93 pretest available, Eli Zaretskii, 2023/10/09
- Re: Texinfo 7.0.93 pretest available, Bruno Haible, 2023/10/09
- Re: Texinfo 7.0.93 pretest available,
Eli Zaretskii <=
- Re: Texinfo 7.0.93 pretest available, Bruno Haible, 2023/10/09
- Re: Texinfo 7.0.93 pretest available, Eli Zaretskii, 2023/10/09
- Re: Texinfo 7.0.93 pretest available, Gavin Smith, 2023/10/09
- Re: Texinfo 7.0.93 pretest available, Bruno Haible, 2023/10/09
- Re: Texinfo 7.0.93 pretest available, Gavin Smith, 2023/10/10
- Re: Texinfo 7.0.93 pretest available, Bruno Haible, 2023/10/10
- Re: Texinfo 7.0.93 pretest available, Eli Zaretskii, 2023/10/10
- Re: Texinfo 7.0.93 pretest available, Gavin Smith, 2023/10/10
- Re: Texinfo 7.0.93 pretest available, Eli Zaretskii, 2023/10/10
- Re: Texinfo 7.0.93 pretest available, Bruno Haible, 2023/10/10