[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Texinfo 7.0.93 pretest available
From: |
Bruno Haible |
Subject: |
Re: Texinfo 7.0.93 pretest available |
Date: |
Mon, 09 Oct 2023 18:15:05 +0200 |
Eli Zaretskii wrote:
> unless the locale's codeset is UTF-8, any character that is not
> printable _in_the_current_locale_ will return -1 from wcwidth. I'm
> guessing that no one has ever tried to run the test suite in a
> non-UTF-8 locale before?
I just tried it now: On Linux (Ubuntu 22.04), in a de_DE.UTF-8 locale,
texinfo 7.0.93 build fine and all tests pass.
> Yes, quite a few characters return -1 from wcwidth, in particular the
> ȷ character above (which explains the above difference).
This character is U+0237 LATIN SMALL LETTER DOTLESS J. It *should* be
recognized as having a width of 1 in all implementations of wcwidth.
There's no reason for it to have a width of -1, since it's not a control
character.
There's no reason for it to have a width of 0, since it's not a combining
mark or a non-spacing character.
There's no reason for it to have a width of 2, since it's not a CJK character
and not in a Unicode range with many CJK characters.
> /* Otherwise, fall back to the system's wcwidth function. */
> #if HAVE_WCWIDTH
> return wcwidth (wc);
> #else
> return wc == 0 ? 0 : iswprint (wc) ? 1 : -1;
> #endif
> }
> }
>
>
> I don't think the above logic in Gnulib's wcwidth (which basically
> replicates the logic in any reasonable wcwidth implementation, so is
> not specific to Gnulib) fits what Texinfo needs. Texinfo needs to be
> able to produce output independently of the locale. What matters to
> Texinfo is the encoding of the output document, not the locale's
> codeset. So I think we should call uc_width when the output document
> encoding is UTF-8 (which is the default, including in the above test),
> regardless of the locale's codeset. Or we could use a simpler
> approximation:
>
> return wc == 0 ? 0 : iswcntrl (wc) ? 0 : 1;
This "simpler approximation" would not return a good result when wc
is a control character (such as CR, LF, TAB, or such). It is important
that the caller of wcwidth() or wcswidth() is able to recognize that
the string as a whole does not have a definite width.
Bruno
- Re: Texinfo 7.0.93 pretest available, (continued)
- Re: Texinfo 7.0.93 pretest available, Eli Zaretskii, 2023/10/08
- Re: Texinfo 7.0.93 pretest available, Eli Zaretskii, 2023/10/08
- Re: Texinfo 7.0.93 pretest available, Eli Zaretskii, 2023/10/08
- Re: Texinfo 7.0.93 pretest available, Gavin Smith, 2023/10/08
- Re: Texinfo 7.0.93 pretest available, Eli Zaretskii, 2023/10/08
- Re: Texinfo 7.0.93 pretest available, Eli Zaretskii, 2023/10/08
- Re: Texinfo 7.0.93 pretest available, Gavin Smith, 2023/10/08
- Re: Texinfo 7.0.93 pretest available, Eli Zaretskii, 2023/10/08
- Re: Texinfo 7.0.93 pretest available, Gavin Smith, 2023/10/08
- Re: Texinfo 7.0.93 pretest available, Eli Zaretskii, 2023/10/09
- Re: Texinfo 7.0.93 pretest available,
Bruno Haible <=
- Re: Texinfo 7.0.93 pretest available, Eli Zaretskii, 2023/10/09
- Re: Texinfo 7.0.93 pretest available, Bruno Haible, 2023/10/09
- Re: Texinfo 7.0.93 pretest available, Eli Zaretskii, 2023/10/09
- Re: Texinfo 7.0.93 pretest available, Gavin Smith, 2023/10/09
- Re: Texinfo 7.0.93 pretest available, Bruno Haible, 2023/10/09
- Re: Texinfo 7.0.93 pretest available, Gavin Smith, 2023/10/10
- Re: Texinfo 7.0.93 pretest available, Bruno Haible, 2023/10/10
- Re: Texinfo 7.0.93 pretest available, Eli Zaretskii, 2023/10/10
- Re: Texinfo 7.0.93 pretest available, Gavin Smith, 2023/10/10
- Re: Texinfo 7.0.93 pretest available, Eli Zaretskii, 2023/10/10