bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Texinfo 7.0.93 pretest available


From: Eli Zaretskii
Subject: Re: Texinfo 7.0.93 pretest available
Date: Mon, 09 Oct 2023 17:06:39 +0300

> From: Gavin Smith <gavinsmith0123@gmail.com>
> Date: Sun, 8 Oct 2023 20:21:44 +0100
> Cc: bug-texinfo@gnu.org
> 
> Just comparing the first line in the hunk:
> 
> -(ì) @'{e} é (é) @'{@dotless{i}} í (í) @dotless{i} ı (ı) @dotless{j} ȷ
> +(ì) @'{e} é (é) @'{@dotless{i}} í (í) @dotless{i} ı (ı) @dotless{j} ȷ (ȷ)
> 
> the line you are getting is longer than the reference results.  
> 
> I wonder if for some of the non-ASCII characters wcwidth is returning 0 or
> -1 leading the line to be longer.

Yes, quite a few characters return -1 from wcwidth, in particular the
ȷ character above (which explains the above difference).

> It's also possible that other codepoints have inconsistent wcwidth results,
> especially for combining accents.
> 
> Do you know if it is the gnulib implementation of wcwidth that is being
> used or a MinGW one?

AFAIK, MinGW doesn't have wcwidth, so we are using the one from
Gnulib.  But what Gnulib does in this case is not what Texinfo
expects, I think:

int
wcwidth (wchar_t wc)
#undef wcwidth
{
  /* In UTF-8 locales, use a Unicode aware width function.  */
  if (is_locale_utf8_cached ())
    {
      /* We assume that in a UTF-8 locale, a wide character is the same as a
         Unicode character.  */
      return uc_width (wc, "UTF-8");
    }
  else
    {
      /* Otherwise, fall back to the system's wcwidth function.  */
#if HAVE_WCWIDTH
      return wcwidth (wc);
#else
      return wc == 0 ? 0 : iswprint (wc) ? 1 : -1;
#endif
    }
}

IOW, unless the locale's codeset is UTF-8, any character that is not
printable _in_the_current_locale_ will return -1 from wcwidth.  I'm
guessing that no one has ever tried to run the test suite in a
non-UTF-8 locale before?

I don't think the above logic in Gnulib's wcwidth (which basically
replicates the logic in any reasonable wcwidth implementation, so is
not specific to Gnulib) fits what Texinfo needs.  Texinfo needs to be
able to produce output independently of the locale.  What matters to
Texinfo is the encoding of the output document, not the locale's
codeset.  So I think we should call uc_width when the output document
encoding is UTF-8 (which is the default, including in the above test),
regardless of the locale's codeset.  Or we could use a simpler
approximation:

      return wc == 0 ? 0 : iswcntrl (wc) ? 0 : 1;

CC'ing Bruno who I think knows much more about this.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]