[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Fix wrong character count in argp
From: |
Bruno Haible |
Subject: |
Re: Fix wrong character count in argp |
Date: |
Sun, 12 Feb 2012 22:59:06 +0100 |
User-agent: |
KMail/4.7.4 (Linux/3.1.0-1.2-desktop; KDE/4.7.4; x86_64; ; ) |
Hi Vladimir,
Thank you for the proposed patch.
> As already reported several years ago
I cannot find it in my archives. Maybe that discussion already contained
some useful thoughts or arguments? Can you please point me to it?
> argp counts bytes even when
> actually what matters is the display length. This patch improves the
> situation by counting only leading and standalone UTF-8 bytes. It
> doesn't handle the double-width characters like Chinese sinograms
A program that needs to consider display length - for example for
line wrapping - should
1) work with any locale encoding. Don't assume that the locale encoding
is UTF-8.
2) work with Chinese ideographs correctly, like it should also work
with Russian (single-width) letters.
The easiest way to satisfy these two requirements is to base the code on
either
* the function mbswidth (gnulib module mbswidth) and possibly also mbiter
or mbuiter, or
* the gnulib module unilbrk/ulc-width-linebreaks, it contains a complete
line-breaking algorithm.
Can you rewrite your patch to this effect?
Also, such tricky issues should be checked in the test suite. Can you
please also provide a test program, some input data, and the expected
output for this data? We can then turn it into a gnulib test.
Bruno