bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ignoring control characters in character width


From: Patrice Dumas
Subject: Re: ignoring control characters in character width
Date: Tue, 5 Sep 2023 21:16:47 +0200

On Tue, Sep 05, 2023 at 10:06:13PM +0300, Eli Zaretskii wrote:
> > Date: Tue, 5 Sep 2023 20:19:40 +0200
> > From: Patrice Dumas <pertusus@free.fr>
> > Cc: bug-texinfo@gnu.org
> > 
> > On Tue, Sep 05, 2023 at 09:09:18PM +0300, Eli Zaretskii wrote:
> > > > Date: Tue, 5 Sep 2023 20:01:53 +0200
> > > > From: Patrice Dumas <pertusus@free.fr>
> > > > 
> > > > Currently, when counting the width of a line of character, we count
> > > > control characters that are also spaces as having a width of 1.  I think
> > > > that it is not good, as control characters either should not have a
> > > > width, for end of line, form feed, carriage return, or have a width that
> > > > is not well defined for vertical and horizontal tab.  I suggest to
> > > > consider all the control characters as having a width of 0.  This will
> > > > be consistent with libunistring u8_strwidth, which I intend to use in C
> > > > code equivalent to perl code.
> > > 
> > > Please define "control characters" for this purpose.  Some of them are
> > > definitely not zero-width, for example, TAB.
> > 
> > Characters whose unicode codepoints in decimal are in the range 0 to 31,
> > and also 127 (Delete).  This includes the horizontal tab.  It
> > corresponds to the [:cntrl:] character class.
> 
> Then I guess I still don't understand: how is TAB a zero-width
> character?

It is not, but for the purpose of counting width of characters this is
as valid as other choices.

> > > Also, depending on how control characters are displayed, their width
> > > could be even 4, for example if they are displayed as \nnn octal
> > > escapes.
> > 
> > It is in a context where they are displayed as encoded bytes.
> 
> So what is the context of this discussion, if it is not display of
> bytes?  I really don't understand, could you elaborate?
> 
> Control characters can also be displayed as ^C, for example, in which
> case they take 2 columns.

I think I understand what you don't understand, actually this is not
about displaying the characters, which is not really done by texi2any,
it is about situations where we need to count the width of characters
in texi2any.  For instance, this is to determine when to put end of
lines when formatting Info to compare with line width, or to format
multitable cells, or to determine the length of underlining * for a
heading string as in 

Some heading
************

Hope that it is clearer.  Also we need to make this choice without
knowing precisely how the characters will be displayed.  In general
the display is done by info readers for Info, but it could also be in a
pager, a text editor for the diverse possibilities of plain text output.

-- 
Pat




reply via email to

[Prev in Thread] Current Thread [Next in Thread]