[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Chicken-hackers] [PATCH] Restore row and column counting for ports
From: |
John Cowan |
Subject: |
Re: [Chicken-hackers] [PATCH] Restore row and column counting for ports (fixes #978) |
Date: |
Sat, 16 Feb 2013 16:21:02 -0500 |
User-agent: |
Mutt/1.5.20 (2009-06-14) |
Peter Bex scripsit:
> If this turns out to be true that this is a big bottleneck, it makes
> sense to leave the bookkeeping up to applications. Perhaps it could be
> convenient to add a custom "counted port" that can wrap other port
> types.
That's what I would recommend, rather than (a) carefully tracking in
all locations, with resultant overhead, or (b) tracking in only some
locations, resulting in bogus row and column values.
> Our current implementation of column counting does not take into
> account multibyte UTF-8 characters. I'm not sure if we need to take
> care of this. An input file might not be in UTF-8 encoding, in which
> case "fixing" UTF-8 counting will cause the count to be incorrect on
> those files. We could just assume all ports are UTF-8 since that's the
> "native" Chicken character set. The only way to *truly* fix this is
> to make ports aware of their encoding, but that's just... annoying :)
What's more, column width becomes a tricky matter when you have to handle
the full repertoire of Unicode characters, for they can have widths of 0
(combining characters, non-printing characters), 1 (most characters),
or 2 (CJK ideographs, U+3000 IDEOGRAPHIC SPACE, and U+FF01 to U+FF5E,
the FULLWIDTH variants of the ASCII repertoire). That's not baggage
that should be carted around everywhere, but it would be fine in a
counted-port wrapper.
See UAX #11 <http://www.unicode.org/reports/tr11>, which is a detached
part of the Unicode Standard, for details.
--
So they play that [tune] on John Cowan
their fascist banjos, eh? address@hidden
--Great-Souled Sam http://www.ccil.org/~cowan