monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] rfc: small simplification to paths.cc/constants.cc


From: Zack Weinberg
Subject: Re: [Monotone-devel] rfc: small simplification to paths.cc/constants.cc
Date: Sun, 16 Jul 2006 13:49:14 -0700

On 7/14/06, Nathaniel Smith <address@hidden> wrote:
> +// ??? Ensure use of UTF8 encoding internally, validate encoding here.

^^ Hmm?

I have gotten lost in the conversions and the wrappers, and cannot
tell what encoding (if any) can be relied upon at this point in the
code.  The exclusion of characters 00-1f and 7f, but none in the 80-ff
range, makes me think it's supposed to be utf8 (it's clearly not a
fixed-width 16- or 32-bit encoding; if it were any single-byte 8859.n
encoding, we should also exclude 80-9f; any other variable-width
encoding that I know of requires rather more smarts to find bad
characters in...)

But if it _is_ guaranteed to be utf8 at this point, there are a number
of invalid byte sequences that we ought to be weeding out: notably ED
A0 xx .. ED BF xx and overlength encodings like E0 9F 80; unless we
have a guarantee from elsewhere that we're not going to get them.  I
have code (from libcpp) that I can adapt to do this.

zw




reply via email to

[Prev in Thread] Current Thread [Next in Thread]