scm-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Scm-discuss] building scm


From: Aubrey Jaffer
Subject: Re: [Scm-discuss] building scm
Date: Sun, 24 Oct 2010 12:07:39 -0400 (EDT)

 | Date: Sun, 24 Oct 2010 12:04:49 +0200
 | From: Philipp Klaus Krause <address@hidden>
 | 
 | Am 23.10.2010 22:57, schrieb Aubrey Jaffer:
 | > 
 | > Punycode can introduce a hyphen ("-") and can output a string
 | > with a leading digit (see the Hebrew and Korean examples in
 | > <http://tools.ietf.org/html/rfc3492>), both of which are not
 | > legal in C identifiers.  Are these the only pitfalls?
 | 
 | AFAIK, yes. However I only consider the hyphen to be a real
 | pitfall: To ease the integration of Scheme and C code I would
 | suggest to use a prefix for all hobbit-generated identifiers to
 | reduce the risk of name clashes. Thus would the problem of leading
 | digits go away automatically.

<http://tools.ietf.org/html/rfc5891> has more restrictions (appended).
To fix these would require more and more incompatibilities with
Punycode.  Punycode's mixed radix encoding of codepoints is
undecipherable without a calculator; it might as well be a hash for
human readers.  For this application, Punycode's only benefit to
humans is that the ASCII characters are extracted and prepend the
hash.  This can easily be implemented without the complexities of
Punycode.

                               -=-=-=-

4.2.3.1. Hyphen Restrictions


   The Unicode string MUST NOT contain "--" (two consecutive hyphens) in
   the third and fourth character positions and MUST NOT start or end
   with a "-" (hyphen).

4.2.3.2. Leading Combining Marks


   The Unicode string MUST NOT begin with a combining mark or combining
   character (see The Unicode Standard, Section 2.11 [Unicode] for an
   exact definition).

4.2.3.3. Contextual Rules


   The Unicode string MUST NOT contain any characters whose validity is
   context-dependent, unless the validity is positively confirmed by a
   contextual rule.  To check this, each code point identified as
   CONTEXTJ or CONTEXTO in the Tables document [RFC5892] MUST have a
   non-null rule.  If such a code point is missing a rule, the label is
   invalid.  If the rule exists but the result of applying the rule is
   negative or inconclusive, the proposed label is invalid.

4.2.3.4. Labels Containing Characters Written Right to Left


   If the proposed label contains any characters from scripts that are
   written from right to left, it MUST meet the Bidi criteria [RFC5893].





reply via email to

[Prev in Thread] Current Thread [Next in Thread]