[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: uc_tolower (uc_toupper (x))
From: |
Mike Gran |
Subject: |
Re: uc_tolower (uc_toupper (x)) |
Date: |
Thu, 10 Mar 2011 16:54:41 -0800 (PST) |
> From:Mark H Weaver <address@hidden>
> To:address@hidden
> Cc:
> Sent:Thursday, March 10, 2011 3:39 PM
> Subject:uc_tolower (uc_toupper (x))
>
> I've noticed that srfi-13.c very frequently does:
>
> uc_tolower (uc_toupper (x))
>
> Is there a good reason to do this instead of:
>
> uc_tolower (x)
Unicode defines a case folding algorithm as well as
a data table for case insensitive sorting. Setting
things to lowercase is a decent approximation of
case folding. But doing the upper->lower operation picks
up a few more of the corner cases, like U+03C2 GREEK
SMALL LETTER FINAL SIGMA and U+03C3 GREEK SMALL LETTER SIGMA
which are the same letter with different representations,
or U+00B5 MICRO SIGN and U+039C GREEK SMALL LETTER MU
which are supposed to have the same sort ordering.
Now that we've pulled in all of libunistring, it might
be a good idea to see if it has a complete implementation
of unicode case folding, because upper->lower is also not
completely correct.
-Mike
- uc_tolower (uc_toupper (x)), Mark H Weaver, 2011/03/10
- Re: uc_tolower (uc_toupper (x)),
Mike Gran <=
- Using libunistring for string comparisons et al, Mark H Weaver, 2011/03/11
- Re: Using libunistring for string comparisons et al, Mark H Weaver, 2011/03/11
- Re: Using libunistring for string comparisons et al, Mark H Weaver, 2011/03/11
- Re: Using libunistring for string comparisons et al, Ludovic Courtès, 2011/03/12
- Re: Using libunistring for string comparisons et al, Mark H Weaver, 2011/03/12
- Re: Using libunistring for string comparisons et al, Ludovic Courtès, 2011/03/13
- Re: Using libunistring for string comparisons et al, Andy Wingo, 2011/03/30
- O(1) accessors for UTF-8 backed strings, Mark H Weaver, 2011/03/12
- Re: O(1) accessors for UTF-8 backed strings, Alex Shinn, 2011/03/12
- Re: O(1) accessors for UTF-8 backed strings, Mark H Weaver, 2011/03/15