help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Comparing non-English strings for sorting


From: Thien-Thi Nguyen
Subject: Re: Comparing non-English strings for sorting
Date: Fri, 13 Feb 2009 00:57:15 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux)

() "address@hidden" <address@hidden>
() Tue, 10 Feb 2009 02:47:41 -0800 (PST)

                       (vconcat (downcase str1))
                       (vconcat (downcase str2)))))

If all the strings you wish to compare are composed entirely of
the characters in `order', this (unconditional case smashing) is
sufficient.  Otherwise, comparing a downcased character in that
set with a "downcased" character outside that set (where the
result is equal to the input) can be problematic.

Consider the ASCII character set (ascii(7)), specifically, the
six indices between ?Z and ?a (here, we use ?_, decimal 95).

 (downcase ?_) => 95  ;; no change
 (downcase ?a) => 97  ;; no change
 (downcase ?A) => 97  ;; smashed (numerically "upward", hee hee)
           ?A  => 65  ;; originally

Using unconditional case smashing in a hypothetical analog of
`my-case-insensitive-nonenglish-string-comparator', we'd see:

 (string-ci-lessp "_" "a") => t
 (string-ci-lessp "_" "A") => t
 (string-lessp "_" "a")    => t
 (string-lessp "_" "A")    => nil

Perhaps the reason behind the difference between the 2nd and 4th
results being "one is case-insensitive and the other isn't" does
indeed satisfy you.  It doesn't, me.  What is the case of the
underscore and why should my (in)sensitivity to it matter at all?

Appended is what i think is a more rational algorithm (expressed
in C, not Emacs Lisp, because it is part of an upcoming Guile
release (which is implemented (like Emacs) in C)).  It allows for
the (properly phrased ;-) mu answer.

thi

______________________________________
int
scm_i_ccmp_ci (int x, int y)
{
  int d, lx, ly, ux = 0, uy = 0;

#define ISLOWER(c)  (islower (c) ? (1 + c - 'a') : 0)
#define ISUPPER(c)  (isupper (c) ? (1 + c - 'A') : 0)
#define ALPHA(c)    ((l ## c = ISLOWER (c)) || (u ## c = ISUPPER (c)))

  d = (!ALPHA (x) || !ALPHA (y))
    /* Subtract directly.  */
    ? (x - y)
    /* Subtract in one domain or another.  */
    : (lx
       ? (lx - (ly
                ? ly
                : uy))
       : (ux - (uy
                ? uy
                : ly)));
  return !d
    ? 0
    : (GOOD (d)
       ?  1
       : -1);

#undef ALPHA
#undef ISUPPER
#undef ISLOWER
}




reply via email to

[Prev in Thread] Current Thread [Next in Thread]