emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regex and case-fold-search problem


From: Stefan Monnier
Subject: Re: regex and case-fold-search problem
Date: Mon, 26 Aug 2002 12:11:42 -0400

> Andreas Schwab <address@hidden> writes:
> > |> Yeah, but character ranges make perfect sense in many local contexts.
> > |> E.g., [0-9], or [<0>-<9>] where <0> and <9> are `wide' digits from some
> > |> character set.
> > 
> > What does [A-Z] mean in EBCDIC?  [0-9] is a special case, because ISO C
> > requires that 0,1,2,3,4,5,6,7,8,9 are consecutive in the execution
> > character set.  But in many locales the collating sequence <A> - <Z>
> > contains more that just the upper case letters from the English alphabet.
> 
> The question is not `does [A-Z] make sense?', but rather: `_if_ [A-Z]
> makes sense, does [a-z] make sense too?'
> 
> That is, we aren't the ones writing [A-Z], it's lisp authors or users
> entering regexps or something.  If they want to enter a less-than-useful
> character range, that's their prerogative; however, emacs should avoid
> making what they enter _less_ meaningful because of the case-fold-search
> setting.
> 
> My point was that perhaps in practice, the ranges that would get screwed
> up by case-fold-search are even less sensible that normal, meaning it's
> likely most people wouldn't (or shouldn't) use them, and we really don't
> need to worry about the issue.  [ASCII is probably a special case, since
> it's so well known that people actually do tend to specify wierd ranges]
> 
> [but it looks like maybe it will get fixed properly anyway...]

I agree that we shouldn't spend too much time on it.
The patch I installed does the following:
- Fix a few problems such as ``if the case-table mapped ?* to ?o then
  "\\(fo\\)*" used to only match "foo"''.  Luckily such case-tables
  are not very common, so nobody noticed the problem.
- case-fold-search now works correctly for ranges in ASCII
- case-fold-search still doesn't work correctly for ranges in non-ASCII
  but it matches at least as much as when case-fold-search is nil: i.e.
  the range might include some chars which the user didn't expect, but it
  at least include the chars which the user expected.  The previous behavior
  was that the range could include some unexpected chars as well and could
  also not include some expected chars.  The current code matches at least
  as many strings as the previous one.

I think that's good enough for now,


        Stefan





reply via email to

[Prev in Thread] Current Thread [Next in Thread]