[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: ".*utf\\(-?8\\)\\>" versus ".*[._]utf" versus "address@hidden>"
From: |
Paul Eggert |
Subject: |
Re: ".*utf\\(-?8\\)\\>" versus ".*[._]utf" versus "address@hidden>" |
Date: |
Sun, 3 Feb 2002 11:42:27 -0800 (PST) |
> From: Dave Love <address@hidden>
> Date: 03 Feb 2002 18:20:26 +0000
> User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.1.80
>
> >>>>> Paul Eggert writes:
>
> > Because utf-8 should be the normal case. In the normal case, the
> > encoding name should be delimited, to prevent incorrect matches
> > when one encoding name is a suffix of another.
>
> I'm surprised false matches are any more likely with utf-8 than 8859
> &c. Is prefixing it with \< good enough?
It's just a precaution. I don't know of any places where it's
actually needed. The point is that 8859 is a special case: it is part
of some locale names in nonstandard ways. UTF-8 is not a special
case, so we shouldn't use the special-case variant to search for
UTF-8. In the normal case, we use \<, so that should be good
enough for UTF-8.
> > I'm not sure I follow your point, but I'll try to answer. The code in
> > question is using a heuristic to guess the coding system from the
> > locale name.
>
> It's actually guessing a complete language environment.
Yes, though the part of the code that we're talking about is guessing
just the coding system.
> On checking again, I'm not at all sure the current code DTRT. For
> instance (given that I've defined a Windows-1251 coding system and
> language environment):
>
> (set-locale-environment "cs_CZ.windows-1250")
> => nil
> current-language-environment
> => "Czech"
> (symbol-value (car coding-category-list))
> => iso-8859-2
>
> I think what should happen in this case is that the codeset part of
> the locale should override the language part.
Yes, that should be an improvement. In other words, if the part after
the "." corresponds to a known coding system, that should override
locale-preferred-coding-systems.
> The language environment stuff could probably do with a bit of
> re-thinking to fit better with locale processing and customization.
Yes, that code is a bit dated now.