Re: ".*utf\\(-?8\\)\\>" versus ".*[._]utf" versus "address@hidden>"

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ".utf\\(-?8\\)\\>" versus ".[._]utf" versus "address@hidden>"

From:	Paul Eggert
Subject:	Re: ".utf\\(-?8\\)\\>" versus ".[._]utf" versus "address@hidden>"
Date:	Sun, 3 Feb 2002 11:42:27 -0800 (PST)

> From: Dave Love <address@hidden>
> Date: 03 Feb 2002 18:20:26 +0000
> User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.1.80
> 
> >>>>> Paul Eggert writes:
> 
>  > Because utf-8 should be the normal case.  In the normal case, the
>  > encoding name should be delimited, to prevent incorrect matches
>  > when one encoding name is a suffix of another.
> 
> I'm surprised false matches are any more likely with utf-8 than 8859
> &c.  Is prefixing it with \< good enough?

It's just a precaution.  I don't know of any places where it's
actually needed.  The point is that 8859 is a special case: it is part
of some locale names in nonstandard ways.  UTF-8 is not a special
case, so we shouldn't use the special-case variant to search for
UTF-8.  In the normal case, we use \<, so that should be good
enough for UTF-8.

>  > I'm not sure I follow your point, but I'll try to answer.  The code in
>  > question is using a heuristic to guess the coding system from the
>  > locale name.  
> 
> It's actually guessing a complete language environment.

Yes, though the part of the code that we're talking about is guessing
just the coding system.

> On checking again, I'm not at all sure the current code DTRT.  For
> instance (given that I've defined a Windows-1251 coding system and
> language environment):
> 
> (set-locale-environment "cs_CZ.windows-1250")
>   => nil
> current-language-environment
>   => "Czech"
> (symbol-value (car coding-category-list))
>   => iso-8859-2
> 
> I think what should happen in this case is that the codeset part of
> the locale should override the language part.

Yes, that should be an improvement.  In other words, if the part after
the "."  corresponds to a known coding system, that should override
locale-preferred-coding-systems.

> The language environment stuff could probably do with a bit of
> re-thinking to fit better with locale processing and customization.

Yes, that code is a bit dated now.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: ".*utf\\(-?8\\)\\>" versus ".*[._]utf" versus "address@hidden>", Dave Love, 2002/02/03
- Re: ".*utf\\(-?8\\)\\>" versus ".*[._]utf" versus "address@hidden>", Paul Eggert <=

Prev by Date: Re: Size of splash screen (Re: "About Emacs" menu item?)
Next by Date: Another update of GNU TLS bindings
Previous by thread: Re: ".*utf\\(-?8\\)\\>" versus ".*[._]utf" versus "address@hidden>"
Next by thread: Re: Patch for copyright.el to allow multi-line copyrights
Index(es):
- Date
- Thread