Re: Unicode range and enumeration support.

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode range and enumeration support.

From:	Eli Schwartz
Subject:	Re: Unicode range and enumeration support.
Date:	Sun, 22 Dec 2019 01:38:13 -0500
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.3.0

On 12/20/19 7:35 PM, L A Walsh wrote:
> On 2019/12/18 11:46, Greg Wooledge wrote:
>> To put it another way: you can write code that determines whether
>> an input character $c matches a glob or regex like [Z-a].  (Maybe.)
>>
>> But, you CANNOT write code to generate all of the characters from Z to a
>>   
> This generates characters from decimal 8300 - 8400 (because that range
> includes raised and lowered digits which have the number and value
> properties equivalent to 0-9.
> 
> ----
> 
> No? 8300, 8400 arbitrary code points that contain raised and lowered
> numbers
> that have the number property (as does 0..9):
> 
> perl -we' use strict; use v5.16;
> my $c;
> for ($c=8300;$c<8400;++$c) {
> my $o=chr $c;
> printf "%s", $o if $o=~/\pN/;   #match unicode property "is_num"
> };printf "\n"'
> ⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉
> 
> Q.E.D.
> 
> 
> Is that sufficient proof?

It's sufficient proof that you're wrong, yes.

Given the discussion was about collation, not simply enumerating
codepoints in order of their codepoint values, it would be helpful to
actually, you know, collate them.

Given your sample text range:

$ printf %s\\n ⁰ ⁴ ⁵ ⁶ ⁷ ⁸ ⁹ ₀ ₁ ₂ ₃ ₄ ₅ ₆ ₇ ₈ ₉ | sort
⁰
₀
₁
₂
₃
⁴
₄
⁵
₅
⁶
₆
⁷
₇
⁸
₈
⁹
₉

This is plainly not in byte order.

Now you need to ask yourself the question: which locale do you want to
sort according to? I used en_US.UTF-8. Please don't say "C.UTF-8",
because that's not actually a thing. And the plain C locale won't work
for obvious reasons...

-- 
Eli Schwartz
Arch Linux Bug Wrangler and Trusted User

signature.asc
Description: OpenPGP digital signature

[Prev in Thread]

Current Thread

[Next in Thread]

Re: unquoted expansion not working (was Re: Not missing, but very hard to see), (continued)

Prev by Date: Re: Crash when moving full-width glyphs across lines
Next by Date: Re: Unicode range and enumeration support.
Previous by thread: Re: Unicode range and enumeration support.
Next by thread: Re: Unicode range and enumeration support.
Index(es):
- Date
- Thread