bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode range and enumeration support.


From: Eli Schwartz
Subject: Re: Unicode range and enumeration support.
Date: Sun, 22 Dec 2019 01:38:13 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.3.0

On 12/20/19 7:35 PM, L A Walsh wrote:
> On 2019/12/18 11:46, Greg Wooledge wrote:
>> To put it another way: you can write code that determines whether
>> an input character $c matches a glob or regex like [Z-a].  (Maybe.)
>>
>> But, you CANNOT write code to generate all of the characters from Z to a
>>   
> This generates characters from decimal 8300 - 8400 (because that range
> includes raised and lowered digits which have the number and value
> properties equivalent to 0-9.
> 
> ----
> 
> No? 8300, 8400 arbitrary code points that contain raised and lowered
> numbers
> that have the number property (as does 0..9):
> 
> perl -we' use strict; use v5.16;
> my $c;
> for ($c=8300;$c<8400;++$c) {
> my $o=chr $c;
> printf "%s", $o if $o=~/\pN/;   #match unicode property "is_num"
> };printf "\n"'
> ⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉
> 
> Q.E.D.
> 
> 
> Is that sufficient proof?

It's sufficient proof that you're wrong, yes.

Given the discussion was about collation, not simply enumerating
codepoints in order of their codepoint values, it would be helpful to
actually, you know, collate them.

Given your sample text range:

$ printf %s\\n ⁰ ⁴ ⁵ ⁶ ⁷ ⁸ ⁹ ₀ ₁ ₂ ₃ ₄ ₅ ₆ ₇ ₈ ₉ | sort
⁰
₀
₁
₂
₃
⁴
₄
⁵
₅
⁶
₆
⁷
₇
⁸
₈
⁹
₉

This is plainly not in byte order.

Now you need to ask yourself the question: which locale do you want to
sort according to? I used en_US.UTF-8. Please don't say "C.UTF-8",
because that's not actually a thing. And the plain C locale won't work
for obvious reasons...

-- 
Eli Schwartz
Arch Linux Bug Wrangler and Trusted User

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]