bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] v4.1.3 (run on OSX 10.11.3): potential gsub() bug


From: Aharon Robbins
Subject: Re: [bug-gawk] v4.1.3 (run on OSX 10.11.3): potential gsub() bug
Date: Fri, 19 Feb 2016 16:00:04 +0200
User-agent: Heirloom mailx 12.5 6/20/10

Hi.

> > Generally, it sounds like the right thing to do is:
> >
> >    - in a UTF-8 locale, *always* deal with *characters* (Unicode
> >    codepoints), not bytes
> >    - specifically, when encountering \xhh, compare it to the *Unicode
> >    codepoint* of the character at hand
> >
> >
> > Always dealing with characters makes sense to me, especially given that
> > you can *mix* Unicode characters and \x*hh* escapes in a single bracket
> > expression.
> >
> > Thus, given that \xff is the max. codepoint value that can currently be
> > expressed, which doesn't allow matching the full range of Unicode
> > characters, I suggest the following:
> >
> >    - At
> >    
> > https://www.gnu.org/software/gawk/manual/html_node/Bracket-Expressions.html#Bracket-Expressions:
> >       - document this limitation
> >       - recommend the workaround of using actual characters rather than
> >       codepoint escapes as the range endpoints.

This is what I've done. The changes will eventually propogate to the
repo.

I will talk to other GNU maintainers about how we want to deal with
this issue; I don't want to invent something on my own and have it
be different from other GNU utilities.

Thanks,

Arnold



reply via email to

[Prev in Thread] Current Thread [Next in Thread]