Re: [bug-gawk] Problem with substr() after match() with non-ASCII charac

bug-gawk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Problem with substr() after match() with non-ASCII charac

From:	Eli Zaretskii
Subject:	Re: [bug-gawk] Problem with substr() after match() with non-ASCII characters
Date:	Wed, 16 Sep 2015 10:10:00 +0300

> From: Janis Papanagnou <address@hidden>
> Date: Tue, 15 Sep 2015 23:35:58 +0200
> 
> > The problem is that you're feeding gawk invalid multibyte data for
> > the locale you're in. When gawk tries to figure out where, in terms of
> > characters, the match starts, it gets confused because of this invalid
> > data.
> 
> Obviously.
> 
> My view is that (a) I expect *consistency* in the functions, and (b) I should
> be able to process any data (from unknown locales). I can achieve (b) by
> the two means I posted, so *functionally* I'm fine now. I think that (a)
> should be addressed (i.e. a consistent implementation that does not
> "confuse" awk, and let awk's set of functions work with the same "metric").

You cannot have locale-independent processing as long as Gawk relies
on locale-dependent functions such as mbrtowc, mbrlen, and strcoll.
If we want to be locale-independent, we need to have
locale-indifferent versions of those functions (and others like
them).  And even then, some users will _want_ locale dependency,
e.g. when sorting text or displaying date/time values.

So you are asking for something that is (a) a lot of work, and (b) is
practically an unreachable goal, if you insist on 100% locale
independence.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [bug-gawk] Problem with substr() after match() with non-ASCII characters, Janis Papanagnou, 2015/09/15
- Re: [bug-gawk] Problem with substr() after match() with non-ASCII characters, Eli Zaretskii <=
  - Re: [bug-gawk] Problem with substr() after match() with non-ASCII characters, Janis Papanagnou, 2015/09/16
    - Re: [bug-gawk] Problem with substr() after match() with non-ASCII characters, Hermann Peifer, 2015/09/16

Prev by Date: Re: [bug-gawk] Handling hexadecimals in different modes
Next by Date: Re: [bug-gawk] Problem with substr() after match() with non-ASCII characters
Previous by thread: Re: [bug-gawk] Problem with substr() after match() with non-ASCII characters
Next by thread: Re: [bug-gawk] Problem with substr() after match() with non-ASCII characters
Index(es):
- Date
- Thread