[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] IBM z/OS + EBCDIC support

From: Daniel Richard G.
Subject: Re: [PATCH] IBM z/OS + EBCDIC support
Date: Tue, 22 Sep 2015 16:37:02 -0400

Hi Paul,

On Tue, 2015 Sep 22 12:32-0700, Paul Eggert wrote:
> Thanks for looking into this.  I have some questions about the c-ctype
> changes.  It appears that the proposed patch defers to the system
> functions (which use the current locale), but that's not the intent of
> c-ctype: it's supposed to correspond to a stripped down POSIX "C"
> locale regardless of the current locale settings.  Is there something
> special in z/OS that requires using the system functions?  (E.g., does
> the "C" locale behave differently depending on some *other* setting
> regarding character set?)

Mainly, it was the attempt to answer the question "so what specific
variant of EBCDIC are we going to target here?" that led me to use
the system functions. EBCDIC-1047 is favored in z/OS, but EBCDIC-037
is also popular, and then there are the Russian/Japanese/etc. code
pages that some far-flung users might want. However, unlike "normal"
8-bit encodings like ISO 8859-#, KOI8-R et al., there is no agreement
in the 7-bit range, and even ASCII characters like "[" and "]" are
not consistently encoded between EBCDIC variants. We don't have the
option of saying, "Okay, screw all that, we'll just limit ourselves
to this common subset," unless said subset excludes things like
punctuation marks.

My view is, it's not worth the hassle. Yes, c-ctype is not supposed to
be locale-dependent. It's going to be a lot more work, and a lot more
code to maintain to overcome that, and it's not likely the users of
these systems will see a corresponding benefit. I think it would be
better to have this for now---it's better than nothing---and if a clear
need arises in the future for locale-independent behavior on z/OS
(possibly by selecting an EBCDIC variant at compile time), then cross
that bridge then.

> With the above in mind, it's not clear what c_isascii should do.
> Should it return 1 for bytes in the range 0..127, or for bytes that
> correspond to ASCII bytes if one assumes the standard translation
> from EBCDIC code page 037 to ASCII?  (Is there a standard?)  If the
> former, the current code is OK; if the latter, does the system
> isascii always return the same results regardless of locale and do
> these results make sense?

The latter behavior is the right one, IMO. If the former, there wouldn't
even be a point to having an isascii() function at all; you would just
do a range check.

Yes, there's a standard... a whole smorgasbord to choose from ^_^

The system isascii() function is locale-dependent. With "[" and "]"
depending on that, I don't see a way to get around this, unless you
deliberately support one EBCDIC variant at the expense of all others.


> Anyway, in looking through the code I see that it's hard to test a port 
> to EBCDIC because it uses ifdef rather than if, and I do see some 
> promotion bugs that you noted but we can fix these with inline functions 
> rather than macros (cleaner and safer nowadays), and there are a few 
> other style glitches (e.g., boolean values, overuse of >=) so I 
> installed the attached patch.  This patch assumes EBCDIC control 
> characters are either less than ' ' or are all 1 bits, which I think is 
> right.  The patch also tightens up the tests a bit.

Yes, all control characters appear to be in [\x00-\x3F], but not
everything in that range is a control character. (I remember 0x04 was
not.) I tried making c_iscntrl() a simple range check at first, but that
did not agree with the system iscntrl().

> This patch doesn't address the isascii problem, nor the "something 
> special in z/OS" problem, so quite possibly further patches will be 
> needed to this module.
> Email had 1 attachment:
> + 0001-c-ctype-port-better-to-EBCDIC.patch
>   21k (text/x-patch)

I'll be happy to test your [revised] patch this evening.


Daniel Richard G. || address@hidden
My ASCII-art .sig got a bad case of Times New Roman.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]