Mike Fulton wrote:
> One of the things I would like to do on z/OS is be able to exploit our file
> tagging
> capabilities in the file system for iconv.
>
> For most cases, C programs can use 'auto-conversion' to convert files from
> various EBCDIC SBCS code pages to ISO8859-1, but this only works if the
> file is tagged with a CCSID, otherwise the file is treated as binary.
You are saying that many files on that system have an out-of-band indication
of the charset (like the xattrs on Linux or the data fork on macOS
<https://en.wikipedia.org/wiki/Resource_fork>)?
Right. we have chtag as a command as well as corresponding library functions
for setting and querying the CCSID of a stream.
In this example, I have 2 files - one ASCII and one EBCDIC:
FULTONM@ZOSCAN2B bash /tmp/tagged> ls -T
t ISO8859-1 T=on FileA.txt
t IBM-1047 T=on FileB.txt
FULTONM@ZOSCAN2B bash /tmp/tagged> cat FileA.txt FileB.txt
This is File A
This is File B
The system has an environment variable you can set: _BPXAUTOCVT=ON
and it will do 'autoconversion' for you. There are a variety of environment variables
that I describe briefly in my blog:
What happens when a user does
$ cat file1 file2 > file3
and file1 and file2 have different encodings specified? Does 'cat' do
the conversion it its source code, or is the open() / fopen() call
triggering the conversion?
Yes - the underlying C open/write code in cat is aware of the environment variables.
Not all C code is. One of the reasons we are porting the various low level tools is
to improve this experience across the board for z/OS users so that it 'just works'.
And has 'cat' been modified to add a charset indicator on file3
upon close() / fclose()?
Yes.
> I created a first 'proof of concept' patch for just IBM-1047 that works
> fine, but only for 1047:
> https://github.com/ZOSOpenTools/libiconvport/blob/main/tarball-patches/iconv.c.patch
> It would need to be fleshed out to properly support the other CCSIDs.
> I expect someone on z/OS has already done the mapping of iconv 'to' pages to
> integral CCSIDs, but if not, I could provide that.
>
> Is a z/OS specific enhancement something that would be considered for
> libiconv?
Yes, that could be considered.
The patch you showed looks reasonable.
For upstreaming, there are three important guidelines:
- Do assign the copyright to the FSF as soon as its is of legally
relevant size:
https://www.gnu.org/prep/maintain/html_node/Legally-Significant.html
- Use the same coding style as the surrounding package.
- Test the changes before you submit them.
If there is a lot of code for a specific platform to be integrated, I
_might_ request that it be separated out into a .h file.
It's not very much although depending on how I do the fix 'right' for the
encoding mapping, perhaps that might belong in a separate file, but
that's your call.
Also, I might request adding a unit test, since I don't want to write
a unit test for your code if, two years from now, someone reports a bug.
Will do. Is there a particular doc I should read that describes the process for
a unit test or should I just read the test harness code?
Bruno