[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: sed on binary files
Gary V. Vaughan
Re: sed on binary files
Thu, 2 Oct 2008 12:09:35 +0800
On 2 Oct 2008, at 10:51, Eric Blake wrote:
Is there any portable way to process files that contain NUL bytes?
None that I'm aware of. Many GNU utilities are reasonably well
behaved with respect to '\0', and m4 is unusual to some extent in that
we don't handle them well ourselves.
I'm working on making m4 1.6 transparently handle NUL,
Excellent! I made an attempt to do that myself on the 2.0 branch some
years ago, but it didn't go well so I never committed...
and want to
post-process the output to normalize error messages while still
that NUL bytes appeared where expected on stderr. But on Solaris, the
native sed strips NUL bytes before processing the line (NUL bytes
appear in text files, and POSIX does not define behavior on non-text
files, so this is not a bug, just a difference from GNU diff). As a
result, the m4 testsuite either fails (if I only postprocess the
stderr and not the expected error) or can have false positives (if
stderr and expected error are normalized, then regressions involving
or missing NUL are not detected). I don't want to require perl for
this one test; m4 seems fundamental enough to keep the testsuite
restricted to the GNU coding standards set of tools.
I'd be inclined to do that in C. A few lines should be sufficient to
write a minimal filter that writes '\' '0' or '^' '@' to output
whenever a NUL byte arrives?
The Solaris man
pages mention that /usr/xpg4/bin/tr can handle NUL bytes, but not
/usr/bin/tr; maybe I could search for an adequate tr, and change all
to some other byte that does not otherwise appear in my expected
(with the added benefit that diff might not give up early with the
complaint that the files are binary), but I don't know if that is
It's probably a safe bet that whatever vendor tool you rely on to
postprocess will do the wrong thing on one machine or another :(
Any suggestions? Is this worth documenting in the autoconf manual?
Certainly, especially since many of the GNU tools *do* endeavour to
handle '\0' input gracefully.
Email me: address@hidden (\(\
Read my blog: http://blog.azazil.net ( o.O)
And my other blog: http://www.machaxor.net (uu )o
...and my book: http://sources.redhat.com/autobook ("("_)