bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18987: the bourne shell printf-vs-\xHH portability trap


From: Jim Meyering
Subject: bug#18987: the bourne shell printf-vs-\xHH portability trap
Date: Sun, 9 Nov 2014 10:19:57 -0800

2014-11-08 20:19 GMT-08:00 Jim Meyering <address@hidden>:
> On Sat, Nov 8, 2014 at 7:52 PM, Paul Eggert <address@hidden> wrote:
>>   hex_printf_()
>>   {
>>     hex_printf_format=$(printf '%s\n' "$1" | sed '
>>       s/^/_/
>>       s/$/_/
>>       s/\([^\\]\(\\\\\)*\\x\)\([0-9aAbBcCdDeEfF][^0-9aAbBcCdDeEfF]\)/\10\3/g
>>       s/\([^\\]\(\\\\\)*\\x\)\([0-3]\)/\10\3/g
>>       s/\([^\\]\(\\\\\)*\\x\)\([4-7]\)/\11\3/g
>>       s/\([^\\]\(\\\\\)*\\x\)\([89aAbB]\)/\12\3/g
>>       s/\([^\\]\(\\\\\)*\\x\)\([cCdDeEfF]\)/\13\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[048cC]\([0-7]\)/\1,0\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[048cC]\([89aAbBcCdDeEfF]\)/\1,1\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[159dD]\([0-7]\)/\1,2\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[159dD]\([89abcdef]\)/\1,3\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[26aAeE]\([0-7]\)/\1,4\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[26aAeE]\([89aAbBcCdDeEfF]\)/\1,5\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[37bBfF]\([0-7]\)/\1,6\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[37bBfF]\([89aAbBcCdDeEfF]\)/\1,7\3/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[08]/\1\3\40/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[19]/\1\3\41/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[2aA]/\1\3\42/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[3bB]/\1\3\43/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[4cC]/\1\3\44/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[5dD]/\1\3\45/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[6eE]/\1\3\46/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[7fF]/\1\3\47/g
>>       s/^_//
>>       s/_$//
>>     ')
>>     shift
>>     printf "$hex_printf_format" "$@"
>>   }
>
> How elegantly twisted ;-)
> I like it.
>
> Do you have time to write the complete patch?
> I'd like to make a pre-release snapshot tomorrow.

I tried it, and found that this new function makes the multibyte-white-space
test fail with GNU sed. Here's a simplified example showing where
it goes wrong. This shows that only the first \x285 is transformed
into \x2,05:

  $ printf '%s\n' '_\x285\x285\n_' \
     |sed 's/\([^\\]\(\\\\\)*\\x[0-3]\)[
048cC]\([0-7]\)/\1,0\3/g'
  _\x2,05\x285\n_

The intent was that it transform both, of course.
The trouble arises when the regexp consumes all 3 hex
digits.  Then there is no longer a non-backslash remaining
to be consumed on 2nd and subsequent iterations.

There is also a portability problem in that Solaris 5.10's /bin/sed
seems unable to handle some of that code. For example,
using that same example with its /bin/sed, neither \x285
string is transformed.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]