[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
improve substr
From: |
Eric Blake |
Subject: |
improve substr |
Date: |
Wed, 24 Dec 2008 23:23:22 +0000 (UTC) |
User-agent: |
Loom/3.14 (http://gmane.org/) |
POSIX is not very specific about negative arguments to substr. It is only
explicit that a positive second argument larger than the first argument's
length is okay (the empty string must silently result). Furthermore, BSD m4
segfaults on substr(abc,-2), which gives a bit of weight to the argument that
negative arguments aren't really standardized, so we might as well make
behavior nice.
Up till now, we've been silently returning the empty string if any negative
arguments occur, which matches Solaris m4, but is not very useful. So, I think
it's high time that we adopt perl's semantics for negative arguments
(from 'perldoc -f substr):
my $s = "The black cat climbed the green tree";
my $color = substr $s, 4, 5; # black
my $middle = substr $s, 4, -11; # black cat climbed the
my $end = substr $s, 14; # climbed the green tree
my $tail = substr $s, -4; # tree
my $z = substr $s, -4, 2; # tr
While it is true that this can be done with existing m4, it seems inefficient
to have to use this (lightly tested) code (the use of incr/decr strips leading
0 while preserving the sign, so that the argument -08 is parsed as decimal -8
as in the original substr, and not as an octal error as in eval):
define(`substr', `ifelse(`$#', `0', ``$0'',
`_substr1(`$1', incr(decr(`$2')),
ifelse(`$3', `', `len(`$1')', `incr(decr(`$3'))'), len(`$1'))')')
define(`_substr1', `_substr2(`$1',
eval($2 < 0 ? ($2 + $4 < 0 ? 0 : $2 + 4) : $2),
`$3', `$4')')
define(`_substr2', `builtin(`substr', `$1', `$2',
eval($3 < 0 ? ($3 + $4 - $2 < 0 ? 0 : $3 + $4 - $2) : $3))')
Also, perl's use of an optional fourth argument to be spliced into the original
string is cool. Perl only allows a fourth argument when substr is used on an
lvalue, which doesn't translate very well to m4, but m4 could treat it roughly
like the following (untested):
define(`substr', `ifelse(`$#', `4',
`$0(`$1', `0', `$2')$4`'$0(`$1', eval($2 + $3))',
`builtin(`$0', $@)')
but with support for negative arguments, expecting decimal arguments, and
issuing a warning like perl if the entire substring selected lies outside the
original string.
If there are no newlines, this could also be achieved on the master branch with
an extended regular expression, although that is probably slower:
define(`substr', `ifelse(`$#', `4',
`patsubst(`$1', `^(.{$2}).{$3}', `\1$4', `extended')'
`builtin(`$0', $@)')
Again, implementing this natively will be more efficient. What do you think of
adding these two enhancements to substr?
--
Eric Blake
- improve substr,
Eric Blake <=