[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
findutils-4.1.20: a comment on xargs.c arg_max
From: |
Nelson H. F. Beebe |
Subject: |
findutils-4.1.20: a comment on xargs.c arg_max |
Date: |
Thu, 9 Dec 2004 08:00:44 -0700 (MST) |
The code for findutils-4.1.20/xargs/xargs.c contains this fragment:
/* Sanity check for systems with huge ARG_MAX defines (e.g., Suns which
have it at 1 meg). Things will work fine with a large ARG_MAX but it
will probably hurt the system more than it needs to; an array of this
size is allocated. */
if (arg_max > 20 * 1024)
arg_max = 20 * 1024;
In earlier releases of GNU findutils, I had made a local modification
to comment out that statement.
We no longer live in a PDP-11 world, and modern systems often have
many gigabytes of main memory. It seems utterly draconian to limit
arg_max to 20KB, and I'm very sceptical that larger values will "hurt
the system". The only computer systems where memory space is likely
to be sharply limited today are embedded systems, programmed by a
small number of people under careful controlled resource limits.
Computers are supposed to work for people, not the other way around.
When the environment size is restricted, users suffer from nonsense
like this (from an SGI IRIX 6.5 system):
% find /usr/include/ -type f | xargs grep frobnitz
xargs: environment is too large for exec
% which xargs
/usr/local/bin/xargs
% /usr/local/bin/xargs --version
/usr/local/bin/xargs: environment is too large for exec
# How big is the environment?
% env | wc -c
5768
# What is the POSIX minimum?
% getconf _POSIX_ARG_MAX
4096
Most GNU packages follow an important design principle of "no
arbitrary limits" on the size of objects. findutils should too.
Please consider removing the 20KB limit, and making the code more
robust against large environment areas, as shown below.
As an experiment, I've just rebuilt findutils-4.1.20 on that system,
and changed the xargs code like this:
% diff xargs.c.~1~ xargs.c
298a299,303
> (void)fprintf(stderr,"DEBUG: xargs: ARG_MAX = %18ld\n",
(long)ARG_MAX);
> (void)fprintf(stderr,"DEBUG: xargs: LONG_MAX = %18ld\n",
(long)LONG_MAX);
> (void)fprintf(stderr,"DEBUG: xargs: orig_arg_max = %18ld\n",
(long)orig_arg_max);
> (void)fprintf(stderr,"DEBUG: xargs: env_size(environ) = %18ld\n",
(long)ARG_MAX);
>
305a311,312
> (void)fprintf(stderr,"DEBUG: xargs: capped arg_max = %18ld\n",
(long)arg_max);
>
307a315,321
>
> (void)fprintf(stderr,"DEBUG: xargs: reduced arg_max = %18ld\n",
(long)arg_max);
>
> if (arg_max < 1024 * 1024)
> arg_max = 1024 * 1024;
> (void)fprintf(stderr,"DEBUG: xargs: expanded arg_max = %18ld\n",
(long)arg_max);
>
Here is what happens when I run it:
% find /usr/include -type f | ./xargs cat | wc -l
DEBUG: xargs: ARG_MAX = 5120
DEBUG: xargs: LONG_MAX = 2147483647
DEBUG: xargs: orig_arg_max = 3072
DEBUG: xargs: env_size(environ) = 5120
DEBUG: xargs: capped arg_max = 3072
DEBUG: xargs: reduced arg_max = -2724
DEBUG: xargs: expanded arg_max = 1048576
680880
The origin of the "environment is too large for exec" diagnostic and
immediate exit is now clear: the reduced arg_max is negative.
Guaranteeing a minimum of 1MB worked around the problem, and xargs ran
correctly, compared to what SGI's version does:
% /bin/find /usr/include -type f | /bin/xargs cat | wc -l
680880
I have a large collection of architectures to test code on, including
all of the major Unix flavors on all of the major CPU types, and will
be happy to assist in any testing that such changes might entail.
Thanks to the simh and Hercules simulator projects, I also now have
several historical Unix releases on simulated historical architectures
(PDP-11, Interdata-32, VAX, and soon, IBM S/360). On the VAX at
least, I have gcc-2.95, so it should be possible to build most modern
packages on it.
For reference, here are some snippets from POSIX (IEEE Std
1003.1-2001) volumes 1--4 about ARG_MAX:
8694 {ARG_MAX}
8695 Maximum length of argument to the exec functions including
environment data.
8696 Minimum Acceptable Value: {_POSIX_ARG_MAX}
8918 {_POSIX_ARG_MAX}
8919 Maximum length of argument to the exec functions
including environment data.
8920 Value: 4096
9565 The number of bytes available for the new process' combined
argument and environment lists is
9566 {ARG_MAX}. It is implementation-defined whether null
terminators, pointers, and/or any
9567 alignment bytes are included in this total.
9862 [E2BIG] The limit {ARG_MAX} applies not just to
the size of the argument list, but to
9863 the sum of that and the size of the
environment list.
28305 The number of bytes available for the child process' combined
argument and environment lists
28306 is {ARG_MAX}. The implementation shall specify in the system
documentation (see the Base
28307 Definitions volume of IEEE Std 1003.1-2001, Chapter 2,
Conformance) whether any list
28308 overhead, such as length words, null terminators, pointers, or
alignment bytes, is included in
28309 this total.
39989 The standard developers considered requiring that setenv( )
indicate an error when a call to it
39990 would result in exceeding {ARG_MAX}. The requirement was
rejected since the condition might
39991 be temporary, with the application eventually reducing the
environment size. The ultimate
39992 success or failure depends on the size at the time of a
call to exec, which returns an indication of
39993 this error condition.
40395 The generated command line length shall be the sum of the
size in bytes of the utility name and
40396 each argument treated as strings, including a null byte
terminator for each of these strings. The
40397 xargs utility shall limit the command line length such that
when the command line is invoked,
40398 the combined argument and environment lists (see the exec
family of functions in the System
40399 Interfaces volume of IEEE Std 1003.1-2001) shall not exceed
{ARG_MAX}-2048 bytes. Within
40400 this constraint, if neither the -n nor the -s option is
specified, the default command line length
40401 shall be at least {LINE_MAX}.
40526 On implementations with a large value for {ARG_MAX},
xargs may produce command lines
40527 longer than {LINE_MAX}. For invocation of utilities, this
is not a problem. If xargs is being used
40528 to create a text file, users should explicitly set the
maximum command line length with the -s
40529 option.
40579 The requirement that xargs never produces command lines such
that invocation of utility is
40580 within 2048 bytes of hitting the POSIX exec {ARG_MAX}
limitations is intended to guarantee
40581 that the invoked utility has room to modify its environment
variables and command line
40582 arguments and still be able to invoke another utility. Note
that the minimum {ARG_MAX}
40583 allowed by the System Interfaces volume of IEEE Std
1003.1-2001 is 4096 bytes and the
40584 minimum value allowed by this volume of IEEE Std 1003.1-2001
is 2048 bytes; therefore, the
40585 2048 bytes difference seems reasonable. Note, however, that
xargs may never be able to invoke a
40586 utility if the environment passed in to xargs comes close to
using {ARG_MAX} bytes.
829 There are no explicit limits in IEEE Std 1003.1-2001 on the
sizes of names, words (see the
830 definition of word in the Base Definitions volume of IEEE Std
1003.1-2001), lines, or other
831 objects. However, other implicit limits do apply: shell script
lines produced by many of the
832 standard utilities cannot exceed {LINE_MAX} and the sum of
exported variables comes under
833 the {ARG_MAX} limit. Historical shells dynamically allocate
memory for names and words and
834 parse incoming lines a character at a time. Lines cannot have
an arbitrary {LINE_MAX} limit
835 because of historical practice, such as makefiles, where make
removes the <newline>s associated
836 with the commands for a target and presents the shell with one
very long line. The text on
837 INPUT FILES in the Shell and Utilities volume of IEEE Std
1003.1-2001, Section 1.11, Utility
838 Description Defaults does allow a shell to run out of memory,
but it cannot have arbitrary
839 programming limits.
9170 {ARG_MAX}
9171 This is defined by the System Interfaces volume of IEEE Std
1003.1-2001. Unfortunately, it is
9172 very difficult for a conforming application to deal with
this value, as it does not know how
9173 much of its argument space is being consumed by the
environment variables of the user.
9228 There are different limits associated with command lines and
input to utilities, depending on the
9229 method of invocation. In the case of a C program exec-ing a
utility, {ARG_MAX} is the
9230 underlying limit. In the case of the shell reading a script and
exec-ing a utility, {LINE_MAX}
9231 limits the length of lines the shell is required to process, and
{ARG_MAX} will still be a limit. If a
9232 user is entering a command on a terminal to the shell,
requesting that it invoke the utility,
9233 {MAX_INPUT} may restrict the length of the line that can be
given to the shell to a value below
9234 {LINE_MAX}.
11574 {ARG_MAX}
11575 The current minimum is likely to need to be increased
for profiles, particularly as larger
11576 amounts of information are passed through the
environment. Many implementations are
11577 believed to support larger values.
-------------------------------------------------------------------------------
- Nelson H. F. Beebe Tel: +1 801 581 5254 -
- University of Utah FAX: +1 801 581 4148 -
- Department of Mathematics, 110 LCB Internet e-mail: address@hidden -
- 155 S 1400 E RM 233 address@hidden address@hidden -
- Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe -
-------------------------------------------------------------------------------
- findutils-4.1.20: a comment on xargs.c arg_max,
Nelson H. F. Beebe <=