[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 0/4] faster gnulib-tool
From: |
Bruno Haible |
Subject: |
Re: [PATCH 0/4] faster gnulib-tool |
Date: |
Fri, 2 Jan 2009 02:17:30 +0100 |
User-agent: |
KMail/1.9.9 |
Hello Ralf,
Thank you for your speedups to gnulib-tool. At first I was, of course,
excited about the 2x speedup. But when looking at the maintainability
of the code that you propose, I'm not fine with all of it any more.
My four objections are:
1) You observe that forking programs in a shell script is slow, and
therefore propose to use more shell built-ins. The problem with it
is that I chose to implement gnulib-tool in shell (for the control
structure) and sed (for the text processing). The shell also has
some inferior commands for string processing, but sed is vastly
superior for this purpose. I want to stick with 'sed' for the
text processing, otherwise we write some parts of the code to use
shell built-ins, and when we notice that a little more text processing
is required, we have to rewrite the code to use 'sed'. So the use
of shell built-ins for text processing turns out to be a
"premature optimization" (in the sense of Knuth) and hampers
maintainability.
If you want to achieve good speedups for scripts that use 'sed':
can you work towards making 'sed' a bash built-in? This is challenging,
but if you are after performance, that would be promising.
2) Your patches change the generation of code so that it goes through
intermediate shell variables. The problem with this is that the
transformation from string to standard output is not simple:
echo $string
outputs the string plus a newline, and 'echo -n' is not portable.
3) The sed expression sed_cache_module in part 1 of your patch is not
maintainable. The sed_extract_prog was already complex, but what you
made of it is beyond what is acceptable in code that should be
maintained 5 and 10 years from now.
4) There is too much 'eval' in the code. As you have seen in an earlier
patch today, every use of 'eval' can bring a security problem. The
only uses of 'eval' that are always safe are variable assignments
eval "$var=\$value"
when you can guarantee that 'var' is a simple identifier.
And, last not least, more comments would have been better.
So, globally, when you try to cache multiline strings, read from files,
in variables with computed identifiers, you are going far beyond what
shell as a language is suitable for.
Unfortunately, I don't see a better choice as an implementation language
of gnulib-tool:
- Python is good for text processing but does incompatible changes
in the language definition every couple of years.
- Perl is excluded because of the misdesigned syntax, and it also
has incompatible changes e.g. between perl 5.6 and 5.8.
- Java is not good because although it is standardized and fast and
GNU has a free implementation of it, its text processing is not
expressive enough (too verbose).
- m4 is maybe powerful but too few people know how to program it.
Bruno
- Re: [PATCH 0/4] faster gnulib-tool,
Bruno Haible <=