gnulib-tool-py
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gnulib-tool-py] from func_import to GNULibImport class


From: Bruno Haible
Subject: Re: [gnulib-tool-py] from func_import to GNULibImport class
Date: Fri, 04 May 2012 03:30:11 +0200
User-agent: KMail/4.7.4 (Linux/3.1.10-1.9-desktop; KDE/4.7.4; x86_64; ; )

Hi Dmitriy,

> methods for --import, --add-import,
> --remove-import, --update modes (I think it is a good idea to combine it
> all into one class because the most part of actions are the same for all
> these options).

Absolutely. Yes, these four options act very similarly.

But since this is a big piece of code, and its completion depends on
many other parts (reading of module descriptions, transitive closure
of module lists, conditional dependencies, ...), I would think that now
is a good occasion to create unit tests for some parts that you've
already written and that should work:
  - The --help option,
  - The --vbersion option,
  - The fact that the options --list, --find, ..., --create-testdir, ...
    are each valid but specifying any two of them at the same time will
    yield an error message.

If you don't do this now, the danger is that there are some loose ends
(pieces of unfinished or untested code in the already existing code) and
when you come back to this code in two or three months, you don't remember
which are these loose ends.

There are, of course, ways to mitigate this danger, such as putting
a complete(!) set of FIXMEs and TODOs and TOTESTs in various places.
But in my experience it's just more efficient to handle most of the
FIXMEs and TODOs and TOTESTs sooner than pushing them out - except
for those TODOs that are likely to be difficult and unlikely to be
important.

> However it is difficult to understand what code from lines
> 4141-4247 (as it appears in the latest version of gnulib-tool) does (here
> script defines my_sed_traces variable). Could someone help me with parsing?

These are 'sed' scripts. The reference manual is [1], especially the
chapter "sed Programs" [2].

The my_sed_traces variable could also be stored in a file (then we
would use 'sed -n -f filename' instead of 'sed -n -e "$my_sed_traces"'.

Each of the statements denotes a processing done on each input line.
For example, the first line
  s,#.*$,,
means: If the line contains a '#' character, then delete it and the rest
of the line.

The second line
  s,^dnl .*$,,
means: If the line starts with the string "dnl ", then delete the entire
contents of the line.

The fourth statement

      /gl_LOCAL_DIR(/ {
        s,^.*gl_LOCAL_DIR([[ ]*\([^]"$`\\)]*\).*$,cached_local_gnulib_dir="\1",p
      }

means: If the line contains the string 'gl_LOCAL_DIR(' then extract
the part within parentheses that does not contain ] or " or $ or ` or \
or ) characters, put it in double-quotes, prefix it with as
cached_local_gnulib_dir= and print it. The 'eval' later evaluates it,
thus assigning to the shell variable 'cached_local_gnulib_dir'.

In Python, you don't want to have such a thing as 'eval'. We used 'sed'
and 'eval' only because that's the fastest way to handle a file line-by-line
with complicated regular expression matching.

You can drop the test for " or $ or ` or \ here, because these characters
were rejected only to avoid shell syntax errors during 'eval'.

So, what this boils down to, is:
  - Search for the string 'gl_LOCAL_DIR('.
  - After this string, drop opening brackets and whitespace (in any order).
  - Then collect characters until you see a closing bracket or a closing
    parenthesis.
  - Then drop any immediately following closing brackets.
  - Then you should see a closing parenthesis.
  - If this parsing succeeds, take the collected characters and assign them
    to the variable 'cached_local_gnulib_dir'. If not, just do nothing and
    keep the line unchanged.

You are free to implement this in Python through hand-crafted processing
code, or through regular expressions, whichever you find better.

The processing unit in 'sed' normally is a line. But here we have a loop
that concatenates adjacent lines in some case:

      /gl_MODULES(/ {
        ta
        :a
          s/)/)/
          tb
          N
          ba
        :b
        s,^.*gl_MODULES([[ ]*\([^]"$`\\)]*\).*$,cached_specified_modules="\1",p
      }

(Yes, this is a loop: The 'ba' statement means "goto :a". And the 'tb'
statement means: If the previous s/)/)/ statement succeeded in finding a
closing parenthesis and replacing it with itself, then "goto :b". The 'N'
command does the line concatenation.)

When you test your rewrite, take a sample gnulib-cache.m4 file such as [3].

The second my_sed_traces is similar, but parses a gnulib-comp.m4 file such
as [4]. Its first 3 statements eliminate shell comments and m4 comments.
Then it looks for a line that contains 'AC_DEFUN([gl_FILE_LIST], ['
(more precisely, instead of 'gl' use the value of the variable
cached_macro_prefix), processes the following lines up to the closing
parenthesis, and accumulates all these lines in the variable
'cached_files'.

I could only give the raw explanation. For details, it's good to consult
the 'sed' manual, but since 'sed' programming is so tricky, it's even better
to try it on real examples. Put the script into a file and do
  $ sed -n -f filename < inputfile

Bruno

[1] http://www.gnu.org/software/sed/manual/html_node/index.html
[2] http://www.gnu.org/software/sed/manual/html_node/sed-Programs.html
[3] 
http://git.savannah.gnu.org/gitweb/?p=libidn.git;a=blob_plain;f=gl/m4/gnulib-cache.m4;hb=HEAD
[4] 
http://git.savannah.gnu.org/gitweb/?p=libidn.git;a=blob_plain;f=gl/m4/gnulib-comp.m4;hb=HEAD




reply via email to

[Prev in Thread] Current Thread [Next in Thread]