bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gnulib-tool: .XXXignore files generation


From: Bruno Haible
Subject: Re: gnulib-tool: .XXXignore files generation
Date: Thu, 28 Jun 2018 01:24:15 +0200
User-agent: KMail/5.1.3 (Linux/4.4.0-128-generic; KDE/5.18.0; x86_64; ; )

Hi Dmitry,

> it was 
> designed so that it is generic and can be applied both to git and CVS with 
> small changes, which is a nice property

Yes. Although CVS is dead for several years already, it's good to keep it like
this, so that it may be easier to add support for hg, svn, or other future
version control systems.

> ). Could someone describe how it works? 
> As far as I understand, we must iterate over added_files and removed_files so 
> that on each iteration we yield the following information:

In current gnulib-tool this is line 5744..5842.

We start by collecting the list of added and removed files. For simplicity of
the next steps, we combine them in a single list, with a marker 'A' for added
or 'R' for removed file.

Then, because there is a .cvsignore or .gitignore per directory, we have
to split the list of files according to directories. For example, if we
have
    m4/gnulib-comp.m4   (added)
    m4/foo.m4           (added)
    lib/sub/bar.c       (removed)
    lib/sub/baz.c       (removed)
we prepare for 2 operations:
  - one operation on m4/.gitignore
  - one operation on lib/sub/.gitignore

Why not 2 individual operations on m4/.gitignore?
  1. because we want to keep a backup file, and creating a backup file
     for the first modification but not for the second one is more complex
     logic,
  2. for speed: When we add 100 entries to a file that has already 100 entries
     the computation time should be O(100+100), not O(100*100).

So, how do we split the list into list per directory? In Python you surely
have adequate list operations. In bash, which lacks such list operations,
I used a line-by-line logic:

    After line                        we do
    ---------------------------
    m4/gnulib-comp.m4   (added)
                                      current dir is m4, file list is 
"gnulib-comp.m4   (added)"
    m4/foo.m4           (added)
                                      current dir is m4, file list is 
"gnulib-comp.m4   (added), foo.m4           (added)"
    lib/sub/bar.c       (removed)
                                      invoke foo_done_dir for m4
                                      current dir is lib/sub, file list is 
"bar.c    (removed)"
    lib/sub/baz.c       (removed)
                                      current dir is lib/sub, file list is 
"bar.c    (removed), baz.c    (removed)"
    EOF
                                      invoke foo_done_dir for lib/sub

Inside foo_done_dir, handle .cvsignore and .gitignore separately.

For the list of added ignores (in the particular directory), we
eliminate those that are already listed (since there's no point
in introducing duplicates in the .cvsignore or .gitignore file);
this is the "join -v 1" command.
The list of removed ignores (in the particular directory) is also needed.

We modify the .cvsignore or .gitignore file only if one of these two lists
(or both) is non-empty.

Before doing the modification, we create the backup file, as usual.

Then, we handle the added ignores. We don't know in which order the developer
would like to have the file sorted, therefore we DON'T sort .cvsignore or
.gitignore; we only sort the part that we add and we add the new lines at the
end. The developer will have to adjust it if he likes to. But this is better
than to destroy an existing structure of that file.
For .cvsignore, it means we would add lines
  foo.m4
  gnulib-comp.m4
whereas in .gitignore we need to add lines
  /foo.m4
  /gnulib-comp.m4

The removals are done by creating a sed script. For the example above, it
would contain 2 lines. namely
for .cvsignore:
  /^bar\.c$/d
  /^baz\.c$/d
which eliminates all "bar.c" and "baz.c" lines;
for .gitignore:
  /^\/bar\.c$/d
  /^\/baz\.c$/d
which eliminates all "/bar.c" and "/baz.c" lines.

In Python, maybe you don't need to go through a sed script but can do this
all in one run. If you do so and your filter function eliminates lines with
a given contents, you also don't need the regular expressions and the implied
backslashing; just use a hashed set
  { "bar.c", "baz.c" }
or
  { "/bar.c", "/baz.c" }
respectively, and process the .cvsignore or .gitignore file line by line with
this filter.

Bruno




reply via email to

[Prev in Thread] Current Thread [Next in Thread]