lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] VCS caching


From: Greg Chicares
Subject: Re: [lmi] VCS caching
Date: Fri, 13 Apr 2018 16:12:41 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0

On 2018-04-13 14:14, Vadim Zeitlin wrote:
> On Fri, 13 Apr 2018 11:32:43 +0000 Greg Chicares <address@hidden> wrote:
[...]
> GC> It seemed weird to iterate through `git submodule status` when git 
> provides
> GC> powerful commands that would appear to do that for us, e.g.:
> GC> 
> GC>   git clone --depth 1 --shallow-submodules --recurse-submodules \
> GC>     file:///cache_for_lmi/vcs/wxWidgets
> GC> 
> GC> But that command clones submodules from github,
> 
>  Yes, I don't think there is any way to tell git-clone to get submodules
> from anywhere else but .gitmodules file in the repository being cloned
> itself, which is why I had to split the process in 2: clone the super
> repository first and then submodules. Of course, this is what "git clone
> --recurse-submodules" does internally anyhow, so it's not such a big deal.

'git clone --recurse-submodules' works like this:

  for(auto& i : submodules)
    git submodule update i;

But to use a cached bare repository instead of a remote internet server,
we need

  for(auto& i : submodules)
    {
    i.replace_github_with_local_cache(); // we need to do this ourselves
    git submodule update i;
    }

and you had to write that loop explicitly because git provides no option
to override the remote server. Maybe they'll add that someday: it's been
discussed numerous times on stackoverflow e.g., and it's a pretty common,
reasonable use case.

> GC> even though it doesn't even
> GC> mention github...because '.gitmodules' does. I tried using 'git config' to
> GC> reset my local mirror, e.g.:
> GC>   git config submodule.src/png.url /cache_for_lmi/vcs/libpng.git
> GC> but I discovered that '.gitmodules' governs--as I suppose it must, because
> GC> different wx SHA1s may use different submodule SHA1s--so that's why it 
> makes
> GC> sense that '.gitmodules' file is in the repository rather than in 
> $GIT_DIR.
> 
>  The latter makes sense precisely because you wouldn't be able to clone a
> repository with the submodules otherwise. The former is not really correct,
> however: .git/config submodules option has precedence over .gitmodules and
> this is why "git config submodule.$subpath.url" command in install_wx.sh
> actually works, i.e. the submodule will be cloned from the given URL after
> it's given.

What I meant is that the upstream cloned-from repository's .gitmodules takes
precedence over 'git config' changes to that repository. If we use 'git config'
to modify the downstream cloned-to repository, modifications there do override
the upstream .gitmodules file.

> Note, however, that the submodule URL can only be configured
> like this before it is initialized using "git submodule init" or "git
> submodule update --init", so it can't work _after_ "git clone
> --recurse-submodules", which initializes all the submodules.

Yes: a chicken-and-egg problem. That's why it would be nice to have a
git-clone option to override the remote server:
  git clone --recurse-submodules --override_module_url="/my/cached/vcs"
which would have the effect of inserting
  s!https://github.com/wxWidgets!/my/cached/vcs!
into the recursion, before git-submodule-update is invoked with an
implicit parameter of the upstream .gitmodules .

> GC> Even if I split the command above into two steps--clone wx, then handle 
> the
> GC> submodules with a single 'git submodule update --recursive' command, it
> GC> didn't work properly.
> 
>  I'm not sure what exactly didn't work, but this looks very similar to what
> install_wx.sh does, so it really ought to work if done in the right order.

I must have failed to override the URLs correctly, because you're right:

/tmp/vcs[0]$cd /tmp; rm -rf /tmp/vcs/wxWidgets; cd /tmp/vcs
/tmp/vcs[0]$git clone --shared /cache_for_lmi/vcs/wxWidgets.git wxWidgets
/tmp/vcs[0]$cd wxWidgets
/tmp/vcs/wxWidgets[0]$git config submodule.3rdparty/catch.url 
/cache_for_lmi/vcs/Catch.git
/tmp/vcs/wxWidgets[0]$git config submodule.src/expat.url 
/cache_for_lmi/vcs/libexpat.git  
/tmp/vcs/wxWidgets[0]$git config submodule.src/jpeg.url 
/cache_for_lmi/vcs/libjpeg-turbo.git
/tmp/vcs/wxWidgets[0]$git config submodule.src/png.url 
/cache_for_lmi/vcs/libpng.git        
/tmp/vcs/wxWidgets[0]$git config submodule.src/tiff.url 
/cache_for_lmi/vcs/libtiff.git
/tmp/vcs/wxWidgets[0]$git config submodule.src/zlib.url 
/cache_for_lmi/vcs/zlib.git   
/tmp/vcs/wxWidgets[0]$git submodule update --recursive --init                   
   

and now 'git submodule status' shows exactly the same output I see in the
complete build that I did the other day.

But I think we should move 'git submodule update' out of the loop. Today
we can already use its '--jobs' parameter to update submodules in parallel.
Then maybe someday git will learn an option like '--override_module_url'
above, and we can eliminate the loop entirely. Specifically, I'd change:

-   git submodule update --init "$subpath"
 done
+
+git submodule update --recursive --jobs $(nproc) --init

('nproc' is in GNU coreutils, so everyone should have it.)

> GC> Because this is the only way to learn git and I was already engaged in 
> it, I
> GC> decided to explore further. Running 'install_wx.sh' a couple days ago, I
> GC> measure this disk usage:
> GC>   $du -sh /opt/lmi/vcs/wxWidgets
> GC>   767M    /opt/lmi/vcs/wxWidgets
> GC> and I remember that getting everything from github took several minutes. 
> With
> GC> that as the baseline, how might I most effectively use a local bare 
> repository?
> 
>  This is a question I didn't even ask because for me the only important
> overhead to avoid is that of doing a remote clone. Cloning a local
> repository is fast enough to not matter.

Oh. I thought you were concerned about the extra hundreds of MiB on disk,
too, so I was trying to minimize both time and disk usage.

> GC> I realize you're suggesting we use a local non-bare repository and
> GC> build there, but first I want to explore the idea of using a cached
> GC> bare repository as a drop-in replacement for github--i.e.,
> GC>   wx_git_url="/cache_for_lmi/vcs/wxWidgets.git" install_wx.sh
> 
>  FWIW this is exactly how I use the script myself, except that I use a
> repository on another machine on the LAN because cloning from LAN is still
> fast enough, even if it's slower than using local file system, of course.

I'm glad you said that, so that I don't accidentally break your use case,
which differs slightly from mine, which probably differs from Kim's.

I want to maintain a perfectly clean repository on the base system that
hosts my chroots, and never do any git operation there except clone,
fetch, fsck, and maybe prune and gc--so I know it's always good. Then I
could mount that inside each chroot, though in practice I copy it in.

Kim might do the same thing, except with exactly one bare repository
in "C:\cache_for_lmi\vcs".

We also want lmi's msw installation process (in file 'INSTALL') to work
if no cache yet exists.

> GC> How about '--shared'? IOW:
> GC>   git clone --shared "$wx_git_url" ${wx_dir##*/}
> GC> I don't think '--shared' is risky at all for our use case: this is a 
> throwaway
> GC> clone, used only for building a particular version of wx, one time only. 
> The
> GC> disk savings are the same as '--reference' above (which makes sense 
> because
> GC> one implies the other):
> GC>   410M    /cache_for_lmi/vcs
> GC>   2.1M    /tmp/vcs/wxWidgets/.git
> GC>   215M    /tmp/vcs/wxWidgets
> GC> and it takes only 2.670 s. This seems like a win.
> 
>  Yes, I think we lose nothing by adding --shared option to install_wx.sh as
> it will be just ignored when the source repository is not local.

Okay, I'll add '--shared'. I hadn't done that because I didn't know whether
it would actually be ignored in the non-local case: I couldn't find that in
the manual, and neither could this person:

https://superuser.com/questions/778323/
| the documentation starts with "When the repository to clone is on the local
| machine...". clone -s works just as well via SSH, but that leaves me wondering
| what happens if the repository to clone is not on the local machine.

[the question was never answered there]

> GC> Use '--separate-git-dir'?
> GC>   $git init --separate-git-dir /cache_for_lmi/vcs/wxWidgets.git 
> /tmp/vcs/wxWidgets
> 
>  Hmm, I don't think you want to do this: I've never used this option of
> git-init before, but from its description, it doesn't create a new
> repository at all, so why would you use it here?

Precisely because it doesn't create a new repository--which we don't need.

/tmp/vcs[0]$cd /tmp; rm -rf /tmp/vcs/wxWidgets; cd /tmp/vcs
/tmp/vcs[0]$git init --separate-git-dir /cache_for_lmi/vcs/wxWidgets.git 
/tmp/vcs/wxWidgets
Reinitialized existing Git repository in /cache_for_lmi/vcs/wxWidgets.git/
/tmp/vcs[0]$cd wxWidgets 
/tmp/vcs/wxWidgets[0]$ls -Ao 
total 4
-rw-r--r-- 1 greg 41 Apr 13 15:28 .git
/tmp/vcs/wxWidgets[0]$cat .git 
gitdir: /cache_for_lmi/vcs/wxWidgets.git

Then this command creates a working copy:

/tmp/vcs/wxWidgets[0]$git checkout .

and this one gives us the SHA1 we want:

/tmp/vcs/wxWidgets[0]$git checkout e38866d3a6
HEAD is now at e38866d3a6 Merge branch 'lzma'

>  If you just want a new working tree, you should use git-worktree instead.
> This should be the most space- and time-efficient way to do it,

I'm guessing that git-worktree is just a layer of porcelain to automate
what I did above. What could be more efficient than a 41-byte '.git' file?

> but it
> won't work with remote repositories, so install_wx.sh would have to be made
> more complicated to use git-worktree for local repositories or the current
> code for the remote ones and IMHO it's not worth it, when the current
> version works reasonably well in both cases.

So git-worktree, and the worktree simulation above, would break your
use case. Similarly, it wouldn't work for Kim if for some reason she
placed her cache directory on another drive. Therefore, that's not a
feasible solution.

How do we make this work robustly in 'install_msw.sh', which says only
  ./install_wx.sh
today? I don't know where your LAN repository is, so I don't know how
to detect it--should we inspect some environment variable, or should we
assume that you'll mount it under '/cache_for_lmi', which is hardcoded
in 'install_msw.sh' anyway?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]