lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] VCS caching


From: Vadim Zeitlin
Subject: Re: [lmi] VCS caching
Date: Fri, 13 Apr 2018 19:56:55 +0200

On Fri, 13 Apr 2018 16:12:41 +0000 Greg Chicares <address@hidden> wrote:

GC> On 2018-04-13 14:14, Vadim Zeitlin wrote:
GC> > On Fri, 13 Apr 2018 11:32:43 +0000 Greg Chicares <address@hidden> wrote:
GC> [...]
GC> > GC> It seemed weird to iterate through `git submodule status` when git 
provides
GC> > GC> powerful commands that would appear to do that for us, e.g.:
GC> > GC> 
GC> > GC>   git clone --depth 1 --shallow-submodules --recurse-submodules \
GC> > GC>     file:///cache_for_lmi/vcs/wxWidgets
GC> > GC> 
GC> > GC> But that command clones submodules from github,
GC> > 
GC> >  Yes, I don't think there is any way to tell git-clone to get submodules
GC> > from anywhere else but .gitmodules file in the repository being cloned
GC> > itself, which is why I had to split the process in 2: clone the super
GC> > repository first and then submodules. Of course, this is what "git clone
GC> > --recurse-submodules" does internally anyhow, so it's not such a big deal.
GC> 
GC> 'git clone --recurse-submodules' works like this:
GC> 
GC>   for(auto& i : submodules)
GC>     git submodule update i;
GC> 
GC> But to use a cached bare repository instead of a remote internet server,
GC> we need
GC> 
GC>   for(auto& i : submodules)
GC>     {
GC>     i.replace_github_with_local_cache(); // we need to do this ourselves
GC>     git submodule update i;
GC>     }
GC> 
GC> and you had to write that loop explicitly because git provides no option
GC> to override the remote server. Maybe they'll add that someday: it's been
GC> discussed numerous times on stackoverflow e.g., and it's a pretty common,
GC> reasonable use case.

 I don't think it's possible to add it, I thought about writing my own
Git subcommand for doing this and I just couldn't find any good interface
for it. The trouble is that you need to specify the URLs for all submodules
in some way and I just don't see any way to do it generically.

GC> What I meant is that the upstream cloned-from repository's .gitmodules takes
GC> precedence over 'git config' changes to that repository.

 Ah, yes, of course, sorry for misunderstanding you. Just for the record,
.git directory contents (be it the config file itself or info/exclude or
info/attributes or whatever) only ever affects the repository itself.

GC> Yes: a chicken-and-egg problem. That's why it would be nice to have a
GC> git-clone option to override the remote server:
GC>   git clone --recurse-submodules --override_module_url="/my/cached/vcs"
GC> which would have the effect of inserting
GC>   s!https://github.com/wxWidgets!/my/cached/vcs!
GC> into the recursion, before git-submodule-update is invoked with an
GC> implicit parameter of the upstream .gitmodules .

 Yes, I guess this is the only possibility, similarly to how
git-filter-branch can take an arbitrarily complex shell command to do
whatever it does. I don't consider git-filter-branch a particularly simple
command to do however and the best I can say in favour of its syntax is
that it's not used often and the syntax barrier is, perhaps, appropriate
for a command doing dangerous things like git-filter-branch does. I don't
think it would be such a great idea for git-clone...

GC> But I think we should move 'git submodule update' out of the loop. Today
GC> we can already use its '--jobs' parameter to update submodules in parallel.
GC> Then maybe someday git will learn an option like '--override_module_url'
GC> above, and we can eliminate the loop entirely. Specifically, I'd change:
GC> 
GC> -   git submodule update --init "$subpath"
GC>  done
GC> +
GC> +git submodule update --recursive --jobs $(nproc) --init
GC> 
GC> ('nproc' is in GNU coreutils, so everyone should have it.)

 Yes, using --jobs is a good idea, thanks. It's a relatively new option and
I still haven't got used to using it, partly because I did

        $ git config --global submodule.fetchJobs $(nproc)

on a couple of my machines and forgot everything about it afterwards.


GC> > GC> Because this is the only way to learn git and I was already engaged 
in it, I
GC> > GC> decided to explore further. Running 'install_wx.sh' a couple days 
ago, I
GC> > GC> measure this disk usage:
GC> > GC>   $du -sh /opt/lmi/vcs/wxWidgets
GC> > GC>   767M    /opt/lmi/vcs/wxWidgets
GC> > GC> and I remember that getting everything from github took several 
minutes. With
GC> > GC> that as the baseline, how might I most effectively use a local bare 
repository?
GC> > 
GC> >  This is a question I didn't even ask because for me the only important
GC> > overhead to avoid is that of doing a remote clone. Cloning a local
GC> > repository is fast enough to not matter.
GC> 
GC> Oh. I thought you were concerned about the extra hundreds of MiB on disk,
GC> too, so I was trying to minimize both time and disk usage.

 It's a nice bonus, but I didn't really think about it.

GC> Okay, I'll add '--shared'. I hadn't done that because I didn't know whether
GC> it would actually be ignored in the non-local case: I couldn't find that in
GC> the manual, and neither could this person:
GC> 
GC> https://superuser.com/questions/778323/
GC> | the documentation starts with "When the repository to clone is on the 
local
GC> | machine...". clone -s works just as well via SSH, but that leaves me 
wondering
GC> | what happens if the repository to clone is not on the local machine.
GC> 
GC> [the question was never answered there]

 OK, I wrote that it would be ignored just because it seemed like the only
logical thing it could do (other than giving an error), but now I've tested
this and I can confirm that the --shared option is indeed ignored for
non-local repositories.

GC> >  If you just want a new working tree, you should use git-worktree instead.
GC> > This should be the most space- and time-efficient way to do it,
GC> 
GC> I'm guessing that git-worktree is just a layer of porcelain to automate
GC> what I did above.

 It's not impossible, but this porcelain does add value, e.g. it also gives
you commands such as "git worktree list" and "git worktree prune".

GC> What could be more efficient than a 41-byte '.git' file?

 Well, you also needed 2 "git checkout" commands in addition to this file.
With git-worktree you can use just a single command which does everything.

GC> So git-worktree, and the worktree simulation above, would break your
GC> use case.

 Yes.

GC> Similarly, it wouldn't work for Kim if for some reason she placed her
GC> cache directory on another drive.

 I'm not sure why is it important for the directories to be on the same
drive?

GC> How do we make this work robustly in 'install_msw.sh', which says only
GC>   ./install_wx.sh
GC> today? I don't know where your LAN repository is, so I don't know how
GC> to detect it

 This isn't especially advanced, but I think something like this should do:

        case "$wx_git_url" in
                file://*)
                        is_local=1
                        ;;

                *:*)
                        is_local=0
                        ;;

                *)
                        is_local=1
                        ;;
        esac

This doesn't work correctly for local paths with colons in them, but I
think it's far from unreasonable to just forbid them.

 Of course, all this still assumes that it's actually worth to do all this
instead of just cloning using --shared and personally I don't think so...

VZ


reply via email to

[Prev in Thread] Current Thread [Next in Thread]