lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] VCS caching


From: Greg Chicares
Subject: Re: [lmi] VCS caching
Date: Fri, 13 Apr 2018 11:32:43 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0

On 2018-04-11 13:01, Vadim Zeitlin wrote:
> On Wed, 11 Apr 2018 12:46:59 +0000 Greg Chicares 
> <address@hidden> wrote:
[...]
> GC> Of course, if you already know the best way to achieve the
> GC> goal I'm trying to express, that spoiler would be welcome.
> 
> I'm afraid I don't even see the problem

I needed to learn more about git and explore various approaches in order to
understand why there actually isn't any problem.

First, although I already could read 'install_wx.sh', I wouldn't have known
how to write it--I didn't understand why you used git in one particular way
when there are other ways. And I could see numerous ways to speed it up by
caching a long-lived local bare repository, but didn't know how to choose
among them. For example...

It seemed weird to iterate through `git submodule status` when git provides
powerful commands that would appear to do that for us, e.g.:

  git clone --depth 1 --shallow-submodules --recurse-submodules \
    file:///cache_for_lmi/vcs/wxWidgets

But that command clones submodules from github, even though it doesn't even
mention github...because '.gitmodules' does. I tried using 'git config' to
reset my local mirror, e.g.:
  git config submodule.src/png.url /cache_for_lmi/vcs/libpng.git
but I discovered that '.gitmodules' governs--as I suppose it must, because
different wx SHA1s may use different submodule SHA1s--so that's why it makes
sense that '.gitmodules' file is in the repository rather than in $GIT_DIR.

Even if I split the command above into two steps--clone wx, then handle the
submodules with a single 'git submodule update --recursive' command, it
didn't work properly. This git modification seemed promising:
    https://github.com/git/git/commit/9671a76c174d9bd2b4f56243526fda51f9ff8e46
but I could only ever get the HEAD revision of submodules. These options...
    uploadpack.allowReachableSHA1InWant
    uploadpack.allowTipSHA1InWant
...might help get shallow clones, but submodule depth isn't really an issue
because the submodules are relatively small.

Eventually I realized that in 'install_wx.sh' you had already done the least
unappealing thing that actually works. Researching the online agonies of
others like me who have tried to get these powerful but half-baked commands
to "just work" has given me insight into how git evolves.

Because this is the only way to learn git and I was already engaged in it, I
decided to explore further. Running 'install_wx.sh' a couple days ago, I
measure this disk usage:
  $du -sh /opt/lmi/vcs/wxWidgets
  767M    /opt/lmi/vcs/wxWidgets
and I remember that getting everything from github took several minutes. With
that as the baseline, how might I most effectively use a local bare repository?

Use github as primary and local cache as '--reference'? Disk usage is lower
(at first, I had no idea why, but that mystery is resolved far below):
  410M    /cache_for_lmi/vcs
  2.1M    /tmp/vcs/wxWidgets/.git
  215M    /tmp/vcs/wxWidgets
and of course it's much faster. However, even if everything we need is already
in the local bare repository, this still requires access to github. But I want
everything to work even if my internet connection goes down (as long as the
cached bare repository is sufficiently up to date).

I realize you're suggesting we use a local non-bare repository and build there,
but first I want to explore the idea of using a cached bare repository as a
drop-in replacement for github--i.e.,
  wx_git_url="/cache_for_lmi/vcs/wxWidgets.git" install_wx.sh
for all the following experiments. To save time, I locally added 'exit' in
'install_wx.sh' right above
  [ "$wx_skip_clean" = 1 ] || git clean -dfx
so it does all of the cloning (including submodules) but none of the building.

How about '--shared'? IOW:
  git clone --shared "$wx_git_url" ${wx_dir##*/}
I don't think '--shared' is risky at all for our use case: this is a throwaway
clone, used only for building a particular version of wx, one time only. The
disk savings are the same as '--reference' above (which makes sense because
one implies the other):
  410M    /cache_for_lmi/vcs
  2.1M    /tmp/vcs/wxWidgets/.git
  215M    /tmp/vcs/wxWidgets
and it takes only 2.670 s. This seems like a win.

Use local cache, with a shallow clone? Curiously enough, we can't clone a
specific SHA1, but we can tag our cached bare repository--e.g., to get
  e38866d3a603 Merge branch 'lzma'
we do this in the cache directory:
  git tag -a lmi-20180412 e38866d3a603
and then
  git clone --branch lmi-20180412 --depth 1 file:///cache_for_lmi/vcs/wxWidgets
That's no improvement: it takes 6.454 s, more than twice as long as '--shared'
above, and it appears to use more space, too:
  410M    /cache_for_lmi/vcs
  21M     /tmp/vcs/wxWidgets/.git
  215M    /tmp/vcs/wxWidgets
(AIUI, 'du' with multiple directories counts hard links only the first time it
sees them, and doesn't count the same files for subsequent directories, so it
actually does matter that the .git directory is ten times as big; and it seems
natural that '--depth 1' is actually less economical because it means we can't
just hard-link the packed objects in our cache.)

Use '--separate-git-dir'?
  $git init --separate-git-dir /cache_for_lmi/vcs/wxWidgets.git 
/tmp/vcs/wxWidgets
  Reinitialized existing Git repository in /cache_for_lmi/vcs/wxWidgets.git/
    [that's harmless--the message is just informational]
  $cd wxWidgets
  $git checkout .
  $git checkout e38866d3a6
So far so good, but now we have no submodules. Undeterred by online naysayers
who say I'll never be able to get them (and should therefore clone instead), I
run just the 'git module update' commands in 'install_wx.sh', and--I get no
submodules. But that's because they're all "((null))" here:

$git submodule status
 011f6e6458d888246f94643e293f002073cff489 3rdparty/catch ((null))
 6b2e0e680289cdf92839b2a3f8b0735c84dc9326 src/expat ((null))

and presumably "((null))" is the result of running 'git describe' for the SHA1s
shown (though it'd be nicer to have some error indication in the first column).
Then, doing the silliest thing that could possibly work in 'install_msw.sh'
[reformatted for clarity]:

-git submodule status                   | grep '^-' | cut -d' ' -f2 | while 
read -r subpath
+git submodule status | sed -e's/^ /-/' | grep '^-' | cut -d' ' -f2 | while 
read -r subpath

...I do get the submodules. Total time including submodules is 1.564 s, the
best so far, and 'du' reports:
  410M    /cache_for_lmi/vcs
  4.0K    /tmp/vcs/wxWidgets/.git
  107M    /tmp/vcs/wxWidgets
which is also the best so far. Removing my early 'exit' statement, I let it
all build, and...success. The resulting libraries are almost the same size:

/tmp/vcs[0]$ls -o /tmp/opt/lmi/local/lib/*wx*.*                                 
-rw-r--r-- 1 greg 13171364 Apr 13 00:25 
/tmp/opt/lmi/local/lib/libwx_mswu-3.1-i686-w64-mingw32.dll.a
-rwxr-xr-x 1 greg 19408987 Apr 13 00:25 
/tmp/opt/lmi/local/lib/wxmsw312u_gcc_gcc-7.3-win32-e38866d...627348.dll

as the ones I built the other day using 'install_msw.sh':

/tmp/vcs[0]$ls -o /opt/lmi/local/lib/{libwx_,wxmsw}*
-rw-r--r-- 1 greg 13190684 Apr 11 09:10 
/opt/lmi/local/lib/libwx_mswu-3.1-i686-w64-mingw32.dll.a
-rwxr-xr-x 1 greg 19458964 Apr 11 09:10 
/opt/lmi/local/lib/wxmsw312u_gcc_gcc-7.3-win32-e38866d...627348.dll

so they'll presumably work when I test them.

After building, 'du' says:

  412M    /cache_for_lmi/vcs
  4.0K    /tmp/vcs/wxWidgets/.git
  357M    /tmp/vcs/wxWidgets

412+4+357 = 773, almost identical to the 767 reported above for the run
of 'install_msw.sh' a couple days ago. (I've run 'fetch' on my cached
bare repository since creating it; that could easily account for the
one percent difference.)

And it seems that, after much learning, I've come to the right answer
(and now I understand why it's right):

> if the idea of using /srv/cache_for_lmi/vcs/wxWidgets
> (and xxx.git alongside it for the submodules) and building in some other
> directory (/opt/lmi/local/wx-scratch or wherever, it really doesn't matter)
> is acceptable, then this is certainly what we should do.

Yup!

>  I don't know if I should make a patch for this as the changes seem so
> trivial, but please let me know if I should.

No--if I had said 'yes', I'd have learned nothing: it's trivial, but only
in retrospect.

But please tell me if there's a better technique than '--separate-git-dir'.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]