bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#65720: Guile-Git-managed checkouts grow way too much


From: Csepp
Subject: bug#65720: Guile-Git-managed checkouts grow way too much
Date: Mon, 11 Sep 2023 09:06:58 +0200

Simon Tournier <zimon.toutoune@gmail.com> writes:

> Hi,
>
> On Fri, 08 Sep 2023 at 19:09, Ludovic Courtès <ludo@gnu.org> wrote:
>
>>>> It would also be pretty bad for closure size:
>>>>
>>>> --8<---------------cut here---------------start------------->8---
>>>> $ guix size guile-git | tail -1
>>>> total: 106.6 MiB
>>>> $ guix size guile-git git-minimal | tail -1
>>>> total: 169.8 MiB
>>>> --8<---------------cut here---------------end--------------->8---
>>>>
>>>> It’s also not clear concretely how we’d add that dependency.  Try
>>>> invoking ‘git’ from $PATH and print a warning if it doesn’t work?
>>>> But then, what about applications like Cuirass and hpcguix-web?
>>>
>>> I think we can rely on something like,
>>>
>>>     guix shell -C git-minimal -- git gc
>>
>> We’re talking about the implementation of a cache (meant to speed up
>> operations), that would actually fill said cache plus do a whole bunch
>> of expensive operations?  Nah.  :-)
>
> I do not think.  If I understand correctly, we need to run “git gc” at
> some point, therefore git-minimal needs to me around.  The question is
> how and when.
>
> Well, maybe I am missing what the bug is about.  For me, it is about
> running ‘git gc’ for cleaning the Git checkout cache, no?
>
>
> Solution #1.  Add git-minimal as inputs.  It increases the closure and
> the extra load (on average) is about the ratio between the rate of “guix
> pull” and the rate of the git-minimal changes.
>
> Assuming, that people are running “guix pull” once per week and say “git
> gc” is run after 50 pulls.  (These both number are totally arbitrary and
> based on my personal estimate).
>
> Data Service [1] tells:
>
>         2023-07-07 15:45:22 2023-09-08 21:22:08
>         2023-05-11 16:10:48 2023-07-07 14:21:45
>         2023-05-01 16:40:08 2023-05-11 14:36:16
>         2023-04-25 13:34:54 2023-05-01 15:19:55
>         2023-04-25 13:34:54 2023-09-08 21:22:08        
>         2023-03-06 17:22:28 2023-04-25 12:27:33
>         2023-01-17 23:49:19 2023-03-06 16:48:43
>         2022-11-08 13:06:42 2023-01-17 15:11:47
>         2022-10-08 05:14:46 2022-11-08 09:56:31
>         2022-09-06 15:00:08 2022-10-08 04:15:43
>         2022-08-13 22:02:31 2022-09-06 12:58:52
>
>
> It means that an user will download ~10 times git-minimal for nothing.
>
>
> Solution #2.  The one I am proposing. :-)  Download git-minimal only
> when Guix needs it for running “git gc”.  Yeah, there is probably a
> small overload with some operations.  But, I bet this overload is much
> smaller than the one of solution #1.
>
> Well, it depends on the number of times people are updating the cache vs
> the rate of change of git-minimal.
>
> For sure, if one updates 100 times per week the cache, having
> git-minimal as inputs is far better.  But I do not think that the
> regular usage on average. :-)
>
> That’s why I am proposing to have an option for turning off this “git
> gc“ operation.
>
> Well, we have lived since years without running ‘git gc’ so running it
> once per year on average is probably enough to keep the cache size
> reasonable.  And git-minimal is changing every month.
>
>
> Maybe, there is some solution #3. ;-)
>
> Cheers,
> simon
>
>
> 1: 
> https://data.guix.gnu.org/repository/1/branch/master/package/git-minimal/output-history

Please don't create another situation like with guix system roll-back,
where a crucial sysadmin operation doesn't work without network access.
Or at least make it configurable, so things that are likely to be needed
for future operations are pre-fetched.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]