bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#46942: ci.guix.gnu.org is slow from my system


From: raid5atemyhomework
Subject: bug#46942: ci.guix.gnu.org is slow from my system
Date: Mon, 15 Mar 2021 10:14:28 +0000

> Hi Maxime,
>
> > On Mon, 2021-03-15 at 00:13 +0000, raid5atemyhomework via Bug reports for 
> > GNU Guix wrote:
> >
> > > Hello all,
> > > [...]
> > > I recently had to rebuild an OS (because I was dumb; the Guix language
> > > for shepherd services can easily lead you deadlocking shepherd itself)
> > > and had supreme difficulty reinstalling, [...]
> >
> > Reinstalling after a messed up configuration file shouldn't be necessary.
> > At least when using GRUB as bootloader, guix keeps some old (& presumably
> > not broken) system generations around, that can be selected when booting
> > from the bootloader. (I don't recall exactly how the menu is named,
> > maybe ‘Old system generations of $HOSTNAMES?)
>
> Unfortunately I had a long-standing latent bug in my configuration file that 
> triggered on a (persistent on-disk) edge case which would cause the shepherd 
> process to enter an infinite loop (because the shepherd configuration 
> language is Turing-complete enough to allow infinite loops in the first 
> place). All the remaining generations (since I didn't like keeping more than 
> a dozen, and had recently been excessively tweaking the configuration file) 
> had this bug, so I had no way of reverting to an even older generation that 
> predated the bug.


And regardless, this kind of problem shouldn't occur in the first place.

* Instead of running the `start` code in the same process 1 (which is special 
enough that no amount of `kill -s SIGKILL 1` will work even if you manage to 
log into a console), `shepherd` should really run it in a separate process and 
monitor it if it's taking too long and possibly allow the operator to break out 
of it.  Principle of least power and all that...
  * If you want details: there is a shepherd service A that is a requirement of 
shepherd service B, however the daemon launched by A needed to reach a 
particular point in its initialization before B can start talking to it.  B 
itself will fail to start if A has not reached that point in initialization.  
The extra code I added to the `start` of shepherd service A was to wait for 
that point of initialization before A was considered "started".  It turned out 
it was buggy in that if the point was not reached in 1 second it would 
inadvertently enter an incorrect looping logic (ironically, the logic was 
supposed to exit it after 60 seconds, but I got increment/decrement crossed, 
meaning it would always loop as long as you never reached -60 seconds, which 
was impossible....) that ended up being an infinite loop and preventing process 
1 from advancing.  And this point was getting delayed when the process launched 
by A had to do a lot of (important) data on-disk that it needed to process at 
startup, so it was persistent on-disk data that would need > 1 second to 
process, thus ensuring that the buggy code would be entered.
* If this was a new computer it would also be just as screwed during 
installation anyway, you should consider this a fortuitous discovery of a 
latent bug.
  * New users trying out Guix System that happen to get hit by this bug might 
very well decide that Guix is not stable enough for them to commit to using.

Thanks
raid5atemyhomework






reply via email to

[Prev in Thread] Current Thread [Next in Thread]