Re: [lmi] Problems on corporate server--live stream

lmi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Problems on corporate server--live stream

From:	Greg Chicares
Subject:	Re: [lmi] Problems on corporate server--live stream
Date:	Tue, 13 Oct 2020 19:16:48 +0000
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0

On 2020-10-13 17:13, Vadim Zeitlin wrote:
> On Tue, 13 Oct 2020 16:08:18 +0000 Greg Chicares <gchicares@sbcglobal.net> 
> wrote:
> 
> GC> Vadim--Feel free to jump in if you feel like it, or to ignore this if
> GC> you don't.
> 
>  I'm not sure if I'm going to be of much help, to be honest, as I don't
> know all these shell scripts, but let me try asking some questions just in
> case they allow you to see things from a different angle -- which could, or
> not, be useful.

Thanks, it's always useful to get an outside viewpoint when things
just don't seem to make sense.

> GC> 20201013T003109Z ./lmi_setup_24.sh: Configured users.
> GC> 20201013T003109Z ./lmi_setup_25.sh: Configured {zsh,vim,git} for user 
> 'root'.
> GC> 20201013T003110Z ./lmi_setup_25.sh: Configured {zsh,vim,git} for user 
> '[REDACTED]'.
> GC> 20201013T003111Z ./lmi_setup_25.sh: Configured {zsh,vim,git} for user 
> '[REDACTED2]'.
> GC> E: You are required to change your password immediately (password aged)
> GC> 20201013T003111Z ./lmi_setup_29.sh: Created lmi directories.
> GC> 20201013T003112Z ./lmi_setup_30.sh: Copied optional files.
> GC> 20201013T003137Z ./lmi_setup_40.sh: Configured 'wine' for user 
> '[REDACTED]'.
> 
>  Why is there no line "Configured 'wine' for user 'root'" here? The loop
> seems to be exactly the same as the one used for lmi_setup_25.sh above,
> which outputs 3 lines, but this one somehow only outputs 2. Do you
> understand why?

Yes, the loop deliberately excludes 'root' here, because:

  https://wiki.winehq.org/FAQ#Should_I_run_Wine_as_root.3F
| 6.2 Should I run Wine as root?
| NEVER run Wine as root!
  ^^^^^^^^^^^^^^^^^^^^^^^ <-- bright red, boldface, italic

> GC> 20201013T003156Z ./lmi_setup_40.sh: Configured 'wine' for user 
> '[REDACTED2]'.
> GC> E: You are required to change your password immediately (password aged)
> 
>  Do you know where do these errors come from? They're apparently being
> given from lmi_setup_{29,40}.sh (unless the output is not in order due to
> the use of "| tee /dev/tty"?), but I don't see anything that could explain
> them in these scripts and schroot commands used to run them don't seem to
> differ from the commands used for the other scripts, which don't result in
> errors.

Yes, I know where they come from. The lmi_setup_{25,40}.sh scripts are
run for all normal users in `getent group lmi`, i.e., me, Kim, and
'nemo', but not root. They print a message to /dev/tty just prior to
completion, so they mean:
  [doing something for Greg]: Configured [something] as user 'whoever'
  [doing something for Kim ]: Configured [something] as user 'whoever'
  [doing something for nemo]: E: You are required to change your password 
immediately (password aged)
It just so happens that schroot's "E:" messages also appear on /dev/tty.
If /dev/tty shows a message that I wrote, then a script completed;
if it shows a message that schroot emitted, then the script didn't
even start because the "schroot --user=whoever script=whatever"
command failed. It always fails, now, for 'nemo'.

Thanks to this discussion, I now see that the change I was going to
commit (which would change the logic in order to skip 'nemo') is
undesirable. The solution is to remove 'nemo' from the 'lmi' group.

I'd never done that before. The command is
  # usermode -G "" nemo
where the '""' is mandatory. Before:
  # id -nG nemo
  nemo lmi
After:
  # id -nG nemo
  nemo

I'm not going to try removing the user 'nemo', because I suspect
there's a heated argument between LDAP and /etc/passwd, and I
don't want to pour gasoline on that particular fire today.

> GC> Anyway, it ended there with return code 1. Instead, it was supposed to
> GC> continue and run 'lmi_setup_43.sh' next. Examining the logs:
> GC> 
> GC> 20201013T011805Z ./lmi_setup_42.sh: Installed lmi for '[REDACTED]'.
> GC> E: 10mount: umount: 
> /run/schroot/mount/lmi_bullseye_3-9fc3415f-d9ef-4c67-be6c-0cb6ad47e2cb: 
> target is busy.
> GC> E: 10mount:         (In some cases useful info about processes that use
> GC> E: 10mount:          the device is found by lsof(8) or fuser(1))
> GC> E: 10mount: rmdir: failed to remove 
> '/var/run/schroot/mount/lmi_bullseye_3-9fc3415f-d9ef-4c67-be6c-0cb6ad47e2cb': 
> Device or resource busy
> GC> E: lmi_bullseye_3-9fc3415f-d9ef-4c67-be6c-0cb6ad47e2cb: Chroot setup 
> failed: stage=setup-stop
> 
>  This one seems to be coming from schroot command in lmi_setup_01.sh, but I

In 'lmi_setup_01r.sh' with an 'r', actually, signifying "redhat".

> again have trouble seeing the difference between it and the previous
> command that should have used the same --user option value in the loop
> calling lmi_setup_40.sh just above

Thanks, I had misunderstood that. The invocation order is:
  schroot --chroot=${CHRTNAME} --user="${NORMAL_USER}" --directory=/tmp 
./lmi_setup_42.sh
  schroot --chroot=${CHRTNAME} --user="${NORMAL_USER}" --directory=/tmp 
./lmi_setup_43.sh
  schroot --chroot=${CHRTNAME} --user=nemo             --directory=/tmp 
./lmi_setup_44.sh
and, having learned that 'nemo' is poisoned, I thought that the scripts
for me (I'm $NORMAL_USER) had succeeded, and then one for 'nemo' failed.
But what actually happened is that script #42 ran successfully for me,
and then script #43 failed for me, so script #44 was never even reached
for 'nemo'.

Thus, 'nemo' isn't the only problem. There's another distinct problem
with a Heraclitean flavor: I cannot set foot in the same chroot twice.
That problem may perhaps be manifested in two separate ways, as either
'wine' failures or 'schroot' failures.

> GC> https://wiki.debian.org/Schroot
> GC> | to retrive the PID you can take a piece of the name of the directory,
> GC> | says "d2c072e7" and look for in the /proc filesystem:
> GC> |
> GC> | ~$ grep -r d2c072e7 /proc/*/mountinfo
> GC> 
> GC> I tried that, with '0cb6ad', but it was no help--way too many PIDs are 
> shown.
> 
>  This doesn't seem normal, is it? I.e. why should there be many processes
> using this chroot? In any case, what are they?

Having eradicated it, I can no longer answer. It seemed quite
extraordinary to me.

> GC> These commands:
> GC>   lsof  /var/run/schroot/mount/lmi_bullseye_3-<Tab>
> GC>   fuser /var/run/schroot/mount/lmi_bullseye_3-<Tab>
> GC> return nothing,
> 
>  This is really strange. If you take one of the many PIDs found above and
> use "lsof $PID" on it, don't you see this directory in the output?
> 
>  And, just to be clear, did you run these commands as root? Because you
> definitely need to.

If it happens again, I'll try to investigate more carefully.
I believe I ran 'lsof' and 'fuser' first as my normal user,
and (because that yielded nothing) then as root, but I
didn't save that part of the terminal session. Wait...yes,
I did use root, because the applicable 'lsof' and 'fuser'
commands are in root's shell history.

> GC> So I used 'umount -l', the lazy-but-forcible option, and then
> GC>   $mount | grep cb6ad
> GC> showed nothing.
> 
>  To be honest I've never used "umount --lazy", but its man page seems to
> say that it should really be followed by a reboot, so maybe you should do
> this to avoid some [even more] mysterious problems in the future?

I've never rebooted this server, and I'm not sure whether I'm
actually allowed to, or whether I'd be able to access it afterwards.
I would have used '--force', but some online post suggested '-l',
so that's what I tried first.

> GC> $schroot --chroot=chroot:lmi_bullseye_3 --user=`whoami` --directory=tmp 
> ./lmi_setup_44.sh 2>1 |less -S
> GC> 
> GC> 0022
> GC> 0002
> GC> LMI_TRIPLET = "x86_64-w64-mingw32"
> GC>   Production system built--ready to start GUI test in another session.
> GC> wine: a wine server seems to be running, but I cannot connect to it.
> GC>    You probably need to kill that process (it might be pid 29580).
[...]
>  Searching for "wine server" finds many more results, not all of them
> relevant (although drowning our sorrows in a bottle starts seeming more and
> more appealing, the further I read this message...), but the gist of the
> ones which are seems to be that wine server is always supposed to be
> stopped by wine itself when it exits. So I wonder if you still had some
> wine processes running at this time?

It seems that I must have, though I can't see how that's possible.
I do know that, although I may be unable to enter a chroot, I've
never been trapped inside of one and unable to exit. And, as long
as I make sure 'wine' is installed only in a chroot, exiting the
chroot ought to ensure that any rogue wineserver is killed.

That reasoning may not have applied earlier today, when to my
surprise I found 'wine' installed on the host, but I've removed
that now.

>  Also, have you tried looking at the PID 29580? Is it stuck in the kernel
> on some non-interruptible IO call or something?

There was no PID 29580. I can't remember the exact commands I
tried, because I always have to look them up, but I know I
tried multiple ways, including
  ps -some-forgotten-flag 29580
  ls /proc/29580/*
and they all returned nothing.  

> GC> Repeating the above command yet again, it seems close to succeeding.
> GC> I see no errors obviously attributable to the OS or to 'wine'.
> GC> Somehow, this file exists:
> GC>   /opt/lmi/bin/data/configurable_settings.xml
> GC> instead of the intended one:
> GC>   /opt/lmi/data/configurable_settings.xml
> GC> but at least that appears to be a problem under my control.
> 
>  I have absolutely no idea how could the file end up in a wrong directory,
> but the idea of rebooting really doesn't seem so bad by now. Something
> seems to be seriously wrong here.

First, I think I'll just try recreating the whole universe,
now that 'nemo' has been vaporized.

[Prev in Thread]

Current Thread

[Next in Thread]

[lmi] Problems on corporate server--live stream, Greg Chicares, 2020/10/13
- Re: [lmi] Problems on corporate server--live stream, Vadim Zeitlin, 2020/10/13
  - Re: [lmi] Problems on corporate server--live stream, Greg Chicares <=
    - Re: [lmi] Problems on corporate server--live stream, Greg Chicares, 2020/10/13

Prev by Date: Re: [lmi] Problems on corporate server--live stream
Next by Date: [lmi] GUI on corporate server? [Was: Using Git submodules for the dependencies]
Previous by thread: Re: [lmi] Problems on corporate server--live stream
Next by thread: Re: [lmi] Problems on corporate server--live stream
Index(es):
- Date
- Thread