bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/6] libfshelp_rlock


From: Samuel Thibault
Subject: Re: [PATCH 1/6] libfshelp_rlock
Date: Tue, 20 Dec 2016 12:34:58 +0100
User-agent: Mutt/1.5.21+34 (58baf7c9f32f) (2010-12-30)

Svante Signell, on Tue 20 Dec 2016 09:50:54 +0100, wrote:
> On Tue, 2016-12-20 at 02:03 +0100, Samuel Thibault wrote:
> > Svante Signell, on Thu 15 Dec 2016 12:31:55 +0100, wrote:
> > > On Thu, 2015-03-05 at 02:34 +0100, Samuel Thibault wrote:
> > > > 
> > No, I'm unsure what we want: do we trust the FS server as to provide
> > the proper l_pid, just like we trust it to return proper file owners?
> > I guess we might not have the choice actually, but I'm perhaps wrong.
> > People, any thoughts on this?  Since the locker process is not involved
> > at all when another process calls GETLK, I don't think there is a way
> > to make the FS server have something to *prove* to the process calling
> > GETLK that a given pid is the locker of a given range?  A way I could
> > see would be to implement a trusted lock-auth server that FS servers
> > talk with, the FS server would get a record-lock-identity token from it,
> > and use an rendez-vous handshake to prove to the process calling GETLK
> > that it has that token.
> 
> Earlier we talked about a proc_{user,server}_identify RPC, solving the pid
> problem here, as well as the locks not being held with forks. And also the pid
> identification for SCM_CREDS. Have you changed your mind?

I haven't changed my mind: one still needs this RPC to make the SETLK
calling process prove of its pid.  But if we don't want to trust the FS
server to provide the correct information, we need another RPC to make
the FS server prove it to the GETLK calling process.

> > Anyway, really that's for later, the implementation for now won't allow
> > the server to return anything meaningful in l_pid, so this is code
> > that will actually never get run for now, so rather than try to write
> > something which might actually reveal to be wrong, let's for now just
> > return a plain ENOSYS for GETLK.
> 
> Why? file_record_lock reports that a lock is taken by another process, but
> cannot give the PID, and reports -1 instead.

Which is not a value that POSIX defines, so we need to be very careful
here, e.g. by checking on http://codesearch.debian.net/ what Debian
application make use of this field, whether they check whether it's -1
as you propose, etc.  Unless this is done, I've a bit wary of returning
-1.

> What potential negative effects do yo see here?

We have already had bad issues with various packages due to corner
behavior of what we implemented. Avoiding this beforehand helps tracking
issues. That's e.g. the reason why I introduced the gsync usage only
step by step, to spot the issues step by step.

What potential positive use do we have?  Again, a codesearch
investigation for GETLK would be useful to answer this.  But again,
that'd yet delay having basic record locking at last (at at last have a
working tdb for smb etc., notably, which doesn't use GETLK).

> Additionally, when there is no process already locking F_GETLK correctly 
> reports
> now when a lock can be taken.

Sure, but as a side effect they get to see an l_pid field. If they
do crazy things about it, who knows what can happen. Just to give an
example, if a program running as root calls kill() on it, and it happesn
to be -1 because we invented that value, it'll kill all the processes in
system... Not really a side effect we want.

> Again, I don't see the need to extend the file_record_lock RPC by another
> argument.

Are you talking about only the per-open case, or also about the
per-process case? In the per-open case it's useless indeed, but in the
per-process case, we don't want to let the FS server believe the SETLK
calling process when it claims its pid. So we need to have a port to let
the calling process prove its pid through the proc_user/server_identify
RPCs.

> > > > - we'll then be able to implement GETLK, and in case a per-open lock was
> > > >   taken, we'll put -1 in l_pid, like BSD does.
> > > 
> > > Please explain!
> > 
> > Please be polite.  Asking it this way just makes me want to stop my mail
> > here...
> 
> Where is the impoliteness here? Is it the exclamation mark?

Yes.

Exclamation mark means ordering, not requesting.

> Sorry, can you give a naming of that token in the RPC definition.

Again, just like io_reauthenticate

> We have now:
> routine file_record_lock (
>        file: file_t;
>        RPT
>        cmd: int;
>        inout flock64: flock_t);

Just append

rendezvous2: mach_port_send_t

> And I assume that calls in the implementations in libdiskfs/etc will use the
> argument NUL?

For now (no pid recording) they'll just check that it's NUL, and if not
return EOPNOTSUPP.

(note that by NUL, I mean MACH_PORT_NULL)

Svante Signell, on Tue 20 Dec 2016 11:36:21 +0100, wrote:
> On Tue, 2016-12-20 at 02:03 +0100, Samuel Thibault wrote:
> > Svante Signell, on Mon 19 Dec 2016 18:25:58 +0100, wrote:
> > > It seems like the open mode for /dev/null does not contain O_READ/O_WRITE
> > > flags
> > 
> > ? It does, see cred->po->openmodes.
> Seems to be a difference between libdiskfs/libnetfs and libtrivfs:

> The variable open_modes I mentioned in that mail is taken from
> libdiskfs/libnetfs:

Ok, now I understand: you were not talking about the open mode for the
per-open of the application, but the about open mode for the underlying
realnode.

> That's what I reported in that mail. Why don't you believe me?

Because here it's not about believing, but understanding.  One can't
answer a question that one doesn't understand.  Here you were talking
about some code (libnetfs/libdiskfs) which /dev/null doesn't even
link against (it links against libtrivfs). The missing point was the
underlying realnode.

So the actual issue is the openmode of the realnode. That happens to
come from trivfs_open, called from trivfs_S_fsys_getroot, which just
io_duplicate (cntl->underlying, &new_realnode);, and 'underlying' comes
from trivfs_startup's call to fsys_startup, where the flags come from
the trivfs_startup parameter, 0 in the case of the null translator. 

I don't know all the potential consequences of adding O_READ|O_WRITE
there, it'd have to be investigated, because once this is done, all the
calls proxied to realnode would need to be proofread. In the file_lock
case, trivfs_S_file_lock would have to check cred->po->openmodes for
itself before calling file_lock on the realnode, for instance. Again, we
could leave that out for a further patch, to go only step by step.

> > > making flock tests as well as fcntl tests fail, when proxied through
> > > libtrivfs/file-lock.c (trivfs_S_file_lock).
> > 
> > Well, do we really want to support locks on /dev/null?  Neal's
> > implementation proxies the support to the realnode when there is one
> > (null doesn't have one),

null does have one actually, as explained above.

> BTW: This /dev/null test came about after your test case
> in Bug#759008: libtdb1.

Which is proposed code that upstream didn't take, so not code seen in
the wild :)

> > > Any advice given here is appreciated. Maybe the special devices, should be
> > > opened with modes similar to regular files?
> > 
> > We don't care about null.  We may want to care about e.g. locking
> > /dev/com0, but I don't think applications actually hope that this is
> > portable at all, so I don't think we want to care, and just get the
> > implementation in for regular files at last.
> 
> OK, let's forget about locking device files then.

For a first step that should be plenty already, yes.

Samuel



reply via email to

[Prev in Thread] Current Thread [Next in Thread]