bug-parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNU Parallel Bug Reports [parallel] parallel-20110205 hangs in "make


From: Ido Tal
Subject: Re: GNU Parallel Bug Reports [parallel] parallel-20110205 hangs in "make check" step
Date: Sat, 12 Feb 2011 23:18:26 -0800

Hi Ole,

> In my opinion 'sem -j1 --id myname myprg' should guarantee that only
> one process is running using the filesystem of $HOME. It is to
> guarantee that myprg can do something that would f*ck up if multiple
> myprgs were running at the same time. If we change the behaviour so we
> only guarantee that there will be a single myprg PER MACHINE, then
> that would not do what I would expect as a user.
Yes, I see what you mean. I think you are right, and take my previous comment back: please *do not* implement a per-machine lock (at least not as the default option).

> First of all I would prefer to have this bug fixed instead of making
> some workaround. For that I need help in making something that will
> create a lock that will work across multiple machines over NFS.
Found this at http://en.wikipedia.org/wiki/File_locking
It sounds like you already know this, but if not:

Whether flock locks work on network filesystems, such as NFS, is implementation-dependent. On BSD systems flock calls are successful no-ops. On Linux prior to 2.6.12 flock calls on NFS files would only act locally. Kernel 2.6.12 and above implement flock calls on NFS files using POSIX byte range locks. These locks will be visible to other NFS clients that implement fcntl()/POSIX locks.

> It would be helpful to me if you can provide a setup and an example
> that will fail every time (or at least most times).
I've glanced at your code and it seems like you're using flock for the semaphore files, right? If so, I guess I can't help, since our supercomputers are up-to-date linux. So, from the Wikipedia article, there should be no bug. I'll check anyway, but give me a bit of time...

Ido




On Fri, Feb 11, 2011 at 1:15 AM, Ole Tange <address@hidden> wrote:
On Fri, Feb 11, 2011 at 1:19 AM, Ido Tal <address@hidden> wrote:

> * A user submits many jobs to a queue.
> * The queuing system distributes the jobs between the different computer
> nodes that make up the supercomputer.
> * Each node has its own CPU, but the file system is shared between the
> nodes.
>
> Now, if the jobs sent were making use of parallel with the sem option, then
> it seems the bug would occur. In this case, I think it would make sense for
> each host to have its own file.

Yes. In this situation we may see the bug.

First of all I would prefer to have this bug fixed instead of making
some workaround. For that I need help in making something that will
create a lock that will work across multiple machines over NFS.

In my opinion 'sem -j1 --id myname myprg' should guarantee that only
one process is running using the filesystem of $HOME. It is to
guarantee that myprg can do something that would f*ck up if multiple
myprgs were running at the same time. If we change the behaviour so we
only guarantee that there will be a single myprg PER MACHINE, then
that would not do what I would expect as a user.

That is why I am reluctant to just implement a PER MACHINE lock.

It would be helpful to me if you can provide a setup and an example
that will fail every time (or at least most times).


/Ole


reply via email to

[Prev in Thread] Current Thread [Next in Thread]