Re: [Gluster-devel] Too many open files

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Too many open files

From:	Brent A Nelson
Subject:	Re: [Gluster-devel] Too many open files
Date:	Mon, 9 Apr 2007 14:50:14 -0400 (EDT)

It didn't survive the weekend. Same symptom as before (65536 items under/proc/pid/fd for the first glusterfsd, not the second, almost alldirectories). I have restarted glusterfsd and glusterfs on both nodes,just in case I goofed somewhere, but so far it looks like the bug (or asimilar one) still exists.


Thanks,

Brent

On Fri, 6 Apr 2007, Brent A Nelson wrote:

I think you're right; the makefile glitch must have thrown off the rest ofthe compile. A fresh attempt seems stable, and something which waspreviously able to quickly trigger the directory fd bug now runs perfectly.


Looks good!

Thanks,

Brent

On Fri, 6 Apr 2007, Anand Avati wrote:

Brent,
 can you please send me your spec files? because I am able to 'ls'
without any problems and there is no fd leak observed. I have loaded
just cluster/afr, and previously had loaded all performance xlators on
both server and clietn side together and in both the cases things
worked perfectly fine.

I'm guessing the encrytpion makefile issue caused a bad build? (things
were changed in libglusterfs). the makefile is committed now though
(along with the -l fix). please do a make uninstall/clean/install
since quit a chunk of changes have gone in the last few days.

avati

On Fri, Apr 06, 2007 at 03:33:30PM -0400, Brent A Nelson wrote:

glusterfsd dies on both nodes almost immediately (I can ls succesfully
once before it dies, but cd in and they're dead).  The glusterfs processes
are still running, but I of course have "Transport endpoint is not
connected."

Also, glusterfsd and glusterfs no longer seem to know where to log by
default and refuse to start unless I give the -l option on each.

Thanks,

Brent

On Fri, 6 Apr 2007, Anand Avati wrote:

Brent,
the fix has been committed. can you please check if it works for you?

regards,
avati

On Thu, Apr 05, 2007 at 02:09:26AM -0400, Brent A Nelson wrote:

That's correct.  I had commented out unify when narrowing down the mtime

bug (which turned out to be writebehind) and then decided I had noreason

to put it back in for this two-brick filesystem.  It was mounted without
unify when this issue occurred.

Thanks,

Brent

On Wed, 4 Apr 2007, Anand Avati wrote:

Can you confirm that you were NOT using unify int he setup??

regards,
avati


On Thu, Apr 05, 2007 at 01:09:16AM -0400, Brent A Nelson wrote:

Awesome!

Thanks,

Brent

On Wed, 4 Apr 2007, Anand Avati wrote:

Brent,
thank you so much for your efforts of sending the output!

from the log it is clear the leak fd's are all for directories.Indeed

there was an issue with releasedir() call reaching all the nodes. The
fix should be committed today to tla.

Thanks!!

avati



On Wed, Apr 04, 2007 at 09:18:48PM -0400, Brent A Nelson wrote:

I avoided restarting, as this issue would take a while to reproduce.

jupiter01 and jupiter02 are mirrors of each other.  All performance

translators are in use, except for writebehind (due to the mtimebug).


jupiter01:
ls -l /proc/26466/fd |wc
65536  655408 7358168
See attached for ls -l output.

jupiter02:
ls -l /proc/3651/fd |wc
ls -l /proc/3651/fd
total 11
lrwx------ 1 root root 64 2007-04-04 20:43 0 -> /dev/null
lrwx------ 1 root root 64 2007-04-04 20:43 1 -> /dev/null
lrwx------ 1 root root 64 2007-04-04 20:43 10 -> socket:[2565251]
lrwx------ 1 root root 64 2007-04-04 20:43 2 -> /dev/null
l-wx------ 1 root root 64 2007-04-04 20:43 3 ->
/var/log/glusterfs/glusterfsd.log
lrwx------ 1 root root 64 2007-04-04 20:43 4 -> socket:[2255275]
lrwx------ 1 root root 64 2007-04-04 20:43 5 -> socket:[2249710]
lr-x------ 1 root root 64 2007-04-04 20:43 6 -> eventpoll:[2249711]
lrwx------ 1 root root 64 2007-04-04 20:43 7 -> socket:[2255306]
lr-x------ 1 root root 64 2007-04-04 20:43 8 ->
/etc/glusterfs/glusterfs-client.vol
lr-x------ 1 root root 64 2007-04-04 20:43 9 ->
/etc/glusterfs/glusterfs-client.vol

Note that it looks like all those extra directories listed on
jupiter01
were locally rsynched from jupiter01's Lustre filesystems onto the
glusterfs client on jupiter01.  A very large rsync from a different
machine to jupiter02 didn't go nuts.

Thanks,

Brent

On Wed, 4 Apr 2007, Anand Avati wrote:

Brent,
I hope the system is still in the same state to dig some info out.

To verify that it is a file descriptor leak, can you please runthistest. On the server, run ps ax and get the PID of glusterfsd. thendoan ls -l on /proc/<pid>/fd/ and please mail the output of that.That

should give a precise idea of what is happening.
If the system has been reset out of the state, please give us the

spec file you are using and the commands you ran (of some majorjobs

like heavy rsync) so that we will try to reproduce the error in our
setup.

regards,
avati


On Wed, Apr 04, 2007 at 01:12:33PM -0400, Brent A Nelson wrote:

I put a 2-node GlusterFS mirror into use internally yesterday, as
GlusterFS was looking pretty solid, and I rsynced a whole bunch of
stuff

to it. Today, however, an ls on any of the three clients givesme:


ls: /backup: Too many open files

It looks like glusterfsd hit a limit.  Is this a bug
(glusterfs/glusterfsd

forgetting to close files; essentially, a file descriptor leak),or

do I
just need to increase the limit somewhere?

Thanks,

Brent


_______________________________________________
Gluster-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/gluster-devel


--
Shaw's Principle:
   Build a system that even a fool can use,
   and only a fool will want to use it.




--
Shaw's Principle:
    Build a system that even a fool can use,
    and only a fool will want to use it.


--
Shaw's Principle:
     Build a system that even a fool can use,
     and only a fool will want to use it.


--
Shaw's Principle:
      Build a system that even a fool can use,
      and only a fool will want to use it.


--
Shaw's Principle:
       Build a system that even a fool can use,
       and only a fool will want to use it.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Gluster-devel] Too many open files, (continued)
- Re: [Gluster-devel] Too many open files, Anand Avati, 2007/04/04
  - Message not available
    - Re: [Gluster-devel] Too many open files, Anand Avati, 2007/04/05
    - Re: [Gluster-devel] Too many open files, Brent A Nelson, 2007/04/05
    - Re: [Gluster-devel] Too many open files, Anand Avati, 2007/04/05
    - Re: [Gluster-devel] Too many open files, Brent A Nelson, 2007/04/05
    - Re: [Gluster-devel] Too many open files, Anand Avati, 2007/04/06
    - Re: [Gluster-devel] Too many open files, Brent A Nelson, 2007/04/06
    - Re: [Gluster-devel] Too many open files, Brent A Nelson, 2007/04/06
    - Re: [Gluster-devel] Too many open files, Anand Avati, 2007/04/06
    - Re: [Gluster-devel] Too many open files, Brent A Nelson, 2007/04/06
    - Re: [Gluster-devel] Too many open files, Brent A Nelson <=
    - Re: [Gluster-devel] Too many open files, Brent A Nelson, 2007/04/10
    - Re: [Gluster-devel] Too many open files, Krishna Srinivas, 2007/04/11
    - Re: [Gluster-devel] Too many open files, Brent A Nelson, 2007/04/11
    - Re: [Gluster-devel] Too many open files, Anand Avati, 2007/04/12
    - Re: [Gluster-devel] Too many open files, Brent A Nelson, 2007/04/12
    - Re: [Gluster-devel] Too many open files, Brent A Nelson, 2007/04/13
    - Re: [Gluster-devel] Too many open files, Brent A Nelson, 2007/04/13
    - [Gluster-devel] write-behind mtime glitch, Brent A Nelson, 2007/04/17
    - Re: [Gluster-devel] write-behind mtime glitch, Krishna Srinivas, 2007/04/17
    - [Gluster-devel] Re: write-behind mtime glitch, Anand Avati, 2007/04/18

Prev by Date: Re: [Gluster-devel] Questions
Next by Date: Re: [Gluster-devel] Too many open files
Previous by thread: Re: [Gluster-devel] Too many open files
Next by thread: Re: [Gluster-devel] Too many open files
Index(es):
- Date
- Thread