|
From: | Brent A Nelson |
Subject: | [Gluster-devel] Re: more bugs (was Re: io-threads...) |
Date: | Mon, 30 Apr 2007 18:55:04 -0400 (EDT) |
I was wondering if you could describe patch-134 a little? I was curious as to whether or not it could be related to the stat-prefetch or the NFS reexport issues.
Well, my current setup appears to be stable! The setup includes storage/posix, features/posix-locks, and protocol/server on the servers. The clients use protocol/client, cluster/afr, cluster/unify with alu, performance/write-behind, and performance/read-ahead. Note that I haven't actually tested locking (it's loaded, but probably hasn't been used in any of my testing).
I'd like to have stat-prefetch and io-threads, too, but they aren't critical. If NFS reexport was stable, though, I would be able to go ahead and easily migrate a lot of things we use internally over to GlusterFS, as an eat-your-own-dogfood prelude to migrating everything of consequence over to it...
Thanks, Brent On Sun, 29 Apr 2007, Anand Avati wrote:
Brent, if you are using the latest TLA, then it is expected if you have aggregate-size > 0 and if file is a multiple of 4096 (no necessarily even multiple). having aggregate-size = 0 and no io-threads should not produce the mtim glitch at all. avati On Sat, Apr 28, 2007 at 06:18:27PM -0400, Brent A Nelson wrote:Adding to the list, there is still an mtime bug when using write-behind even without io-threads. It occurs on (big clue!) files with sizes evenly divisable by 4096 in my AFR/unified setup: -rw-r--r-- 1 root root 4096 2007-04-28 16:01 /scratch/usr/src/linux-headers-2.6.15-28/include/net/genetlink.h -rwxr-xr-x 1 root root 12288 2007-04-28 15:54 /scratch/usr/bin/last -rwxr-xr-x 1 root root 12288 2007-04-28 15:54 /scratch/usr/bin/aseqnet -rw-r--r-- 1 root root 528384 2007-04-28 15:54 /scratch/usr/share/command-not-found/programs.d/all-universe.db -rw-r--r-- 1 root root 12288 2007-04-28 15:54 /scratch/usr/share/command-not-found/programs.d/all-multiverse.db -rw-r--r-- 1 root root 12288 2007-04-28 15:54 /scratch/usr/share/command-not-found/programs.d/i386-restricted.db -rw-r--r-- 1 root root 135168 2007-04-28 15:54 /scratch/usr/share/command-not-found/programs.d/i386-multiverse.db -rw-r--r-- 1 root root 12288 2007-04-28 15:54 /scratch/usr/share/command-not-found/programs.d/all-restricted.db -rw-r--r-- 1 root root 135168 2007-04-28 15:54 /scratch/usr/share/command-not-found/programs.d/all-main.db -rw-r--r-- 1 root root 4198400 2007-04-28 15:54 /scratch/usr/share/command-not-found/programs.d/i386-universe.db -rw-r--r-- 1 root root 1052672 2007-04-28 15:54 /scratch/usr/share/command-not-found/programs.d/i386-main.db -rw-r--r-- 1 root root 65536 2007-04-28 15:53 /scratch/usr/share/samba/valid.dat -rw-r--r-- 1 root root 61440 2007-04-28 15:57 /scratch/usr/lib/python2.5/distutils/command/wininst-7.1.exe -rw-r--r-- 1 root root 61440 2007-04-28 15:57 /scratch/usr/lib/python2.5/distutils/command/wininst-6.exe -rw-r--r-- 1 root root 61440 2007-04-28 15:55 /scratch/usr/lib/python2.4/distutils/command/wininst-7.1.exe -rw-r--r-- 1 root root 61440 2007-04-28 15:55 /scratch/usr/lib/python2.4/distutils/command/wininst-6.exe -rw-r--r-- 1 root root 4096 2007-04-28 15:57 /scratch/usr/lib/gettext/msgfmt.net.exe -rw-r--r-- 1 root root 8192 2007-04-28 15:57 /scratch/usr/lib/GNU.Gettext.dll -rw-r--r-- 1 root root 143360 2007-04-28 15:56 /scratch/usr/lib/libgc.so.1.0.2 These files have wrong mtimes on both nodes in the AFR, not just one or the other. It resulted from a simple "cp -a" of my /usr directory. Thanks, Brent On Fri, 27 Apr 2007, Brent A Nelson wrote:A couple of more bugs observed today: 1) stat-prefetch still causes glusterfs to die on occasion. I can reproduce this with a bunch of clients doing a du of a complex directory structure; out of 8 clients du'ing simultaneously, one or two will die before the du finishes (glusterfs dies). This is probably the same thing I've reported before about stat-prefetch, but I was hoping io-threads might have been responsible (it wasn't). 2) NFS reexport is somehow triggering a really rapid memory leak/consumption in glusterfsd, causing it to quickly die. On the NFS client, I did a du of an Ubuntu Edgy mirror, which worked fine. Then I did multiple cp -a's of a simple 30MB directory, which causes rapid memory consumption on the glusterfsd of node1 of an AFR. It soon dies (before it does the sixth copy), along with the NFS-exported glusterfs client (running on node1, as well). This occurs in a simple mirror with storage/posix and protocol/server on the server and protocol/client, cluster/afr, performance/read-ahead, and performance/write-behind on the client. Thanks, Brent On Fri, 27 Apr 2007, Brent A Nelson wrote:Hmm, it looks like io-threads is responsible for more than just mtime glitches when used with write-behind. I just found that the problems I had with NFS re-export go away when I get rid of io-threads (plus, now that I can enable write-behind, the NFS write performance is far better, by at least a factor of 5)! It looks like I'll be switching off io-threads for now, and turning on all the other performance enhancements. Thanks, Brent On Fri, 27 Apr 2007, Brent A Nelson wrote:On Thu, 26 Apr 2007, Anand Avati wrote:Brent, I understand what is happening. It is because I/O threads lets the mtime overtake the write call. I assume you have loaded io-threads on server side (or below write-behind on client side).Yes, I have io-threads loaded on the server. This occurs when I load write-behind on the client.I could provide you a temporary 'ugly' fix just for you if the issue is critical (until the proper framework comes in 1.4)It would be worthwhile if the temporary fix is acceptable for the 1.3 release (otherwise, you'll need a warning included with the release, so that people enabling io-threads and write-behind know what to expect), but don't waste your time if it's just for me. Push on to 1.4 and the real fix; I'll just leave write-behind disabled for now. Many Thanks, Brent-- ultimate_answer_t deep_thought (void) { sleep (years2secs (7500000)); return 42; }
[Prev in Thread] | Current Thread | [Next in Thread] |