|
From: | nicolas prochazka |
Subject: | Re: [Gluster-devel] about afr |
Date: | Mon, 2 Feb 2009 21:01:48 +0100 |
hi again,
last test and last log before stop for me :
I do a change, i add option read-subvolume brick_10.98.98.2 in client conf 10.98.98.48
and option read-subvolume brick_10.98.98.1 in client conf 10.98.98.44
run 10.98.98.1 and 10.98.98.2 as server
run 10.98.98.44 and 10.98.98.48 as client
1 - stop 10.98.98.2
10.98.98.48 always run and go read to 10.98.98.1
10.98.98.44 always run , 10.98.98.1
2 - rerun 10.98.98.2 , waiting 5 minutes
3 - stop 10.98.98.1
process 10.98.98.44 / 48 are hanging
I think, client can not re read to 10.98.98.2 , is it normal ? 10.98.98.2 is become ready after crash.
Regards,
NicoOn Mon, Feb 2, 2009 at 2:25 PM, nicolas prochazka <address@hidden> wrote:hello
I always trying to debugging my strange and block problem.
I run client with log but there's a lot and a lot (100 mo ) so i can not send you, just info :
Server 10.98.98.1 and 10.98.98.2
client 10.98.98.44 10.98.98.48
Test : ( all tests is performe with big file ( > 10G ) sometimes the test hangs process, sometimes, big file become corrupte ( there's seem that's some data is lacking )
run all system. : ok
stop : 10.98.98.2 : client seems ok
run 10.98.98.2 : sometime it block
stop 10.98.98.1 : client 10.98.98.44 is blocking : last log is :
2009-02-02 13:53:59 D [io-cache.c:798:ioc_need_prune] io-cache: locked table(0x614320)
2009-02-02 13:53:59 D [io-cache.c:802:ioc_need_prune] io-cache: unlocked table(0x614320)
2009-02-02 13:53:59 D [client-protocol.c:1701:client_readv] brick_10.98.98.2: (2148533016): failed to get remote fd, returning EBADFD
and if i rerun 10.98.98.1 , client run again ( ls works ) and log :
2009-02-02 14:03:18 D [fuse-bridge.c:1945:fuse_statfs] glusterfs-fuse: 40423: STATFS
2009-02-02 14:03:18 D [fuse-bridge.c:1945:fuse_statfs] glusterfs-fuse: 40424: STATFS
2009-02-02 14:03:33 D [fuse-bridge.c:1945:fuse_statfs] glusterfs-fuse: 40425: STATFS
On client 10.98.98.48 , not block.
On Fri, Jan 30, 2009 at 10:14 AM, nicolas prochazka <address@hidden> wrote:Hello,
first thing, thanks a lot for all yours works.
second,
Your tests is ok for me but when i replace echo or tail by opening a file with certains type of program,
as qemu for example, there's a lot of problem. Process hangs, I also try with --disable-direct-io-mode then process do not hang but file seems to be corrupted.
It's very strange problem.
Regards,
Nicolas Prochazka.2009/1/30 Raghavendra G <address@hidden>
nicolas,
I've two servers n1 and n2 which are being afred from client side. I am using the same configuration you finalized on for which you are facing the problem. n1 is the first child of afr.
on n1:
ifconfig eth0 down (eth0 is the interface I am using for communicating with server on n1)
on glusterfs mount:
1. ls (hangs for transport-timeout seconds but completes successfully after timeout)
2. I also had a file opened with tail -f /mnt/glusterfs/file before bringing down eth0 on n1.
3. echo "content" >> /mnt/glusterfs/file, appends to file and I was able to observe the content through tail -f.
on n1:
bring up eth0
on glusterfs mount:
1. ls (completes successfully without any problem).
2. echo "content-2" >> /mnt/glusterfs/file (also appends content-2 to file and shown in the output of tail -f)
From the above tests, it seems the bug is not reproducible in our setup. Is this the similar procedure you followed to reproduce the bug? I am using glusterfs--mainline--3.0--patch-883.
regards,--On Fri, Jan 30, 2009 at 12:05 AM, Anand Avati <address@hidden> wrote:
Raghu/ Krishna,
can you guys look into this? It seems like a serious flaw..
avati
On Thu, Jan 29, 2009 at 7:13 PM, nicolas prochazka
> to be more precise,
> now i can do 'ls /glustermountpoint ' after timeout in all cases, that's
> good
> but, for files which be opened before the crash of first server, that do not
> work, process seems to be block.
>
> Regards,
> Nicolas.
Raghavendra G
[Prev in Thread] | Current Thread | [Next in Thread] |