Re: [Gluster-devel] Choice of Translator question

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Choice of Translator question

From:	Kevan Benson
Subject:	Re: [Gluster-devel] Choice of Translator question
Date:	Wed, 26 Dec 2007 14:00:22 -0800
User-agent:	Thunderbird 2.0.0.9 (X11/20071031)

Gareth Bult wrote:

Hi,

Thanks for that, but I'm afraid I'd already read it ... :(

The fundamental problem I have is with the method apparently employed
by self-heal.

Here's what I'm thinking;

Take a 5G database sitting on an AFR with three copies. Normal
operation - three consistent replica's, no problem.

Issue # 1; glusterfsd crashes (or is crashed) on one node. That
replica is immediately out of date as a result of continuous writes
to the DB.

Question # 1; When glusterfsd is restarted on the crashed node, how
does the system know that node is out of date and should not be used
for striped reads?

The trusted.afr.version extended attribute tracks while file version isbeing used, and on a read, all participating AFR members should respondwith this information, and any older/obsoleted file versions arereplaced by a newer copy from one of the valid AFR members (this isself-heal)

My assumption; Because striped reads are per file and as a result,
striping will not be applied to the database, hence there will be no
read advantage obtained by putting the database on the filesystem ..
??

I think they are planning striped reads per block (maybe definable) at alater date.

Question # 2; Apart from closing the database and hence closing the
file, how do we tell the crashed node that it needs to re-mirror the
file?


Read from the the file from a client (head -c1 FILE >/dev/null to force).

Question # 3; Mirroring a 5G file will take "some time" and happens
when you re-open the file. While mirroring, the file is effectively
locked.

Net effect;

a. To recover from a crash the DB needs a restart b. On restart, the
DB is down for the time taken to copy 5G between machines (over a
minute)

From an operational point of view, this doesn't fly .. am I missing
something?

you could use the stripe translator over AFR to AFR chunks of the DBfile, thus allowing per chunk self-heal. I'm not familiar enough withdatabase file writing practices in general (not to mention yourparticular database's practices), or the stripe translator to tellwhether any of the following will cause you problems, but they are worthlooking into:

1) Will the overhead the stripe translator introduces with a very largefile and relatively small chunks cause performance problems? (5G in 1MBstripes = 5000 parts...)2) How will GlusterFS handle a write to a stripe that is currentlyself-healing? Block?3) Does the way the DB writes the DB file cause massive updatesthroughout the file, or does it generally just append and update theindices, or something completely different. It could have an affect onhow well something like this works.

Essentially, using this layout, you are keeping track of which stripeshave changed and only have to sync those particular ones on self-heal.The longer the downtime, the longer self-heal will take, but you canmitigate that problem with a rsync of the stripes between the activeand failed GlusterFS nodes BEFORE starting glusterfsd onthe failed node(make sure to get the extended attributes too).

Also, it appears that I need to restart glusterfsd when I change the
configuration files (i.e. to re-read them) which effectively crashes
the node .. is there a way to re-read a config without crashing the
node? (on the assumption that as above, crashing a node is
effectively "very" expensive...?)

The above setup, if feasible, would mitigate restart cost, to the pointwhere only a few megs might need to be synced on a glusterfs restart.


--

-Kevan Benson
-A-1 Networks

[Prev in Thread]

Current Thread

[Next in Thread]

[Gluster-devel] Choice of Translator question, Gareth Bult, 2007/12/22
- Re: [Gluster-devel] Choice of Translator question, Krishna Srinivas, 2007/12/25
  - Re: [Gluster-devel] Choice of Translator question, Gareth Bult, 2007/12/25
    - Re: [Gluster-devel] Choice of Translator question, Kevan Benson <=
    - Re: [Gluster-devel] Choice of Translator question, Gareth Bult, 2007/12/27
    - Re: [Gluster-devel] Choice of Translator question, Kevan Benson, 2007/12/27
    - Re: [Gluster-devel] Choice of Translator question, Gareth Bult, 2007/12/27
    - Re: [Gluster-devel] Choice of Translator question, Kevan Benson, 2007/12/27
    - Re: [Gluster-devel] Choice of Translator question, Gareth Bult, 2007/12/27
    - [Gluster-devel] Permissions and ownership ..., Gareth Bult, 2007/12/27
    - Re: [Gluster-devel] Permissions and ownership ..., Raghavendra G, 2007/12/27
    - Re: [Gluster-devel] Choice of Translator question, Kevan Benson, 2007/12/27
    - Re: [Gluster-devel] Choice of Translator question, Csibra Gergo, 2007/12/28
    - Re: [Gluster-devel] Choice of Translator question, Gareth Bult, 2007/12/28

Prev by Date: Re: [Gluster-devel] ECC translator
Next by Date: Re: [Gluster-devel] Choice of Translator question
Previous by thread: Re: [Gluster-devel] Choice of Translator question
Next by thread: Re: [Gluster-devel] Choice of Translator question
Index(es):
- Date
- Thread