gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Choice of Translator question


From: Gareth Bult
Subject: Re: [Gluster-devel] Choice of Translator question
Date: Thu, 27 Dec 2007 23:00:48 +0000 (GMT)

This could be the problem.

When I do this on a 1G file, I have 1 file in each stripe partition of size ~ 
1G.

I don't get (n) files where n=1G/chunk size ... (!)

If I did, I could see how it would work .. but I don't ..

Are you saying I "definitely should" see files broken down into multiple sub 
files, or were you assuming this is how it worked?

Gareth.


----- Original Message -----
From: "Kevan Benson" <address@hidden>
To: "Gareth Bult" <address@hidden>
Cc: "gluster-devel" <address@hidden>
Sent: Thursday, December 27, 2007 8:16:53 PM (GMT) Europe/London
Subject: Re: [Gluster-devel] Choice of Translator question

Gareth Bult wrote:
>> Agreed, which is why I just showed the single file self-heal
>> method, since in your case targeted self heal (maybe before a full
>> filesystem self heal) might be more useful.
> 
> Sorry, I was mixing moans .. on the one hand there's no log hence no
> automatic detection of out of date files (which means you need a
> manual scan), and secondly, doing a full self-heal on a large
> file-system "can" be prohibitively "expensive" ...
> 
> I'm vaguely wondering if it would be possible to have a "log"
> translator that wrote changes to a namespace volume for quick
> recovery following a node restart. (as an option of course)

An interesting thought.  Possibly something that keeps a filename and 
timestamp so other AFR members could connect and request changed file 
AFR versions since X timestamp.

Automatic self-heal is supposed to be on the way, so I suspect they are 
already doing (or planning) something like this.

>> I don't see how the AFR could even be aware the chunks belong to
>> the same file, so how it would know to replicate all the chunks of
>> a file is a bit of a mystery to me.  I will admit I haven't done
>> much with the stripe translator though, so my understanding of it's
>> operation may wrong.
> 
> Mmm, trouble is there's nothing definitive in the documentation
> either way .. I'm wondering whether it's a known critical omission
> which is why it's not been documented (!) At the moment stripe is
> pretty useless without self-heal (i.e. AFR). AFR is pretty useless
> without stripe for anyone with large files. (which I'm guessing is
> why stripe was implemented after all the "stripe is bad"
> documentation) If the the two don't play well and a self-heal on a
> large file means a 1TB network data transfer - this would strike me
> as a show stopper.

I think the original docs said it was implemented because it was easy, 
but there wasn't a whole lot to be gained by using it.  Since then, I've 
seen people post numbers that seemed to indicate it gave a somewhat 
sizable boost, but the extra complexity in introduced never made it 
attractive to me.

The possibility it could be used to greatly speed up self-heal on large 
files seems like a real good reason to use it though, so hopefully we 
can find a way to make it work.

>> Understood.  I'll have to actually try this when I have some time,
>> instead of just doing some armchair theorizing.
> 
> Sure .. I think my tests were "proper" .. although I might try them
> on TLA just to make sure.
> 
> Just thinking logically for a second, for AFR to do chunk level
> self-heal, there must be a chunk level signature store somewhere. ...
> where would this be ?

Well, to AFR each chunk should just look like another file, it shouldn't 
care that it's part of a whole.

I assume the stripe translator uses another extended attribute to tell 
what file it's part of.  Perhaps the AFR translator is stripe aware and 
that's causing the problem?

>> Was this on AFR over stripe or stripe over AFR?
> 
> Logic told me it must be AFR over stipe, but I tries it both ways
> round ..

Let get rid of the over/under terminology (which I always seem to think 
of reverse from other people), and use a representation that's more 
absolute:

client -> XLATOR(stripe) -> XLATOR(AFR) -> diskVol(1..N)

Throw in your network connections wherever you want, but this should be 
testable on a single box with two different directories exported as volumes.

The client writes to the stripe translator, which splits up the large 
file, which is then sent to the AFR translator so each chunk is stored 
redundantly in each disk volume supplied.

If the AFR and stripe are reversed, it will have to pull all stripe 
chunks to do a self heal (unless AFR is stripe aware), which isn't what 
we are aiming for.

Is that similar to what you tested?

-- 

-Kevan Benson
-A-1 Networks





reply via email to

[Prev in Thread] Current Thread [Next in Thread]