mldonkey-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Mldonkey-users] [pango 20030105a] new chunks order


From: Goswin Brederlow
Subject: Re: [Mldonkey-users] [pango 20030105a] new chunks order
Date: 07 Jan 2003 12:08:18 +0100
User-agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Military Intelligence)

Pierre Etchemaite <address@hidden> writes:

> Le Sun, 05 Jan 2003 20:16:53 +0100, Sven Hartge <address@hidden>
> a écrit :
> 
> >           "new chunks order": experimental. Try to optimize file
> >           completion
> >                 time using cost-benefit algorithm. The estimations could
> >                 be improved a lot, but the basic algorithm is there.
> >                 To allow for experimenting, the new algorithm is only
> >                 used when random_order_download is enabled.
> > 
> > in the ChangeLog, but this explanation is a bit short.
> > 
> > What are the benefits of this new algorithm?
> 
> I'm not sure yet :)
> 
> They're many discussions about improving file propagation, how not
> downloading in totally random order is bad (including trying to get first
> and last chunks in priority, or trying too hard to get rare chunks), how
> completing chunks quickly is good, etc.

Getting the first and last chunk might not be good for propagation but
it is vital for previewing. With mplayer under linux its vital to get
the beginning of an avi. With windows playern is usually vital to get
the index (which is at the end of the avi) too. Considering all the
files I killed after a few MB download because they where fakes I
think getting first/last chunk first saves a lot.

> I modified DonkeyOneFile.compute_size so I could know the amount of missing
> bytes in each chunk, and better estimate how many clients I should use for
> completing a partial chunk.

I allways wanted to include that information in the bar diagram in the
gui somehow.

> The idea (that's certainly wrong, I'm now almost sure), was that since the
> whole file is completed when all chunks are, the "work per source" should be
> as uniform as possible over all chunks.

If your selfish the work should be distributed so that all chunks
finish at the same time. Calculate an ETA for each chunk and request
the chunk that will take longest next. But don't be selfish.
 
> This is certainly wrong, because we want to complete some chunks as soon as
> possible. And since a chunk is often much smaller than the whole file, 
> only the scheduling of the last remaining chunks will determine when the
> whole file completes (= completing other chunks a bit sooner or a bit later
> doesn't matter).
> 
> Instead, I think we should try to use all the available sources for the
> smallest number of chunks at a time, within some reasonable limit (no need
> to use all clients for the same chunk). Work per source should be as uniform
> as possible over those chunks only.
> 
> I still don't know how hard we should try to get rare chunks when there's an
> opportunity...

For rare files getting rare chunks is vital. For comon files it
probably doesn't matter whether you have 20 or 30 sources for a
chunk. The 1000 sources you are not connected too could completly
reverse the rareness.

I think rareness should realy be calculated over all known chunks and
not just connected chunks.


Overall I think two things should be considered:

1. Complete chunks as fast as possible. Once they are shared you free
up slots on other clients because people will connect to you instead.
You also can compute the md4 and correct errors. and fragmentation of
the file isn't too bad.

2. Request chunks in a way that leaves the maximum number of sources
for the remaining chunks. For example if you have a rare chunk with
one source and 100 sources for other chunks. If you first complete all
other chunks you will have only one source left for the last chunk and
100 competitors. If you grab that rare chunk instead if possible you
still have 100 sources left for the other chunks.


To combine the two I would suggest keeping a count of the overall
rareness (connected and unconnected clients) of chunks and try to get
rare chunks first.  Get as many surces for the rarest chunk up to a
minimum sub chunk size of say 256K, i.e. as long as there is a gap of
>256K in the file where only one source downloads try to split that
into two parts with two sources. Thats how lopster is doing it.

The limit of 256K should be set low to allow many sources per chunk
but high enough that you don't end up asking for very small blocks all
the time and spend more time asking than downloading.


Reconnecting to clients should also consider rareness of chunks.
Clients with rare chunks should be tried more often, clients with
common chunks less often.

MfG
        Goswin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]