savannah-hackers-public
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Savannah-hackers-public] Re: Savannah Mirror statistics


From: Darrin Khan
Subject: [Savannah-hackers-public] Re: Savannah Mirror statistics
Date: Sat, 18 Apr 2009 22:03:09 +1000
User-agent: Thunderbird 2.0.0.21 (Windows/20090302)

Hey All,

Alex Fernandez wrote:
>> This will make us miss some downloads.  I don't know if SF handles
>> this case, but when a user saves the mirror link somewhere, mail it to
>> friends, put it in a script, etc, then people will directly hit the
>> mirror without passing through the SF mirror selection. So the most
>> accurate source of information is the mirror's log.
> 
> That is why the redirect needs to be done _after_ the click. The
> directory in /releases should point to a file on /releases, and this
> final link is what should be redirected to the mirror. If a user
> copies the link they aren't even aware that they will be redirected
> somewhere.

A note on the copying URL bit too, any solution that is used, should
ensure that the URL does not include any GET arguments (ie. no & in
URL). Issue is if you cut and paste the URL to a cli and use wget to
download it, you end up with incorrect downloads and a number of jobs
that return command not found, this is if you use wget with non quoted
arguments. So a redirected URL like

http://savannah.nongnu.org/files/?group=xtogen

should be ok,

http://savannah.nongnu.org/files/?group=cygbuild&file=cygbuild-20071001.1731.tar.gz

would cause issues,

maybe some format like

http://savannah.nongnu.org/dl/<group>/<filename>

obviously, the dl is a script that handles the smarts for the
download/click.

> 
> I believe that is how Google search pages work right now (it wasn't
> always this way): the search page links back to Google, and this
> Google link redirects to the actual link.

Yes, I guess they are checking to see if it was clicked on, not
necessarily that the page/files was there or not..

> 
> There is a related problem with mirror updates -- it can take up to a
> day for new files to appear in mirrors, so the redirect can point to a
> missing file. I have seen this happen on Sourceforge too, but the
> window is probably much shorter. I don't know how to solve this one
> easily. We could always check periodically, and only redirect to the
> mirror if the file is actually there; otherwise send to
> releases-noredirect. This method is more robust for users, but it
> makes the setup much more fragile.

An option could be to have the redirect script do a HEAD request once
every 24hrs (only if the file is requested, not a regular check) to see
if the file is available on the mirror.Flag the time it checked along
with status and not check again for the next 24hrs. Would allow for fast
redirects after the first download every 24hrs. This may work for
popular files, however will add latency for the less popular files.

Something like the following order of processes.

1) Link to download file clicked
2) Select mirror
3) Does file exist on mirror (HEAD request)
   - No, then record time and status, use master or another known good
mirror
   - Yes, record time and status, redirect to mirror
4) record click as a download


(Back to getting copies of the logs from everyone)
Are all the mirrors using the same log format ?

Just some thoughts,
Darrin

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]