[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Savannah-hackers-public] gnu ftp mirror download statistics
From: |
Assaf Gordon |
Subject: |
Re: [Savannah-hackers-public] gnu ftp mirror download statistics |
Date: |
Sat, 4 Mar 2017 15:43:35 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 |
Hello,
On 03/03/2017 07:20 AM, Antonio Diaz Diaz wrote:
> Assaf Gordon wrote:
>> Download statistics for gnu's ftp-mirror:
>> https://download.savannah.gnu.org/ftpmirror-stats/
>
> Does this work if the user goes to http://ftpmirror.gnu.org/ , is
> redirected to a mirror, and then navigates the mirror to find the file
> he wants to download? Or it only works if the file is accessed trough a
> full link like http://ftpmirror.gnu.org/ed/ed-1.14.2.tar.lz ?
> [...]
> But the funniest thing is that it can apparently invent files that do no
> exist (ddrescue is released in lz format only):
Actually, it counts exactly one thing:
HTTP 'GET' requests to 'ftpmirror.gnu.org'.
The first subdirectory in the URL is taken as the 'package name'.
It does not track any other mirror usage, nor direct access to
ftp.gnu.org. It also does not verify the file actually exists.
And neither does it separates bots from real people (based on user-agent
or otherwise).
Any request is counted. Over time, I assume incorrect requests will be
drowned-out as noise compared to real requests. Though they can serve as
useful flags to know if some runaway script somewhere on the internet
has a wrong URL in it.
> I have noticed that files can be counted in more than one place.
> ed-1.14.2.tar.lz 1
> [...]
> gnu
> ed/ed-1.14.2.tar.lz 6
Again, this script counts requests, so these are two different request URLs.
The confusion/duplication stems from the fact the for 'ftp.gnu.org',
the "/gnu/" subdirectory is required, e.g.:
http://ftp.gnu.org/gnu/ed/ed-1.14.2.tar.lz #works
http://ftp.gnu.org/ed/ed-1.14.2.tar.lz #does not exist
But mirrors do not always have the '/gnu/' sub directory
(e.g. "http://gnumirror.nkn.in" is one of the mirrors, no "/gnu/" sub
directory).
The correct mirroring URL should not have "/gnu/":
http://ftpmirror.gnu.org/ed/ed-1.14.2.tar.lz
However, I assume a long time ago someone realized
this is too confusing, and added an apache rewrite rule
to also support (and strip) a "/gnu/" subdirectory
for ftpmirror.gnu.org .
We kept this 'rewrite' rule when we moved to the new VMs
recently ( in vcs0:/etc/nginx/sites-available/ftpmirror-common.inc , for
future reference).
So everything 'just works' with and without '/gnu/'.
But this is an indication that somewhere, someone has constructed
their own mirror URL, and it's incorrect.
regards,
- assaf