guix-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug#33600] CDN performance


From: Chris Marusich
Subject: [bug#33600] CDN performance
Date: Thu, 13 Dec 2018 00:05:06 -0800
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)

Ludovic Courtès <address@hidden> writes:

> Regarding the GNU sub-domain, as I replied to Meiyo, I’m in favor of it,
> all we need is someone to champion setting it up.

I could help with this.  Whom should I contact?

>> Regarding CDNs, I definitely think it's worth a try!  Even Debian is
>> using CloudFront (cloudfront.debian.net).  In fact, email correspondence
>> suggests that as of 2013, Amazon may even have been paying for it!
>>
>> https://lists.debian.org/debian-cloud/2013/05/msg00071.html
>
> (Note that debian.net is not Debian, and “there’s no cloud, only other
> people’s computer” as the FSFE puts it.)

I do try to avoid the term "cloud" whenever possible.  It's hard to
avoid when it's in the product name, though!  A wise man once said, "A
cloud in the mind is an obstacle to clear thinking."  ;-)

You may be right about debian.net.  I don't know who owns that domain.
It's confusing, since debian.org is definitely owned by the Debian
project, and the following page says they're using Amazon CloudFront:

https://deb.debian.org/

Maybe Debian still uses Amazon CloudFront, or maybe they don't any more.
In any case, I've found the following email thread, which documents a
thoughtful discussion regarding whether or not Debian should use a CDN.
They discussed many of the same concerns we're discussing here.

https://lists.debian.org/debian-project/2013/10/msg00029.html

A summary, in the middle of the long thread, is here:

https://lists.debian.org/debian-project/2013/10/msg00074.html

Later, part of the thread broke off and continued here:

https://lists.debian.org/debian-project/2014/02/msg00001.html

That's as far as I've read.

Judging by that email thread, one of the reasons why Debian considered
using a CDN was because they felt that the cost, in terms of people
power, of maintaining their own "proto-CDN" infrastructure had grown too
great.  I believe it!  I think it would be ill-advised for the Guix
project to expend effort and capital on building and maintaining its own
CDN.  I think it would be wiser to focus on developing a decentralized
substitute solution (GNUnet, IPFS, etc.).

That said, I still think that today Guix should provide a third-party
CDN option.  For many Guix users, a CDN would improve performance and
availability of substitutes.  Contracting with a third party to provide
the CDN service would require much less effort and capital than building
and maintaining a CDN from scratch.  This would also enable the project
to focus more on building a decentralized substitute solution.  And once
that decentralized solution is ready, it will be easy to just "turn off"
the CDN.

I also understand Hartmut's concerns.  The risks he points out are
valid.  Because of those risks, even if we make a third-party CDN option
available, some people will choose not to use it.  For that reason, we
should not require Guix users to use a third-party CDN, just as we do
not require them to use substitutes from our build farm.

However, not everyone shares the same threat model.  For example,
although some people choose not to trust substitutes from our build
farm, still others do.  The choice is based on one's own individual
situation.  Similarly, if we make a third-party CDN option available and
explain the risks of using it, Guix users will be able to make an
educated decision for themselves about whether or not to use it.

>> Here, it took 0.459667 - 0.254210 = 0.205457 seconds (about 205 ms) to
>> establish the TCP connection after the DNS lookup.  The average
>> throughput was 1924285 bytes per second (about 40 megabits per second,
>> where 1 megabit = 10^6 bits).  It seems my connection to berlin is
>> already pretty good!
>
> Indeed.  The bandwidth problem on berlin is when you’re the first to
> download a nar and it’s not been cached by nginx yet.  In that case, you
> get very low bandwidth (like 10 times less than when the item is cached
> by nginx.)  I’ve looked into it, went as far as strace’ing nginx, but
> couldn’t find the reason of this.
>
> Do you any idea?

I made a typo here.  The value "1924285" should have been "4945831",
which is what measure_get printed.  However, the intended result (40
Mbps) is still correct.

Actually, I thought 40 megabits per second was pretty great for a
single-threaded file transfer that originated in Europe (I think?) and
terminated in Seattle (via my residential Comcast downlink).  I
requested that particular file many times before that final test run, so
it was probably already cached by nginx.

However, I believe you when you say that it's slow the first time you
download the substitute from berlin.  What path does the data take from
its origin through berlin?  If berlin needs to download the initial file
from another server, perhaps the connection between berlin and that
server is the bottleneck?  Maybe we should discuss that in a different
email thread, though.

> I’ve tried this from home (in France, with FTTH):
>
> [...]
>
> speed_download: 20803402.000 B/s

Wow, that's 166 megabits per second!  I'm jealous.  :-)

> Wall-clock time is less than a tenth; woow.

I expect others will see similar performance improvements, as long as
they are close to the edge locations provided by Amazon CloudFront and
their local ISP downlink is fast enough to see a benefit.  Again, the
edge locations are here:

https://aws.amazon.com/cloudfront/features/

Here's a direct link to their current map:

https://d1.awsstatic.com/global-infrastructure/maps/CloudFront%20Network%20Map%2010.12.18.59e838df2f373247d2efaeb548076e084fd8993e.png

-- 
Chris

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]