[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Help: Why wget wall clock time much higher than download

From: Tim Rühsen
Subject: Re: [Bug-wget] Help: Why wget wall clock time much higher than download time?
Date: Fri, 21 Jun 2019 10:29:23 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.1


On 6/21/19 1:36 AM, David Bodin wrote:
> Tim,
> Genuine thanks for your response--and especially for your contribution of
> wget2. I ran into an issue setting it up (on my aws ami) and can't find any
> resources online that address the issue.
> I followed the instructions you provided on how to build
> <https://gitlab.com/gnuwget/wget2/blob/master/README.md> it, but after
> building it and trying "*wget [url]*" , I first ran into  "*Failed to
> connect: Wget has been built without TLS support*," but then found the
> solution <https://github.com/rockdaboot/wget2/issues/201> and fixed it with
> "*sudo yum -y install gnutls-devel*" and confirmed this by running
> "./configure" and checking "SSL/TLS support:    yes", and then rebuilt it
> and tried to use wget again "*wget [url]*", but then ran into:
> TLS False Start requested but Wget built with insufficient GnuTLS version
> WARNING: OCSP is not available in this version of GnuTLS.
> ERROR: The certificate is not trusted.
> ERROR: The certificate doesn't have a known issuer.
> Failed to connect: Certificate error
> But when I try to install/update "*gnutls,*" I'm informed:
> Package gnutls-2.12.23-21.18.amzn1.x86_64 already installed and latest
> version
> so I'm not sure how to proceed as it shows the most up to date package.
> Thanks in advance for any help you can provide.

Here is the answer I gave on Github, meant for readers who just read the ML:
Looks pretty good to me - you are just tiny little step behind. Your
wget2 has TLS support, else you wouldn't see those messages.

TLS False Start is absolutely not needed for functionality. It gives you
just a little speed gain during TLS connection phase, by 1 RTT (that's ~
ping time to the server).

OCSP gives you more security. It checks that the server certificate
hasn't been revoked. Not doing this check doesn't impair functionality.

The errors mostly say that you didn't install the CA certificate(s)
needed to check the server certificate. Here on Debian you have to
install a package named 'ca-certificates'. Not sure what it's named on
CentOS 7. You need this to stay secure and avoid man-in-the-middle
(MITM) attacks.

As a desperate fallback, you can switch off the security checks with
--no-check-certificate. Please try to avoid it !

> I'm going to use wget2, but wanted to briefly follow up on my original
> question with wget to hopefully learn a little more.
> 1.) Thanks for the note on on not using "*--random-wait*" in the future,
> but it made no difference when I ran my command with or without this flag.
> Even with the flag, the download completed in 35s, but with 248 file, if
> the wait was only .5s (for a best case scenario), it should have taken
> around 62s.

Thanks, I'll check --random-wait. Maybe it's not functional any more for
some reason. I guess nobody uses it nowadays.

> 2.) If I ran my wget command with "*--no-clobber*", it would correctly
> download all files the first time, and the second time I ran the same
> command, it would acknowledge it has already downloaded all the files and
> finish almost immediately. I tried to parallelize the downloads by running
> multiple instances of the program (wget --noclobber [url] & wget
> --noclobber [url]), but it didn't download multiple files at the same time.
> I expected the first program to start the download of a file, and the
> second program to see it and to skip to the next file that needed to be
> downloaded, and for the programs to move in parallel downloading all the
> files. Do you know why this behavior happened instead of what I expected?

Wget instances do not communicate with each other. Running more than one
wget (or wget2) instance on the same file / directory will likely end in
havoc. Just don't do it.

Regards, Tim

> On Thu, Jun 20, 2019 at 12:55 AM Tim Rühsen <address@hidden> wrote:
>> On 6/17/19 10:32 PM, David Bodin wrote:
>>> *wget --page-requisites --span-hosts --convert-links --adjust-extension
>>> --execute robots=off --user-agent Mozilla
>>> --random-wait
>> https://www.invisionapp.com/inside-design/essential-steps-designing-empathy/
>>> <
>> https://www.invisionapp.com/inside-design/essential-steps-designing-empathy/
>>> *
>>> This command above provides the following stats:
>>> Total wall clock time: 35s
>>> Downloaded: 248 files, 39M in 4.2s (9.36 MB/s)
>>> This website takes about 5 seconds to download and display all files on a
>>> hard refresh in the browser.
>>> Why is the wall clock time *significantly longer* than the download time
>>> and is there a way to make it faster?
>> First of all, --random-wait waits 0.5 to 1.5 seconds after each
>> downloaded page. Don't use it - there have been times when web servers
>> blocked fast clients, but that shouldn't be the case today.
>> Wget uses just one connection for downloading, no compression by
>> default, no http/2.
>> You can try Wget2 which uses as many parallel connections as you like,
>> uses compression by default and http/2 if possible. Depending on the
>> HTTP server, Wget2 is often 10x faster then Wget just with it's default
>> settings.
>> You find the latest Wget2 tarball at
>> https://gnuwget.gitlab.io/wget2/wget2-latest.tar.gz.
>> Instructions how to build at
>> https://gitlab.com/gnuwget/wget2/blob/master/README.md
>> Regards, Tim

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]