bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Miscellaneous thoughts & concerns


From: Tim Rühsen
Subject: Re: [Bug-wget] Miscellaneous thoughts & concerns
Date: Sat, 7 Apr 2018 00:01:25 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0

Hi Jeffrey,


thanks for your feedback !


On 06.04.2018 23:30, Jeffrey Fetterman wrote:
> Thanks to the fix that Tim posted on gitlab, I've got wget2 running just
> fine in WSL. Unfortunately it means I don't have TCP Fast Open, but given
> how fast it's downloading a ton of files at once, it seems like it must've
> been only a small gain.
>
>
> I've come across a few annoyances however.
>
> 1. There doesn't seem to be any way to control the size of the download
> queue, which I dislike because I want to download a lot of large files at
> once and I wish it'd just focus on a few at a time, rather than over a
> dozen.
The number of parallel downloads ? --max-threads=n

> 3. Doing a TLS resume will cause a 'Failed to write 305 bytes (32: Broken
> pipe) error to be thrown', seems to be related to how certificate
> verification is handled upon resume, but I was worried at first that the
> WLS problems were rearing their ugly head again.
Likely the WSL issue is also affecting the TLS layer. TLS resume is
considered 'insecure',
thus we have it disabled by default. There still is TLS False Start
enabled by default.


> 3. --no-check-certificate causes significantly more errors about how the
> certificate issuer isn't trusted to be thrown (even though it's not
> supposed to be doing anything related to certificates).
Maybe a bit too verbose - these should be warnings, not errors.

> 4. --force-progress doesn't seem to do anything despite being recognized as
> a valid paramater, using it in conjunction with -nv is no longer beneficial.
You likely want to use --progress=bar. --force-progress is to enable the
progress bar even when redirecting (e.g. to a log file).
@Darshit, we shoudl adjust the behavior to be the same as in Wget1.x.

> 5. The documentation is unclear as to how to disable things that are
> enabled by default. Am I to assume that --robots=off is equivalent to -e
> robots=off?

-e robots=off should still work. We also allow --robots=off or --no-robots.

> 6. The documentation doesn't document being able to use 'M' for chunk-size,
> e.g. --chunk-size=2M

The wget2 documentation has to be brushed up - one of the blockers for
the first release.

>
> 7. The documentation's instructions regarding --progress is all wrong.
I'll take a look the next days.

>
> 8. The http/https proxy options return as unknown options despite being in
> the documentation.
Yeah, the docs... see above. Also, proxy support is currently limited.


> Lastly I'd like someone to look at the command I've come up with and offer
> me critiques (and perhaps help me address some of the remarks above if
> possible).

No need for --continue.
Think about using TLS Session Resumption.
--domains is not needed in your example.

Did you build with http/2 and compression support ?

Regards, Tim
> #!/bin/bash
>
> wget2 \
>       `#WSL compatibility` \
>       --restrict-file-names=windows --no-tcp-fastopen \
>       \
>       `#No certificate checking` \
>       --no-check-certificate \
>       \
>       `#Scrape the whole site` \
>       --continue --mirror --adjust-extension \
>       \
>       `#Local viewing` \
>       --convert-links --backup-converted \
>       \
>       `#Efficient resuming` \
>       --tls-resume --tls-session-file=.\tls.session \
>       \
>       `#Chunk-based downloading` \
>       --chunk-size=2M \
>       \
>       `#Swiper no swiping` \
>       --robots=off --random-wait \
>       \
>       `#Target` \
>       --domains=example.com example.com
>





reply via email to

[Prev in Thread] Current Thread [Next in Thread]