[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
wget2 | HTTP Response 0 flooding. (#609)
From: |
marcel dope (@marceldope) |
Subject: |
wget2 | HTTP Response 0 flooding. (#609) |
Date: |
Fri, 29 Jul 2022 17:16:58 +0000 |
marcel dope created an issue: https://gitlab.com/gnuwget/wget2/-/issues/609
Trivia:
wget1 has a bug where it truncates filenames that aren't longer than the 255
char limit imposed by the filesystem. Downloading a file with name 240 char
long truncated it to 236 chars, and downloading the same file, but in a mirror
mode (recursive plus create directories mirroring the whole path) it was
truncated to 207 chars. The latter is because wget1 erroneously counts the path
toward the filename char limit.
It's of utmost importance to me that the mirror I create be size, metadata and
filename equal to the remote copy.
I tried the wget2 command shared next and it doesn't seem to have this bug and
it's magnitude better in every aspect, so thank you for this.
BTW I've did extensive testing of wget2's behavior and found these differences
with the documentation / expected behavior:
- `-R "index.html*"` - this option is ignored, indexes are downloaded anyway
- from the manpage of `--force-progress`: `This option will also force the
progress bar to be printed to stderr when used alongside the --output-file
option.` - this doesn't work, no progress bar
- `--progress=bar` - if specified nothing will be saved with -o output or if
stdout is redirected to a file
- `--stats-all` - not implemented
**The problem:**
I use this command to mirror a website containing 10k small files: `wget2 -rNl
inf -np --no-if-modified-since --retry-connrefused --waitretry=3600
--retry-on-http-error=*,\!404 --https-enforce=hard -R "index.html*"
--fsync-policy=on --random-wait --max-threads=1 -t inf --backups=99 -w 1 URL`.
It works perfectly for a while and then I'm flooded with
```
[0] Checking 'URL' ...
HTTP response 0 [URL]
[0] Downloading 'URL' ...
HTTP response 0 [URL]
```
or
```
[0] Downloading 'URL' ...
HTTP response 0 [URL]
```
The `--stats-site` output is
```
Status ms Size URL
0 0 0 FILE
```
or
```
Status ms Size URL
0 1 0 FILE
```
for each try.
The tries seem to happen very fast (it could very well be 1000 tries per
second) judging by the output and output's filesize increase.
Aborting the wget2 process and restarting it results in HTTP responses 200
again (and eventually HTTP responses 0 again). This suggests to me that wget2
may be flooding the server/cloudflare/my openwrt router with hundreds of
requests each second sabotaging the mirroring process when it could simply wait
a second/minute/hour and it would get a 200 response. I wish the
`--waitretry=3600` and -w 1 I specified would apply here. It's likely that
raising wait time between each try when I get 0 response would fix this,
enabling the mirror to succeed. Unfortunately currently I have to re-parse and
check (size, timestamp) all the 10k files again to continue mirroring. This
highly increases the load on the server.
Questions:
1. Why do I get HTTP Reponse 0 in the first place? What does it mean?
2. Is this a wget2 bug, openwrt misfeature or a real cloudflare/server response?
Note that I don't think I ever got a HTTP Response 0 when testing mirroring the
same website with wget1.
Suggestions:
Make `--retry-connrefused --retry-on-http-error --random-wait` all work for
every request made by wget2, not just the downloads and make the wait time for
all these rise up from `-w` number to the `--waitretry` number. I suggest
`--waitretry` shouldn't increase the value each try by just 1 second, but by
multiplication of that value times 2, fe. 1st try = 1s, 2nd try = 2s, 3rd try =
4s.
wget2 could also postpone retries optionally or by default, ie. if you get a
timeout, wait the retry wait time and try another file only to get back to the
previous one once x time passes or other downloads finish.
--
Reply to this email directly or view it on GitLab:
https://gitlab.com/gnuwget/wget2/-/issues/609
You're receiving this email because of your account on gitlab.com.
- wget2 | HTTP Response 0 flooding. (#609),
marcel dope (@marceldope) <=
- Re: wget2 | HTTP Response 0 flooding. (#609), @rockdaboot, 2022/07/30
- Re: wget2 | HTTP Response 0 flooding. (#609), marcel dope (@marceldope), 2022/07/30
- Re: wget2 | HTTP Response 0 flooding. (#609), @rockdaboot, 2022/07/30
- Re: wget2 | HTTP Response 0 flooding. (#609), @rockdaboot, 2022/07/30
- Re: wget2 | HTTP Response 0 flooding. (#609), @rockdaboot, 2022/07/30
- Re: wget2 | HTTP Response 0 flooding. (#609), marcel dope (@marceldope), 2022/07/31
- Re: wget2 | HTTP Response 0 flooding. (#609), @rockdaboot, 2022/07/31
- Re: wget2 | HTTP Response 0 flooding. (#609), @rockdaboot, 2022/07/31
- Re: wget2 | HTTP Response 0 flooding. (#609), @rockdaboot, 2022/07/31