[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #57413] No data detection, recursive mode won; t work if received H
From: |
zahlabut |
Subject: |
[bug #57413] No data detection, recursive mode won; t work if received HTTP responses are always comming in gzip compression |
Date: |
Sun, 15 Dec 2019 05:11:45 -0500 (EST) |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36 |
URL:
<https://savannah.gnu.org/bugs/?57413>
Summary: No data detection, recursive mode won;t work if
received HTTP responses are always comming in gzip compression
Project: GNU Wget
Submitted by: zahlabut
Submitted on: Sun 15 Dec 2019 10:11:43 AM UTC
Category: None
Severity: 3 - Normal
Priority: 5 - Normal
Status: None
Privacy: Public
Assigned to: None
Originator Name: Arkady Shtempler
Originator Email:
Open/Closed: Open
Discussion Lock: Any
Release: 1.14
Operating System: GNU/Linux
Reproducibility: Every Time
Fixed Release: None
Planned Release: None
Regression: None
Work Required: None
Patch Included: None
_______________________________________________________
Details:
I'm trying to use WGET to recursively download all content from some Web page,
but in fact only index.html is downloaded, without even being decompressed
(It’s saved as “gzip”)
In my case, the site I’m working with, is simply ignores
“Accept-Encoding” HTTP request header and always responds with “gzip”
content, so there is no way to get something else than “gzip”.
It’s not standard WEB site behavior, but this is what I have to deal with.
I’ve tried adding “Accept-Encoding”: identity or just not existing
compression values like:
“Accept-Encoding”: zahlabut and the result was always the same, which is
“gzip” no matter of sent value.
#################################################################
[ashtempl@ashtempl stam]$ wget -r --no-parent --header
"Accept-Encoding:identity"
https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e9a/692822/8/check/neutron-tempest-plugin-bgpvpn-bagpipe/e9ade00/
-o wget.log
[ashtempl@ashtempl stam]$
[ashtempl@ashtempl stam]$
[ashtempl@ashtempl stam]$
[ashtempl@ashtempl stam]$
[ashtempl@ashtempl stam]$ tree
.
├── storage.bhs1.cloud.ovh.net
│ └── v1
│ └── AUTH_dcaab5e32b234d56b626f72581e3644c
│ └── zuul_opendev_logs_e9a
│ └── 692822
│ └── 8
│ └── check
│ └──
neutron-tempest-plugin-bgpvpn-bagpipe
│ └── e9ade00
│ └── index.html
└── wget.log
9 directories, 2 files
[ashtempl@ashtempl stam]$ file
storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e9a/692822/8/check/neutron-tempest-plugin-bgpvpn-bagpipe/e9ade00/index.html
storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e9a/692822/8/check/neutron-tempest-plugin-bgpvpn-bagpipe/e9ade00/index.html:
gzip compressed data, last modified: Sun Dec 8 12:31:40 2019, max
compression
#################################################################
I guess that WGET fails to read received HTML once it’s saved locally as
compressed file and that is why “recursive mode” is actually fails (stoped
being executed).
I think that WGET should have the ability (Data Detection) to try and detect
the type of received content (just like Linux “file” command does).
So once the received data will be detected as “gzip” it will be
uncompressed using “gzip” and will read it’s content in order to
continue with “recursive” mode (getting all URLs e.t.c).
### whet.log file content ###
--2019-12-15 11:51:48--
https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e9a/692822/8/check/neutron-tempest-plugin-bgpvpn-bagpipe/e9ade00/
Resolving storage.bhs1.cloud.ovh.net (storage.bhs1.cloud.ovh.net)...
142.44.140.9
Connecting to storage.bhs1.cloud.ovh.net
(storage.bhs1.cloud.ovh.net)|142.44.140.9|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 701 [text/html]
Saving to:
‘storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e9a/692822/8/check/neutron-tempest-plugin-bgpvpn-bagpipe/e9ade00/index.html’
0K 100% 60.7M=0s
2019-12-15 11:51:49 (60.7 MB/s) -
‘storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e9a/692822/8/check/neutron-tempest-plugin-bgpvpn-bagpipe/e9ade00/index.html’
saved [701/701]
FINISHED --2019-12-15 11:51:49--
Total wall clock time: 0.9s
Downloaded: 1 files, 701 in 0s (60.7 MB/s)
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?57413>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [bug #57413] No data detection, recursive mode won; t work if received HTTP responses are always comming in gzip compression,
zahlabut <=
- [bug #57413] No data detection, recursive mode won; t work if received HTTP responses are always comming in gzip compression, Tim Ruehsen, 2019/12/15
- [bug #57413] No data detection, recursive mode won; t work if received HTTP responses are always comming in gzip compression, zahlabut, 2019/12/15
- [bug #57413] No data detection, recursive mode won; t work if received HTTP responses are always comming in gzip compression, Tim Ruehsen, 2019/12/15
- [bug #57413] No data detection, recursive mode won; t work if received HTTP responses are always comming in gzip compression, zahlabut, 2019/12/16
- [bug #57413] No data detection, recursive mode won; t work if received HTTP responses are always comming in gzip compression, Tim Ruehsen, 2019/12/16
- [bug #57413] No data detection, recursive mode won; t work if received HTTP responses are always comming in gzip compression, zahlabut, 2019/12/16