bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #57413] No data detection, recursive mode won; t work if received H


From: zahlabut
Subject: [bug #57413] No data detection, recursive mode won; t work if received HTTP responses are always comming in gzip compression
Date: Sun, 15 Dec 2019 05:11:45 -0500 (EST)
User-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36

URL:
  <https://savannah.gnu.org/bugs/?57413>

                 Summary: No data detection, recursive mode won;t work if
received HTTP responses are always comming in  gzip compression
                 Project: GNU Wget
            Submitted by: zahlabut
            Submitted on: Sun 15 Dec 2019 10:11:43 AM UTC
                Category: None
                Severity: 3 - Normal
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
         Originator Name: Arkady Shtempler
        Originator Email: 
             Open/Closed: Open
         Discussion Lock: Any
                 Release: 1.14
        Operating System: GNU/Linux
         Reproducibility: Every Time
           Fixed Release: None
         Planned Release: None
              Regression: None
           Work Required: None
          Patch Included: None

    _______________________________________________________

Details:

I'm trying to use WGET to recursively download all content from some Web page,
but in fact only index.html is downloaded, without even being decompressed
(It’s saved as “gzip”)
In my case, the site I’m working with, is simply ignores
“Accept-Encoding” HTTP request header and always responds with “gzip”
content, so there is no way to get something else than “gzip”.
It’s not standard WEB site behavior, but this is what I have to deal with.
I’ve tried adding “Accept-Encoding”: identity or just not existing
compression values like:
“Accept-Encoding”: zahlabut and the result was always the same, which is
“gzip” no matter of sent value.

#################################################################
[ashtempl@ashtempl stam]$ wget -r --no-parent --header
"Accept-Encoding:identity"
https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e9a/692822/8/check/neutron-tempest-plugin-bgpvpn-bagpipe/e9ade00/
-o wget.log
[ashtempl@ashtempl stam]$ 
[ashtempl@ashtempl stam]$ 
[ashtempl@ashtempl stam]$ 
[ashtempl@ashtempl stam]$ 
[ashtempl@ashtempl stam]$ tree
.
├── storage.bhs1.cloud.ovh.net
│   └── v1
│       └── AUTH_dcaab5e32b234d56b626f72581e3644c
│           └── zuul_opendev_logs_e9a
│               └── 692822
│                   └── 8
│                       └── check
│                           └──
neutron-tempest-plugin-bgpvpn-bagpipe
│                               └── e9ade00
│                                   └── index.html
└── wget.log

9 directories, 2 files
[ashtempl@ashtempl stam]$ file
storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e9a/692822/8/check/neutron-tempest-plugin-bgpvpn-bagpipe/e9ade00/index.html

storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e9a/692822/8/check/neutron-tempest-plugin-bgpvpn-bagpipe/e9ade00/index.html:
gzip compressed data, last modified: Sun Dec  8 12:31:40 2019, max
compression
#################################################################

I guess that WGET fails to read received HTML once it’s saved locally as
compressed file and that is why “recursive mode” is actually fails (stoped
being executed).
I think that WGET should have the ability (Data Detection) to try and detect
the type of received content (just like Linux “file” command does).
So once the received data will be detected as “gzip” it will be
uncompressed using “gzip” and will read it’s content in order to
continue with “recursive” mode (getting all URLs e.t.c).


### whet.log file content ###
--2019-12-15 11:51:48-- 
https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e9a/692822/8/check/neutron-tempest-plugin-bgpvpn-bagpipe/e9ade00/
Resolving storage.bhs1.cloud.ovh.net (storage.bhs1.cloud.ovh.net)...
142.44.140.9
Connecting to storage.bhs1.cloud.ovh.net
(storage.bhs1.cloud.ovh.net)|142.44.140.9|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 701 [text/html]
Saving to:
‘storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e9a/692822/8/check/neutron-tempest-plugin-bgpvpn-bagpipe/e9ade00/index.html’

     0K                                                       100% 60.7M=0s

2019-12-15 11:51:49 (60.7 MB/s) -
‘storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e9a/692822/8/check/neutron-tempest-plugin-bgpvpn-bagpipe/e9ade00/index.html’
saved [701/701]

FINISHED --2019-12-15 11:51:49--
Total wall clock time: 0.9s
Downloaded: 1 files, 701 in 0s (60.7 MB/s)




    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?57413>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]