[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Wget-dev] wget2 | Default Page handling broken by commit 89ff57ee93
From: |
Tim Rühsen |
Subject: |
Re: [Wget-dev] wget2 | Default Page handling broken by commit 89ff57ee937e6c70d3927c00a9dda75f33947238 (#415) |
Date: |
Thu, 13 Dec 2018 10:14:15 +0000 |
** PROBLEM **
During --recursive downloads, we don't generate unique file names (e.g. ending
with .1, .2 etc), at least not when a directory structure is created locally.
This being said, we have a file naming synchronisation issue between server
(GET request) and client (local file naming). The following three GET requests
result in 3 different contents, while we only have a single file name for them:
1. GET /foo
2. GET /foo/
3. GET /foo/index.html
The first one is saved a file `foo`. For the second one we have to create a
directory `foo` to save `index.html` in. So we have to rename the existing file
`foo` to something else (what ?). Then the third GET request would be saved
into `foo/index.html` - which already exists. Which one to rename ? And how to
rename ? The naming should be unambiguous so that two recursive downloads
always generate the same file structure. In other words: the order of the three
downloads should have no influence on the file naming.
** (one possible) SOLUTION **
(GET /foo/index.html) tries to create the directory `foo`. If `foo` already
exists as file: move the file away, create dir `foo`, move the file to
`foo/.directory_noslash` and save/overwrite the response content as
`foo/index.html`.
(GET /foo/) tries to create the directory `foo`. If `foo` already exists as
file: move the file away, create dir `foo`, move the file to
`foo/.directory_noslash` and save/overwrite the response content as
`foo/.directory_slash`.
(GET /foo) tries to save `foo` as file. If `foo` already exists as file:
overwrite it. If `foo` already exists as directory: save/overwrite the response
content as `foo/.directory_noslash`.
This is not 100% Wget1.x compatible but allows precise local linking with `-k`.
And I remember some long-standing issues that would be solved with such an
approach as well.
The names `.directory_slash` and `.directory_noslash` can be made configurable,
just in case there are websites using these names.
--
Reply to this email directly or view it on GitLab:
https://gitlab.com/gnuwget/wget2/issues/415#note_124641970
You're receiving this email because of your account on gitlab.com.
- Re: [Wget-dev] wget2 | Default Page handling broken by commit 89ff57ee937e6c70d3927c00a9dda75f33947238 (#415),
Tim Rühsen <=