On 25-12-2020 18:42, Tim Rühsen wrote:
Hello Franz,
tried with wget 1.20.3 and these both command work:
#1 Do not download smc/artworks/ directory:
wget -d -4 --mirror -nH -np --retr-symlinks=no --passive-ftp
--no-verbose --cut-dirs=1 ftp://mirror.netcologne.de/savannah/smc/
--reject-regex=".*(/artworks/.*)"
#2 Do not download .bz2 and .rpm files
wget -d -4 --mirror -nH -np --retr-symlinks=no --passive-ftp
--no-verbose --cut-dirs=1 ftp://mirror.netcologne.de/savannah/smc/
--reject-regex=".*(\.bz2|\.rpm)$"
(--regex-type=posix is default)
(the order of URL and options doesn't matter)
Regards, Tim
On 23.12.20 13:48, Frans de Boer wrote:
LS,
I found that wget 1.20 and later do support some basic regular
expressions. I had good results with --accept=-regex but the reject
part is more troublesome. I can't use ERE's since only BRE's is
supported with the notion that the whole URL should be included.
I use wget to mirror some sites, but I do not want certain sub
directories included in the download. You can think of sub
directories named rpm, debug, temp etc.
Example:
wget -4 --mirror -nH -np --retr-symlinks=no --passive-ftp
--no-verbose --cut-dirs=1 --regex-type posix --reject-regex
"ftp\:\/\/mirror\.netcologne\.de\/savannah\/smc\/Screensaver\/" -P
./debugdir/nongnu ftp://mirror.netcologne.de/savannah/smc/
I tried this example with or without partial backslashes, but none is
working. I tried this also with a single file, to no avail too. I
understand that one can added multiple reject statements but would
rather use the ERE .*(dir1|dir2|dir3|...|dirx|(..ERE..)), but that is
rather cumbersome when I have to specify them by hand. I do have
already a ERE string ready and would like to use that instead.
Breaking down this string again into multiple reject statement might
also not work if I can't even reject one file or sub directory.
Is there a way to accomplish above without having to resort to loops
and sed as the filtering tool?
Regards, Frans
Hello Tim,
Alas, using wget version 1.20.3 under openSUSE 15.2 the line with
excluding the artworks directory is not working. The whole artworks sub
directory is loaded. To be sure, I also copied your line exactly to see
if that makes a different. By the way, I tried this also under openSUSE
Tumbleweed. The -d option does not indicate anything about the used regex.
The strange thing is that when I use a similar approach for python, I am
able to use the following arguments to the reject statement:
".*/(amd64|binaries|Debug|debug|deleted|OLD|old|Patches|patches|prev|previous|rpm|RPM|rpms|RPMS|temp|tmp|w32
|win32|.*(rc|RC|a|b|p)[[:digit:]]{1}.*)/.*" - my universal string for
all other projects too.
With this I have to add that I also use an --accept-regex for python and
no such addition for nongnu.
So, I wonder why it seems to work on your side and not at my side.
--- Frans