[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: wget: unable to resolve host address
From: |
Tim Rühsen |
Subject: |
Re: wget: unable to resolve host address |
Date: |
Fri, 18 Feb 2022 13:35:19 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.1 |
On 16.02.22 21:04, Seymour J Metz wrote:
Given that RFCs 3490-3492 came out in 2003 and 5890-5895 came out in 2010, I
would have expected IDNA support by now. Does anybody know for sure?
This issue has nothing to do with IDN support.
It is about the fact that the input file uses a charset that is not
compatible with UTF-8 or ASCII, namely UTF-16 [1].
UTF-16 uses 2 or 4 bytes per character, so it needs to be converted into
UTF-8 before wget can read it. Also, that file uses a BOM (byte order
mark), which needs to be processed.
This does the job:
iconv -f utf-16 -t utf-8 /tmp/url-list.txt > url-list-utf8.txt
Just a small glimpse over to Wget2 :-)
Wget2 understands `--input-encoding=utf-16`, BUT it currently doesn't
handle the BOM. This is easy to implement as the code already exists to
deal with HTML files encoded as UTF-16 with or without BOM.
I created https://gitlab.com/gnuwget/wget2/-/issues/586 for this.
Regards, Tim
[1] https://en.wikipedia.org/wiki/UTF-16
[2] https://en.wikipedia.org/wiki/Byte_order_mark
________________________________________
From: Bug-wget <bug-wget-bounces+smetz3=gmu.edu@gnu.org> on behalf of
pythonomorpha@gmail.com <pythonomorpha@gmail.com>
Sent: Tuesday, February 8, 2022 1:26 PM
To: bug-wget@gnu.org
Subject: wget: unable to resolve host address
Hello,
I am trying to download from a list of files (jpeg images). The website
utilizes Cyrillic in its URL. I get the following error message: wget:
unable to resolve host address 'xn--h-xubc'
I've checked the links manually and the do work.
I am enclosing a shortened version of the file list.
I've tried different commands to no avail:
wget.exe -i C:\dl_files\url-list.txt --secure-protocol=auto
--remote-encoding=Windows-1251 -nc -c -P C:\dl_files\
I've used Windows-1251 as I did not see a list of encoding names in the
manual
https://secure-web.cisco.com/1ooTZPy8h-fBRcp0Zjk_hT6tQbv4w0wsk879mz0uB6aG15KQwcB5um7xiytswPhvpEx2CdU9QntWH_SPxAnAAG2ARAaxmvTXfptU_z__MN1SAGF4Sez144I6e5o6wRDx_cSKPXoTDNyplauirv54vbnDS5kLuXXsirRhFl1o3guYaHHwaf3LYbyLEOP1sfTL44_bLjOocvGciGnBwA68K2ME4JREkRcBuegw_-t6YfWN3v9vCCIziBr8G5DQ-u2wZVCytrHEb423jdgKX3xtQJQrfCnNBUT243xpqVx57lS8cbrgaBTxvUOBIKj0Se4FctlqI9ZanNX4VKAbM5laWTi54FjwlpdEqS5p2a-_mHFAGnfVznDud3Ng47NLEw8LBwKlZSNA26ms9KzvmbbG0zDq3PF5CE_nwWxjc01-0kGa2qeRISiPFM58HpVsAG3Pt/https%3A%2F%2Fwww.gnu.org%2Fsoftware%2Fwget%2Fmanual%2Fwget.html%23Wgetrc-Commands
wget.exe -i C:\dl_files\url-list.txt --secure-protocol=auto -nc -c -P
C:\dl_files\
Apparently the problem is caused by Cyrillic characters. I have inkling that
I am not using the correct options for the program.
I would appreciate if you gave me a hint on how to solve the problem.
Regards,
Max
OpenPGP_signature
Description: OpenPGP digital signature