[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Incorrect handling of Cyrillic characters in http request
From: |
Tim Rühsen |
Subject: |
Re: [Bug-wget] Incorrect handling of Cyrillic characters in http request - any workaround? |
Date: |
Tue, 31 Mar 2015 22:50:18 +0200 |
User-agent: |
KMail/4.14.2 (Linux/3.16.0-4-amd64; KDE/4.14.2; x86_64; ; ) |
Hi Steven,
Am Dienstag, 31. März 2015, 18:11:58 schrieb Stephen Wells:
> Dear all - I am currently trying to use wget to obtain mp3 files from the
> Google Translate TTS system. In principle this can be done using:
>
> wget -U Mozilla -O "${string}.mp3" "
> http://translate.google.com/translate_tts?tl=TL&q=${string}"
>
> where TL is a twoletter language code (en,fr,de and so on).
>
> However I am meeting a serious error when I try to send Russian strings
> (tl=ru) in Cyrillic characters. I'm working in a UTF-8 environment (under
> Cygwin) and the file system will display the cyrillic strings no problem.
> If I provide a command like this:
>
> http://translate.google.com/translate_tts?tl=ru&q=мазать
>
> wget incorrectly processes the Cyrillic characters _before_ sending the
> http request, so what it actually requests is:
>
> http://translate.google.com/translate_tts?tl=ru&q=%D0%BC%D0%B0%D0%B7%D0%B0%D
> 1%82%D1%8C
This seems to be the correct behavior of a web client.
The URL in the GET request is transmitted UTF-8 encoded and percent escaping
is performed for chars >127 (not mentioning control chars here).
> This of course produces a string of gibberish in the resulting mp3 file!
This is something different. If you are talking about the file name, well
there is --restrict-file-names=nocontrol. Did you give it a try ?
> Is there any way to make wget actually send the string it is given, instead
> of mangling it on the way out? This is really blocking me.
From what you write, I am unsure if you are talking about the resulting file
name or about HTTP URL encoding in a GET request.
Regards, Tim
signature.asc
Description: This is a digitally signed message part.