bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] bad filenames (again)


From: Andries E. Brouwer
Subject: Re: [Bug-wget] bad filenames (again)
Date: Wed, 12 Aug 2015 14:38:15 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Hi Tim,

> Just a few questions.
> 
> 1.
> Why don't you use 'opt.locale' to check if the local encoding is UTF-8 ?

I thought that was usable only if ENABLE_IRI was defined.

> 2. 
> I don't understand how you distinguish between illegal and legal UTF-8 
> sequences. I guess only legal sequences should be unescaped. 
> Or to make it easy: if the string is valid UTF-8, do not escape.
> If it is not valid UTF-8, escape it.
> You could:
> Add unistr/u8-check to bootstrap.conf (./bootstrap thereafter),
> include #include "unistr.h" and use
> if (u8_check (s, strlen(s)) == 0) to test for validity.

Yes, I expected you to say something like this.

My reason: I consider this escaping a very doubtful activity.
In my eyes the correct code is not: always escape except when UTF-8,
but rather: never escape except perhaps when someone asks for it.
So the precise check for UTF-8 is in my eyes just bloat.

Moreover: what to do if the name is not valid UTF-8?
The current escaping produces something that not valid UTF-8.
So doing the current escaping is certainly a mistake, not better
than using the name as-is. Invent a new type of escaping?

So, for the time being, my previous patch avoided the old mistake,
without introducing new mistakes :-).

Andries



reply via email to

[Prev in Thread] Current Thread [Next in Thread]