[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] [PATCH] Fix possible issues running in a turkish locale
From: |
Tim Ruehsen |
Subject: |
Re: [Bug-wget] [PATCH] Fix possible issues running in a turkish locale |
Date: |
Thu, 20 Nov 2014 10:43:54 +0100 |
User-agent: |
KMail/4.14.2 (Linux/3.16.0-4-amd64; KDE/4.14.2; x86_64; ; ) |
On Thursday 20 November 2014 00:12:08 Ángel González wrote:
> On 18/11/14 17:12, Tim Ruehsen wrote:
> > I amended three tests to fail when run with turkish locale.
> > I fixed these issues (using c_strcasecmp/c_strncasecmp) and also replaced
> > strcasecmp/strncasecmp by c_strcasecmp/c_strncasecmp at places where we
> > definitely want a ASCII comparison instead of a locale dependent one.
> >
> > There are still some places left where we use strcasecmp/strncasecmp, e.g.
> > domain/host and filename comparisons.
> >
> > Please have a look...
> >
> > Tim
>
> I had pretty much coded the same thing when I realized that your patch
> was still unapplied.
>
> I am attaching it here fwiw. I generally changed them on a few more
> places, although I think
> some of your edits to init.c are incorrect, as well as those on
> progress.c: as they are
> user-parameters, they _might_ be introduced in the user locale (they
> would misteriously fail
> when run under C locale in cron, though. I'm not so sure it should be
> supported).
Please be more specific.
Imaging user input --level=INF (or --level=inf) will be compared with "inf".
The turkish people will be used to enter the correct char in this case, namely
'I' or 'i' and not 'İ' or 'ı'. Else most programs would simply break. In this
case a ASCII comparison (c_str...) is absolutely ok. Using locale-aware
comparison would not work (well, the user could try it out since he gets
immediate response by Wget).
> Notwithstanding with keeping parameters in user-locale case, I made the
> accepts list C-case.
> That's the most arguable one, but doesn't seem sensible to change the
> code to support that.
I think this is not correct. The accepts and regexes are filename related.
Filenames are not limited to ASCII. What we have to do here is a normalization
to UTF-8 (using the users locale). Filenames/pathes found in HTML or CSS also
have to be converted to UTF-8 (using the page's locale). These UTF-8 strings
have to be compared with an appropriate function. str(n)casecmp would not be
correct here, a byte-by-byte comparison like c_str(n)casecmp is better but not
perfect. libunistring has functions for that.
I would suggest that I push my patch.
We still have two weeks to inspect the changes... if in doubt, let's set up a
test case. Just give an example of what could go wrong and we can simply try
it out.
Tim
signature.asc
Description: This is a digitally signed message part.
- Re: [Bug-wget] [PATCH] Fix possible issues running in a turkish locale, (continued)
- Re: [Bug-wget] [PATCH] Fix possible issues running in a turkish locale, Tim Ruehsen, 2014/11/20
- Re: [Bug-wget] [PATCH] Fix possible issues running in a turkish locale, Giuseppe Scrivano, 2014/11/20
- Re: [Bug-wget] Removing form feeds from sources, Tim Ruehsen, 2014/11/20
- Re: [Bug-wget] Removing form feeds from sources, Giuseppe Scrivano, 2014/11/20
- Re: [Bug-wget] Removing form feeds from sources, Tim Ruehsen, 2014/11/20
- Re: [Bug-wget] Removing form feeds from sources, Darshit Shah, 2014/11/21
Re: [Bug-wget] [PATCH] Fix possible issues running in a turkish locale, Ángel González, 2014/11/19
- Re: [Bug-wget] [PATCH] Fix possible issues running in a turkish locale,
Tim Ruehsen <=