bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] "date" malfunctions in the Turkish locale


From: Vefa Bicakci
Subject: [PATCH] "date" malfunctions in the Turkish locale
Date: Thu, 31 Jul 2008 20:44:34 +0300
User-agent: Mozilla-Thunderbird 2.0.0.16 (X11/20080724)

Hello,

As you can guess from the subject line, the date program
distributed with coreutils malfunctions in the Turkish locale.
To be specific, it only malfunctions when it tries to process
English day or month names containing the letter "i".

There are four "i"s in Turkish, and the capitalization rules
are different compared to the capitalization of the English
"i". In Turkish,  "I" is not the uppercase letter corresponding
to "i"; "İ" (idotabove) is, and "i" is not the lowercase
letter corresponding to "I"; "ı" (idotless) is. When date
encounters the word "Fri" (short for "Friday"), it tries
to convert it to uppercase using the Turkish capitalization
rules, and because of this it concludes that it is an invalid
day name. You can read more about the Turkish "i"s and their
capitalization rules in:

http://www.i18nguy.com/unicode/turkish-i18n.html

(The relevant part of the page has the title:
"Why Applications Fail With The Turkish Language".)

Here is a short command line conversation which clearly
demonstrates the problem:

===
$ LANG=tr_TR.UTF-8 date -d "Fri"   ### Malfunction in action!
date: invalid date `Fri'

$ LANG=tr_TR.UTF-8 date -d "FRI"   ### "I" works - already uppercase
Cum Haz 27 00:00:00 EEST 2008

$ LANG=en_US.UTF-8 date -d "Fri"   ### English locale is okay
Fri Jun 27 00:00:00 EEST 2008

$ LANG=en_US.UTF-8 date -d "FRI"
Fri Jun 27 00:00:00 EEST 2008
===

The reason of this malfunction can be seen by looking at the
following lines of code in the "lookup_word" function in
"lib/getdate.c":

===
2688
2689    /* Make it uppercase.  */
2690    for (p = word; *p; p++)
2691      {
2692        unsigned char ch = *p;
2693        *p = toupper (ch);
2694      }
2695
===

As you can see, even though the "toupper()" function is to
process English day/month names, it is not called in
the English locale, but the user's locale. And because the
relationship between "i" and "I" is different in Turkish,
when date converts "Fri" to uppercase according to Turkish
capitalization rules, it ends up with something other than
"FRI". (***) And later on in the same function, because the
resulting string does not match "FRI", date concludes that
"Fri" is not a valid day name in the Turkish locale.

The following patch fixes this problem by making sure that the
"toupper()" function is called in the "C" locale in the relevant
part of the coreutils code for date.

Regards,

M. Vefa Bıçakcı

===

(***): To be specific, in tr_TR.UTF-8, it ends up with "FRi".
This is because "toupper()" and "tolower()" functions do not
return UTF-8 characters which are needed to represent letters
such as "idotabove" and "idotless". Please note that this
problem exists in the non-unicode Turkish locale too, as the
"toupper()" and "tolower()" functions return the corresponding
8-bit characters in ISO-8859-9 encoding in tr_TR.ISO-8859-9.

===

diff -ur coreutils-6.12.orig/lib/getdate.c coreutils-6.12/lib/getdate.c
--- coreutils-6.12.orig/lib/getdate.c   2008-06-26 21:06:06.000000000 +0300
+++ coreutils-6.12/lib/getdate.c        2008-06-26 21:06:55.000000000 +0300
@@ -181,6 +181,8 @@
 #include <stdlib.h>
 #include <string.h>

+#include <locale.h>
+
 #include "xalloc.h"


@@ -2686,12 +2688,14 @@
   bool period_found;
   bool abbrev;

+  setlocale(LC_ALL, "C");
   /* Make it uppercase.  */
   for (p = word; *p; p++)
     {
       unsigned char ch = *p;
       *p = toupper (ch);
     }
+  setlocale(LC_ALL, "");

   for (tp = meridian_table; tp->name; tp++)
     if (strcmp (word, tp->name) == 0)
diff -ur coreutils-6.12.orig/lib/getdate.y coreutils-6.12/lib/getdate.y
--- coreutils-6.12.orig/lib/getdate.y   2008-06-26 21:06:06.000000000 +0300
+++ coreutils-6.12/lib/getdate.y        2008-06-26 21:06:55.000000000 +0300
@@ -66,6 +66,8 @@
 #include <stdlib.h>
 #include <string.h>

+#include <locale.h>
+
 #include "xalloc.h"


@@ -915,12 +917,14 @@
   bool period_found;
   bool abbrev;

+  setlocale(LC_ALL, "C");
   /* Make it uppercase.  */
   for (p = word; *p; p++)
     {
       unsigned char ch = *p;
       *p = toupper (ch);
     }
+  setlocale(LC_ALL, "");

   for (tp = meridian_table; tp->name; tp++)
     if (strcmp (word, tp->name) == 0)



diff -ur coreutils-6.12.orig/lib/getdate.c coreutils-6.12/lib/getdate.c
--- coreutils-6.12.orig/lib/getdate.c   2008-06-26 21:06:06.000000000 +0300
+++ coreutils-6.12/lib/getdate.c        2008-06-26 21:06:55.000000000 +0300
@@ -181,6 +181,8 @@
 #include <stdlib.h>
 #include <string.h>
 
+#include <locale.h>
+
 #include "xalloc.h"
 
 
@@ -2686,12 +2688,14 @@
   bool period_found;
   bool abbrev;
 
+  setlocale(LC_ALL, "C");
   /* Make it uppercase.  */
   for (p = word; *p; p++)
     {
       unsigned char ch = *p;
       *p = toupper (ch);
     }
+  setlocale(LC_ALL, "");
 
   for (tp = meridian_table; tp->name; tp++)
     if (strcmp (word, tp->name) == 0)
diff -ur coreutils-6.12.orig/lib/getdate.y coreutils-6.12/lib/getdate.y
--- coreutils-6.12.orig/lib/getdate.y   2008-06-26 21:06:06.000000000 +0300
+++ coreutils-6.12/lib/getdate.y        2008-06-26 21:06:55.000000000 +0300
@@ -66,6 +66,8 @@
 #include <stdlib.h>
 #include <string.h>
 
+#include <locale.h>
+
 #include "xalloc.h"
 
 
@@ -915,12 +917,14 @@
   bool period_found;
   bool abbrev;
 
+  setlocale(LC_ALL, "C");
   /* Make it uppercase.  */
   for (p = word; *p; p++)
     {
       unsigned char ch = *p;
       *p = toupper (ch);
     }
+  setlocale(LC_ALL, "");
 
   for (tp = meridian_table; tp->name; tp++)
     if (strcmp (word, tp->name) == 0)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]