bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gawk] printf does not recognize .PREC if locale is en_US.UTF-8


From: Aharon Robbins
Subject: Re: [gawk] printf does not recognize .PREC if locale is en_US.UTF-8
Date: Fri, 01 Jan 2010 11:46:32 +0200

Greetings. Re: this:

> Date: Thu, 31 Dec 2009 13:32:39 +0200
> From: tczy <address@hidden>
> To: address@hidden
> Subject: [gawk] printf does not recognize .PREC if locale is en_US.UTF-8
>
> *** ISSUE AND HOW TO REPRODUCE (IT): ***
>
> echo nothing | awk '{printf "%.3s", "foobar"}'
>
> produces 'foobar' if LC_ALL is en_US.UTF-8. Other variations of the same
> program (with awk 'BEGIN{printf ...', etc.) produce the same. If LC_ALL
> is set to C, everything is fine.
>
> awk '{a=3Dsprintf("%.3s", "foobar"); print a}'
>
> also has this issue.
>
> IRC reports 3.1.5 working well with UTF locale.
>
> *** SYSTEM INFO ***
>
> % gawk --version
> GNU Awk 3.1.7
>
> % uname -a
> Linux sidep.ath.cx 2.6.31-ARCH #1 SMP PREEMPT Tue Nov 10 19:01:40 CET 2009 =
> x86_64 Intel(R) Core(TM)2 Duo CPU T5670 @ 1.80GHz GenuineIntel GNU/Linux
>
> Also GLibC 2.11.1.

It is indeed a bug.  Dealing with multibyte characters in general has been
a continuing source of pain.  Attached is a patch. It will wend its way
into the Savannah CVS shortly.

Happy New Year!

Arnold
---------------------------------------------------------------------------------
Fri Jan  1 11:41:50 2010  Arnold D. Robbins  <address@hidden>

        * builtin.c (format_tree): At pr_tail, remember to take the precision
        into account when determining how many characters to copy out.
        Thanks to tczy <address@hidden> for the bug report.

Index: builtin.c
===================================================================
RCS file: /d/mongo/cvsrep/gawk-stable/builtin.c,v
retrieving revision 1.38
diff -u -r1.38 builtin.c
--- builtin.c   21 Nov 2009 21:16:50 -0000      1.38
+++ builtin.c   1 Jan 2010 09:40:49 -0000
@@ -1223,9 +1223,18 @@
                        if (fw == 0 && ! have_prec)
                                ;
                        else if (gawk_mb_cur_max > 1 && (cs1 == 's' || cs1 == 
'c')) {
+                               int nchars_needed = 0;
+
                                assert(cp == arg->stptr || cp == cpbuf);
-                               copy_count = mbc_byte_count(arg->stptr,
-                                               cs1 == 's' ? arg->stlen : 1);
+
+                               if (cs1 == 'c')
+                                       nchars_needed = 1;
+                               else if (have_prec)
+                                       nchars_needed = prec;
+                               else
+                                       nchars_needed = arg->stlen;
+
+                               copy_count = mbc_byte_count(arg->stptr, 
nchars_needed);
                        }
                        bchunk(cp, copy_count);
                        while (fw > prec) {




reply via email to

[Prev in Thread] Current Thread [Next in Thread]