bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Alignment bug in ls with UTF-8 filenames under Mac OS X


From: Bruno Haible
Subject: Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
Date: Thu, 18 Jan 2007 15:37:23 +0100
User-agent: KMail/1.9.1

Vincent Lefevre wrote:
> Hmm... I forgot that ls was an alias (the same one on all my accounts).
> So, back on Mac OS X:
> 
> prunille:~/blah> \ls -C --color=always | hexdump -C
> 00000000  1b 5b 30 30 6d 1b 5b 30  6d 45 cc 81 1b 5b 30 30  |.[00m.[0mE�..[00|
> 00000010  6d 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |m               |
> 00000020  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
> 00000030  1b 5b 30 6d 79 31 32 33  34 35 36 37 38 39 30 31  |.[0my12345678901|
> 00000040  32 33 34 35 36 37 38 39  30 31 32 33 34 35 36 37  |2345678901234567|
> 00000050  38 39 30 1b 5b 30 30 6d  0a 1b 5b 30 6d 78 31 32  |890.[00m..[0mx12|
> 00000060  33 34 35 36 37 38 39 30  31 32 33 34 35 36 37 38  |3456789012345678|
> 00000070  39 30 31 32 33 34 35 36  37 38 39 30 1b 5b 30 30  |901234567890.[00|
> 00000080  6d 20 20 1b 5b 30 6d 7a  31 32 33 34 35 36 37 38  |m  .[0mz12345678|
> 00000090  39 30 31 32 33 34 35 36  37 38 39 30 31 32 33 34  |9012345678901234|
> 000000a0  35 36 37 38 39 30 1b 5b  30 30 6d 0a 1b 5b 6d     |567890.[00m..[m|
> 000000af

That makes - except for the escape sequences - an E, a combining accent and
31 spaces. So it's the same bug as in "ls -C -T0".

> > I see that the first call to wcwidth() gives: wcwidth(0x0301) = 1.
> > U+0301 is COMBINING ACUTE ACCENT. So here is the problem: MacOS'
> > wcwidth is buggy for combining characters like accents.
> 
> OK. Can't autoconf detect that and use another implementation?

Yes. We can do that in gnulib. I'll work on this issue in the next few weeks.
Please remind us (on the bug-gnulib mailing list) in 1 or 2 months.

And, as we have seen, the other issue is that Apple Terminal has problems
estimating the width of tabs when there are non-ASCII characters. Since
you can start an telnet/ssh session from MacOS X to any platform (Linux,
Solaris, etc.), the fix needs to be platform independent. Here is such a fix:


2007-01-18  Bruno Haible  <address@hidden>

        Avoid problems with tabs after non-ASCII characters in some terminals.
        * src/ls.c (nonascii_in_this_line): New variable.
        (quote_name): Update nonascii_in_this_line.
        (print_many_per_line, print_horizontal): Set nonascii_in_this_line to
        false at the beginning of each line.
        (indent): Use spaces for indentation when nonascii_in_this_line.

diff -c -3 -r1.447 ls.c
*** src/ls.c    2 Jan 2007 06:29:12 -0000       1.447
--- src/ls.c    18 Jan 2007 14:38:14 -0000
***************
*** 851,856 ****
--- 851,859 ----
     for the separating white space.  */
  #define MIN_COLUMN_WIDTH      3
  
+ /* True if some non-ASCII character has been output on this line.  */
+ static bool nonascii_in_this_line;
+ 
  
  /* This zero-based index is used solely with the --dired option.
     When that option is in effect, this counter is incremented for each
***************
*** 3704,3710 ****
      }
  
    if (out != NULL)
!     fwrite (buf, 1, len, out);
    if (width != NULL)
      *width = displayed_width;
    return len;
--- 3702,3722 ----
      }
  
    if (out != NULL)
!     {
!       /* Update nonascii_in_this_line indicator.  */
!       char const *p = buf;
!       char const *plimit = buf + len;
! 
!       for (; p < plimit; p++)
!       if (!isascii (to_uchar (*p)))
!         {
!           nonascii_in_this_line = true;
!           break;
!         }
! 
!       /* Actually output the quoted representation.  */
!       fwrite (buf, 1, len, out);
!     }
    if (width != NULL)
      *width = displayed_width;
    return len;
***************
*** 3957,3962 ****
--- 3969,3975 ----
        size_t pos = 0;
  
        /* Print the next row.  */
+       nonascii_in_this_line = false;
        while (1)
        {
          size_t name_length = length_of_file_name_and_frills (files + filesno);
***************
*** 3984,3989 ****
--- 3997,4004 ----
    size_t name_length = length_of_file_name_and_frills (files);
    size_t max_name_length = line_fmt->col_arr[0];
  
+   nonascii_in_this_line = false;
+ 
    /* Print first entry.  */
    print_file_name_and_frills (files);
  
***************
*** 3996,4001 ****
--- 4011,4017 ----
        {
          putchar ('\n');
          pos = 0;
+         nonascii_in_this_line = false;
        }
        else
        {
***************
*** 4047,4060 ****
  }
  
  /* Assuming cursor is at position FROM, indent up to position TO.
!    Use a TAB character instead of two or more spaces whenever possible.  */
  
  static void
  indent (size_t from, size_t to)
  {
    while (from < to)
      {
!       if (tabsize != 0 && to / tabsize > (from + 1) / tabsize)
        {
          putchar ('\t');
          from += tabsize - from % tabsize;
--- 4063,4085 ----
  }
  
  /* Assuming cursor is at position FROM, indent up to position TO.
!    Use a TAB character instead of two or more spaces whenever possible.
!    Depends on the TABSIZE option and on the current value of
!    NONASCII_IN_THIS_LINE.  */
  
  static void
  indent (size_t from, size_t to)
  {
    while (from < to)
      {
!       /* Setting TABSIZE to 0 inhibits the use of tabs.  Also, since some
!        terminal emulators (like Apple Terminal from MacOS X 10.3) don't
!        handle tabs after non-ASCII combining accents on the same line
!        well, avoid tabs where there are non-ASCII characters so far on
!        the current line.  */
!       if (tabsize != 0
!         && !nonascii_in_this_line
!         && to / tabsize > (from + 1) / tabsize)
        {
          putchar ('\t');
          from += tabsize - from % tabsize;




reply via email to

[Prev in Thread] Current Thread [Next in Thread]