bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#9086: ls --color: 30% speed-up and case-insensitive suffixes [Re: bu


From: Jim Meyering
Subject: bug#9086: ls --color: 30% speed-up and case-insensitive suffixes [Re: bug#9086
Date: Tue, 09 Oct 2012 14:32:07 +0200

Pádraig Brady wrote:
> On 10/09/2012 12:52 PM, Jim Meyering wrote:> Eric Blake wrote:
>>> On 07/26/2011 02:33 PM, marcel partap wrote:
>>>> Here's a patch. Adds STRCASEEQ_LEN macro for case insensitive extension
>>>> matching.
>>>> #regards/marcel.
>>>
>>> Your patch would make the new behavior unconditional.  But I like case
>>> sensitivity, and think that case insensitivity should be an opt-in
>>> process that I request, with coordination between dircolors to
>>> generate a new string for LS_COLORS to be honored by ls.  Furthermore,
>>> the patch is lacking in NEWS, documentation, and testsuite coverage.
>>>
>>> Additionally, you should be aware that strncasecmp() has undefined
>>> behavior in non-C multibyte locales.  It would probably be better to
>>> use c_strncasecmp(), so that you are guaranteed defined behavior
>>> regardless of the current locale.
>>>
>>> Would you care to tackle those additional issues?  And are you set up
>>> for copyright assignment, since the patch will probably be non-trivial
>>> by that point in time?
>>
>> I saw this while going back through old bugs, so started poking around.
>> How about this: if a suffix is not matched, convert it to lower case
>> and check again.  That way, anyone who cares to highlight .TaR or .TAR
>> differently from .tar can still easily do so, yet names ending with
>> uppercase .TAR, .JPG, .FLAC, etc. will now be highlighted by default.
>>
>> Does anyone think it's worth making this new fallback-to-case-insensitivity
>> an option?
>
> Not me anyway.
>
>> While looking at that, I realized that comparing each name in an ls'd
>> directory with each of nearly 100 suffixes should leave nontrivial room
>> for improvement.  Sure enough, once I'd replaced that linear search
>> with a hash-table lookup, I see a measurable improvement.
>
>> and taking best-of-10 times, I see a 0.34 -> 0.23 (~30%) improvement.
>
> Very nice.
>
>> Of course, I have deliberately used the case that shows the most improvement:
>>    - a directory with very many files
>>    - no suffix match, so the old code searches all suffixes
>>    - using tmpfs: minimal stat overhead
>
>> @@ -4343,21 +4412,11 @@ print_color_indicator (const struct fileinfo *f, 
>> bool symlink_target)
>>       }
>>
>>     /* Check the file's suffix only if still classified as C_FILE.  */
>> -  ext = NULL;
>> -  if (type == C_FILE)
>> -    {
>> -      /* Test if NAME has a recognized suffix.  */
>> -
>> -      len = strlen (name);
>> -      name += len;          /* Pointer to final \0.  */
>> -      for (ext = color_ext_list; ext != NULL; ext = ext->next)
>> -        {
>> -          if (ext->ext.len <= len
>> -              && STREQ_LEN (name - ext->ext.len, ext->ext.string,
>> -                            ext->ext.len))
>> -            break;
>> -        }
>> -    }
>> +  char *suffix;
>> +  struct sufco *ext;        /* Color extension */
>> +  ext = (type == C_FILE && (suffix = strrchr (name, '.'))
>> +         ? suffix_color_lookup (suffix)
>> +         : NULL);
>
> So do we now only support suffixes delimited by '.' ?
> Previously the delimiter was arbitrary or optional:
>
>   touch star; LS_COLORS="*tar=01;31" /bin/ls --color *tar

Good catch.  I realized that early on, but then forgot to mention it.
Yes, I would have to document that the "." is now required.
It seems like a reasonable restriction, but technically
it could be called a regression.

Also, with these changes, a multiple-"." suffix will no longer work.
I.e., before, if you wanted to give *.tar.xz files a color different
from plain *.xz files, you could.

Does anyone object to that?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]