[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#49340: small sort takes hours for UTF-8 locale
From: |
Jon Klaas |
Subject: |
bug#49340: small sort takes hours for UTF-8 locale |
Date: |
Fri, 2 Jul 2021 14:32:21 -0500 |
Hello,
I encountered a file that was taking hours to sort that was expected
to take negligible time. This seems to be due to the locale
LANG=en_US.UTF-8. I've worked around the problem by using LC_ALL=C, but
thought I would report this, as I didn't see a relevant bug report.
This was seen on centos 8 using package
coreutils-8.30-6.el8.x86_64
and the current
coreutils-8.30-8.el8.x86_64
#takes under 1 second.
export LC_ALL=C
sort tst00776.out
#slow sort takes many hours
export LC_ALL=en_US.UTF-8
sort tst00776.out
Looks like most of the time is consumed here:
#0 0x00007f4a65425c4b in strcoll_l () from /lib64/libc.so.6
#1 0x00005600d195d365 in strcoll_loop ()
#2 0x00005600d195bebd in xmemcoll0 ()
#3 0x00005600d1951176 in compare ()
#4 0x00005600d1951224 in sequential_sort ()
#5 0x00005600d19511d5 in sequential_sort ()
#6 0x00005600d195374b in sortlines ()
#7 0x00005600d194d96b in main ()
It's possible the input (attached) has invalid UTF-8.
I also tried on an older RHEL 7 and did NOT reproduce the problem with
coreutils.x86_64 8.22-23.el7
Thanks,
Jon Klaas
tst00776.out.gz
Description: GNU Zip compressed data
- bug#49340: small sort takes hours for UTF-8 locale,
Jon Klaas <=