|
From: | DE CARNE DE CARNAVALET, Xavier [COMP] |
Subject: | bug#60544: sort hangs on lengthy line with invalid UTF8 characters |
Date: | Wed, 4 Jan 2023 04:38:33 +0000 |
sort seems to do extra computations on long line with invalid UTF8 characters and could hang for days on just two lines. Here is the minimal example I could make to reproduce the bug: $ perl -e 'print "\xcd\xe5\xe0"; print "\n"' > file1 $ perl -e 'print "\xcd\xe5\xe0"x1000; print "\n"' > file2 To verify: $ ls -l file* -rw-rw-r-- 1 u u 4 Jan 4 12:13 file1 -rw-rw-r-- 1 u u 3001 Jan 4 12:13 file2 $ xxd -p file1 cde5e00a $ xxd -p file2 cde5e0cde5e0cde5e0cde5e0cde5e0cde5e0cde5e0cde5e0cde5e0cde5e0 [...] cde5e0cde5e0cde5e0cde5e0cde5e0cde5e0cde5e0cde5e0cde5e0cde5e0 0a Then: $ export LC_ALL=en_US.UTF8 $ time sort --debug file1 file2 sort: using 'en_US.UTF8' sorting rules [...] real 0m1.951s user 0m1.951s sys 0m0.000s It took nearly two seconds to sort two lines from two files. If I replace the \xe0 with \x61 in the first (small) file, the time gets down to milliseconds: $ perl -e 'print "\xcd\xe5\x61"; print "\n"' > file3 $ time sort --debug file3 file2 sort: using 'en_US.UTF8' sorting rules [...] real 0m0.007s user 0m0.003s sys 0m0.003s The time it takes increases when one of the file gets larger, see for instance with 2k repetitions: $ perl -e 'print "\xcd\xe5\xe0"x2000; print "\n"' > file4 $ time sort --debug file1 file4 sort: using 'en_US.UTF8' sorting rules [...] real 0m7.696s user 0m7.690s sys 0m0.004s Expectedly, sort should take milliseconds at most in all cases for two moderately long lines. $ uname -a Linux 5.13.0-51-generic #58~20.04.1-Ubuntu SMP Tue Jun 14 11:29:12 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux $ apt list installed coreutils coreutils/focal,now 8.30-3ubuntu2 amd64 [installed] $ sort --version sort (GNU coreutils) 8.30 Xavier de Carné de Carnavalet [https://www.polyu.edu.hk/emaildisclaimer/PolyU_Email_Signature.jpg]<http://www.polyu.edu.hk> www.polyu.edu.hk<http://www.polyu.edu.hk> [https://www.polyu.edu.hk/emaildisclaimer/Icons-02.jpg]<https://www.polyu.edu.hk/cpa/online-channels/#ipolyuapp> [https://www.polyu.edu.hk/emaildisclaimer/Icons-03.jpg] <https://www.facebook.com/HongKongPolyU> [https://www.polyu.edu.hk/emaildisclaimer/Icons-04.jpg] <https://www.youtube.com/user/HongKongPolyU> [https://www.polyu.edu.hk/emaildisclaimer/Icons-05.jpg] <https://www.instagram.com/hongkongpolyu/> [https://www.polyu.edu.hk/emaildisclaimer/Icons-06.jpg] <https://www.linkedin.com/school/hong-kong-polytechnic-university/> [https://www.polyu.edu.hk/emaildisclaimer/Icons-07.jpg] <https://twitter.com/HongKongPolyU> [https://www.polyu.edu.hk/emaildisclaimer/Icons-08.jpg] <https://www.polyu.edu.hk/-/media/department/home/setting/polyu-wechat_qr-code_20190903.jpg?bc=ffffff&h=150&w=150&hash=679EE95BCB1796F71B5A4149647785C9> [https://www.polyu.edu.hk/emaildisclaimer/Icons-09.jpg] <https://www.weibo.com/hongkongpolyu> Disclaimer: This message (including any attachments) contains confidential information intended for a specific individual and purpose. If you are not the intended recipient, you should delete this message and notify the sender and The Hong Kong Polytechnic University (the University) immediately. Any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited and may be unlawful. The University specifically denies any responsibility for the accuracy or quality of information obtained through University E-mail Facilities. Any views and opinions expressed are only those of the author(s) and do not necessarily represent those of the University and the University accepts no liability whatsoever for any losses or damages incurred or caused to any party as a result of the use of such information.
file1
Description: file1
file2
Description: file2
file3
Description: file3
file4
Description: file4
[Prev in Thread] | Current Thread | [Next in Thread] |