[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#53145: 回覆: bug#53145: Acknowledgement ("cut" can't segment Chinese c
From: |
Bernhard Voelker |
Subject: |
bug#53145: 回覆: bug#53145: Acknowledgement ("cut" can't segment Chinese characters correctly?) |
Date: |
Wed, 12 Jan 2022 13:25:17 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.1 |
On 1/12/22 12:19, zendas via GNU coreutils Bug Reports wrote:
> I have considered dealing with this problem directly with three bytes
> instead, but I have two doubts, I can correctly use wc -m to recognize the
> bytes in the same environment (but cut can't?), and my script goal is to
> recognize Chinese, will The probability of execution is higher on platforms
> that support Chinese environment. In addition, the fixed three-byte approach
> cannot handle the mixed content of full shape and half shape. I need a lot of
> judgment and conversion, which will greatly increase the possibility of
> errors.
As Bob wrote, some downstream distributions have multi-byte support in cut(1)
for many years,
e.g. RHEL/Fedora and SUSE/openSUSE.
E.g. here on my openSUSE system:
$ echo "你好啊" | LC_ALL=zh_CN.UTF-8 cut -c 1
你
Have a nice day,
Berny