bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#36070: 27; feature request '(Describe Char Unidata List) to include


From: Eli Zaretskii
Subject: bug#36070: 27; feature request '(Describe Char Unidata List) to include 'kDefinition' value
Date: Mon, 03 Jun 2019 18:06:32 +0300

> From: Van L <van@scratch.space>
> Date: Mon, 3 Jun 2019 22:00:30 +1000
> 
> The details retrieved by 'M-x describe-char' on '入' show the following
> 
> --8<---------------cut here---------------start------------->8---
> Character code properties: customize what to show
>   name: CJK IDEOGRAPH-5165
>   general-category: Lo (Letter, Other)
>   decomposition: (20837) ('入')
> --8<---------------cut here---------------end--------------->8---

This comes from UnicodeData.txt, our source for the Unicode properties
of all the characters.  We parse it into uni-*.el files as part of the
build.

> Following the customize link to 'Describe Char Unidata List' 
> I find more information can be had from [1] .
> 
> The Readings table, in particular, is nice to have for the 'kDefinition'.
> 
> --8<---------------cut here---------------start------------->8---
> | Data type   | Value                    |
> |-------------+--------------------------|
> | kDefinition | enter, come in(to), join |
> |             |                          |
> --8<---------------cut here---------------end--------------->8---

This comes from Unihan_Reading.txt, a different file that is part of
the Unihan database.

We don't currently have a property where to put this value, so we need
first to extend the properties.  And then we will need to parse the
above file and populate the property.  Patches welcome.  Bonus points
for reviewing other properties of the Unihan DB and adding whatever is
useful.  See UAX#38 (http://www.unicode.org/reports/tr38/), for the
description of the properties.

Thanks.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]