bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] libunistring: update to Unicode 7.0.0


From: Daiki Ueno
Subject: [PATCH] libunistring: update to Unicode 7.0.0
Date: Thu, 15 Jan 2015 13:02:00 +0900
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux)

Hello,

I've merged all the remaining Unicode 7.0.0 patches to Gnulib master.
Since the commits are huge, I don't post them to the lists nor include
links to cgit (see below for the commit IDs).

The snapshot libunistring tarball is also available from:
http://alpha.gnu.org/gnu/libunistring/libunistring-0.9.5-alpha4.tar.xz

I bumped the minimum required version of the host libunistring, so if
your program is using any of the uni*/* Gnulib modules, the updated code
will take precedence over the host libunistring.  If you find any
problem, let me know (preferably before the next libunistring relase,
planned for the middle of next month).

Here are the corresponding commit logs:

commit 7585eb3f16ab1e83f1d46ed5bb243488d8c34228
Author: Daiki Ueno <address@hidden>
Date:   Thu Jan 15 12:44:00 2015 +0900

    libunistring: update to Unicode 7.0.0
    
    * lib/unictype/joininggroup_byname.gperf: Add Straight Waw and
    Manichaean names.
    * lib/unictype/joininggroup_name.h: Likewise.
    * lib/unictype.in.h (UC_JOINING_GROUP_STRAIGHT_WAW)
    (UC_JOINING_GROUP_MANICHAEAN_ALEPH): New enumeration values.
    * lib/gen-uni-tables.c (UC_JOINING_GROUP_STRAIGHT_WAW)
    (UC_JOINING_GROUP_MANICHAEAN_*): New enumeration values.
    (fill_arabicshaping, joining_group_as_c_identifier): Support those
    enum values.
    (is_property_alphabetic): Accept newly added characters to
    cuneiform numeric signs.
    (is_property_default_ignorable_code_point): Reject U+0605.
    (FIELDLEN): Increase from 120 to 160.
    * lib/uniwidth/width.c (nonspacing_table_data): Add U+0605,
    U+08FF, U+0C00, U+0C81, U+0D01, U+1AB0..U+1ABE, U+1BAC..U+1BAD,
    U+1CF8..U+1CF9, U+1DE7..U+1DF5, U+A9E5, U+AA7C, U+FE27..U+FE2D,
    U+102E0, U+10376..U+1037A, U+10AE5..U+10AE6, U+1107F, U+11173,
    U+1122F..U+11231, U+11234, U+11236..U+11237, U+112DF,
    U+112E3..U+112EA, U+11301, U+1133C, U+11340, U+11366..U+1136C,
    U+11370..U+11374, U+114B3..U+114B8, U+114BA, U+114BF..U+114C0,
    U+114C2..U+114C3, U+115B2..U+115B5, U+115BC..U+115C0,
    U+11633..U+1163A, U+1163D, U+1163F..U+11640, U+16AF0..U+16AF4,
    U+16B30..U+16B36, U+1BC9D..U+1BC9E, U+1BCA0..U+1BCA3, and
    U+1E8D0..U+1E8D6.
    (uc_width): Adjust nonspacing_table_ind boundary from 240 to 248.
    * tests/uniwidth/test-uc_width2.sh: Same updates as in
    lib/uniwidth/width.c.
    * all generated files under lib/uni* and tests/uni*: Regenerate.

commit 0d1916cba5b1f783a284520f30371c7c7383cb26
Author: Daiki Ueno <address@hidden>
Date:   Thu Jan 15 12:16:53 2015 +0900

    libunistring: update to Unicode 6.3.0
    
    * lib/uniwbrk.in.h (WBP_DQ, WBP_SQ, WBP_HL): New enumeration values.
    * lib/uniwbrk/u-wordbreaks.h (FUNC): Support WB7a, WB7b, and WB7c.
    Update WB5, WB6, WB7, WB9, WB11, WB12, WB13a, and WB13b.
    * lib/uniwbrk/wbrktable.h (uniwbrk_table): Adjust table size.
    * lib/uniwbrk/wbrktable.c (uniwbrk_table): Support rule WB7a.
    Update WB5, WB9, WB10, WB13a, and WB13b.
    * tests/uniwbrk/test-uc-wordbreaks.c
    (wordbreakproperty_to_string): Support WBP_DQ, WBP_SQ, and WBP_HL.
    * lib/gen-uni-tables.c (UC_BIDI_LRI, UC_BIDI_RLI, UC_BIDI_FSI)
    (UC_BIDI_PDI): New enumeration values.
    (bidi_category_byname): Support those enum values.
    (is_WBP_MIDNUMLET): Exclude 0x0027 (SINGLE QUOTE), which is now a
    dedicated property assigned.
    (is_property_case_ignorable): Check 0x0027.
    (WBP_DQ, WBP_SQ, WBP_HL): New enumeration values.
    (get_wbp, debug_output_wbp, fill_org_wbp, debug_output_org_wbp)
    (output_wbp): Support those enum values.
    * lib/unictype.in.h (UC_BIDI_LRI, UC_BIDI_RLI, UC_BIDI_FSI)
    (UC_BIDI_PDI): New enumeration values.
    * lib/unictype/bidi_byname.gperf: Add those property names.
    * lib/uniwidth/width.c (nonspacing_table_data): Add U+061C,
    U+180E, U+1A1B, and U+2066..U+2069.
    * tests/uniwidth/test-uc_width2.sh: Same updates as in
    lib/uniwidth/width.c.
    * all generated files under lib/uni* and tests/uni*: Regenerate.

commit 794132ffcb51368479556ad43981710a367240bd
Author: Daiki Ueno <address@hidden>
Date:   Thu Jan 15 12:14:14 2015 +0900

    libunistring: update to Unicode 6.2.0
    
    * lib/unilbrk/lbrktables.h (LBP_RI): New enumeration value.
    (unilbrk_table): Adjust table size.
    * lib/unilbrk/lbrktables.c (unilbrk_table): Add a row and column
    for LBP_RI.
    * lib/uniwbrk.in.h (WBP_RI): New enumeration value.
    * lib/uniwbrk/u-wordbreaks.h (FUNC): Support rule WB13c.
    Normalize table index skipping ignored properties.
    * lib/uniwbrk/wbrktable.c (uniwbrk_table): Support WBP_RI.  Remove
    WBP_EXTEND and WBP_FORMAT, which are now computed without using
    the table.
    * lib/uniwbrk/wbrktable.h: Adjust table size.
    * lib/unigbrk.in.h (GBP_RI): New enumeration value.
    * lib/unigbrk/uc-is-grapheme-break.c (UC_IS_GRAPHEME_BREAK):
    Support rule GB8a.
    (UC_GRAPHEME_BREAKS_FOR, gb_table): Support GBP_RI.
    * tests/unigbrk/test-uc-is-grapheme-break.c
    (graphemebreakproperty_to_string): Support GBP_RI.
    * lib/gen-uni-tables.c (LBP_RI): New enumeration value.
    (get_lbp, debug_output_lbp, fill_org_lbp, debug_output_org_lbp)
    (output_lbp): Support LBP_RI.  Adjust some characters changed from
    LBP_AL to LBP_ID.
    (output_lbp): Support LBP_RI.
    (WBP_RI): New enumeration value.
    (debug_output_wbp, fill_org_wbp, debug_output_org_wbp)
    (output_wbp): Support WBP_RI.
    (GBP_RI): New enumeration value.
    (output_gbp_test, fill_org_gbp): Support GBP_RI.
    * all generated files under lib/uni* and tests/uni*: Regenerate.

commit 4b6bc42e050611b12758490ee85c997e54790784
Author: Daiki Ueno <address@hidden>
Date:   Thu Jan 15 12:08:17 2015 +0900

    libunistring: update to Unicode 6.1.0
    
    * lib/gen-uni-tables.c (output_joining_group): Switch to
    3-level table to accommodate joining groups defined with higher
    codepoint value.  Since there are only 88 groups defined in
    Unicode 7.0.0, use 7-bit packed format for level3 entries.
    (get_lbp): Update for Unicode 6.1.0.
    * lib/unictype/joininggroup_of.c (uc_joining_group): Adjust to use
    3-level table.
    * lib/unictype/joininggroup_byname.gperf: Add Rohingya Yeh
    joining group name.
    * lib/unictype/joininggroup_name.h: Likewise.
    * lib/unilbrk/lbrktables.h (LBP_HL): New enumeration value.
    (unilbrk_table): Adjust table size.
    * lib/unilbrk/lbrktables.c (unilbrk_table): Add a row and column
    for LBP_HL.
    * lib/uniwidth/width.c (nonspacing_table_data): Add U+0604,
    U+08E4..U+08FE, U+1BAB, U+1CF4, U+A674..U+A67B, U+A69F,
    U+AAEC..U+AAED, U+AAF6, U+11100..U+11102, U+11127..U+1112B,
    U+1112D..U+11134, U+11180..U+11181, U+111B6..U+111BE, U+116AB,
    U+116AD, U+116B0..U+116B5, U+116B7, U+16F8F..U+16F92.  Remove
    U+302E..U+302F.
    * tests/uniwidth/test-uc_width2.sh: Same updates as in
    lib/uniwidth/width.c.
    * all generated files under lib/uni* and tests/uni*: Regenerate.
    * modules/uni*/* (configure.ac): Bump minimum version to 0.9.5.

commit 803c77dea215638cdc08356ea5560ce93a03b6ff
Author: Daiki Ueno <address@hidden>
Date:   Thu Jan 15 12:06:30 2015 +0900

    uniwbrk/u32-wordbreaks-tests: add conformance test
    
    * modules/uniwbrk/u32-wordbreaks-tests (Files): Add
    tests/uniwbrk/test-uc-wordbreaks.c,
    tests/uniwbrk/test-uc-wordbreaks.sh, and
    tests/uniwbrk/WordBreakTest.txt.
    (Makefile.am): Add uniwbrk/test-uc-wordbreaks.sh to $(TESTS), add
    test-uc-wordbreaks to $(check_PROGRAMS), and define
    test_uc_wordbreaks_SOURCES and test_uc_wordbreaks_LDADD.
    * tests/uniwbrk/test-uc-wordbreaks.sh: New file.
    * tests/uniwbrk/test-uc-wordbreaks.c: New file.

commit 626571a023a08ebd0a4b870b996a95b38ff95db6
Author: Daiki Ueno <address@hidden>
Date:   Thu Jan 15 12:03:09 2015 +0900

    uniwbrk: ignore Extended/Format characters at BOL
    
    * lib/uniwbrk/u-wordbreaks.h (FUNC): Ignore Extend and Format
    characters if the previous character property is one of
    WBP_NEWLINE, WBP_CR, and WBP_LF.

For future reference, here is the procedure I followed to check the
integrity of this update:

- compare line breaking properties with the Unicode data

  $ diff lbrkprop_org.txt lbrkprop.txt

- compare word breaking properties with the Unicode data

  $ diff wbrkprop_org.txt wbrkprop.txt

- after merging lib/uniwidth/width.c.part into lib/uniwidth/width.c, run
  the test-uc_width2.sh test

  $ gnulib-tool --create-testdir --dir=test-uniwidth uniwidth/width
  $ cd test-uniwidth && ./configure && make && make check

- bootstrap libunistring and run all tests

- bootstrap gettext and run all tests (to check any line wrapping
  behavior change)

Regards,
--
Daiki Ueno



reply via email to

[Prev in Thread] Current Thread [Next in Thread]