[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
branch master updated: * tp/Texinfo/Convert/Unicode.pm (check_unicode_po
From: |
Patrice Dumas |
Subject: |
branch master updated: * tp/Texinfo/Convert/Unicode.pm (check_unicode_point_conversion) tp/Texinfo/Convert/Plaintext.pm (_convert) tp/Texinfo/Convert/LaTeX.pm (_convert): move the code checking actual conversion to UTF-8 of unicode codepoint string to Texinfo/Convert/Unicode.pm as check_unicode_point_conversion(). |
Date: |
Mon, 16 Aug 2021 16:06:29 -0400 |
This is an automated email from the git hooks/post-receive script.
pertusus pushed a commit to branch master
in repository texinfo.
The following commit(s) were added to refs/heads/master by this push:
new e69381e * tp/Texinfo/Convert/Unicode.pm
(check_unicode_point_conversion) tp/Texinfo/Convert/Plaintext.pm (_convert)
tp/Texinfo/Convert/LaTeX.pm (_convert): move the code checking actual
conversion to UTF-8 of unicode codepoint string to Texinfo/Convert/Unicode.pm
as check_unicode_point_conversion().
e69381e is described below
commit e69381e54c95bb00825b61aa329a7265e4527f9d
Author: Patrice Dumas <pertusus@free.fr>
AuthorDate: Mon Aug 16 22:06:17 2021 +0200
* tp/Texinfo/Convert/Unicode.pm (check_unicode_point_conversion)
tp/Texinfo/Convert/Plaintext.pm (_convert)
tp/Texinfo/Convert/LaTeX.pm (_convert):
move the code checking actual conversion to UTF-8 of
unicode codepoint string to Texinfo/Convert/Unicode.pm
as check_unicode_point_conversion().
---
ChangeLog | 9 ++++++++
tp/Texinfo/Convert/LaTeX.pm | 46 +++++++------------------------------
tp/Texinfo/Convert/Plaintext.pm | 40 +++++++--------------------------
tp/Texinfo/Convert/Unicode.pm | 50 +++++++++++++++++++++++++++++++++++++++++
4 files changed, 75 insertions(+), 70 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 0e564e2..1442328 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2021-08-16 Patrice Dumas <pertusus@free.fr>
+
+ * tp/Texinfo/Convert/Unicode.pm (check_unicode_point_conversion)
+ tp/Texinfo/Convert/Plaintext.pm (_convert)
+ tp/Texinfo/Convert/LaTeX.pm (_convert):
+ move the code checking actual conversion to UTF-8 of
+ unicode codepoint string to Texinfo/Convert/Unicode.pm
+ as check_unicode_point_conversion().
+
2021-08-15 Patrice Dumas <pertusus@free.fr>
* tp/Texinfo/ParserNonXS.pm (_end_line),
diff --git a/tp/Texinfo/Convert/LaTeX.pm b/tp/Texinfo/Convert/LaTeX.pm
index 46941de..cb96686 100644
--- a/tp/Texinfo/Convert/LaTeX.pm
+++ b/tp/Texinfo/Convert/LaTeX.pm
@@ -485,12 +485,6 @@ my %defaults = (
);
-my %contents_commands = (
- 'contents' => 1,
- 'shortcontents' => 1,
- 'summarycontents' => 1,
-);
-
sub converter_defaults($$)
{
return %defaults;
@@ -1602,41 +1596,17 @@ sub _convert($$)
# Syntactic checks on the value were already done in Parser.pm,
# but we have one more thing to test: since this is the one
# place where we might output actual UTF-8 binary bytes, we have
- # to check that chr(hex($arg)) is valid. Perl gives a warning
- # and will not output UTF-8 for Unicode non-characters such as
- # U+10FFFF. In this case, silently fall back to plain text, on
- # the theory that the user wants something.
+ # to check that it is possible. If not, silently fall back to
+ # plain text, on the theory that the user wants something.
my $res;
if ($self->{'to_utf8'}) {
- my $error = 0;
- # The warning about non-characters is only given when the code
- # point is attempted to be output, not just manipulated.
- #
http://stackoverflow.com/questions/5127725/how-could-i-catch-an-unicode-non-character-warning
- #
- # Therefore, we have to try to output it within an eval.
- # Since opening /dev/null or a temporary file means
- # more system-dependent checks, use a string as our
- # filehandle.
- eval {
- use warnings FATAL => qw(all);
- my ($fh, $string);
- open($fh, ">", \$string) || die "open(U string eval) failed: $!";
- binmode($fh, ":utf8") || die "binmode(U string eval) failed: $!";
- print $fh chr(hex("$arg"));
- };
- if ($@) {
- warn "\@U chr(hex($arg)) eval failed: $@\n" if ($self->{'DEBUG'});
- $error = 1;
- } elsif (hex($arg) > 0x10FFFF) {
- # The check above appears not to work in older versions of perl,
- # so check the argument is not greater the maximum Unicode code
- # point.
- $error = 1;
- }
- if ($error) {
- $res = "U+$arg";
- } else {
+ my $possible_conversion
+ = Texinfo::Convert::Unicode::check_unicode_point_conversion($arg,
+ $self->{'DEBUG'});
+ if ($possible_conversion) {
$res = chr(hex($arg)); # ok to call chr
+ } else {
+ $res = "U+$arg";
}
} else {
$res = "U+$arg"; # not outputting UTF-8
diff --git a/tp/Texinfo/Convert/Plaintext.pm b/tp/Texinfo/Convert/Plaintext.pm
index bf080c6..a4701e9 100644
--- a/tp/Texinfo/Convert/Plaintext.pm
+++ b/tp/Texinfo/Convert/Plaintext.pm
@@ -2292,41 +2292,17 @@ sub _convert($$)
# Syntactic checks on the value were already done in Parser.pm,
# but we have one more thing to test: since this is the one
# place where we might output actual UTF-8 binary bytes, we have
- # to check that chr(hex($arg)) is valid. Perl gives a warning
- # and will not output UTF-8 for Unicode non-characters such as
- # U+10FFFF. In this case, silently fall back to plain text, on
- # the theory that the user wants something.
+ # to check that it is possible. If not, silently fall back to
+ # plain text, on the theory that the user wants something.
my $res;
if ($self->{'to_utf8'}) {
- my $error = 0;
- # The warning about non-characters is only given when the code
- # point is attempted to be output, not just manipulated.
- #
http://stackoverflow.com/questions/5127725/how-could-i-catch-an-unicode-non-character-warning
- #
- # Therefore, we have to try to output it within an eval.
- # Since opening /dev/null or a temporary file means
- # more system-dependent checks, use a string as our
- # filehandle.
- eval {
- use warnings FATAL => qw(all);
- my ($fh, $string);
- open($fh, ">", \$string) || die "open(U string eval) failed: $!";
- binmode($fh, ":utf8") || die "binmode(U string eval) failed: $!";
- print $fh chr(hex("$arg"));
- };
- if ($@) {
- warn "\@U chr(hex($arg)) eval failed: $@\n" if ($self->{'DEBUG'});
- $error = 1;
- } elsif (hex($arg) > 0x10FFFF) {
- # The check above appears not to work in older versions of perl,
- # so check the argument is not greater the maximum Unicode code
- # point.
- $error = 1;
- }
- if ($error) {
- $res = "U+$arg";
- } else {
+ my $possible_conversion
+ = Texinfo::Convert::Unicode::check_unicode_point_conversion($arg,
+ $self->{'DEBUG'});
+ if ($possible_conversion) {
$res = chr(hex($arg)); # ok to call chr
+ } else {
+ $res = "U+$arg";
}
} else {
$res = "U+$arg"; # not outputting UTF-8
diff --git a/tp/Texinfo/Convert/Unicode.pm b/tp/Texinfo/Convert/Unicode.pm
index 4526c20..8d823b4 100644
--- a/tp/Texinfo/Convert/Unicode.pm
+++ b/tp/Texinfo/Convert/Unicode.pm
@@ -1468,6 +1468,45 @@ sub unicode_for_brace_no_arg_command($$) {
}
}
+# this function checks that it is possible to output
+# actual UTF-8 binary bytes, by checking that chr(hex($arg)) is valid.
+# Perl gives a warning and will not output UTF-8 for Unicode
+# non-characters such as U+10FFFF.
+#
+# return 1 if the conversion is possible and can be attempted, 0 otherwise.
+# the second argument triggers debugging output if the conversion failed.
+sub check_unicode_point_conversion($;$)
+{
+ my $arg = shift;
+ my $output_debug = shift;
+
+ # The warning about non-characters is only given when the code
+ # point is attempted to be output, not just manipulated.
+ #
http://stackoverflow.com/questions/5127725/how-could-i-catch-an-unicode-non-character-warning
+ #
+ # Therefore, we have to try to output it within an eval.
+ # Since opening /dev/null or a temporary file means
+ # more system-dependent checks, use a string as our
+ # filehandle.
+ eval {
+ use warnings FATAL => qw(all);
+ my ($fh, $string);
+ open($fh, ">", \$string) || die "open(U string eval) failed: $!";
+ binmode($fh, ":utf8") || die "binmode(U string eval) failed: $!";
+ print $fh chr(hex("$arg"));
+ };
+ if ($@) {
+ warn "Unicode chr(hex($arg)) eval failed: $@\n" if ($output_debug);
+ return 0;
+ } elsif (hex($arg) > 0x10FFFF) {
+ # The check above appears not to work in older versions of perl,
+ # so check the argument is not greater the maximum Unicode code
+ # point.
+ return 0;
+ }
+ return 1;
+}
+
# string length size taking into account that east asian characters
# may take 2 spaces.
sub string_width($)
@@ -1579,6 +1618,17 @@ I<$command_name> (like C<@bullet{}>, C<@aa{}> or
C<@guilsinglleft{}>),
or undef if there is no available encoded character for encoding
I<$encoding>.
+=item $possible_conversion = check_unicode_point_conversion($arg,
$output_debug)
+
+Check that it is possible to output actual UTF-8 binary bytes
+corresponding to the Unicode codepoint string I<$args> (such as
+C<201D>). Perl gives a warning and will not output UTF-8 for
+Unicode non-characters such as U+10FFFF. If the optional
+I<$output_debug> argument is set, a debugging output warning
+is emitted if the test of the conversion failed.
+Returns 1 if the conversion is possible and can be attempted,
+0 otherwise.
+
=item $width = string_width($string)
Return the string width, taking into account the fact that some characters
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- branch master updated: * tp/Texinfo/Convert/Unicode.pm (check_unicode_point_conversion) tp/Texinfo/Convert/Plaintext.pm (_convert) tp/Texinfo/Convert/LaTeX.pm (_convert): move the code checking actual conversion to UTF-8 of unicode codepoint string to Texinfo/Convert/Unicode.pm as check_unicode_point_conversion().,
Patrice Dumas <=