branch master updated: * tp/Texinfo/Convert/Unicode.pm (check_unicode

texinfo-commits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

branch master updated: * tp/Texinfo/Convert/Unicode.pm (check_unicode_po

From:	Patrice Dumas
Subject:	branch master updated: * tp/Texinfo/Convert/Unicode.pm (check_unicode_point_conversion) tp/Texinfo/Convert/Plaintext.pm (_convert) tp/Texinfo/Convert/LaTeX.pm (_convert): move the code checking actual conversion to UTF-8 of unicode codepoint string to Texinfo/Convert/Unicode.pm as check_unicode_point_conversion().
Date:	Mon, 16 Aug 2021 16:06:29 -0400

This is an automated email from the git hooks/post-receive script.

pertusus pushed a commit to branch master
in repository texinfo.

The following commit(s) were added to refs/heads/master by this push:
     new e69381e  * tp/Texinfo/Convert/Unicode.pm 
(check_unicode_point_conversion) tp/Texinfo/Convert/Plaintext.pm (_convert) 
tp/Texinfo/Convert/LaTeX.pm (_convert): move the code checking actual 
conversion to UTF-8 of unicode codepoint string to Texinfo/Convert/Unicode.pm 
as check_unicode_point_conversion().
e69381e is described below

commit e69381e54c95bb00825b61aa329a7265e4527f9d
Author: Patrice Dumas <pertusus@free.fr>
AuthorDate: Mon Aug 16 22:06:17 2021 +0200

    * tp/Texinfo/Convert/Unicode.pm (check_unicode_point_conversion)
    tp/Texinfo/Convert/Plaintext.pm (_convert)
    tp/Texinfo/Convert/LaTeX.pm (_convert):
    move the code checking actual conversion to UTF-8 of
    unicode codepoint string to Texinfo/Convert/Unicode.pm
    as check_unicode_point_conversion().
---
 ChangeLog                       |  9 ++++++++
 tp/Texinfo/Convert/LaTeX.pm     | 46 +++++++------------------------------
 tp/Texinfo/Convert/Plaintext.pm | 40 +++++++--------------------------
 tp/Texinfo/Convert/Unicode.pm   | 50 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 75 insertions(+), 70 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 0e564e2..1442328 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2021-08-16  Patrice Dumas  <pertusus@free.fr>
+
+       * tp/Texinfo/Convert/Unicode.pm (check_unicode_point_conversion)
+       tp/Texinfo/Convert/Plaintext.pm (_convert)
+       tp/Texinfo/Convert/LaTeX.pm (_convert):
+       move the code checking actual conversion to UTF-8 of
+       unicode codepoint string to Texinfo/Convert/Unicode.pm
+       as check_unicode_point_conversion().
+
 2021-08-15  Patrice Dumas  <pertusus@free.fr>
 
        * tp/Texinfo/ParserNonXS.pm (_end_line),
diff --git a/tp/Texinfo/Convert/LaTeX.pm b/tp/Texinfo/Convert/LaTeX.pm
index 46941de..cb96686 100644
--- a/tp/Texinfo/Convert/LaTeX.pm
+++ b/tp/Texinfo/Convert/LaTeX.pm
@@ -485,12 +485,6 @@ my %defaults = (
 );
 
 
-my %contents_commands = (
- 'contents' => 1,
- 'shortcontents' => 1,
- 'summarycontents' => 1,
-);
-
 sub converter_defaults($$)
 {
   return %defaults;
@@ -1602,41 +1596,17 @@ sub _convert($$)
         # Syntactic checks on the value were already done in Parser.pm,
         # but we have one more thing to test: since this is the one
         # place where we might output actual UTF-8 binary bytes, we have
-        # to check that chr(hex($arg)) is valid.  Perl gives a warning
-        # and will not output UTF-8 for Unicode non-characters such as
-        # U+10FFFF.  In this case, silently fall back to plain text, on
-        # the theory that the user wants something.
+        # to check that it is possible.  If not, silently fall back to
+        # plain text, on the theory that the user wants something.
         my $res;
         if ($self->{'to_utf8'}) {
-          my $error = 0;
-          # The warning about non-characters is only given when the code
-          # point is attempted to be output, not just manipulated.
-          # 
http://stackoverflow.com/questions/5127725/how-could-i-catch-an-unicode-non-character-warning
-          #
-          # Therefore, we have to try to output it within an eval.
-          # Since opening /dev/null or a temporary file means
-          # more system-dependent checks, use a string as our
-          # filehandle.
-          eval {
-            use warnings FATAL => qw(all);
-            my ($fh, $string);
-            open($fh, ">", \$string) || die "open(U string eval) failed: $!";
-            binmode($fh, ":utf8") || die "binmode(U string eval) failed: $!";
-            print $fh chr(hex("$arg"));
-          };
-          if ($@) {
-            warn "\@U chr(hex($arg)) eval failed: $@\n" if ($self->{'DEBUG'});
-            $error = 1;
-          } elsif (hex($arg) > 0x10FFFF) {
-            # The check above appears not to work in older versions of perl,
-            # so check the argument is not greater the maximum Unicode code 
-            # point.
-            $error = 1;
-          }
-          if ($error) {
-            $res = "U+$arg";
-          } else {
+          my $possible_conversion
+            = Texinfo::Convert::Unicode::check_unicode_point_conversion($arg,
+                                                             $self->{'DEBUG'});
+          if ($possible_conversion) {
             $res = chr(hex($arg)); # ok to call chr
+          } else {
+            $res = "U+$arg";
           }
         } else {
           $res = "U+$arg";  # not outputting UTF-8
diff --git a/tp/Texinfo/Convert/Plaintext.pm b/tp/Texinfo/Convert/Plaintext.pm
index bf080c6..a4701e9 100644
--- a/tp/Texinfo/Convert/Plaintext.pm
+++ b/tp/Texinfo/Convert/Plaintext.pm
@@ -2292,41 +2292,17 @@ sub _convert($$)
         # Syntactic checks on the value were already done in Parser.pm,
         # but we have one more thing to test: since this is the one
         # place where we might output actual UTF-8 binary bytes, we have
-        # to check that chr(hex($arg)) is valid.  Perl gives a warning
-        # and will not output UTF-8 for Unicode non-characters such as
-        # U+10FFFF.  In this case, silently fall back to plain text, on
-        # the theory that the user wants something.
+        # to check that it is possible.  If not, silently fall back to
+        # plain text, on the theory that the user wants something.
         my $res;
         if ($self->{'to_utf8'}) {
-          my $error = 0;
-          # The warning about non-characters is only given when the code
-          # point is attempted to be output, not just manipulated.
-          # 
http://stackoverflow.com/questions/5127725/how-could-i-catch-an-unicode-non-character-warning
-          #
-          # Therefore, we have to try to output it within an eval.
-          # Since opening /dev/null or a temporary file means
-          # more system-dependent checks, use a string as our
-          # filehandle.
-          eval {
-            use warnings FATAL => qw(all);
-            my ($fh, $string);
-            open($fh, ">", \$string) || die "open(U string eval) failed: $!";
-            binmode($fh, ":utf8") || die "binmode(U string eval) failed: $!";
-            print $fh chr(hex("$arg"));
-          };
-          if ($@) {
-            warn "\@U chr(hex($arg)) eval failed: $@\n" if ($self->{'DEBUG'});
-            $error = 1;
-          } elsif (hex($arg) > 0x10FFFF) {
-            # The check above appears not to work in older versions of perl,
-            # so check the argument is not greater the maximum Unicode code 
-            # point.
-            $error = 1;
-          }
-          if ($error) {
-            $res = "U+$arg";
-          } else {
+          my $possible_conversion
+            = Texinfo::Convert::Unicode::check_unicode_point_conversion($arg,
+                                                             $self->{'DEBUG'});
+          if ($possible_conversion) {
             $res = chr(hex($arg)); # ok to call chr
+          } else {
+            $res = "U+$arg";
           }
         } else {
           $res = "U+$arg";  # not outputting UTF-8
diff --git a/tp/Texinfo/Convert/Unicode.pm b/tp/Texinfo/Convert/Unicode.pm
index 4526c20..8d823b4 100644
--- a/tp/Texinfo/Convert/Unicode.pm
+++ b/tp/Texinfo/Convert/Unicode.pm
@@ -1468,6 +1468,45 @@ sub unicode_for_brace_no_arg_command($$) {
   }  
 }
 
+# this function checks that it is possible to output
+# actual UTF-8 binary bytes, by checking that chr(hex($arg)) is valid.
+# Perl gives a warning and will not output UTF-8 for Unicode
+# non-characters such as U+10FFFF.
+#
+# return 1 if the conversion is possible and can be attempted, 0 otherwise.
+# the second argument triggers debugging output if the conversion failed.
+sub check_unicode_point_conversion($;$)
+{
+  my $arg = shift;
+  my $output_debug = shift;
+
+  # The warning about non-characters is only given when the code
+  # point is attempted to be output, not just manipulated.
+  # 
http://stackoverflow.com/questions/5127725/how-could-i-catch-an-unicode-non-character-warning
+  #
+  # Therefore, we have to try to output it within an eval.
+  # Since opening /dev/null or a temporary file means
+  # more system-dependent checks, use a string as our
+  # filehandle.
+  eval {
+    use warnings FATAL => qw(all);
+    my ($fh, $string);
+    open($fh, ">", \$string) || die "open(U string eval) failed: $!";
+    binmode($fh, ":utf8") || die "binmode(U string eval) failed: $!";
+    print $fh chr(hex("$arg"));
+  };
+  if ($@) {
+    warn "Unicode chr(hex($arg)) eval failed: $@\n" if ($output_debug);
+    return 0;
+  } elsif (hex($arg) > 0x10FFFF) {
+    # The check above appears not to work in older versions of perl,
+    # so check the argument is not greater the maximum Unicode code
+    # point.
+    return 0;
+  }
+  return 1;
+}
+
 # string length size taking into account that east asian characters
 # may take 2 spaces.
 sub string_width($)
@@ -1579,6 +1618,17 @@ I<$command_name> (like C<@bullet{}>, C<@aa{}> or 
C<@guilsinglleft{}>),
 or undef if there is no available encoded character for encoding 
 I<$encoding>. 
 
+=item $possible_conversion = check_unicode_point_conversion($arg, 
$output_debug)
+
+Check that it is possible to output actual UTF-8 binary bytes
+corresponding to the Unicode codepoint string I<$args> (such as
+C<201D>).  Perl gives a warning and will not output UTF-8 for
+Unicode non-characters such as U+10FFFF.  If the optional
+I<$output_debug> argument is set, a debugging output warning
+is emitted if the test of the conversion failed.
+Returns 1 if the conversion is possible and can be attempted,
+0 otherwise.
+
 =item $width = string_width($string)
 
 Return the string width, taking into account the fact that some characters

[Prev in Thread]

Current Thread

[Next in Thread]

branch master updated: * tp/Texinfo/Convert/Unicode.pm (check_unicode_point_conversion) tp/Texinfo/Convert/Plaintext.pm (_convert) tp/Texinfo/Convert/LaTeX.pm (_convert): move the code checking actual conversion to UTF-8 of unicode codepoint string to Texinfo/Convert/Unicode.pm as check_unicode_point_conversion()., Patrice Dumas <=

Prev by Date: branch master updated: More latex tests on files
Next by Date: branch master updated: LaTeX.pm: handle last style brace commands. coverage_macro.texi: Add fonts selection brace commands in a @displaymath.
Previous by thread: branch master updated: More latex tests on files
Next by thread: branch master updated: LaTeX.pm: handle last style brace commands. coverage_macro.texi: Add fonts selection brace commands in a @displaymath.
Index(es):
- Date
- Thread