branch master updated: One function in Texinfo::Common to handle file na

texinfo-commits
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
branch master updated: One function in Texinfo::Common to handle file na

From:	Patrice Dumas
Subject:	branch master updated: One function in Texinfo::Common to handle file name encoding
Date:	Thu, 24 Feb 2022 17:42:29 -0500
This is an automated email from the git hooks/post-receive script.

pertusus pushed a commit to branch master
in repository texinfo.

The following commit(s) were added to refs/heads/master by this push:
     new 69aa96fccc One function in Texinfo::Common to handle file name encoding
69aa96fccc is described below

commit 69aa96fccccb2fe1fa6e8609f80e697b977264be
Author: Patrice Dumas <pertusus@free.fr>
AuthorDate: Thu Feb 24 23:42:15 2022 +0100

    One function in Texinfo::Common to handle file name encoding
    
    * tp/Texinfo/Common.pm (encode_file_name),
    tp/Texinfo/Convert/Converter.pm (encoded_file_name),
    tp/Texinfo/Convert/DocBook.pm, tp/Texinfo/Convert/HTML.pm,
    tp/Texinfo/Convert/IXIN.pm, tp/Texinfo/Convert/Info.pm,
    tp/Texinfo/Convert/LaTeX.pm, tp/Texinfo/Convert/Utils.pm
    (expand_verbatiminclude), tp/Texinfo/ParserNonXS.pm: put the
    main function encode_file_name() doing file name encoding
    in Texinfo::Common and use encoded_file_name for Converters.
    Return the file name encoding if there is a need to decode
    the file name for error messages.
    
    * tp/Texinfo/ParserNonXS.pm (_save_line_directive): encode CPP
    line directive file name.
---
 ChangeLog                                          | 18 ++++++
 tp/TODO                                            | 11 ++++
 tp/Texinfo/Common.pm                               | 27 +++++++++
 tp/Texinfo/Convert/Converter.pm                    | 30 ++++------
 tp/Texinfo/Convert/DocBook.pm                      |  3 +-
 tp/Texinfo/Convert/HTML.pm                         |  4 +-
 tp/Texinfo/Convert/IXIN.pm                         |  3 +-
 tp/Texinfo/Convert/Info.pm                         |  3 +-
 tp/Texinfo/Convert/LaTeX.pm                        |  3 +-
 tp/Texinfo/Convert/Utils.pm                        | 38 ++++++++----
 tp/Texinfo/ParserNonXS.pm                          | 33 +++++------
 tp/t/input_files/cpp_lines.texi                    |  4 ++
 tp/t/results/include/cpp_lines.pl                  | 63 +++++++++++++++++++-
 tp/t/test_utils.pl                                 | 55 ++++++++++++-----
 tp/tests/formatting/list-of-tests                  |  5 +-
 "tp/tests/formatting/os\303\251.texi"              |  4 ++
 .../formatting/res_parser/cpp_lines/cpp_lines.1    |  0
 .../formatting/res_parser/cpp_lines/cpp_lines.2    |  3 +
 .../formatting/res_parser/cpp_lines/cpp_lines.html | 68 ++++++++++++++++++++++
 .../non_ascii_command_line/Chapteur.html           |  2 +
 .../os\303\251-texinfo.texi"                       |  4 ++
 .../non_ascii_command_line/os\303\251.2"           |  2 +
 tp/tests/test_scripts/formatting_cpp_lines.sh      | 19 ++++++
 23 files changed, 333 insertions(+), 69 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 3d46c554fc..68261ceb60 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,21 @@
+2022-02-24  Patrice Dumas  <pertusus@free.fr>
+
+       One function in Texinfo::Common to handle file name encoding
+
+       * tp/Texinfo/Common.pm (encode_file_name),
+       tp/Texinfo/Convert/Converter.pm (encoded_file_name),
+       tp/Texinfo/Convert/DocBook.pm, tp/Texinfo/Convert/HTML.pm,
+       tp/Texinfo/Convert/IXIN.pm, tp/Texinfo/Convert/Info.pm,
+       tp/Texinfo/Convert/LaTeX.pm, tp/Texinfo/Convert/Utils.pm
+       (expand_verbatiminclude), tp/Texinfo/ParserNonXS.pm: put the
+       main function encode_file_name() doing file name encoding
+       in Texinfo::Common and use encoded_file_name for Converters.
+       Return the file name encoding if there is a need to decode
+       the file name for error messages.
+
+       * tp/Texinfo/ParserNonXS.pm (_save_line_directive): encode CPP
+       line directive file name.
+
 2022-02-24  Gavin Smith  <gavinsmith0123@gmail.com>
 
        Include file name encoding for XS parser
diff --git a/tp/TODO b/tp/TODO
index 9143ff6a06..fbc3b5878a 100644
--- a/tp/TODO
+++ b/tp/TODO
@@ -19,6 +19,17 @@ Before next release
 
 for @example args, use *-user as class?
 
+
+byte encoding, check how used, check XS parser?
+l 3226 ParserNonXS.pm
+              unshift @{$self->{'input'}}, {
+                'name' => $file,
+
+bytes: (global_information)
+$self->{'info'}->{'input_file_name'}
+$self->{'info'}->{'input_directory'} 
+
+
 Bugs
 ====
 
diff --git a/tp/Texinfo/Common.pm b/tp/Texinfo/Common.pm
index 746e1e4a60..b04ae79c49 100644
--- a/tp/Texinfo/Common.pm
+++ b/tp/Texinfo/Common.pm
@@ -1505,6 +1505,33 @@ sub parse_node_manual($)
 
 # misc functions also interesting for converters
 
+# Reverse the decoding of the file name from the input encoding.  When
+# dealing with file names, we want Perl strings representing sequences of
+# bytes, not Unicode codepoints.
+#     This is necessary even if the name of the included file is purely
+# ASCII, as the name of the directory it is located within may contain
+# non-ASCII characters.
+#   Otherwise, the -e operator and similar may not work correctly.
+# TODO document and add the possibility to use configuration_information
+sub encode_file_name($$;$)
+{
+  my $configuration_information = shift;
+  my $file_name = shift;
+  my $input_encoding = shift;
+
+  my $encoding;
+
+  if ($input_encoding and ($input_encoding eq 'utf-8'
+                           or $input_encoding eq 'utf-8-strict')) {
+    utf8::encode($file_name);
+    $encoding = 'utf-8';
+  } else {
+    $file_name = Encode::encode($input_encoding, $file_name);
+    $encoding = $input_encoding;
+  }
+  return ($file_name, $encoding);
+}
+
 sub locate_include_file($$)
 {
   my $configuration_information = shift;
diff --git a/tp/Texinfo/Convert/Converter.pm b/tp/Texinfo/Convert/Converter.pm
index b70c0d1e56..4ca8a64835 100644
--- a/tp/Texinfo/Convert/Converter.pm
+++ b/tp/Texinfo/Convert/Converter.pm
@@ -1009,36 +1009,26 @@ sub present_bug_message($$;$)
   warn "You found a bug: $message\n\n".$additional_information;
 }
 
-# Reverse the decoding of the file name from the input encoding.  When
-# dealing with file names, we want Perl strings representing sequences of
-# bytes, not Unicode codepoints.
-#     This is necessary even if the name of the included file is purely
-# ASCII, as the name of the directory it is located within may contain
-# non-ASCII characters.
-#   Otherwise, the -e operator and similar may not work correctly.
-sub encode_file_name($$)
+# Reverse the decoding of the file name from the input encoding.
+# TODO document
+sub encoded_file_name($$)
 {
   my $self = shift;
   my $file_name = shift;
 
-  # FIXME use the locale instead?
-  my $info = $self->{'parser_info'};
-  if ($info) {
-    my $encoding = $info->{'input_perl_encoding'};
-    if ($encoding and ($encoding eq 'utf-8' or $encoding eq 'utf-8-strict')) {
-      utf8::encode($file_name);
-    } else {
-      $file_name = Encode::encode($encoding, $file_name);
-    }
-  }
-  return $file_name;
+  my $document_encoding;
+  $document_encoding = $self->{'parser_info'}->{'input_perl_encoding'}
+    if ($self->{'parser_info'}
+      and defined($self->{'parser_info'}->{'input_perl_encoding'}));
+  return Texinfo::Common::encode_file_name($self, $file_name, 
$document_encoding);
 }
 
 sub txt_image_text($$$)
 {
   my ($self, $element, $basefile) = @_;
 
-  my $text_file_name = $self->encode_file_name($basefile.'.txt');
+  my ($text_file_name, $file_name_encoding)
+    = $self->encoded_file_name($basefile.'.txt');
 
   my $txt_file = Texinfo::Common::locate_include_file($self, $text_file_name);
   if (!defined($txt_file)) {
diff --git a/tp/Texinfo/Convert/DocBook.pm b/tp/Texinfo/Convert/DocBook.pm
index e997d6f542..c240c25a02 100644
--- a/tp/Texinfo/Convert/DocBook.pm
+++ b/tp/Texinfo/Convert/DocBook.pm
@@ -1118,7 +1118,8 @@ sub _convert($$;$)
           }
           my @files;
           foreach my $extension (@docbook_image_extensions) {
-            my $file_name = $self->encode_file_name("$basefile.$extension");
+            my ($file_name, $file_name_encoding)
+               = $self->encoded_file_name("$basefile.$extension");
             if ($self->Texinfo::Common::locate_include_file($file_name)) {
               push @files, ["$basefile.$extension", uc($extension)];
             }
diff --git a/tp/Texinfo/Convert/HTML.pm b/tp/Texinfo/Convert/HTML.pm
index 484ef4e09d..374b41c4d8 100644
--- a/tp/Texinfo/Convert/HTML.pm
+++ b/tp/Texinfo/Convert/HTML.pm
@@ -271,7 +271,8 @@ sub html_image_file_location_name($$$$)
       unshift @extensions, ("$extension", ".$extension");
     }
     foreach my $extension (@extensions) {
-      my $file_name = $self->encode_file_name($image_basefile.$extension);
+      my ($file_name, $file_name_encoding)
+        = $self->encoded_file_name($image_basefile.$extension);
       my $located_image_path
            = $self->Texinfo::Common::locate_include_file($file_name);
       if (defined($located_image_path) and $located_image_path ne '') {
@@ -296,6 +297,7 @@ sub html_image_file_location_name($$$$)
       }
     }
   }
+  # TODO set and return $image_path_encoding?
   return ($image_file, $image_basefile, $image_extension, $image_path);
 }
 
diff --git a/tp/Texinfo/Convert/IXIN.pm b/tp/Texinfo/Convert/IXIN.pm
index aca2ca24e3..2ec7066016 100644
--- a/tp/Texinfo/Convert/IXIN.pm
+++ b/tp/Texinfo/Convert/IXIN.pm
@@ -839,7 +839,8 @@ sub output_ixin($$)
       }
       foreach my $extension (@extension, @image_files_extensions) {
         my $file_name_text = "$basefile.$extension";
-        my $file_name = $self->encode_file_name($file_name_text);
+        my ($file_name, $file_name_encoding)
+          = $self->encoded_file_name($file_name_text);
         my $file = $self->Texinfo::Common::locate_include_file($file_name);
         if (defined($file)) {
           my $filehandle = do { local *FH };
diff --git a/tp/Texinfo/Convert/Info.pm b/tp/Texinfo/Convert/Info.pm
index 712d2d3d8f..7d0be98af3 100644
--- a/tp/Texinfo/Convert/Info.pm
+++ b/tp/Texinfo/Convert/Info.pm
@@ -510,7 +510,8 @@ sub format_image($$)
     }
     my $image_file;
     foreach my $extension (@extensions) {
-      my $file_name = $self->encode_file_name($basefile.$extension);
+      my ($file_name, $file_name_encoding)
+        = $self->encoded_file_name($basefile.$extension);
       if ($self->Texinfo::Common::locate_include_file($file_name)) {
         # use the basename and not the file found.  It is agreed that it is
         # better, since in any case the files are moved.
diff --git a/tp/Texinfo/Convert/LaTeX.pm b/tp/Texinfo/Convert/LaTeX.pm
index 666fc488bd..0e1bf16fb3 100644
--- a/tp/Texinfo/Convert/LaTeX.pm
+++ b/tp/Texinfo/Convert/LaTeX.pm
@@ -2308,7 +2308,8 @@ sub _convert($$)
 
         my $image_file;
         foreach my $extension (@LaTeX_image_extensions) {
-          my $file_name = $self->encode_file_name("$basefile.$extension");
+          my ($file_name, $file_name_encoding)
+             = $self->encoded_file_name("$basefile.$extension");
           my $located_file =
             $self->Texinfo::Common::locate_include_file($file_name);
           if (defined($located_file)) {
diff --git a/tp/Texinfo/Convert/Utils.pm b/tp/Texinfo/Convert/Utils.pm
index 317ae979c5..218a986f7e 100644
--- a/tp/Texinfo/Convert/Utils.pm
+++ b/tp/Texinfo/Convert/Utils.pm
@@ -196,28 +196,38 @@ sub expand_verbatiminclude($$$)
   my $configuration_information = shift;
   my $current = shift;
 
-  return unless ($current->{'extra'} and 
defined($current->{'extra'}->{'text_arg'}));
+  my $input_encoding;
+
+  return unless ($current->{'extra'}
+                 and defined($current->{'extra'}->{'text_arg'}));
   my $file_name_text = $current->{'extra'}->{'text_arg'};
-  # FIXME $file_name_text should be encoded to the file system
-  # encoding here to be passed to locate_include_file
+  $input_encoding = $current->{'extra'}->{'input_perl_encoding'}
+        if (defined($current->{'extra'}->{'input_perl_encoding'}));
+
+  my ($file_name, $file_name_encoding)
+    = Texinfo::Common::encode_file_name($configuration_information,
+                                                    $file_name_text,
+                                                    $input_encoding);
+
   my $file = Texinfo::Common::locate_include_file($configuration_information,
-                                                  $file_name_text);
+                                                  $file_name);
 
   my $verbatiminclude;
 
   if (defined($file)) {
     if (!open(VERBINCLUDE, $file)) {
       if ($registrar) {
-        # FIXME $file should be decoded to perl internal codepoints here
+        my $decoded_file = $file;
+        # need to decode to the internal perl codepoints for error message
+        $decoded_file = Encode::decode($file_name_encoding, $file)
+           if (defined($file_name_encoding));
         $registrar->line_error($configuration_information,
-                               sprintf(__("could not read %s: %s"), $file, $!),
-                               $current->{'line_nr'});
+                      sprintf(__("could not read %s: %s"), $decoded_file, $!),
+                      $current->{'line_nr'});
       }
     } else {
-      if (defined $current->{'extra'}->{'input_perl_encoding'}) {
-        binmode(VERBINCLUDE, ":encoding("
-                             . $current->{'extra'}->{'input_perl_encoding'}
-                             . ")");
+      if (defined($input_encoding)) {
+        binmode(VERBINCLUDE, ":encoding(" . $input_encoding . ")");
       }
       $verbatiminclude = { 'cmdname' => 'verbatim',
                            'parent' => $current->{'parent'},
@@ -229,10 +239,14 @@ sub expand_verbatiminclude($$$)
       }
       if (!close (VERBINCLUDE)) {
         if ($registrar) {
+          my $decoded_file = $file;
+          # need to decode to the internal perl codepoints for error message
+          $decoded_file = Encode::decode($file_name_encoding, $file)
+             if (defined($file_name_encoding));
           $registrar->document_warn(
                  $configuration_information, sprintf(__(
                       "error on closing \@verbatiminclude file %s: %s"),
-                             $file, $!));
+                          $decoded_file, $!));
         }
       }
     }
diff --git a/tp/Texinfo/ParserNonXS.pm b/tp/Texinfo/ParserNonXS.pm
index 6845082616..ac48d2c1a0 100644
--- a/tp/Texinfo/ParserNonXS.pm
+++ b/tp/Texinfo/ParserNonXS.pm
@@ -1989,7 +1989,13 @@ sub _save_line_directive
   my $input = $self->{'input'}->[0];
   return if !$input;
   $input->{'line_nr'} = $line_nr if $line_nr;
-  $input->{'name'} = $file_name if $file_name;
+  # need to convert to bytes for file name
+  if (defined($file_name)) {
+    my ($encoded_file_name, $file_name_encoding)
+       = Texinfo::Common::encode_file_name($self, $file_name,
+                 $self->{'info'}->{'input_perl_encoding'});
+    $input->{'name'} = $encoded_file_name;
+  }
 }
 
 # returns next text fragment, be it pending from a macro expansion or 
@@ -3206,21 +3212,11 @@ sub _end_line($$$)
         } elsif ($superfluous_arg) {
           # An error message is issued below.
         } elsif ($command eq 'include') {
-          my $file_name = $text;
-          # When dealing with file names, we want Perl strings representing 
sequences
+          # We want Perl strings representing sequences
           # of bytes, not codepoints in the internal perl encoding. 
-          #     This is necessary even if the name of the included file is 
purely
-          # ASCII, as the name of the directory it is located within may 
contain
-          # non-ASCII characters.
-          # Otherwise, the -e operator and similar may not work correctly.
-          if (defined $self->{'info'}->{'input_perl_encoding'}) {
-            my $encoding = $self->{'info'}->{'input_perl_encoding'};
-            if ($encoding and ($encoding eq 'utf-8' or $encoding eq 
'utf-8-strict')) {
-              utf8::encode($file_name);
-            } else {
-              $file_name = Encode::encode($encoding, $file_name);
-            }
-          }
+          my ($file_name, $file_name_encoding)
+             = Texinfo::Common::encode_file_name($self, $text,
+                                   $self->{'info'}->{'input_perl_encoding'});
           my $file = Texinfo::Common::locate_include_file($self, $file_name);
           if (defined($file)) {
             my $filehandle = do { local *FH };
@@ -3233,13 +3229,16 @@ sub _end_line($$$)
                 'line_nr' => 0,
                 'pending' => [],
                 'fh' => $filehandle };
+              # TODO note that it is bytes.  No reason to have it used much
+              # Make sure to document that it is bytes.
+              # TODO add $file_name_encoding information?
               $current->{'extra'}->{'file'} = $file;
               # we set the type to replaced to tell converters not to
               # expand the @-command
               $current->{'type'} = 'replaced';
             } else {
-              # FIXME $text does not show the include directory.  However 
using $file
-              # would require to decode it to perl internal codepoints
+              # FIXME $text does not show the include directory.  Using $file
+              # would require to decode it to perl internal codepoints with 
$file_name_encoding
               $self->_command_error($current, $line_nr,
                               __("\@%s: could not open %s: %s"),
                               $command, $text, $!);
diff --git a/tp/t/input_files/cpp_lines.texi b/tp/t/input_files/cpp_lines.texi
index d3e56b6f4e..06dbde59f4 100644
--- a/tp/t/input_files/cpp_lines.texi
+++ b/tp/t/input_files/cpp_lines.texi
@@ -47,4 +47,8 @@ line before
 
 @email{after verb}
 
+# line 5 "accentêd"
+
+@documentlanguage làng
+
 @bye
diff --git a/tp/t/results/include/cpp_lines.pl 
b/tp/t/results/include/cpp_lines.pl
index bbf2e73de5..3b942f5488 100644
--- a/tp/t/results/include/cpp_lines.pl
+++ b/tp/t/results/include/cpp_lines.pl
@@ -675,6 +675,47 @@ $result_trees{'cpp_lines'} = {
         {
           'parent' => {},
           'text' => '
+',
+          'type' => 'empty_line'
+        },
+        {
+          'parent' => {},
+          'text' => '
+',
+          'type' => 'empty_line'
+        },
+        {
+          'args' => [
+            {
+              'contents' => [
+                {
+                  'parent' => {},
+                  'text' => "l\x{e0}ng"
+                }
+              ],
+              'extra' => {
+                'spaces_after_argument' => '
+'
+              },
+              'parent' => {},
+              'type' => 'line_arg'
+            }
+          ],
+          'cmdname' => 'documentlanguage',
+          'extra' => {
+            'spaces_before_argument' => ' ',
+            'text_arg' => 'lÃ ng'
+          },
+          'line_nr' => {
+            'file_name' => 'accentÃªd',
+            'line_nr' => 7,
+            'macro' => ''
+          },
+          'parent' => {}
+        },
+        {
+          'parent' => {},
+          'text' => '
 ',
           'type' => 'empty_line'
         }
@@ -817,6 +858,11 @@ 
$result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[32]{'contents'}[0]{'parent
 
$result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[32]{'contents'}[1]{'parent'}
 = $result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[32];
 $result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[32]{'parent'} = 
$result_trees{'cpp_lines'}{'contents'}[1];
 $result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[33]{'parent'} = 
$result_trees{'cpp_lines'}{'contents'}[1];
+$result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[34]{'parent'} = 
$result_trees{'cpp_lines'}{'contents'}[1];
+$result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[35]{'args'}[0]{'contents'}[0]{'parent'}
 = $result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[35]{'args'}[0];
+$result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[35]{'args'}[0]{'parent'} 
= $result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[35];
+$result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[35]{'parent'} = 
$result_trees{'cpp_lines'}{'contents'}[1];
+$result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[36]{'parent'} = 
$result_trees{'cpp_lines'}{'contents'}[1];
 $result_trees{'cpp_lines'}{'contents'}[1]{'extra'}{'node_content'}[0] = 
$result_trees{'cpp_lines'}{'contents'}[1]{'args'}[0]{'contents'}[0];
 
$result_trees{'cpp_lines'}{'contents'}[1]{'extra'}{'nodes_manuals'}[0]{'node_content'}[0]
 = $result_trees{'cpp_lines'}{'contents'}[1]{'args'}[0]{'contents'}[0];
 $result_trees{'cpp_lines'}{'contents'}[1]{'parent'} = 
$result_trees{'cpp_lines'};
@@ -873,6 +919,9 @@ line before
 
 @email{after verb}
 
+
+@documentlanguage làng
+
 @bye
 ';
 
@@ -915,6 +964,8 @@ after inc.
 
 after verb
 
+
+
 ';
 
 $result_nodes{'cpp_lines'} = {
@@ -933,7 +984,17 @@ $result_menus{'cpp_lines'} = {
   'structure' => {}
 };
 
-$result_errors{'cpp_lines'} = [];
+$result_errors{'cpp_lines'} = [
+  {
+    'error_line' => "warning: l\x{e0}ng is not a valid language code
+",
+    'file_name' => 'accentÃªd',
+    'line_nr' => 7,
+    'macro' => '',
+    'text' => "l\x{e0}ng is not a valid language code",
+    'type' => 'warning'
+  }
+];
 
 
 $result_floats{'cpp_lines'} = {};
diff --git a/tp/t/test_utils.pl b/tp/t/test_utils.pl
index edf3f0ef15..d6d5e9ec77 100644
--- a/tp/t/test_utils.pl
+++ b/tp/t/test_utils.pl
@@ -27,6 +27,8 @@ require Texinfo::ModulePath;
 Texinfo::ModulePath::init(undef, undef, 'updirs' => 2);
 
 # For consistent test results, use the C locale
+# Note that this should prevent displaying some for non ascii characters
+# in error messages in particular
 $ENV{LC_ALL} = 'C';
 $ENV{LANGUAGE} = 'en';
 
@@ -34,17 +36,10 @@ $ENV{LANGUAGE} = 'en';
 
 use Test::More;
 
-use Texinfo::Parser;
-use Texinfo::Convert::Text;
-use Texinfo::Convert::Texinfo;
-use Texinfo::Structuring;
-use Texinfo::Convert::Plaintext;
-use Texinfo::Convert::Info;
-use Texinfo::Convert::HTML;
-use Texinfo::Convert::TexinfoXML;
-use Texinfo::Convert::DocBook;
-use Texinfo::Convert::LaTeX;
-use Texinfo::Config;
+# to determine the locale encoding to output the Texinfo to Texinfo
+# result when regenerating
+use I18N::Langinfo qw(langinfo CODESET);
+use Encode;
 use File::Basename;
 use File::Copy;
 use File::Compare; # standard since 5.004
@@ -57,6 +52,19 @@ use Storable qw(dclone); # standard in 5.007003
 #use Struct::Compare;
 use Getopt::Long qw(GetOptions);
 
+use Texinfo::Common;
+use Texinfo::Convert::Texinfo;
+use Texinfo::Config;
+use Texinfo::Parser;
+use Texinfo::Convert::Text;
+use Texinfo::Structuring;
+use Texinfo::Convert::Plaintext;
+use Texinfo::Convert::Info;
+use Texinfo::Convert::LaTeX;
+use Texinfo::Convert::HTML;
+use Texinfo::Convert::TexinfoXML;
+use Texinfo::Convert::DocBook;
+
 # FIXME Is it really useful?
 use vars qw(%result_texis %result_texts %result_trees %result_errors 
    %result_indices %result_sectioning %result_nodes %result_menus
@@ -105,6 +113,9 @@ foreach my $dir ('t', 't/results', $output_files_dir) {
   }
 }
 
+my $locale_encoding = langinfo(CODESET);
+$locale_encoding = undef if ($locale_encoding eq '');
+
 ok(1);
 
 our %formats = (
@@ -895,6 +906,8 @@ sub test($$)
       $result = $parser->parse_texi_piece($test_text);
     }
     if (defined($test_input_file_name)) {
+      # FIXME should we need to encode or do we assume that
+      # $test_input_file_name is already bytes?
       $parser->{'info'}->{'input_file_name'} = $test_input_file_name;
     }
   } else {
@@ -1144,8 +1157,16 @@ sub test($$)
     print OUT 'use utf8;'."\n\n";
 
     #print STDERR "Generate: ".Data::Dumper->Dump([$result], ['$res']);
+    # NOTE $test_name is in general used for directories and
+    # file names, and therefore should be be bytes.  Here it is used as a
+    # text string, if non ascii, it should be decoded to internal
+    # perl codepoints as OUT is encoded as utf8.  Alternatively it
+    # could be encoded to be used as file name, but it probably is not the
+    # best solution.
     my $out_result;
     {
+      # NOTE rare extra keys could be bytes.  They could be incorrectly
+      # encoded here.  Let's wait for actual cases before fixing.
       local $Data::Dumper::Sortkeys = \&filter_tree_keys;
       $out_result = Data::Dumper->Dump([$split_result], 
['$result_trees{\''.$test_name.'\'}']);
     }
@@ -1172,6 +1193,8 @@ sub test($$)
     }
     {
       local $Data::Dumper::Sortkeys = 1;
+      # NOTE file names are bytes, therefore ther could be a need to
+      # decode them
       $out_result .= Data::Dumper->Dump([$errors], 
['$result_errors{\''.$test_name.'\'}']) ."\n\n";
       $out_result .= Data::Dumper->Dump([$indices], 
['$result_indices{\''.$test_name.'\'}']) ."\n\n"
          if ($indices);
@@ -1207,8 +1230,13 @@ sub test($$)
     print OUT $out_result;
     close (OUT);
     
-    print STDERR "--> 
$test_name\n".Texinfo::Convert::Texinfo::convert_to_texinfo($result)."\n" 
-            if ($self->{'generate'});
+    if ($self->{'generate'}) {
+      my $texinfo_text = 
Texinfo::Convert::Texinfo::convert_to_texinfo($result);
+      if (defined($locale_encoding)) {
+        $texinfo_text = Encode::encode($locale_encoding, $texinfo_text);
+      }
+      print STDERR "--> $test_name\n". $texinfo_text ."\n";
+    }
   }
   if (!$self->{'generate'}) {
     %result_converted = ();
@@ -1377,6 +1405,7 @@ sub output_texi_file($)
   mkdir $dir or die 
      unless (-d $dir);
   my $file = "${dir}$test_name.texi";
+  # We have no idea about encodings, better use bytes everywhere
   open (OUTFILE, ">$file") or die ("Open $file: $!\n");
 
   my $first_line = "\\input texinfo \@c -*-texinfo-*-";
diff --git a/tp/tests/formatting/list-of-tests 
b/tp/tests/formatting/list-of-tests
index f3751f1e68..ae16d92253 100644
--- a/tp/tests/formatting/list-of-tests
+++ b/tp/tests/formatting/list-of-tests
@@ -10,6 +10,10 @@ simplest_test_css simplest.texi --css-include file.css
 # check that command line overrides document
 documentlanguage_cmdline documentlanguage.texi --document-language=fr
 
+# already tested in t/*.t, but here want to have a result with
+# accented characters in error messages
+cpp_lines ../../t/input_files/cpp_lines.texi
+
 # some command-line arguments when incorrect cause texi2any to die.
 # easily tested by calling directly ./texi2any.pl and checking visually:
 # ./texi2any.pl --footnote-style=bâd
@@ -18,5 +22,4 @@ documentlanguage_cmdline documentlanguage.texi 
--document-language=fr
 non_ascii_command_line osé.texi --html --split=Mekanïk 
--document-language=Destruktïw -c 'Kommandöh vâl' -D TÛT -D 'vùr ké' -U ôndef 
-c 'FORMAT_MENU mînù' --macro-expand=@OUT_DIR@osé-texinfo.texi 
--internal-links=@OUT_DIR@intérnal.txt --css-include çss.css --css-include 
cêss.css --css-ref=rëf --css-ref=öref
 
 # test for the copying of image with non ascii characters for epub
-# to be added when it does not fail anymore
 #non_ascii_test_epub osé.texi --init epub3.pm -c 'EPUB_CREATE_CONTAINER 0'
diff --git "a/tp/tests/formatting/os\303\251.texi" 
"b/tp/tests/formatting/os\303\251.texi"
index db36c2c7f1..10141774bc 100644
--- "a/tp/tests/formatting/os\303\251.texi"
+++ "b/tp/tests/formatting/os\303\251.texi"
@@ -21,3 +21,7 @@ value vùr @value{vùr}.
 @image{dîrectory/imàge,,,âlt,.êxt}
 
 @include not_existïng.téxi
+
+@verbatiminclude included_akçentêd.texi
+
+@verbatiminclude vi_not_existïng.téxi
diff --git a/tp/tests/formatting/res_parser/cpp_lines/cpp_lines.1 
b/tp/tests/formatting/res_parser/cpp_lines/cpp_lines.1
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/tp/tests/formatting/res_parser/cpp_lines/cpp_lines.2 
b/tp/tests/formatting/res_parser/cpp_lines/cpp_lines.2
new file mode 100644
index 0000000000..e493afa3e6
--- /dev/null
+++ b/tp/tests/formatting/res_parser/cpp_lines/cpp_lines.2
@@ -0,0 +1,3 @@
+g_f:74: @include: could not find file_with_cpp_lines.texi
+accentêd:7: warning: làng is not a valid language code
+cpp_lines.texi: warning: must specify a title with a title command or @top
diff --git a/tp/tests/formatting/res_parser/cpp_lines/cpp_lines.html 
b/tp/tests/formatting/res_parser/cpp_lines/cpp_lines.html
new file mode 100644
index 0000000000..d07733d1ad
--- /dev/null
+++ b/tp/tests/formatting/res_parser/cpp_lines/cpp_lines.html
@@ -0,0 +1,68 @@
+<!DOCTYPE html>
+<html>
+<!-- Created by texinfo, http://www.gnu.org/software/texinfo/ -->
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+<title>Untitled Document</title>
+
+<meta name="description" content="Untitled Document">
+<meta name="keywords" content="Untitled Document">
+<meta name="resource-type" content="document">
+<meta name="distribution" content="global">
+<meta name="Generator" content="texi2any">
+<meta name="viewport" content="width=device-width,initial-scale=1">
+
+<style type="text/css">
+<!--
+span.program-in-footer {font-size: smaller}
+-->
+</style>
+
+
+</head>
+
+<body lang="en">
+
+
+<p><a class="email" href="mailto:before top">before top</a>.
+</p>
+<a class="node" id="Top"></a>
+<p># 10 25 209
+# 1 2
+</p>
+<pre class="verbatim">
+  #line 5 &quot;f&quot;
+</pre>
+
+<p><a class="email" href="mailto:after lacro def">after lacro def</a>
+</p>
+<p># line 7 &quot;k&quot;
+</p>
+<p><a class="email" href="mailto:after macro call">after macro call</a>.
+</p>
+
+<p><a class="email" href="mailto:after macrotwo def">after macrotwo def</a>
+</p>
+<p>line before
+# line 666 &quot;x&quot;
+</p>
+<p><a class="email" href="mailto:after macrotwo call">after macrotwo call</a>. 
+</p>
+<p><a class="email" href="mailto:after inc">after inc</a>. 
+</p>
+<p><tt class="verb">
+#line 5 &quot;in verb&quot;
+</tt>
+</p>
+<p><a class="email" href="mailto:after verb">after verb</a>
+</p>
+
+
+<hr>
+<p>
+  <span class="program-in-footer">This document was generated on <em 
class="emph">a sunny day</em> using <a class="uref" 
href="http://www.gnu.org/software/texinfo/";><em 
class="emph">texi2any</em></a>.</span>
+</p>
+
+
+</body>
+</html>
diff --git 
a/tp/tests/formatting/res_parser/non_ascii_command_line/Chapteur.html 
b/tp/tests/formatting/res_parser/non_ascii_command_line/Chapteur.html
index 98c49e654a..71f800ef1a 100644
--- a/tp/tests/formatting/res_parser/non_ascii_command_line/Chapteur.html
+++ b/tp/tests/formatting/res_parser/non_ascii_command_line/Chapteur.html
@@ -68,6 +68,8 @@ ul.mark-néni {list-style-type: "vàça"}
 
 <img class="image" src="dîrectory/imàge.êxt" alt="âlt">
 
+
+
 </div>
 <hr>
 <p>
diff --git 
"a/tp/tests/formatting/res_parser/non_ascii_command_line/os\303\251-texinfo.texi"
 
"b/tp/tests/formatting/res_parser/non_ascii_command_line/os\303\251-texinfo.texi"
index 4ea951c406..e4ad1dc5aa 100644
--- 
"a/tp/tests/formatting/res_parser/non_ascii_command_line/os\303\251-texinfo.texi"
+++ 
"b/tp/tests/formatting/res_parser/non_ascii_command_line/os\303\251-texinfo.texi"
@@ -19,3 +19,7 @@ In included téxt.
 @image{dîrectory/imàge,,,âlt,.êxt}
 
 @include not_existïng.téxi
+
+@verbatiminclude included_akçentêd.texi
+
+@verbatiminclude vi_not_existïng.téxi
diff --git 
"a/tp/tests/formatting/res_parser/non_ascii_command_line/os\303\251.2" 
"b/tp/tests/formatting/res_parser/non_ascii_command_line/os\303\251.2"
index 4dbb7790d5..054aa9681a 100644
--- "a/tp/tests/formatting/res_parser/non_ascii_command_line/os\303\251.2"
+++ "b/tp/tests/formatting/res_parser/non_ascii_command_line/os\303\251.2"
@@ -3,3 +3,5 @@ texi2any: warning: Destruktïw is not a valid language code
 texi2any: warning: unknown variable from command line: Kommandöh
 osé.texi:23: @include: could not find not_existïng.téxi
 osé.texi:21: warning: @image file `dîrectory/imàge' (for HTML) not found, 
using `dîrectory/imàge.êxt'
+osé.texi:25: @verbatiminclude: could not find included_akÃ§entÃªd.texi
+osé.texi:27: @verbatiminclude: could not find vi_not_existÃ¯ng.tÃ©xi
diff --git a/tp/tests/test_scripts/formatting_cpp_lines.sh 
b/tp/tests/test_scripts/formatting_cpp_lines.sh
new file mode 100755
index 0000000000..c20e239e41
--- /dev/null
+++ b/tp/tests/test_scripts/formatting_cpp_lines.sh
@@ -0,0 +1,19 @@
+#! /bin/sh
+# This file generated by maintain/regenerate_cmd_tests.sh
+
+if test z"$srcdir" = "z"; then
+  srcdir=.
+fi
+
+one_test_logs_dir=test_log
+
+
+dir=formatting
+name='cpp_lines'
+mkdir -p $dir
+
+"$srcdir"/run_parser_all.sh -dir $dir $name
+exit_status=$?
+cat $dir/$one_test_logs_dir/$name.log
+exit $exit_status
+
[Prev in Thread]
Current Thread
[Next in Thread]
branch master updated: One function in Texinfo::Common to handle file name encoding, Patrice Dumas <=
Prev by Date: branch master updated: Include file name encoding for XS parser
Next by Date: branch master updated: UTF-8 flag on strings for XS parser
Previous by thread: branch master updated: Include file name encoding for XS parser
Next by thread: branch master updated: UTF-8 flag on strings for XS parser
Index(es):
- Date
- Thread