texinfo-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

branch master updated: * tp/init/latex2html.pm (l2h_process, l2h_to_html


From: Patrice Dumas
Subject: branch master updated: * tp/init/latex2html.pm (l2h_process, l2h_to_html) (l2h_change_image_file_names, l2h_finish, l2h_from_cache): do not use a volume, consistently with the HTML converter. Do not force encoding variables to UTF-8, but instead encode separately the directory as in the HTML Converter and the files related to latex2html in UTF-8, such that the files output by latex2html only contains UTF-8. Also encode separately the file names in the latex2html command line and the remaining of the command lin [...]
Date: Sun, 06 Mar 2022 17:27:40 -0500

This is an automated email from the git hooks/post-receive script.

pertusus pushed a commit to branch master
in repository texinfo.

The following commit(s) were added to refs/heads/master by this push:
     new 422e2ed3b3 * tp/init/latex2html.pm (l2h_process, l2h_to_html) 
(l2h_change_image_file_names, l2h_finish, l2h_from_cache): do not use a volume, 
consistently with the HTML converter. Do not force encoding variables to UTF-8, 
but instead encode separately the directory as in the HTML Converter and the 
files related to latex2html in UTF-8, such that the files output by latex2html 
only contains UTF-8.  Also encode separately the file names in the latex2html 
command line and the remaini [...]
422e2ed3b3 is described below

commit 422e2ed3b3678f616555b5edf2f8d78920bc9f9a
Author: Patrice Dumas <pertusus@free.fr>
AuthorDate: Sun Mar 6 23:27:28 2022 +0100

    * tp/init/latex2html.pm (l2h_process, l2h_to_html)
    (l2h_change_image_file_names, l2h_finish, l2h_from_cache): do
    not use a volume, consistently with the HTML converter.
    Do not force encoding variables to UTF-8, but instead encode
    separately the directory as in the HTML Converter and the files
    related to latex2html in UTF-8, such that the files output by
    latex2html only contains UTF-8.  Also encode separately
    the file names in the latex2html command line and the remaining
    of the command line.  Based on Gavin report.
---
 ChangeLog             |  12 ++++
 tp/TODO               |  20 ++++---
 tp/init/latex2html.pm | 159 ++++++++++++++++++++++++++------------------------
 3 files changed, 107 insertions(+), 84 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 6dae578d24..81c3dbc1ca 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,15 @@
+2022-03-06  Patrice Dumas  <pertusus@free.fr>
+
+       * tp/init/latex2html.pm (l2h_process, l2h_to_html)
+       (l2h_change_image_file_names, l2h_finish, l2h_from_cache): do
+       not use a volume, consistently with the HTML converter.
+       Do not force encoding variables to UTF-8, but instead encode
+       separately the directory as in the HTML Converter and the files
+       related to latex2html in UTF-8, such that the files output by
+       latex2html only contains UTF-8.  Also encode separately
+       the file names in the latex2html command line and the remaining
+       of the command line.  Based on Gavin report.
+
 2022-03-06  Gavin Smith  <gavinsmith0123@gmail.com>
 
        * tp/TODO: Move winning latex2html test cases.
diff --git a/tp/TODO b/tp/TODO
index 5ab6cb770f..c20badebab 100644
--- a/tp/TODO
+++ b/tp/TODO
@@ -416,18 +416,24 @@ résultat/encodé.txt file name encoded in latin1.
 char_latin1_latin1_in_refs_tree.txt content encoded in latin1
 
 utf8 encoded manual name and latex2html in latin1 locale
-./texi2any.pl --html --init init/latex2html.pm 
tests/tex_html/tex_encod*_utf8.texi
-
-
-Tests with incorrect results, though not bugs
-*********************************************
-
+./texi2any.pl --verbose -c 'DATA_INPUT_ENCODING_NAME=utf-8' --html --init 
init/latex2html.pm -c 'L2H_CLEAN 0' tests/tex_html/tex_encod*_utf8.texi
+DATA_INPUT_ENCODING_NAME=utf-8 is required in order to have the
+input file name correctly decoded as the document_name
 
 latin1 encoded manual name and latex2html in latin1 locale
 cp tests/tex_html/tex_encode_latin1.texi tex_encodé_latin1.texi
 ./texi2any.pl -c 'L2H_CLEAN 0' --html --init init/latex2html.pm 
tex_encodé_latin1.texi
 
- - code assumes UTF-8 filenames are being used so the file is not found
+
+Tests with incorrect results, though not bugs
+*********************************************
+
+utf8 encoded manual name and latex2html in latin1 locale
+./texi2any.pl --html --init init/latex2html.pm -c 'L2H_CLEAN 0' 
tests/tex_html/tex_encod*_utf8.texi
+No error, but the file names are like
+ tex_encodé_utf8_html/tex_encodÃ'$'\203''©_utf8_l2h.html
+That's in particular because the document_name is incorrect because it is 
decoded
+as if it was latin1.
 
 utf8 encoded manual name and tex4ht in latin1 locale
 ./texi2any.pl --html --init init/tex4ht.pm tests/tex_html/tex_encod*_utf8.texi
diff --git a/tp/init/latex2html.pm b/tp/init/latex2html.pm
index 50a104aba4..09c1a7253f 100644
--- a/tp/init/latex2html.pm
+++ b/tp/init/latex2html.pm
@@ -66,10 +66,6 @@ texinfo_set_from_init_file('L2H_FILE', undef);
 # latex2html are cleaned (they all have the prefix <document name>_l2h_).
 texinfo_set_from_init_file('L2H_CLEAN', 1);
 
-texinfo_set_from_init_file('DATA_INPUT_ENCODING_NAME', 'UTF-8');
-texinfo_set_from_init_file('LOCALE_OUTPUT_FILE_NAME_ENCODING', 'UTF-8');
-texinfo_set_from_init_file('LOCALE_OUTPUT_ENCODING_NAME', 'UTF-8');
-
 
 # latex2html conversions consist of 2 stages:
 # 1) l2h_process
@@ -84,8 +80,8 @@ texinfo_set_from_init_file('LOCALE_OUTPUT_ENCODING_NAME', 
'UTF-8');
 
 # init l2h defaults for files and names
 
-my ($l2h_name, $l2h_latex_path_name, $l2h_cache_path_name, $l2h_html_path_name,
-    $l2h_prefix);
+my ($l2h_name, $l2h_latex_path_name, $l2h_latex_path_string, 
$l2h_cache_path_name,
+    $l2h_html_path_name, $l2h_html_path_string, $l2h_prefix, 
$l2h_prefix_string);
 
 # holds the status of latex2html operations. If 0 it means that there was
 # an error
@@ -93,9 +89,8 @@ my $status = 0;
 
 my $debug;
 my $verbose;
-my $docu_rdir;
-my $docu_volume;
-my $docu_directories;
+my $destination_directory;
+my $destination_directory_string;
 my $docu_name;
 
 my %commands_counters;
@@ -180,30 +175,43 @@ sub l2h_process($$)
   return if (defined($self->get_conf('OUTFILE'))
         and $Texinfo::Common::null_device_file{$self->get_conf('OUTFILE')});
 
-
   $docu_name = $self->get_info('document_name');
-  $docu_rdir = $self->get_info('destination_directory');
-  $docu_rdir = '' if (!defined($docu_rdir));
-  my $no_file;
-  ($docu_volume, $docu_directories, $no_file)
-      = File::Spec->splitpath($docu_rdir, 1);
-  $l2h_name =  "${docu_name}_l2h";
-  $l2h_latex_path_name = File::Spec->catpath($docu_volume, $docu_directories,
-                                             "${l2h_name}.tex");
-  $l2h_cache_path_name = File::Spec->catpath($docu_volume, $docu_directories,
-                                        "${docu_name}-l2h_cache.pm");
   # destination dir -- generated images are put there, should be the same
   # as dir of enclosing html document --
-  $l2h_html_path_name = File::Spec->catpath($docu_volume, $docu_directories,
-                                       "${l2h_name}.html");
+  $destination_directory = $self->get_info('destination_directory');
+  $destination_directory = '' if (!defined($destination_directory));
+  my $dir = $destination_directory;
+  $dir = File::Spec->curdir() if ($dir eq '');
+  my $dir_encoding;
+  ($destination_directory_string, $dir_encoding)
+    = $self->encoded_output_file_name($dir);
+
+  $l2h_name = "${docu_name}_l2h";
+  my $l2h_latex_file_name =  "${l2h_name}.tex";
+  $l2h_latex_path_name = File::Spec->catfile($destination_directory,
+                                             "${l2h_name}.tex");
+  # we use utf-8 encoding irrespective of what is used in texi2any
+  # because latex2html use the file name in the resulting file and
+  # it needs to be utf-8
+  my $encoded_l2h_latex_file_name = encode('UTF-8', $l2h_latex_file_name);
+  $l2h_latex_path_string = File::Spec->catfile($destination_directory_string,
+                                               $encoded_l2h_latex_file_name);
+  $l2h_cache_path_name = File::Spec->catfile($destination_directory,
+                                        "${docu_name}-l2h_cache.pm");
+  my $l2h_html_file_name = "${l2h_name}.html";
+  $l2h_html_path_name = File::Spec->catfile($destination_directory,
+                                            "${l2h_name}.html");
+  my $encoded_l2h_html_file_name = encode('UTF-8', $l2h_html_file_name);
+  $l2h_html_path_string = File::Spec->catfile($destination_directory_string,
+                                              $encoded_l2h_html_file_name);
+
   $l2h_prefix = "${l2h_name}_";
+  $l2h_prefix_string = encode('UTF-8', $l2h_prefix);
   $debug = $self->get_conf('DEBUG');
   $verbose = $self->get_conf('VERBOSE');
 
   unless ($self->get_conf('L2H_SKIP')) {
-    my ($encoded_l2h_latex_path_name, $l2h_latex_path_encoding)
-      = $self->encoded_output_file_name($l2h_latex_path_name);
-    unless (open(L2H_LATEX, ">$encoded_l2h_latex_path_name")) {
+    unless (open(L2H_LATEX, ">$l2h_latex_path_string")) {
       $self->document_error($self, sprintf(__(
               "l2h: could not open latex file %s for writing: %s"),
                                     $l2h_latex_path_name, $!));
@@ -213,7 +221,7 @@ sub l2h_process($$)
     # according to the .log file latex2html is expecting utf-8 if no 
information
     # is provided
     binmode(L2H_LATEX, ':utf8');
-    warn "# l2h: use $encoded_l2h_latex_path_name as latex file\n" if 
($verbose);
+    warn "# l2h: use $l2h_latex_path_string as latex file\n" if ($verbose);
     print L2H_LATEX $l2h_latex_preamble;
   }
   # open the database that holds cached text
@@ -333,7 +341,7 @@ sub l2h_finish_to_latex($)
 # Use latex2html to generate corresponding html code and images
 #
 # to_html():
-#   Call latex2html on $l2h_latex_path_name
+#   Call latex2html on $l2h_latex_path_string
 #   Put images (prefixed with $l2h_name."_") and html file(s) in $l2h_html_dir
 #   Return 1, on success
 #          0, otherwise
@@ -341,7 +349,7 @@ sub l2h_finish_to_latex($)
 sub l2h_to_html($)
 {
   my $self = shift;
-  my ($call, $dotbug);
+  my $dotbug;
   # when there are no tex constructs to convert (happens in case everything
   # comes from the cache), there is no latex2html run
   if ($self->get_conf('L2H_SKIP') or ($latex_converted_count == 0)) {
@@ -369,38 +377,50 @@ sub l2h_to_html($)
     $self->document_error($self, __("l2h: command not set"));
     return 0;
   }
-  $call = $latex2html_command;
+  my $call_start = $latex2html_command;
   # use init file, if specified
   my $init_file = $self->get_conf('L2H_FILE');
   # FIXME not clear whether encoded_input_file_name or encoded_output_file_name
   # should be used here
   if (defined($init_file) and $init_file ne '') {
+    # FIXME likely incorrect, should use the same encoding as
+    # the encoding used to encode call
     my ($encoded_init_file, $init_path_encoding)
       = $self->encoded_input_file_name($init_file);
-    $call .= " -init_file " . $init_file
+    $call_start .= " -init_file " . $init_file
       if -f $encoded_init_file and -r $encoded_init_file;
   }
   # set output dir
-  $call .=  (($docu_rdir ne '') ? " -dir $docu_rdir" : " -no_subdir");
+  my $encoded_destination_dir_option = ' -no_subdir';
+  my $destination_dir_option = $encoded_destination_dir_option;
+  if ($destination_directory ne '') {
+    $encoded_destination_dir_option = " -dir ".$destination_directory_string;
+    $destination_dir_option = " -dir ".$destination_directory;
+  }
   # use l2h_tmp, if specified
-  $call .= " -tmp ".$self->get_conf('L2H_TMP')
+  $call_start .= " -tmp ".$self->get_conf('L2H_TMP')
     if (defined($self->get_conf('L2H_TMP'))
         and $self->get_conf('L2H_TMP') ne '');
   # use a given html version if specified
-  $call .= " -html_version ".$self->get_conf('L2H_HTML_VERSION')
+  $call_start .= " -html_version ".$self->get_conf('L2H_HTML_VERSION')
     if (defined($self->get_conf('L2H_HTML_VERSION'))
         and $self->get_conf('L2H_HTML_VERSION') ne '');
   # options we want to be sure of
-  $call .= " -address 0 -info 0 -split 0 -no_navigation -no_auto_link";
-  $call .= " -prefix $l2h_prefix $l2h_latex_path_name";
+  $call_start .= " -address 0 -info 0 -split 0 -no_navigation -no_auto_link";
 
+  # FIXME use utf-8 here?
   my $encoding = $self->get_conf('LOCALE_OUTPUT_ENCODING_NAME');
-  my $encoded_call;
+  my $encoded_call_start;
   if (defined($encoding)) {
-    $encoded_call = encode($encoding, $call);
+    $encoded_call_start = encode($encoding, $call_start);
   } else {
-    $encoded_call = $call;
+    $encoded_call_start = $call_start;
   }
+  # already encoded
+  my $encoded_call = $encoded_call_start . $encoded_destination_dir_option
+       ." -prefix $l2h_prefix_string $l2h_latex_path_string";
+  my $call = $call_start . $destination_dir_option
+       ." -prefix $l2h_prefix $l2h_latex_path_name";
   warn "# l2h: executing '$encoded_call'\n" if ($verbose);
   if (system($encoded_call)) {
     $self->document_error($self,
@@ -416,7 +436,7 @@ sub l2h_to_html($)
 ##########################
 # Third stage: Extract generated contents from latex2html run
 # Initialize with: init_from_html
-#   open $l2h_html_path_name for reading
+#   open $l2h_html_path_string for reading
 #   reads in contents into array indexed by numbers
 #   return 1,  on success -- 0, otherwise
 # Finish with: finish
@@ -456,34 +476,32 @@ sub l2h_change_image_file_names($$)
       }
       while (1) {
         my $image_file_name = "${docu_name}_${image_count}$ext";
-        my $image_file_path_name = File::Spec->catpath($docu_volume,
-                                  $docu_directories, $image_file_name);
-        my ($encoded_image_file_path_name, $image_path_encoding)
-          = $self->encoded_output_file_name($image_file_path_name);
-        unless (-e $encoded_image_file_path_name) {
+        my $encoded_image_file_name = encode('UTF-8', $image_file_name);
+        my $image_file_path = 
File::Spec->catfile($destination_directory_string,
+                                                  $encoded_image_file_name);
+        unless (-e $image_file_path) {
           last;
         }
         $image_count++;
       }
+      my $src_file = File::Spec->catfile($destination_directory, $src);
       my $encoded_src = Encode::encode('UTF-8', $src);
-      my ($encoded_dir, $encoded_dir_encoding)
-        = $self->encoded_output_file_name($docu_directories);
       my $encoded_file_src
-        = File::Spec->catpath($docu_volume, $encoded_dir, $encoded_src);
+        = File::Spec->catfile($destination_directory_string, $encoded_src);
 
       $dest = "${docu_name}_${image_count}$ext";
       my $file_dest
-        = File::Spec->catpath($docu_volume, $docu_directories, $dest);
-      my ($encoded_file_dest, $dest_file_encoding)
-        = $self->encoded_output_file_name($file_dest);
-
+        = File::Spec->catfile($destination_directory, $dest);
+      my $encoded_dest = Encode::encode('UTF-8', $dest);
+      my $encoded_file_dest = 
File::Spec->catfile($destination_directory_string, 
+                                                  $encoded_dest);
       if ($debug) {
         copy($encoded_file_src, $encoded_file_dest);
       } else {
         if (!rename($encoded_file_src, $encoded_file_dest)) {
           $self->document_warn($self,
                  sprintf(__("l2h: rename %s as %s failed: %s"),
-                                 $encoded_file_src, $encoded_file_dest, $!));
+                                 $src_file, $file_dest, $!));
         }
       }
       $l2h_img{$src} = $dest;
@@ -503,9 +521,7 @@ sub l2h_init_from_html($)
     return 1;
   }
 
-  my ($encoded_l2h_html_path_name, $l2h_html_path_encoding)
-    = $self->encoded_output_file_name($l2h_html_path_name);
-  if (! open(L2H_HTML, "<$encoded_l2h_html_path_name")) {
+  if (! open(L2H_HTML, "<$l2h_html_path_string")) {
     $self->document_warn($self,
                 sprintf(__("l2h: could not open %s: %s"),
                                  $l2h_html_path_name, $!));
@@ -513,7 +529,7 @@ sub l2h_init_from_html($)
   }
   # the file is UTF-8
   binmode(L2H_HTML, ':utf8');
-  warn "# l2h: use $encoded_l2h_html_path_name as html file\n" if ($verbose);
+  warn "# l2h: use $l2h_html_path_string as html file\n" if ($verbose);
 
   my $html_converted_count = 0;   # number of html resulting texts
                                   # retrieved in the file
@@ -646,25 +662,13 @@ sub l2h_finish($)
     warn "# l2h: removing temporary files generated by l2h extension\n"
      if ($verbose);
     my $quoted_l2h_name = quotemeta($l2h_name);
-    my $dir = $docu_rdir;
-    $dir = File::Spec->curdir() if ($dir eq '');
-    my ($encoded_dir, $dir_encoding) = $self->encoded_output_file_name($dir);
-    my ($encoded_docu_directories, $docu_directories_encoding)
-      = $self->encoded_output_file_name($docu_directories);
-    my ($encoded_docu_volume, $docu_volume_encoding)
-      = $self->encoded_output_file_name($docu_volume);
-    if (opendir (DIR, $encoded_dir)) {
+    if (opendir (DIR, $destination_directory_string)) {
       foreach my $file (readdir(DIR)) {
-        # FIXME there is a mix of files created by texi2any and files
-        # created by latex2html.  The encoding of files created by
-        # texi2any and by latex2html could be different.  We could imagine
-        # that for latex2html it would be DATA_INPUT_ENCODING_NAME, without
-        # certainty.  We use the encoding used to encode our files
-        my $file_name = decode($dir_encoding, $file);
+        # we should have made sure that all the files are encoded in utf-8
+        my $file_name = decode('UTF-8', $file);
         if ($file_name =~ /^$quoted_l2h_name/) {
           # FIXME error condition not checked
-          unlink File::Spec->catpath($encoded_docu_volume,
-                                     $encoded_docu_directories, $file);
+          unlink File::Spec->catfile($destination_directory_string, $file);
         }
       }
     }
@@ -741,11 +745,12 @@ sub l2h_from_cache($$)
   if (defined($cached)) {
     while ($cached =~ m/SRC="(.*?)"/g) {
       my $cached_image_file_name = $1;
-      my $cached_image_path_name = File::Spec->catpath($docu_volume,
-                                 $docu_directories, $cached_image_file_name);
-      my ($encoded_cached_image_path_name, $cached_image_path_encoding)
-        = $self->encoded_output_file_name($cached_image_path_name);
-      unless (-e $encoded_cached_image_path_name) {
+      my $encoded_cached_image_file_name
+            = encode('UTF-8', $cached_image_file_name);
+      my $cached_image_path_string
+         = File::Spec->catfile($destination_directory_string,
+                               $encoded_cached_image_file_name);
+      unless (-e $cached_image_path_string) {
         return undef;
       }
     }



reply via email to

[Prev in Thread] Current Thread [Next in Thread]