[lmi-commits] [lmi] master b2ec7ac 6/7: Avoid parsing the same HTML more

lmi-commits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi-commits] [lmi] master b2ec7ac 6/7: Avoid parsing the same HTML more

From:	Greg Chicares
Subject:	[lmi-commits] [lmi] master b2ec7ac 6/7: Avoid parsing the same HTML more than once in standard pages
Date:	Tue, 18 Sep 2018 13:29:30 -0400 (EDT)

branch: master
commit b2ec7acde109641d06fd0c7ee774a32bb9219fc4
Author: Vadim Zeitlin <address@hidden>
Commit: Gregory W. Chicares <address@hidden>

    Avoid parsing the same HTML more than once in standard pages
    
    Add pdf_writer_wx::parse_html() which can be used to parse HTML into the
    internal representation and change paginate_html() and output_html()
    used for multi-page output to take this internal representation instead
    of the text itself.
    
    This avoids parsing the same HTML twice, which is relatively
    time-consuming: for the total illustration generation time of ~1000ms,
    parsing HTML took ~200ms before this change and ~120ms after it,
    resulting in 8% speed up.
    
    Moreover, this will make it possible to modify the HTML text before
    rendering it in the upcoming commit, which would have been difficult and
    awkward to do if we had to continue passing HTML text to output_html().
---
 ledger_pdf_generator_wx.cpp | 50 ++++++++++++++++++++++++++++++++++-----------
 pdf_writer_wx.cpp           | 50 ++++++++++++++++++++++++++++++---------------
 pdf_writer_wx.hpp           | 16 +++++++++++++--
 3 files changed, 86 insertions(+), 30 deletions(-)

diff --git a/ledger_pdf_generator_wx.cpp b/ledger_pdf_generator_wx.cpp
index 22203e2..53ee2aa 100644
--- a/ledger_pdf_generator_wx.cpp
+++ b/ledger_pdf_generator_wx.cpp
@@ -1229,14 +1229,23 @@ class standard_page : public numbered_page
     {
     }
 
-    void render
+    void pre_render
         (Ledger            const& ledger
         ,pdf_writer_wx          & writer
         ) override
     {
-        // Page HTML must have been already set by get_extra_pages_needed().
-        LMI_ASSERT(!page_html_.empty());
+        // Before calling the base class version, parse the HTML to initialize
+        // page_body_cell_.
+        parse_page_html(writer);
 
+        numbered_page::pre_render(ledger, writer);
+    }
+
+    void render
+        (Ledger const& ledger
+        ,pdf_writer_wx& writer
+        ) override
+    {
         int last_page_break = 0;
         for(auto const& page_break : page_break_positions_)
             {
@@ -1251,7 +1260,7 @@ class standard_page : public numbered_page
                 (writer.get_horz_margin()
                 ,writer.get_vert_margin()
                 ,writer.get_page_width()
-                ,page_html_
+                ,*page_body_cell_
                 ,last_page_break
                 ,page_break
                 );
@@ -1261,19 +1270,36 @@ class standard_page : public numbered_page
     }
 
   private:
+    // Parse HTML page contents once and store the result in page_body_cell_
+    // member variable.
+    //
+    // Throws if parsing fails.
+    void parse_page_html(pdf_writer_wx& writer)
+    {
+        // We should be called once and only once.
+        LMI_ASSERT(!page_body_cell_);
+
+        page_body_cell_ = writer.parse_html
+                    (interpolate_html_.expand_template(page_template_name_)
+                    );
+
+        if(!page_body_cell_)
+            {
+            throw std::runtime_error
+                ("failed to parse template '" + 
std::string{page_template_name_} + "'"
+                );
+            }
+    }
+
     int get_extra_pages_needed
         (Ledger            const& // ledger
         ,pdf_writer_wx          & writer
         ) override
     {
-        page_html_ = wxString::FromUTF8
-            (interpolate_html_.expand_template(page_template_name_).as_html()
-            );
-
         page_break_positions_ = writer.paginate_html
             (writer.get_page_width()
             ,get_footer_top() - writer.get_vert_margin()
-            ,page_html_
+            ,*page_body_cell_
             );
 
         // The cast is safe, we're never going to have more than INT_MAX
@@ -1282,9 +1308,9 @@ class standard_page : public numbered_page
         return static_cast<int>(page_break_positions_.size()) - 1;
     }
 
-    char const* const page_template_name_;
-    wxString          page_html_;
-    std::vector<int>  page_break_positions_;
+    char const* const                    page_template_name_;
+    std::unique_ptr<wxHtmlContainerCell> page_body_cell_;
+    std::vector<int>                     page_break_positions_;
 };
 
 // Helper classes used to show the numeric summary table. The approach used
diff --git a/pdf_writer_wx.cpp b/pdf_writer_wx.cpp
index 3589da7..33957c1 100644
--- a/pdf_writer_wx.cpp
+++ b/pdf_writer_wx.cpp
@@ -197,7 +197,7 @@ void pdf_writer_wx::output_image
 std::vector<int> pdf_writer_wx::paginate_html
     (int                          page_width
     ,int                          page_height
-    ,wxString const&              html_str
+    ,wxHtmlContainerCell&         cell
     )
 {
     wxHtmlDCRenderer renderer;
@@ -205,11 +205,7 @@ std::vector<int> pdf_writer_wx::paginate_html
     renderer.SetSize(page_width, page_height);
     DoSetFonts(renderer, html_font_sizes_);
 
-    // Note that we parse HTML twice: here and then again in output_html(),
-    // which is inefficient. Unfortunately currently there is no way to share
-    // neither the parser, nor the result of calling Parse() between
-    // wxHtmlDCRenderer and output_html().
-    renderer.SetHtmlText(html_str);
+    renderer.SetHtmlCell(cell);
 
     std::vector<int> page_breaks;
     for(int pos = 0;;)
@@ -237,7 +233,7 @@ int pdf_writer_wx::output_html
     (int                          x
     ,int                          y
     ,int                          width
-    ,wxString const&              html_str
+    ,wxHtmlContainerCell&         cell
     ,int                          from
     ,int                          to
     ,oenum_render_or_only_measure output_mode
@@ -249,12 +245,7 @@ int pdf_writer_wx::output_html
     // font which is changed by rendering the HTML contents.
     wxDCFontChanger preserve_font(pdf_dc_, wxFont());
 
-    std::unique_ptr<wxHtmlContainerCell> const cell
-        (static_cast<wxHtmlContainerCell*>(html_parser_.Parse(html_str))
-        );
-    LMI_ASSERT(cell);
-
-    cell->Layout(width);
+    cell.Layout(width);
     switch(output_mode)
         {
         case oe_render:
@@ -269,7 +260,7 @@ int pdf_writer_wx::output_html
 
             // "Scroll" the cell upwards by "from" by subtracting it from the
             // vertical position.
-            cell->Draw(pdf_dc_, x, y - from, y, y + to - from, rendering_info);
+            cell.Draw(pdf_dc_, x, y - from, y, y + to - from, rendering_info);
             }
             break;
         case oe_only_measure:
@@ -277,7 +268,7 @@ int pdf_writer_wx::output_html
             break;
         }
 
-    return cell->GetHeight();
+    return cell.GetHeight();
 }
 
 /// Convenient overload when rendering, or measuring, HTML text known to fit on
@@ -294,11 +285,25 @@ int pdf_writer_wx::output_html
     ,oenum_render_or_only_measure output_mode
     )
 {
+    auto const cell{parse_html(std::move(html))};
+    LMI_ASSERT(cell);
+
+    return output_html(x, y, width, *cell, output_mode);
+}
+
+int pdf_writer_wx::output_html
+    (int                          x
+    ,int                          y
+    ,int                          width
+    ,wxHtmlContainerCell&         cell
+    ,oenum_render_or_only_measure output_mode
+    )
+{
     int const height = output_html
         (x
         ,y
         ,width
-        ,wxString::FromUTF8(std::move(html).as_html())
+        ,cell
         ,0
         ,get_total_height()
         ,output_mode
@@ -322,6 +327,19 @@ int pdf_writer_wx::output_html
     return height;
 }
 
+std::unique_ptr<wxHtmlContainerCell> pdf_writer_wx::parse_html(html::text&& 
html)
+{
+    return std::unique_ptr<wxHtmlContainerCell>
+        (static_cast<wxHtmlContainerCell*>
+            (html_parser_.Parse
+                (wxString::FromUTF8
+                    (std::move(html).as_html()
+                    )
+                )
+            )
+        );
+}
+
 int pdf_writer_wx::get_horz_margin() const
 {
     return horz_margin;
diff --git a/pdf_writer_wx.hpp b/pdf_writer_wx.hpp
index 613d6b2..3ad97b8 100644
--- a/pdf_writer_wx.hpp
+++ b/pdf_writer_wx.hpp
@@ -64,14 +64,14 @@ class pdf_writer_wx
     std::vector<int> paginate_html
         (int                          page_width
         ,int                          page_height
-        ,wxString const&              html_str
+        ,wxHtmlContainerCell&         cell
         );
 
     int output_html
         (int                          x
         ,int                          y
         ,int                          width
-        ,wxString const&              html_str
+        ,wxHtmlContainerCell&         cell
         ,int                          from
         ,int                          to
         ,oenum_render_or_only_measure output_mode = oe_render
@@ -81,6 +81,14 @@ class pdf_writer_wx
         (int                          x
         ,int                          y
         ,int                          width
+        ,wxHtmlContainerCell&         cell
+        ,oenum_render_or_only_measure output_mode = oe_render
+        );
+
+    int output_html
+        (int                          x
+        ,int                          y
+        ,int                          width
         ,html::text&&                 html
         ,oenum_render_or_only_measure output_mode = oe_render
         );
@@ -98,6 +106,10 @@ class pdf_writer_wx
 
     wxDC& dc();
 
+    // Helper methods for working with HTML contents.
+
+    std::unique_ptr<wxHtmlContainerCell> parse_html(html::text&& html);
+
     // Page metrics: the page width and height are the size of the page region
     // reserved for the normal contents, excluding horizontal and vertical
     // margins. Total width and height include the margins.

[Prev in Thread]

Current Thread

[Next in Thread]

[lmi-commits] [lmi] master updated (be7b9ff -> 1c398a8), Greg Chicares, 2018/09/18
- [lmi-commits] [lmi] master f4cf784 3/7: Make standard_supplemental_report ctor helper static, Greg Chicares, 2018/09/18
- [lmi-commits] [lmi] master e3d4471 1/7: Upgrade wx, Greg Chicares, 2018/09/18
- [lmi-commits] [lmi] master d143cbe 5/7: Use C++17 if-with-initializer to properly localize a variable, Greg Chicares, 2018/09/18
- [lmi-commits] [lmi] master 71dc467 4/7: Pass the illustration and interpolator to the page ctor, Greg Chicares, 2018/09/18
- [lmi-commits] [lmi] master a3ee7ce 2/7: Instal wxPdfDocument using Git instead of snapshot download, Greg Chicares, 2018/09/18
- [lmi-commits] [lmi] master b2ec7ac 6/7: Avoid parsing the same HTML more than once in standard pages, Greg Chicares <=
- [lmi-commits] [lmi] master 1c398a8 7/7: Implement support for headers for normal PDF illustration pages, Greg Chicares, 2018/09/18

Prev by Date: [lmi-commits] [lmi] master a3ee7ce 2/7: Instal wxPdfDocument using Git instead of snapshot download
Next by Date: [lmi-commits] [lmi] master 1c398a8 7/7: Implement support for headers for normal PDF illustration pages
Previous by thread: [lmi-commits] [lmi] master a3ee7ce 2/7: Instal wxPdfDocument using Git instead of snapshot download
Next by thread: [lmi-commits] [lmi] master 1c398a8 7/7: Implement support for headers for normal PDF illustration pages
Index(es):
- Date
- Thread