lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Control flow in page_with_tabular_report::render()


From: Vadim Zeitlin
Subject: Re: [lmi] Control flow in page_with_tabular_report::render()
Date: Tue, 13 Feb 2018 14:48:41 +0100

On Mon, 12 Feb 2018 22:45:15 +0000 Greg Chicares <address@hidden> wrote:

GC> I prefer to use the subset of C that looks like, say, ansi C, fortran,
GC> or algol, for problems that can be easily solved in that subset, because
GC> that produces the most comprehensible solution. I'm already apprehensive
GC> when you speak of providing an iterator, because I don't see that as
GC> necessary.

 An iterator is anything that allows to write

        for(auto const& page: pages)

and I strongly prefer writing this to dealing with indices explicitly.
Whether we need an explicit iterator or can just use std::vector<> is
another question.

GC> Suppose we're printing the third page, and we know it has four groups
GC> of five rows each, with spaces between them: nineteen data rows that
GC> fill twenty-two lines. It took only a few minutes to write this:
GC> 
GC> #include <iostream>
GC> #include <string>
GC> #include <vector>
GC> 
GC> void write_line(std::string const& s) {std::cout << s << std::endl;}
GC> 
GC> void test()
GC> {
GC>   std::vector<std::string> data =
GC>     
{"A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T"};
GC> 
GC>   // Information from pagination class:
GC>   int n_groups       =  4; // counted by 'h'
GC>   int rows_per_group =  5; // counted by 'i'
GC>   int n_data_rows    = 19; // counted by 'j'
GC>   int n_output_lines = 22; // counted by 'k'
GC> 
GC>   int j = 0;
GC>   int k = 0;
GC> 
GC>   for(int h = 0; h < n_groups; ++h)
GC>     {
GC>     for(int i = 0; i < rows_per_group; ++i)
GC>       {
GC>       if(j < n_data_rows) {write_line(data[j]); ++j; ++k; }
GC>       }
GC>     if  (j < n_data_rows) {write_line(" ");          ++k; }
GC>     }
GC> 
GC>   std::cout << std::endl;
GC>   std::cout << j << " should equal " << n_data_rows    << std::endl;
GC>   std::cout << k << " should equal " << n_output_lines << std::endl;
GC> }
GC> 
GC> It's just a handful of lines of code, all self-contained.

 Sorry, but it's far from obvious that this code is correct. You were
careful to add 2 extra checks comparing j with n_data_rows, but it would
have been easy to forget or misplace them. You also skipped rendering the
fixed part, which, as I wrote in the previous message, doesn't fit
naturally into this loop. And you also don't track the "y" position at all
here. Addressing the last 2 points will make the code slightly longer and
even less obviously correct.

 Compare this with something like

        class tabular_report : paginator
        {
                int get_fixed_page_part_height() override
                {
                        return render_or_measure_fixed_page_part(...);
                }

                void on_data_row(int y, int n) override
                {
                        table.output_row(y, values[n]);
                }

                void on_blank_row(int y) override
                {
                        // nothing to do here
                }
        };

which is painfully obviously correct by construction. Of course, there is
the base class code itself, but it's quite simple as well and is
unit-tested independently.


GC> No fancy design patterns needed. No iterators. No lambdas.

 This is not completely fair. The "template method" pattern is not fancy at
all and many people use it without ever even realizing they do it. And
lambdas was an alternative to using it, so it's either one or the other
(and, of course, what could be more like ANSI C than using callbacks...).


GC> >  However any such approach would have 2 fundamental (AFAICS) problems:
GC> > first, the code using it would be more complicated than currently because
GC> > we need to render the fixed part of the page in order to compute its 
height
GC> > and feed it to this algorithm so that it could have the number of rows per
GC> > page, yet we also need to do it during iteration. This just can't be not
GC> > cumbersome (a bit like this sentence).
GC> 
GC> int P = get_page_height();
GC> int H = get_header_height();
GC> int F = get_footer_height();
GC> int available_output_rows = P - H - F;
GC> 
GC> Where's the difficulty? I'm assuming that the header and footer sections
GC> are the same (except for the page number) for all pages across which a
GC> data-table is spread, so that H and F above are constant; and that the
GC> functions that write the header and footer can be called with a
GC> "measure_only" argument to calculate their sizes before rendering them.

 This is true, but you need to call them twice now, even when there is a
single page: first before starting the loop and then once it has started.
Or you need to write a counterintuitive do/while loop.

GC> >  Second, this completely separates the handling of the row and its 
vertical
GC> > position, as one of them is (implicitly, now) in the general part, while
GC> > the other one is still in PDF code. I find this problematic because they
GC> > must be in sync, and while it should be simple to notice if this is not 
the
GC> > case, it's still not automatic and I don't like that.
GC> 
GC> The invariant that they remain in sync could be asserted at every step.
GC> Once the code is debugged, how could such assertions ever possible fail?

 Very easily if anything changes. For example, if someone decides that
having the whole row of space between groups is too wasteful and you want
to use smaller gaps. It will be pretty simple for things to get out of sync
when changing this.

 Of course, the "once the code is debugged" argument still applies, but
then why do we need to change anything at all? The current code is already
sufficiently debugged and has asserts that will detect if something is
wrong in it. My answer to this is that I don't trust it to stay so in the
future, even after it's changed and this is why I prefer to rely on "the
invariant is preserved because it is automatically ensured" rather than
"because testing doesn't show any asserts".

GC> This idea intertwines the pieces that I'm endeavoring to separate, namely:
GC>   (1) calculate page-layout parameters like h,i,j,k above
GC>   (2) use the result of (1) to render the data

 No, I can't agree with this. It doesn't intertwine them at all, it
separates them completely by keeping all the parameters completely inside
the base class, together with the actual algorithm, and leaving actual
rendering to the derived class. This is an even stronger separation than
what you're trying to do.

GC> My thesis is that:
GC>   - they can be separated

 Yes, this is true for my proposed approach as well.

GC>   - each of the two separate pieces is small, simple, and cohesive

 Yes, so is this. Except I think your code is not as simple as mine, which
is literally trivial and doesn't involve any nested loops over indices in
the not-easily-unit-tested part.

GC>   - the interface between (1) and (2) is just a struct containing four ints
GC> and the clearest way to write that is in a small C subset of C++.

 We don't care about this struct at all. It's just an implementation
detail. C doesn't care about encapsulation, but it's wrong. In C++ we can
do better and hide this irrelevant detail. This makes things simpler today
and will make them much easier to change in the future if needed. Again,
just consider how would you need to change your structure if the height of
rows is allowed to vary? What if we wanted to avoid having a page with only
a single row on it for aesthetic reasons (and would prefer to move the last
group of the previous page to the last one)? This is trivial to do if your
pagination logic *and* data is encapsulated in some class and not nearly as
simple if you need to update both your parameters and the code using them.


GC> I'd say lambdas, function<>, and "design patterns" are unnecessary
GC> impediments to comprehension here:

 Sorry, but I completely disagree. There is nothing complicated about this
design pattern (I didn't use the word to be fancy, but just to make it
clear what I was talking about). It has been used since 50 years in C under
the name of "callback function pointer", the only difference is that it's
much more ergonomic to use in C++ than in C. Again, lambdas and function<>
(which are exclusive anyhow) are completely irrelevant if we use virtual
functions, they're another way of doing the same thing, so it's either one
or the other, but not both.

GC> whatever can be written well and clearly in the C subset of C++ is
GC> simplest and clearest when written that way, and introducing
GC> unnecessarily fancy techniques can only make it less clear.

 And I disagree even completier with this. "Brief for" loops are a C++
technique which is very fancy (it uses iterators internally!), but is much
more clear and less error-prone than the equivalent C loops using indices.
C++ smart pointers (they use templates!) are very fancy, yet incomparably
safer than simple malloc/free. And C++ strings (they use move semantics!)
are, of course, so much less powerful than C char[], you can have real
trouble writing even a simple buffer overflow with them, which was, and
continues to be, easily doable in standard C since 1970s.

 Sorry, but the rule that anything that can be written in C subset of C++
should be written in the former is just obviously wrong. In fact, I'd say
exactly the converse: C subset of C++ should never be used, unless there is
some good reason to go down to it (e.g. performance or compatibility).

 You may decide that using C-inspired approach in this particular case is
better, for some particular reason (which I don't see yet), but please
let's not elevate this in a general principle because it's clearly
untenable.

 Regards,
VZ


reply via email to

[Prev in Thread] Current Thread [Next in Thread]