Re: [lmi] Reference member = shared

lmi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Reference member = shared_ptr

From:	Greg Chicares
Subject:	Re: [lmi] Reference member = shared_ptr
Date:	Fri, 5 Feb 2021 17:01:06 +0000
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0

On 2/4/21 4:49 PM, Vadim Zeitlin wrote:
> On Tue, 2 Feb 2021 22:15:10 +0000 Greg Chicares <gchicares@sbcglobal.net> 
> wrote:
> 
> GC> On 2/2/21 3:28 PM, Vadim Zeitlin wrote:
> 
> [...discussing cache implementation...]
> 
> GC> That's an intriguing idea, and Vaclav had said:
> GC> 
> GC> https://lists.nongnu.org/archive/html/lmi/2012-06/msg00007.html
> GC> | I ended up using pointers in the end, because DBDictionary is uncopyable
> GC> | -- and [...] cannot be easily made copyable.
> GC> 
> GC> The present design, with shared_ptr, seems to be the most reasonable
> GC> way to obtain the current behavior--if class X HAS-A cached class F,
> GC> and instances X0a, X0b, ...X0z are created with instance F0, but
> GC> then the file underlying F is changed resulting in instance F1,
> GC> then subsequently-created instance X1a, X1b... use F1, but earlier
> GC> instances like X0k keep using F0.
> 
>  It all boils down to whether this is acceptable (or even desirable)
> behaviour or not. It seems to me that it could be rather surprising to keep
> using old data, not present anywhere on the disk any more, and it could
> also be surprising, or worse, to use inconsistent data in X0a and X1a.

Here's the real-world scenario:
 - you have in input file loaded--'.ill' or '.cns', e.g.
 - you have "run" that input, producing an MDI-child calculation summary
 - you open an editor for the input file's product--e.g., '.database'
 - you modify the product and save it
 - you rerun the same input, producing a new calculation summary
Then you can toggle back and forth between the MDI children to
see the effect of your product-database change--comparing X0a and X1a
using the terminology above.

You certainly do want X0a and X1a to use different data F0 and F1,
but to use those different data at different times. X0a is an outdated
artifact--it's just the historical result of running X when F0 was
current. You wouldn't want to regenerate it with old data F0, because
that wouldn't change it, so maybe it is okay if F0 is deleted after
X0a has used it.

> GC> If we make F a single mutable
> GC> object, and replace F0 with F1, then instance X0k would use F1,
> GC> which could have astonishing results. For example, maybe F was
> GC> changed in a way that would prevent X0k from being created.
> 
>  If this really can happen,

Certainly--just change this from "no" to "yes"
  {DB_Nonillustrated
  ,DB_Topic_Miscellanea
  ,"Nonillustrated",
  "Forbid all illustrations: 0=no, 1=yes",}
and now any illustration you had created cannot be recreated.

> then I think the only reasonable solution is to
> recreate all X objects whenever F changes. This is, of course, much more
> involved, but I just don't see how could it make sense for X0k to continue
> to be used when the current F state is incompatible with its existence.

Given that
  F0 might represent 'foo.database'
  X0k might be an instance of class AccountValue
  L0k is an instance of class Ledger: L0k = X0k.ledger_from_av();
when F0 is overwritten, X0k has probably already been destroyed,
while L0k continues to exist; but L HAS-NO F, so everything's okay.

IOW, we instantiate class AccountValue (which, through its base class,
HAS-A product_database), and then "run" it and destroy it, retaining
only the shared_ptr<Ledger> that it produced--which is what the
calculation summary depicts.

So this discussion shows that all the X01k etc. discussion above is kind
of irrelevant, though I had to go through it to discover the irrelevance.

Still, there's an evil scenario that AFAIK no one has been silly enough
to stumble into:
 - start running a large census by the "year-by-year" method, which
   runs all cell's calculations in parallel--with enough cells, that
   can take an arbitrarily long time
 - while that's running, foolishly change the relevant product database
Something ungood must happen. With a cache that invalidates old contents,
it's sure to crash. With the present caching implementation, I suppose
any AccountValue objects already instantiated continue to use the data
they were constructed with, while any new ones would get new data;
that's weird, but it's less bad than crashing.

In that scenario, lmi is effectively running as if in a multithreaded
fashion. The ideal solution might be to use MT techniques such as locks
to prevent changing the database when that's dangerous. Or we could just
continue using shared_ptr.

> GC> There are many tradeoffs, and the design we have isn't clearly
> GC> worse than any other. It may be unwise to spend time changing it.
> 
>  Just to be perfectly clear: I'm not speaking about trade-offs but about
> the fundamental goals here. If the current behaviour is what we need, then
> using shared_ptr<const T> is indeed a pretty standard implementation of it.

We don't exactly *need* it, but it does work for the "real-world scenario"
above, and I suspect it prevents a crash in the "evil scenario".

> There are different and IMHO slightly better approaches and personally I
> prefer to use wrapper objects with value-like semantics using reference
> counting and copy-on-write internally, which is advantageous from both the
> code clarity (in particular, such objects are by definition never null) and
> performance point of view (MT-safety of shared_ptr<> doesn't come for free)
> standpoints, but this is indeed just a trade-off between the quality and
> the simplicity of implementation that is probably not important enough to
> worry about.

Well...code clarity is important, and I prefer objects that can never be
null, so I don't want to dismiss this hastily. Are the "wrapper objects"
that you speak of necessarily written by us, as opposed to provided by a
library? I've often thought it would be desirable to have a never-null
shared_ref<T>, but I just tried it (branch odd/shared_ref) and it doesn't
look useful without the ability to overload operator.() .

>  However it is definitely important to know whether the current behaviour
> is indeed what you want. I just can't convince myself that it is, which is
> why I keep bringing it up. Of course, convincing myself shouldn't be your
> goal, it's enough if you're sure of this yourself, and if you are, then I
> won't mention it again.

It's not indispensable. But we have it now, and it fills a need, though
there are certainly other ways to meet that need.

> GC> please tell me whether you think new branch
> GC> odd/unique_vs_shared should be merged into master.
> 
>  One important comment on this branch is that its premise that "forward
> declarations no longer suffice" [when unique_ptr is used] is wrong.

I think that's what I needed to hear.

> It is
> perfectly possible to declare unique_ptr<> with incomplete classes, you
> just need to take care to avoid _using_ it before including the complete
> class declaration. And "using" here means both using it in your own code
> and in the compiler-generated special functions. So you could have kept the
> forward declarations unchanged and just would need to explicitly define
> BasicValues move ctor and assignment operator and dtor as "= default" in
> basicvalues.cpp file, where the full declarations of everything are
> available.

That sounds like it would be easy enough.

>  Whether this should be done or not is a different question, I'm
> sympathetic to the "these objects should probably be held directly" part,
> so perhaps not.

I want to keep headers small (by using forward declarations), and
I want the objects to be held directly--both at the same time.
These objects are const anyway, so perhaps I can hold them by
const reference...as long as their lifetimes don't need to be
automatically extended past the end of the ctor. But this would
work if the objects are shared_ptr's in a never-purged cache.

>  So, to summarize, I don't think odd/unique_vs_shared branch should be
> merged in its current state as it seems to be in some kind of undecided
> middle position between 2 solutions:
> 
> (a) Use unique_ptr<> with incomplete classes to reduce physical
>     dependencies.
> (b) Use objects directly to reduce run-time overhead and make the code
>     simpler.
> 
> I don't know whether (a) or (b) are better, but both seem better than the
> current change to me.

I want both: (a) + (b). Couldn't I get that with const& members
whose references never become invalid because they refer to
shared_ptr's that are never purged or reset?

> GC>   account_value.hpp:    std::shared_ptr<Ledger         > ledger_;
> GC> 
> GC> Here, class AccountValue does its work and then vanishes (its dtor
> GC> executes), leaving behind only the Ledger it produces, which can be
> GC> used downstream because the shared_ptr endows it with eternal life.
> GC> Isn't that an appropriate use?
> 
>  I'd say definitely not. The ownership of this Ledger is never shared with
> anybody by this class (if it happens, it's done outside of it),

Yes, I think so: see below.

> so it
> should use unique_ptr<> and return it from its ledger_from_av() function.
> Notice that the caller could easily put it into shared_ptr<> if it really
> needed to share its ownership -- but getting rid of shared_ptr<> once
> you've started using it isn't possible any more, so returning unique_ptr<>
> is almost always a better idea than returning a shared_ptr<>.

Interesting: I always wondered about that. Here's what I think happens:

// ledgervalues.hpp
class IllusVal ...
    std::shared_ptr<Ledger const> ledger_;

// ledgervalues.cpp
void IllusVal::run(Input const& input)
{ [...]
    ledger_ = av.ledger_from_av();

...so that's the place to try unique_ptr instead.

>  IOW, eternal life is not always a blessing, it could also be a cancer.
> Well defined lifetime is more desirable. At least from software management
> perspective.

Then we wouldn't need to reboot msw every week or so.

And, at least potentially, liblmi.so could be used by a long-running
server process (e.g., as a server providing calculations to a client
application), in which case it would be impolite to fill all available
memory with legions of unkillable objects.

> GC>   rate_table.hpp:    explicit table(std::shared_ptr<table_impl> const& 
> impl)
> GC> 
> GC> Perhaps this is a case where you'd still use shared_ptr today?
> 
>  No, I believe the only reason I used it there was that it had been done
> before the switch to C++11 in lmi. Today I would have used unique_ptr<>
> here and, more importantly, use rvalue reference for the second parameter
> of the database ctor. I could definitely change this code to avoid
> shared_ptr<> misuse, please let me know if you agree it's worth doing.

I agree that it's worth doing, though of course that's not urgent.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [lmi] Reference member = shared_ptr, Greg Chicares, 2021/02/02
- Re: [lmi] Reference member = shared_ptr, Vadim Zeitlin, 2021/02/02
  - Re: [lmi] Reference member = shared_ptr, Greg Chicares, 2021/02/02
    - Re: [lmi] Reference member = shared_ptr, Vadim Zeitlin, 2021/02/04
    - Re: [lmi] Reference member = shared_ptr, Greg Chicares <=
    - Re: [lmi] Reference member = shared_ptr, Greg Chicares, 2021/02/05
    - Re: [lmi] Reference member = shared_ptr, Vadim Zeitlin, 2021/02/06
    - Re: [lmi] Reference member = shared_ptr, Greg Chicares, 2021/02/06
    - Re: [lmi] Reference member = shared_ptr, Vadim Zeitlin, 2021/02/06

Prev by Date: Re: [lmi] Reference member = shared_ptr
Next by Date: Re: [lmi] Reference member = shared_ptr
Previous by thread: Re: [lmi] Reference member = shared_ptr
Next by thread: Re: [lmi] Reference member = shared_ptr
Index(es):
- Date
- Thread