lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] enums naming and scope (was: [lmi-commits] master dd5ca93 2/3:


From: Vadim Zeitlin
Subject: Re: [lmi] enums naming and scope (was: [lmi-commits] master dd5ca93 2/3: Avoid magic values)
Date: Sun, 20 May 2018 20:05:15 +0200

On Sun, 20 May 2018 16:36:31 +0000 Greg Chicares <address@hidden> wrote:

GC> On 2018-05-19 19:59, Vadim Zeitlin wrote:
GC> > On Sat, 19 May 2018 17:46:07 +0000 Greg Chicares <address@hidden> wrote:
[...]
GC> > GC> One other thing troubles me about this: the extensibility argument for
GC> > GC> enums rather than bools.
[...]
GC> >  The recommended way is to "switch" over enum value instead of checking it
GC> > with "if". If the "default" case is not used in the switch, adding a new
GC> > enum element would give warnings for all places where the code handling it
GC> > would need to be added.
GC> 
GC> Yet they're cumbersome and verbose. Consider:
GC> 
GC>   private:
GC>     bool const is_hidden_;
GC>   public:
GC>     bool is_hidden() const {return is_hidden_;}
GC> 
GC> If we replace that bool member variable with an enumeration
GC>   enum oenum_visibility {oe_shown, oe_hidden};
GC> thus:
GC>     oenum_visibility const visibility_;
GC> then do we really want to replace the tiny one-line accessor
GC> above with the following (just because a third enumerator
GC> might someday be added, however remote the possibility)?
GC> 
GC>     bool is_hidden() const
GC>       {
GC>         switch(visibility_)
GC>           {
GC>           case oe_shown:  return false;
GC>           case oe_hidden: return true;
GC>           }
GC>       }

 No, we definitely don't want to replace the accessor body with this, it
wouldn't result in much, if any, gain. But we might want to replace the
is_hidden() accessor with get_visibility() one returning oenum_visibility
as this would force all the users of this class to do a switch over its
return value and behave accordingly -- which could be useful if we really
planned to introduce some other kind of visibility in the future, e.g.
oe_shown_in_landscape_only or something like this.

GC> That seems like a high price to pay in this case, where we
GC> don't seriously expect there ever to be a third enumerator,

 Yes, I agree. But remember that I wrote the above in the context of your
question about the extensibility. If we're absolutely certain that no new
enum values are ever going to be added, there is no problem to solve, so we
shouldn't apply the solution (i.e. start using "switch").

 FWIW my rule of thumb is to not care about extensibility for enums having
2 bool-like values and other special cases, such as 3-valued bool (which,
by definition, is never going to have more than 3 values...), or enums
containing days of the week or the names of the months. Although, I guess,
one can never be too careful: it might have seemed stupid to worry about
extensibility of the enum containing the planet names, but it turned out to
be less static than expected.

 But for all the others I do use "switch" just because I've seen too much
code that wasn't updated to take into account a new enum value and this
seems like such an easily avoidable error that it's a pity not to pay a
small premium for insurance against it.

GC> and our only reason for using an enumeration is to replace
GC> 
GC>   column_metadata("Header", "999.99", false);
GC> 
GC> with
GC> 
GC>   column_metadata("Header", "999.99", oe_hidden);
GC> 
GC> when we don't actually write it that way--what we're actually
GC> replacing is
GC> 
GC>   bool should_be_hidden;
GC>   column_metadata(header, widest_text, should_be_hidden);
GC> 
GC> with
GC> 
GC>   oe_visibility should_be_hidden; // different type
GC>   column_metadata(header, widest_text, should_be_hidden); // identical!
GC> 
GC> and I wonder whether this trip has been worth the price.

 I believe it was because of the first example above which has become much
more readable. It's true that the second one hasn't really changed.


GC> [...prefixes: 'enum_' for enumerations, 'e_' for enumerators...]
GC> 
GC> > GC> I like to distinguish these lexically:
GC> > 
GC> >  I'd be curious to know why. I.e. you don't use "class_" prefix for all
GC> > classes, nor "type_" prefix (or "_t" suffix) for all typedefs. Why should
GC> > enums be singled out for special treatment?
GC> 
GC> To allow different names for types and instance variables, e.g.,
GC> in this hypothetical example:
GC>   void resize(size_t size);
GC> or this actual one:
GC>   AccountValue::IsModalPmtDate(mcenum_mode mode) const;

 I see.

GC> What are the options?
GC> 
GC> 0. foo(mcenum_mode mode) above: prefixed type name
GC> 
GC> 1. foo(Mode mode): uppercase types, lower case variables

 I think it's the most convincing argument for using camel case convention
for the type names in C++. But lmi (mostly) doesn't use it, and I don't
think we're going to consider going back to it, so, unfortunately, this
doesn't work for us.

GC> 2. foo(mc::mode mode): namespace for 'mc' types--interesting idea

 Yes, I rather like this one too.

GC> 3.a. foo(mode m): use single-letter variable names

 This one can be acceptable, IMO, but only as a special case of a more
general option:

4. Use variable name explaining the variable purpose.

I admit I can't think of anything appropriate for IsModalPmtDate()
parameter, but this could be because I have no idea about what does it
actually do (BTW, it took me a heroic effort to understand that "Pmt"
probably stands for "Payment", which makes me regret that it's not spelt
completely to begin with). But maybe there is really no specific meaning to
this variable here, i.e. it's just some arbitrary "mode", in which case I
think it's fine to fall back to 3.a. -- just as we often use "i" for the
loop index or iterator when there is nothing more appropriate.

GC> 3.b. foo(mode mod): use curiously-mangled variable names
GC> or   foo(mode the_mode): curiously-decorated variable names

 Those are indeed bad.

GC> I'm not sure any of those options is obviously better than the
GC> rest. Taking a fresh look now, I'd rank them in order of
GC> increasing preference thus:
GC>   3.b << 3a << 1 < 0 < 2
GC> and would be inclined to replace (0) with (2) if it were easy.

 In isolation, I probably prefer (1), actually. But considering the
constraint of using snake case for all identifiers, it can't really be used
and then my order becomes

        3.b ≪ 0 ≪ 4 ≈ 2

The reason I dislike 0 so much is that it imposes some unreadable extra
prefix for all uses of the enum, even those when the problem above doesn't
arise (because variables do have specific meaning, allowing to use
non-generic names for them).

 BTW, just to be complete, there is nothing wrong from the C++ compiler
point of view with

5. foo(mode mode): just use the same name.

but if you think this is too confusing, I'm going to wholeheartedly agree.

GC> Then there's the problem of enumerators with similar names but
GC> different contextual meanings. E.g., "monthly" could describe
GC>   mce_monthly: a premium that's paid twelve times a year
GC>   mce_monthly_rate: an interest rate that's compounded monthly
GC>   mce_spread_monthly: a distinct variant of 'mce_monthly_rate'
GC> Similarly, several enumerations include a "none" enumerator.
GC> I named these things in the C++98 era, so they're distinguished
GC> lexically. Scoped enums could solve this problem more cleanly.

 Yes, absolutely.

GC> And, for lmi's more strongly typed 'mc' enumerations, we have
GC> 'mcenum_'-prefixed class templates whose underlying C enumeration
GC> is prefixed 'mce_'...so input uses 'mce_mode' for validation, but
GC> but calculations use the simpler 'mcenum_mode' values that have
GC> already been validated. That's two different naming conventions
GC> for enumerations...and what about their enumerators? Let's see:
GC> 
GC>   enum mcenum_mode
GC>     {mce_annual     =  1
GC>     ,mce_semiannual =  2
GC>     ,mce_quarterly  =  4
GC>     ,mce_monthly    = 12
GC>     };
GC>   typedef mc_enum<mcenum_mode> mce_mode;
GC> 
GC> Thus:
GC>   mcenum_- prefix: C enumeration mcenum_mode
GC>   mce_-    prefix: enumerators
GC>   mce_-    prefix: also used for template typedef'd above
GC> 
GC> I guess the only rule I've consistently applied is that an
GC> 'enum_' prefix denotes a C enumeration.

 Yes, this doesn't exactly help understanding this code but OTOH it's not
the main problem neither, this mcenum stuff is just complicated and, as I
might have already mentioned, is one of my least favourite parts of lmi
code base. I really hope we can replace it with reflection in C++20...

 For now the best approach to this problem I've seen is this library:

        http://aantron.github.io/better-enums/

but it's still not perfect, even though quite nice IMHO.


GC> >  Also, what about the C++11 strongly typed (a.k.a. scoped) enums? Should
GC> > they still use "enum_" prefix? If so, surely their enumerators shouldn't
GC> > have any prefix, as otherwise we'd end up with "enum_foo::e_bar" which is
GC> > just clearly over-specified.
GC> 
GC> The reason I haven't eagerly embraced this is that the names on
GC> the left are so much longer than the names on the right:
GC> 
GC> mcenum_mode::annual          versus   mce_annual
GC> mcenum_rate_period::annual   versus   mce_annual_rate
GC> 
GC> That would be less awful if we dropped the prefixes:
GC> 
GC> mode::annual          versus   mce_annual
GC> rate_period::annual   versus   mce_annual_rate

 Yes, I really like how much more readable the left column is. I wish I
could say it's just a question of habit (because then I could still hope to
acquire it), but after 10+ years of working with lmi code, I still stumble
on "mce_" prefixes when reading it. It doesn't really prevent me from
understanding the code any more (although I still remember the time spent
on deciphering all this MCE stuff back when I finally decided to do it for
the first time), but it's a continuing annoyance.

GC> but maybe that goes too far: the natural name for a variable of
GC> type 'rate_period' is just 'rate_period', so we're back to the
GC> {0,1,2,3a,3b} options above.

 Typically a rate period is associated with something (that it's a period
of). So for me its natural name would involve this something, e.g.
"payment_period" or "oil_check_period" or "dragon_hibernation_period" or
whatever.

GC> Yet, even supposing we replace 'oenum_visibility' with
GC> 'oe::visibility' in accordance with (2) above, we'd still have:
GC> 
GC> - ,should_hide_column(ledger, column++) ? oe_hidden : oe_shown
GC> + ,should_hide_column(ledger, column++) ? oe::visibility::hidden : 
oe::visibility::shown
GC> 
GC> which is rather more verbose.

 Yes, but I feel like it's not a bad verbosity as it's not redundant.
Except for "oe::" part.

GC> Against that, we must weigh the gain in type safety. This is already
GC> forbidden with unscoped enums:
GC> 
GC>   enum E {x, y};
GC>   void foo(E e);
GC>   foo(0);     // already forbidden
GC>   E e = x;
GC>   e += 5;     // already forbidden
GC>   int i = e;  // allowed, but what's the problem?

 This makes it possible to pass an enum to a function taking an int by
mistake. I won't claim that it's a particularly common error, but I think
it did happen to me in the past and it doesn't seem impossible that it will
happen again.

GC> and that's really important. Scoped enums add type safety where it's
GC> less important IMO:
GC> 
GC>   enum class E {x, y};
GC>   void foo(E e);
GC>   foo(0);     // was already forbidden
GC>   E e = E::x; // now must say 'E::x'

 Which is, BTW, nice, as it allows to reuse the same "x" in different
enums. And the difference between mode::annual and rate_period::annual is
much more clear than the difference between mce_annual and mce_annual_rate.

GC>   e += 5;     // was already forbidden
GC>   int i = e;  // no longer allowed, but how big is the benefit?
GC> 
GC> Looking at it in a different way, suppose compilers offered an option to
GC> warn on implicit conversion from enum. Given the function mentioned above:
GC> 
GC>   bool AccountValue::IsModalPmtDate(mcenum_mode mode) const
GC>   {
GC>       return 0 == Month % (12 / mode);
GC>   }
GC> 
GC> would I want to enable such a warning? Probably, at least from time to
GC> time (as I do with -Weffc++). Would I add a static_cast? Maybe; maybe not.
GC> But would I want to replace every
GC>   /mce_annual/mcenum_mode::annual/
GC>   /mce_semiannual/mcenum_mode::semiannual/
GC>   ...
GC> everywhere just so that I could have extra type safety that I don't really
GC> care about, at a cost in readability? I don't think so.

 I still think mode::annual is more readable than both alternatives above
by a much larger margin than the difference between them, which doesn't
seem huge to me: yes, the current version is shorter, but the scoped one
scans better ("::" stands out more than "_") and allows to use the same,
natural, enum element names for different enums.

GC> Did the committee give us a balky language change where a compiler warning
GC> would have been much better?

 In any C++98 library, including but not limited to wxWidgets, all enum
elements have a common prefix anyhow because the possibility of conflicting
with the user-defined identifiers is just too high otherwise. So IMO this
change was very welcome as using wxFoo::Bar is nicer than wxFOO_BAR. If you
have total control over all identifiers used in your code, it's a lesser
gain but, again, the possibility to reuse the same names for different enum
elements is useful. E.g. after a brief look at mc_enum_types.xpp, we could
have just "age" and "year" enum elements in survival_limit, from_point and
to_point enums, for example, instead of having mce_survive_to_age and
mce_to_age. Maybe it's not a huge gain, but at the very least it's one less
thing to think about when adding a new enum.

 Regards,
VZ


reply via email to

[Prev in Thread] Current Thread [Next in Thread]