lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi] input sequence parser improvements


From: Vadim Zeitlin
Subject: [lmi] input sequence parser improvements
Date: Fri, 23 Jun 2006 15:16:16 +0200

 Hello,

 I'd like to ask a few things before finalizing the implementation of the
enhancement to the input sequence parser discussed previously.

GC> 0. Calendar year: Today, if comments are to be believed, we allow
GC>   // GRAMMAR duration-scalar: integer
GC>   // GRAMMAR duration-scalar: @ integer
GC>   // GRAMMAR duration-scalar: # integer
GC>   // GRAMMAR duration-scalar: duration-constant
GC>   // TODO ?? calendar year not yet implemented
GC> and it would be helpful to permit calendar years as duration-scalars. I'm
GC> not sure how to do this best.

 I'm not totally sure neither but I think it's better to avoid the
ambiguity if possible.

GC> And we already require '@' to signify ages, so something like 'y2006' would
GC> follow that precedent; perhaps 'year 2006' would be more readable.

 Actually I think that as we are already using '@' and '#', another special
symbol, such as '$' or maybe '=', would be more consistent. But, of course,
'y' or "year" could be used as well. But I really can't make the choice
here myself I'd need to know what is it going to be. Of course, switching
from one to the other is trivial, so if you prefer not to decide this now
we can use '=' for now and then change the rule recognizing it in the
grammar to recognize 'y' instead.
 

GC> 1. Percentages etc.: It would be nice to allow '0.05' to be entered as '5%',
GC> and alternatively as '500bp' where 'bp' is read as 'basis points': common
GC> financial jargon for hundredths of a percent.

 This can be done easily, no problem.

GC> Here an issue arises. Today, this "little language" is used for 'sequence'
GC> fields, as distinguished from 'scalar' fields; every field is one or the
GC> other by its nature. It would not be good to allow '5.73%' (which end users
GC> find much more expressive) only in the one and not in the other. 'Scalar'
GC> fields today use 'numeric_io*.?pp', which is a wrapper for snprintf() and
GC> the strto* family of functions. Now it makes little sense to reimplement
GC> strtod(), for instance, because that is truly difficult to get right; and
GC> the same can be said of snprintf(). So I suspect that we're led to an
GC> intermediate routine that handles both 'sequence' and 'scalar' fields; but
GC> here I'm imagining that the parser delegates to 'numeric_io*.?pp', yet it
GC> doesn't seem to, and I must admit that I spent several minutes looking at
GC> 'input_sequence.cpp' and can't guess how it handles numbers.

 I don't believe the current code delegates to numeric_io* code and the new
code, using boost::spirit, definitely doesn't: it uses the built-in
"real_p" parser to recognize the real numbers (which probably does use strtod
under the hood but we don't know).

 To avoid the inconsistency between accepting '%' and "bp" in one place but
not the other(s), I see 3 solutions:

1. Add a new (static) InputSequence::parse_number() method which would
   handle '%' and "bp" suffixes using the same grammar rule

2. Update numeric_io code to deal with these suffixes independently of
   InputSequence

3. Do (2) and write a custom spirit parser using these functions


 The advantage of (1) is consistency but it doesn't seem natural to have
this method in InputSequence (nor does it look appealing to create a new
class just for this). (2) is better from this point of view but we'll have
different code to do the same thing in 2 places then. While usually I'd
be strongly against this, here I'm not so sure because the code is rather
trivial and it might not be worth it to bring the whole boost::spirit
machinery into play just for this. Also, this code is unlikely to be
modified often (or at all), there are not that many units for the numbers
in addition to absolute, percents and bps. So maybe (2) is enough here.

 But if not, the best solution is, of course, (3). We get consistency,
avoid code duplication and can still avoid using spirit when we only need
to parse a number and not a whole input sequence. The only drawback is that
it's surely the most difficult/time-consuming one to implement but it
shouldn't be that difficult so I think that this is what we should do.
Please let me know what do you think!


GC> 2. Geometric progressions: End users have asked for a way to express, say,
GC>    1000, increasing by five percent per year, compounded, for ten years
GC> producing the values
GC>    1000, 1050, 1102.5, 1157.625, 1215.50625, 1215.50625
GC> where presumably we'd perform no rounding.

 The natural syntax for this could be

        1000 ^ 5% [40, 46)

or, if you prefer more verbose expressions,

        1000 increasing by 5% for [40, 46)

(where "increasing" could be abbreviated to "inc" and both "by" and "for"
could be made optional). Please let me know which one do you prefer. And,
also, I assume that the exponent can be only positive and so we don't need
the "decreasing" variant, please tell me if I'm wrong about it.

 Thanks in advance,
VZ





reply via email to

[Prev in Thread] Current Thread [Next in Thread]