I don't see this requirement on ordered generation documented. In some
cases, it may be inconvenient to do this, e.g. when a block's analysis
discovers after-the-fact that something interesting can be associated
with a past sample. Similarly, a user might want a block to associate
a tag with sample that not yet arrived, to notify a downstream block
that will need to process the event.
I don't think that ordered generation is required per se, but certain blocks sort and
others don't. For instance, the tag_work function in usrp_sink_impl.cc "does"
sort precisely because get_tags_in_range doesn't.
A simple solution for the infrastructure is to require that tags only be
generated from within work(), with offsets corresponding to samples
generated in that call to work(), and in non-decreasing offset order
(though this last requirement could be handled by add_item_tag()). The
developer must then handle the too-late/too-early tag associations
through some other mechanism, such as carrying the effective offset as
part of the tag value.
As far as I'm aware, adding tags from within work is the only safe way to add
tags to a stream. Also, it is required that offsets correspond to the valid
range spanning the buffer of input items passed to work. The scheduler prunes
others outside this range. It's also worth noting that although the history
mechanism allows viewing past samples (filters use this, for example),
attempting to add tags to samples in history will not work; those tags will be
pruned.
If tags need to be stored for future processing in subsequent calls to work,
it's up to the programmer to push them onto a stack/queue/whatever inside the
block. The scheduler won't handle this.
(4) The in-memory stream of tags can produce multiple settings of the
same key at the same offset. However, when stored to a file only the
last setting of the key is recorded.
I believe this last behavior is incorrect and that it's a mistake to use
a map instead of a multimap or simple list for the metadata record of
stream tags associated with a sample.
One argument is that it's critical that a stream archive of a processing
session faithfully record the contents of the stream so that re-running
the application using playback reproduces that stream and thus the
original behavior (absent non-determinism due to asynchrony). This
faithful reproduction is what would allow a maintainer to diagnose an
operational failure caused by a block with a runtime failure when the
same tag is processed twice at the same offset. This is true even if
the same key is set to the same value at the same sample offset multiple
times, which some might otherwise want to argue is redundant.
A corollary argument is that the sample number at which an event like a
tuner configuration change occurs usually can't be exactly associated
with a sample; the best estimate is likely to be the index of the first
sample generated by the next call to work. But depending on processing
speed an application might change an attribute of a data source multiple
times before work was invoked. The effect of those intermediate changes
may be visible in the signal, and to lose the fact they occurred by
discarding all but the last change affects both reproducibility and
interpretation of the signal itself.
I agree this is a problem, but I don't see a workaround as the data plane
(work, streams, etc.) is asynchronous to the control logic. On the bright side,
I believe the USRP source block does associate tuner, sample rate, etc. changes
with an absolute sample in the stream, but this set of features doesn't
necessarily extend to other hardware data sources. As for other asynchronous
events generating stream tags, I think the user is stuck dealing with the
inevitable latency unless the data source can produce metadata that is tightly
coupled in time and pass that information along to GNU Radio.