discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] comments on stream tags and metadata storage


From: Peter A. Bigot
Subject: Re: [Discuss-gnuradio] comments on stream tags and metadata storage
Date: Fri, 18 Jul 2014 06:15:16 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

On 07/17/2014 10:04 PM, Nowlan, Sean wrote:
I don't see this requirement on ordered generation documented.  In some
cases, it may be inconvenient to do this, e.g. when a block's analysis
discovers after-the-fact that something interesting can be associated
with a past sample.  Similarly, a user might want a block to associate
a tag with sample that not yet arrived, to notify a downstream block
that will need to process the event.
I don't think that ordered generation is required per se, but certain blocks sort and 
others don't. For instance, the tag_work function in usrp_sink_impl.cc "does" 
sort precisely because get_tags_in_range doesn't.

My point is really that, because the infrastructure doesn't sort, only blocks that are aware of the problem have compensated for it. Other blocks are dropping data. This could be solved in the infrastructure with a stable sort in get_tags_in_range or add_item_tags. (If the latter, then the infrastructure could also diagnose violations of the offset-must-be-in-valid-range expectation, which might be helpful.)


A simple solution for the infrastructure is to require that tags only be
generated from within work(), with offsets corresponding to samples
generated in that call to work(), and in non-decreasing offset order
(though this last requirement could be handled by add_item_tag()).  The
developer must then handle the too-late/too-early tag associations
through some other mechanism, such as carrying the effective offset as
part of the tag value.

As far as I'm aware, adding tags from within work is the only safe way to add 
tags to a stream. Also, it is required that offsets correspond to the valid 
range spanning the buffer of input items passed to work. The scheduler prunes 
others outside this range. It's also worth noting that although the history 
mechanism allows viewing past samples (filters use this, for example), 
attempting to add tags to samples in history will not work; those tags will be 
pruned.

If tags need to be stored for future processing in subsequent calls to work, 
it's up to the programmer to push them onto a stack/queue/whatever inside the 
block. The scheduler won't handle this.

Thanks; that confirms and is consistent with my expectations.


(4) The in-memory stream of tags can produce multiple settings of the
same key at the same offset.  However, when stored to a file only the
last setting of the key is recorded.

I believe this last behavior is incorrect and that it's a mistake to use
a map instead of a multimap or simple list for the metadata record of
stream tags associated with a sample.

One argument is that it's critical that a stream archive of a processing
session faithfully record the contents of the stream so that re-running
the application using playback reproduces that stream and thus the
original behavior (absent non-determinism due to asynchrony). This
faithful reproduction is what would allow a maintainer to diagnose an
operational failure caused by a block with a runtime failure when the
same tag is processed twice at the same offset.  This is true even if
the same key is set to the same value at the same sample offset multiple
times, which some might otherwise want to argue is redundant.

A corollary argument is that the sample number at which an event like a
tuner configuration change occurs usually can't be exactly associated
with a sample; the best estimate is likely to be the index of the first
sample generated by the next call to work.  But depending on processing
speed an application might change an attribute of a data source multiple
times before work was invoked.  The effect of those intermediate changes
may be visible in the signal, and to lose the fact they occurred by
discarding all but the last change affects both reproducibility and
interpretation of the signal itself.

I agree this is a problem, but I don't see a workaround as the data plane 
(work, streams, etc.) is asynchronous to the control logic. On the bright side, 
I believe the USRP source block does associate tuner, sample rate, etc. changes 
with an absolute sample in the stream, but this set of features doesn't 
necessarily extend to other hardware data sources. As for other asynchronous 
events generating stream tags, I think the user is stuck dealing with the 
inevitable latency unless the data source can produce metadata that is tightly 
coupled in time and pass that information along to GNU Radio.

Inaccuracy in identifying the associated sample is something we have to live with, yes. My argument is that GNU Radio's stream tag infrastructure (including storage as metadata) needs to accommodate this by not dropping tags based solely on offset and key (and value), because the "duplication" may actually carry information. So an offset-specific map from key alone is the wrong data structure for tag storage.

With fork and join flows the tag propagation policy might introduce replications. A candidate workaround is a unique identifier, added internally by gr::block::add_item_tag, which can be used to identify and drop redundant tag instances as they're propagated. That identifier must be unique across all blocks in the system, not just an block-specific ordinal, since the tag srcid is optional. It need not be preserved in archived metadata, though, since at that point we "know" the tags are complete and unique; new identifiers would be added when archived tags are replayed as a live stream.

As background: I'm digging into this because I plan to update gr-osmosdr's rtlsdr_source so I know the sample rate, frequency, gain, and collection time of the signal, and (roughly) where they changed. Mostly because I keep collecting files with captured and processed data for analysis, and have no idea what parameters I used to generate them. Preserving metadata with signal data in a single archive package is really important to me.

Peter



reply via email to

[Prev in Thread] Current Thread [Next in Thread]