Re: [Discuss-gnuradio] Question about GSR internal architecture

discuss-gnuradio
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Discuss-gnuradio] Question about GSR internal architecture

From:	Eric Blossom
Subject:	Re: [Discuss-gnuradio] Question about GSR internal architecture
Date:	Fri, 9 Jul 2004 11:44:11 -0700
User-agent:	Mutt/1.4.1i
On Thu, Jul 08, 2004 at 09:57:01PM -0700, David Beberman wrote:
> Hi,

> My problem is that I'm probably trying to do something with GSR that
> it wasn't meant for.  (isn't that always the case?)

What you're trying to do looks completely consistent with our goals
for GNU Radio.  We want to be able to build systems that require tight
timing synchronization between the Tx and Rx paths.  This includes
among others TDMA and frequency hopping systems.  Although the code
hasn't been written, there's been quite a bit of discussion on this
topic, most of which has taken place either face to face or on the
phone, and hence isn't logged in the mail archive.  Now we get to talk
about it on the list!

> As far as I can tell, and please somebody correct me if I'm wrong, the
> GSR architecture consists of a set of modules hooked together with
> buffers.

Correct.

> Except when threads are not available, each module runs in
> its own thread.

Not exactly.  FWIW, in the 0.x series, the SMP code never worked well
enough to be used, so we were always running all the signal processing
in a single thread.

In the 2.x series, the SMP code, when it's written, will use a
thread / cpu for the signal processing plus an additional thread for
control, gui etc.  I can't see a compelling reason for more than a
single thread per cpu for the signal processing.

The idea in the 2.x world is to dynamically partition the workload
between the processors available, while taking advantage of
thread/processor affinity.  To give a simplified example, imagine a
signal processing graph with 8 blocks in it and a dual processor
system.  Topologically sort the graph and assign the first 4 blocks to
cpu 0, and the final 4 blocks to cpu 1.  The cpus can run pretty much
independently of each other with good memory and cache locality, with
the proviso that there's a buffer that's shared at the boundary of the
partition.  cpu 0 writes into the buffer, cpu 1 reads from the buffer.
Access to this buffer is of course serialized with a mutex, etc.  When
the producer/consumer rendezvous occurs, on the average (assuming that
each block requires a relatively constant amount of cpu, memory
bandwidth, etc for a particular throughput), either cpu 0 is going to
be write blocked or cpu 1 is going to be read blocked.  If cpu 0 is
write blocked (meaning that it's getting done with its work before cpu
1 is), we can change the partitioning by migrating the fifth block
from cpu 1 to to cpu 0, thereby changing the relative work loads.
Assume we low-pass filter this repartitioning activity, so that we're
not migrating the same block back and forth.  It should settle down to
a partitioning that's getting everything it can out of all cpus
available while maintaining good locality and low coordination
overhead.  Sound reasonable?


> The overall structure appears to be meant for broadcast reception or
> transmission.  In either case, "pipeline" delays from processing are
> really not that important.  I think this would be categorized as a
> near realtime system, perhaps.

As is, it should work with half duplex "push to talk" type systems too.

> What I need to do is a little bit different.  I want to have a receive
> and transmit path, and have them tied together for control purposes.

We want that too.

> I also want to have more of a realtime behavior out of the system.
> Since this is running in software on a regular PC, I have to define
> realtime a bit differently than an embedded system.  What I want to
> have happen is that a relation exists between a received signal, and
> when a transmit signal is sent in response.  The relationship should
> be that the transmit signal is sent at a given amount of time after
> the received signal was originally received.

Understood.

> To do this, I would need to have some sort of estimate of when the
> received signal began, something like an interrupt giving me an
> energy detect point.  Then I need to record the current processor
> clock time.  On the transmit path, I want to hold up transmitting
> the signal until some increment of time has passed, given the
> recorded processor clock time.

Our thinking is similar, though we plan on having the hardware/driver
combination timestamp the incoming data and provide a method for
requesting that a particular tx burst begin at a particular point in
the future.  Most likely we'll leave the energy detection, etc to the
host code, but provide it good time stamp info so that when it figures
out where the rx packet is, it can queue up a tx packet for the right
output time.

On the USRP, we envision a running sample counter in the FPGA, that's
conceptually next to the A/D's and D/A's.  Each USB packet we send
from the USRP to the host will contain a timestamp.  Likewise, on tx,
the packets outbound from the host to the USRP will be timestamped
with a "start time".

It's a little more complicated than this, but this is the basic idea.

You could make the same thing work with PCI based hardware too,
assuming that it's not too brain dead.

> Since this is a regular PC, I will make sure that total time elapsed
> of doing the receive path processing, and the transmit signal
> processing is less than the incremental time needed.  That way I can
> be sure that at some point, the transmit path will hold up, waiting
> for the correct transmit time to arrive.  As I understand it, the
> regular Linux kernel multitasks with a granularity in the range of 10
> milliseconds.  I am looking at using the Timesys kernel instead.  They
> are claiming a much lower granularity level.  I'm also planning to run
> a barebones system.  No networking, no gui, no nothing.

You may want to look at 2.6 if you haven't already.  They claim to
have much finer granularity, though I don't know the details and
haven't tested it yet.

If you're willing to design your system such that the rx and tx packets
can be pipelined, you don't need much from the kernel.  E.g., the
corresponding tx packet to a given rx packet occurs N slots later.

Vanu has made this idea work on their GSM software basestation.

> For the work I'm trying to do, I can pretty much work with any latency
> that is needed to get through any processing paths needed.  I just
> need to be able to hit a given realtime deadline as a synchronization
> point.  I'm not even that concerned about jitter from
> scheduler/context switch overhead.  I can account for that in my
> signal processing design.

Does what I've described sound like it will accomodate your needs?

> I'm planning to write the code to implement what I'm describing and
> will be happy to redistribute it, if anybody else happens to ever need
> something like it.

Great!

> I'm wondering if someone could give me a couple pointers on where to
> start looking in the source code to figure out how to implement
> this.

What are you planning on using for your RF front end and A/D, D/A?

I think the next thing would be to come up with a suitable interface
for describing the tx and rx of blocks with associated timing info.
I've got some ideas, but coming from the application perspective, you
may have better insight into what you need than I do.

> As a secondary question, I've been trying to find in the source code
> examples how to handle asynchronous frame reception.  What I'm looking
> for is how to do synchronization on a frame header, followed by data
> reception.  A simple approach would be to put a synchronizer (I use
> the term loosely), as a source to a data receiver module.

The ATSC receiver solves this problem in a way that's specific to
ATSC.  It solves two problems.  The first is a bit synchronization
problem, the second is locating frame boundaries.

In the 0.9 release, the main code for the atsc rx is
src/gnu/atsc/atsc_rx.cc.  This will give you an overview of how the
modules are connected together.  A quick look at the atsc spec will
provide context.  http://www.atsc.org/standards.html.  Check out A53/C
and A/54A.

The specific modules you might want to take a look at are
GrAtscBitTimingLoop3 (bit timing recovery) and GrAtscFieldSyncDemux
(stream of samples to stream of aligned packets).

You'll find them at
src/gnu/lib/dtv/GrAtscBitTimingLoop3.{h,cc}
src/gnu/lib/dtv/GrAtscFieldSyncDemux.{h,cc}

> The problem is that once the synchronizer is done, it will just be
> an extra context switch overhead without adding anything.

There's no context switch required, but your point is valid. 
GrAtscFieldSyncDemux has a state machine that either does a search for
the proper alignment or assumes that it's got alignment and does a few
low overhead checks to be sure.

> What I would really want to see happen is that the chain of sources
> to sinks could be redirected once a component in the chain has done
> its job.  This isn't strictly necessary, only an optimization issue.
> Just was wondering if this already exists and I'm misreading the
> code.


The 2.x code supports reconfiguring the pipeline on the fly.

> I'm not familiar with DSP CPUs and DSP architecture in general, so
> what I'm asking may be obvious.  If so, just let me know where to
> start looking and reading, and I'll do the rest.

There's nothing magic about DSPs.  Mostly you can think of them as
embedded processors that can mulitiply and add very quickly.  Many of
them, particularly the fixed point ones, have highly non-orthogonal
instruction sets where the underlying architectural warts keep poking
their heads into your code.  Often they have special support for
circular buffering or performing the FFT butterfly address
calculation. 

If you don't care about power consumption, floating point on a
contemporary superscalar processor (Pentium, Athlon, G4, etc) kicks
butt on DSPs, and your code isn't tied to some whacky architecture.

> David

Thanks for the great questions and ideas!

Also, I'll be making the first alpha release of the GNU Radio
2.x code base in the next couple of days.

If you want a sneak preview, it's in CVS under the module name "gnuradio-core"
Any files with names like GrCamelCase.xyz are still to be converted to the
new architecture.

| GNU Radio's CVS repository can be checked out through anonymous CVS
| over SSH with the following instructions. When prompted for a password
| for anoncvs, simply press the Enter key.
|   
|   $ export CVS_RSH="ssh"
|   $ cvs -z3 -d:ext:address@hidden:/cvsroot/gnuradio co -P gnuradio-core
| 
| Be sure to use the -P option on your check out. This prunes empty
| directories from the build tree.
| 
| The SSHv2 public key fingerprints for the machine hosting the cvs trees are:
| 
|   RSA: 1024 80:5a:b0:0c:ec:93:66:29:49:7e:04:2b:fd:ba:2c:d5
|   DSA: 1024 4d:c8:dc:9a:99:96:ae:cc:ce:d3:2b:b0:a3:a4:95:a5 
| 
| Once you have the tree checked out, you can keep it up to date by
| using cvs update.
| 

You can browse the cvs archive from the savannah repository with the
help of viewCVS.  http://savannah.gnu.org/cgi-bin/viewcvs/gnuradio/

Eric
[Prev in Thread]
Current Thread
[Next in Thread]
[Discuss-gnuradio] Question about GSR internal architecture, David Beberman, 2004/07/09
- Re: [Discuss-gnuradio] Question about GSR internal architecture, Eric Blossom <=
Prev by Date: [Discuss-gnuradio] How different is 2.x from 0.9?
Next by Date: [Discuss-gnuradio] SSRP 7/11
Previous by thread: [Discuss-gnuradio] Question about GSR internal architecture
Next by thread: [Discuss-gnuradio] How different is 2.x from 0.9?
Index(es):
- Date
- Thread