discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problem with porting an OOT module to 3.8 (sampling speed)


From: Marcus Müller
Subject: Re: Problem with porting an OOT module to 3.8 (sampling speed)
Date: Sat, 3 Oct 2020 08:23:05 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0

Aha! That's examctl the "*very* specific" case Iwas referring to earlier.

This is going to be pretty specific to how you interact with libusb, but in essence: The 3.7 / 3.8 paradigm is still to block in work() until you have data, as a source. That's architectually not pretty, but it is what it is.

I think somewhere in the later 3.7 and definitely for 3.8 we introduced a delay before work() gets re-called in case it returns 0, as that would lead to a CPU core just spinning on that block (although it actually has nothing to do, which means that the less data your block produces, the more persistently it grabs CPU).

You can remedy that:

* for the current scheduling, block in work, with a timeout, instead of using the non-blocking libusb receive methods. I.e. use the synchronous libusb API. (not blocking but then relying on your work() being called in a spinlock manner so that no overflows happen is not a sensible use of the async IO API in libusb.) * Use libusb's async io[0] with a callback: libusb_fill_bulk_transfer(), and set the callback function to a function that sends a `pmt::cons(pmt::intern("done"), pmt::mp(false))`¹ to your block's "system" message port. That should "cancel" the 250ms wait between work() calls (if one is currently going on). This will require an addition copy from a buffer you've filled, but that might be less costly than one would fear (data is hot anyways) * what I'll call the sleeper/waker pattern: keep your block as it is. follow [1] to get a file descriptor to your libusb handle. Have a thread that uses `epoll`, `poll` or `select`² to passively monitor the USB endpoint (without using CPU for that). Then, when new data arrived, you'd wake up your block using the same "system" port method above. In the block's work function, you'd then use the non-blocking libusb functions to get the data (which now is there - otherwise the callback wouldn't have been called).

Best regards,
Marcus

¹ performance hint: have a single object that you hold on to (e.g. `static pmt::pmt_t done_msg = pmt::cons...`) and re-send; constructing PMT symbols is expensive; keeping a single PMT with refcounter isn't) ² I'm not too deep into the details of these APIs/syscalls, but epoll is probably the thing you want in most cases, especially if you're watching more than one file descriptor.

[0] http://libusb.sourceforge.net/api-1.0/group__libusb__asyncio.html
[1] http://libusb.sourceforge.net/api-1.0/group__libusb__poll.html

On 10/2/20 10:27 PM, wk@ire.pw.edu.pl wrote:
Hi Wojciech,

On 02.10.20 15:51, Wojciech Kazubski wrote:
I suppose that in 3.8 the source block has to signal required sample rate to
GR runtime. Is this correct?

No, that's not correct. The runtime doesn't care about required rates at
all. It just makes blocks produce items as fast as they can.

How to fix the code?

Good question! Unless you're doing something *very* specific in your
code, I'd honestly blame this on a bug that was introduced during
porting – but I might be wrong.

Two very common tools to investigate this are rather simple:

1. htop (a `top`-alike program with a bit better visualization), with
thread names enabled – that way you can see which block occupies the CPU
the most, because all blocks run in their own thread
2. `perf` (especially, `perf top -a`), which allows you to sample in
which function your CPU cores are most often. That way, you can identify
functions that might be the blockers there.

Best regards,
Marcus


I thik that the problem is not related to CPU load by my blck. The total CPU load is very low, less than 5%. I made some debugging by addin some test messages printed each time data is  written to or read from the internal buffer. Data is written to the buffer by libusb and read by calling a "work" function (in bloccks of 4096 samples at a time).

In 3.7 I see great number of messages indicatig tha data ie read from the buffer. Part (about half) of the messages show 0 bytes read, due to lack of new data from usb. By this, data rate is reduced from  ~30Msps to 16.368Msps.

In contrast in 3.8 the work function is called only about 4 times a second, giving 16ksps. This is some 2000 times less than in 3.7.

--
Best regards
Wojciech






reply via email to

[Prev in Thread] Current Thread [Next in Thread]