libmicrohttpd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [libmicrohttpd] incoming-data boundary determination


From: Christian Grothoff
Subject: Re: [libmicrohttpd] incoming-data boundary determination
Date: Thu, 11 Apr 2019 12:31:02 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

On 4/11/19 4:54 AM, address@hidden wrote:
> I am starting in on an effort to use libmicrohttpd.  Having read
> through the complete tutorial and manual, I am still missing certain 
> key information.
> 
> When, exactly, is answer_to_connection() called with respect to
> the incoming content-data stream from the client? 
> 
> I'm writing a web server that will process certain types of
> POST data.  Suppose the client sends in a 100 KB package of data.
> Ignoring the first (setup) call to answer_to_connection(), does
> libmicrohttpd wait until the entire payload has been received before
> calling answer_to_connection() with a non-zero upload_data_size?
> Or does it make some decision on how to cut the payload into sections
> that are passed in successive calls?  On what basis is such a
> decision made, and how consistent and reliable is that logic in
> the face of varying payload sizes, large and small?

MHD will call you "when it can", but except for "in a timely fashion as
the data arrives", there are no guarantees.  MHD will NOT wait for the
upload to be completed, but given TCP and OS buffers and OS scheduling,
you may get 1 byte at a time or 100 kb in one shot. MHD makes no
'decisions' here, just whenever it gets data from the OS it passes it to
you. You can set the buffer size used by MHD, but if the buffer only
contains one byte, MHD will still give that to you when it can (got CPU
time).

> My question is very general.  In my particular case, the format will
> be neither application/x-www-form-urlencoded nor multipart/form-data.
> So whatever the library might decide in those cases does not apply
> to my situation.  But even in those cases, what guarantees are there
> that data of specific application-level interest does not overlap 
> the boundary between sections that are passed in successive calls
> to answer_to_connection()?

None. Your application has to deal with this.

> I need to know if the incoming POST data may be passed to my
> answer_to_connection() routine in multiple sections not necessarily
> related to semantic boundaries in the payload.  If that is the case,
> it seems like I would need to keep accumulating the data passed in
> all the calls until I get one with a zero value for upload_data_size,
> and only then can I reliably process the full dataset.

That depends on your parser. MHD's PostProcessor can handle
multipart/form-data incrementally. But yes, often just accumulating
everything and then processing it once the full dataset was received is
the simplest method that works (if you have the RAM for it).

> Would that
> not entail extra data copying into a connection-specific buffer
> maintained by my application?  

Yes, it does. So if this is a concern, maybe you can use a parser that
does support incremental processing? Quite a few out there do, as you
basically just need to modify the tokenizer to 'suspend' when there is
only a fraction of the next token left and to 'resume' once there is
more data. Thus, the rest of the parser usually doesn't even have to be
changed for incremental processing.

> Is there some per-connection buffer
> within the libmicrohttpd package that will contain the entire
> payload as one contiguous section once it has all been received,
> that my application could access instead of spending precious time
> in extra data copying along the way? 

If you configured the IO-Buffer of MHD to be _guaranteed_ to be
sufficiently large for the entire upload, that might indeed work. In
this case you would have to check *upload_data_size against the expected
upload size and "refuse" to process the data until it hits that value.
Note that the current MHD implementation may end up busy waiting in that
case, as IIRC it may call you again if you didn't process all the data,
even in the absence of network traffic (but I'm not sure here).

> Or can I provide a buffer in
> response to the first (setup, *con_cls == NULL) call, that would
> tell libmicrohttpd where to directly place the incoming data,
> to avoid such extra copying?

With the current API, that is not possible. However, I would be happy to
review a patch adding this feature. :-)

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]