bug-mailutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sieve, and header_t's api


From: Alain Magloire
Subject: Re: sieve, and header_t's api
Date: Sun, 22 Apr 2001 19:03:26 -0400 (EDT)

> 
> So I'm working on getting cmu's sieve implementation working with
> mailutils. It's got a callback structure, wherein it parses and
> evaluates scripts, but you provide the callbacks to get headers,
> message sizes, and perform actions on the message (discard, file to
> another mailbox, etc.). The getheader() cb fn looks like:
> 
> int (*getheader)(void* msg, const char* name, char*** value);
> 
> The idea is a field name can occur multiple times, so this function
> fills *value with a pointer to a null-terminated array of pointers
> to the values found for that field. Seems fine... but there's
> a problem. Who owns the data?
> 
> In cmu sieve, the implementor of getheader() owns the data, so it
> passes pointers to data that it keeps back to the caller. So what
> you do is you build a table of headers that have been asked about,
> and the arrays of all the values for that header that were in the
> message. But you can't call the header api's that get the field
> by name (they'll only return the first value, we want all of them).
> So, what you pretty much have to do is go through the headers
> in order (1, 2, ...) get the value, build a data structure that
> has a node for every field name that was found, and all the values
> for that field name are stored in that node.
> 
> Then you free this thing after the sieve script has run on the message.

Ok.

> This is all fine. Except that as I was looking through the mailbox
> code I see that the header_t *already* appears to have such a
> structure. In other words, it appears that for on-disk mailboxes,
> that the header is parsed completely, and a table of all the values
> found is built. For IMAP it looks like it cache's field values after
> looking them up.

Yes you are correct.  There is 3 approaches:
- header_get_fvalue();  //fast header:
   The concrete mailbox implementation caches header that it knows will
   always be call oftenly.  So for example the on-disk mailboxes, will
   when doing the parsing, always cache internally the headers:
   From, Subject, Date, etc ...  it then overload the function:
        header_set_get_fvalue ();
   this function is always call first by the header_t object.

  Another example, IMAP4 mailboxes could use it as away to "prefetch".
  In IMAP this is done by doing ENVELOPE, anyway it's not implemented 8-)

- header_get_value(); // way of getting fields.
   Retrieve the header, overload appropriately by mailboxes.

- Implement the parsing inside header_t:
  mailboxes like POP3 and on-disk must parse the entire  
  headers anyway, so the code was move inside the header.
  When the values are needed the entire header is sucked in
  inside header_t and parse (For example POP3: "TOP msgno 0\r\n")
  The parsing is done by keeping offsets of where header names and
  values start and end.

Those were the reasons, why I thought we could not use parse822()
inside header_t.

> There are two styles of api to get info that are character strings,
> one allocates a string for you, and passes it back, the other
> takes a pointer to data you allocated, and fills it in.

The allocating ones are convenient functions.  Should we have the same
for address_t also?

address_aget_email()?

> 
> I need a header_t API that passes me back pointers to const data
> that has a lifetime of the header_t. Since there isn't one, I
> have to pull all the header field values out of header_t, and put
> them in my own data structrure, and then query it for header
> values. Basically I build a cache of the header field's and pass
> back pointers into that cache.
> 
> I wouldn't even mention this, except that it appears that inside
> the mailbox API, the mailbox is doing the same thing...

The dangers here of exposing the header internal, in that case the
entire buffer, are:

- There is no real protection in C, you just (char *) cast and
  write to the buffer, which may change the offsets of the parsing
  and the next header_get_value() will go to "lala" land.

- when calling header_set_value (), this function for it needs
  can decide to realloc() the entire buffer (free()/malloc())
  which means that the pointer (const char *buffer) that the application
  is holding is simply gone and will cause a crash.

> The address_t is doing the same thing. The caller wants the value
> of a string. That string *is* already in existence inside the
> address_t. It won't have a lifetime past that of the address_t, and
> it can't be modified, but a lot of callers might just want to get
> a pointer to it so they can print it out, or do a quick test against
> it. They are forced to reallocate for no reason.

Yes, but the same reasonning will apply, at one point will want to
have functions like:
address_set_personal();
address_set_email();
address_set_route();
address_add_address();

To permit clients to construct RFC822 compliant headers, i.e if you
expose the buffer, and someone calls address_set_personal(), what
happens?

> ----
> 
> I'm sorry this was so long-winded, it would be fast with a whiteboard!

It's too bad, GNU mailutils was at a too early stage last year in
Vancouver, in front of sushi everything would have been clearrer 8-)
Brainstorming in email exchange is hell.

> A little, this is just some observations, but I think it would be
> worth considering doing some of the following:
> 
> 1 - use this inside the header_t code, and add APIs to header
> and addres that return pointers to internal data:
> 
> address_get_email_p(address_t a, size_t no, const char** email);
>   // *email will point into the address_t, and they can't change it,
>   // or access it after the address is destroyed.
> 
> header_get_field_values_p(header_t h, const char* name, const char*** values)
>   // similar style to above, but.
> 
> 2 - write a cache_t class, that stores name/value pairs, where there
> can be more one than one value per name. Sieve had this, and I'm
> cleaning it up a bit. Using it I'm implementing an API like -2- suggests.

I actually like cache_t(-2-) because I will have, in the future, to come up
with a "cache mailbox" implementation.  The cache mailbox is a container
that takes another mailbox, but the informations downloaded are cache.
This will permit connectionless operation, hmm not the right term,
disconnected operation?

For example we may cache the entire mailbox from a POP3 or a IMAP4 server,
the connection may go away (timeout, reset by peer, etc ....)
the information is cache to reduce the network traffic and
persistency in the face of failures.  So your cache_t maybe the first
step in this direction 8-)

> I guess that I can do this means that the API is ok, but it seems like
> such a general purpose thing, that I'd like to maybe clean it up, and
> sell it as an adapter to wrap around header_t.
> 
> create_header_cache(header_cache_t* ch, message_t m);
> header_cache_get_field_values_p(header_cache_t ch, const char* name, const 
> char*** values);

Looks cool/interesting.

> ----
> 
> Quick question because I'm lazy: what does IMAP do when you ask
> for the "received" fields?

It returns the "entire" received fields, in one block.
Here is the log of a session:
# telnet localhost imap
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
* OK [CAPABILITY IMAP4 IMAP4REV1 LOGIN-REFERRALS AUTH=LOGIN] localhost 
IMAP4rev1 2000.284 at Sun, 22 Apr 2001 18:17:06 -0400 (EDT)
a00 login alain glory67
* CAPABILITY IMAP4 IMAP4REV1 NAMESPACE IDLE MAILBOX-REFERRALS SCAN SORT 
THREAD=REFERENCES THREAD=ORDEREDSUBJECT MULTIAPPEND
a00 OK LOGIN completed
a00 select inbox
* 508 EXISTS
* 0 RECENT
* OK [UIDVALIDITY 977948196] UID validity status
* OK [UIDNEXT 510] Predicted next UID
* FLAGS (\Answered \Flagged \Deleted \Draft \Seen)
a00 OK [READ-WRITE] SELECT completed
a00 fetch 2 body.peek[header.fields (Received)]
* 2 FETCH (BODY[HEADER.FIELDS ("RECEIVED")] {1541}
Received: from spock2.ECE.McGill.CA (address@hidden [132.206.1.2])
        by qnx.com (8.8.8/8.8.8) with ESMTP id JAA21691
        for <address@hidden>; Tue, 16 May 2000 09:15:27 -0400
Received: from mescaline.gnu.org (mescaline.gnu.org [158.121.106.21])
        by spock2.ECE.McGill.CA  ..
....

> > 
> > The mailbox_t API is lacking API for :
> >   - searching

> 
> > Thinking of the IMAP search command, would it be possible to implement
> > this in term in terms of sieve.  I'm thinking of this because, it may not
> > be necessary to have an interface to the mailbox_t API for search.
> 
> If IMAP searches for messages satisfying a boolean condition, and
> the server builds a sieve script that contains a conditional check
> and runs it against all the messages in a mailbox, it could find
> messages. It might be necessary to add some types of checks and
> comparisons to sieve as extensions (which is allowed) to support
> the kinds of checks IMAP does. Or maybe not. This would be kindof
> cool.

Yes, I was thinking along those lines.

> >   - sorting(threading)
> > Since searching(sorting?) can be express in sieve...
> 
> Not sorting!
> 
> Sorting is just a way of looking at messages. Real sorting is looking
> at the "in-reply-to" field value, and then searching for the message
> that has that message-id. But doing soft threading based on subjects
> (a la how mutt can do, if wanted) is useful. It seems like an api to
> get messages by message-id would be all that's needed. Then you
> could have a mailbox_thread_t that took a mailbox, went through all
> the messages in it, and built the tree, but in each node put only
> the id of the message that node corresponded to, and perhaps some
> common information.

OK.

> This seems really application specific, I think
> it would be a good chunk of code to have, but that it shouldn't
> be part of mailbox, it should be a seperate type that operates on
> a mailbox. Unless IMAP has facitilities to "thread"... then I guess
> it would have to be a part of the mailbox.

Yes, I would have like to keep the sorting/threading outside the mailbox
and not to bloat this object to much and rather have a new object
sorting_t/threading_t or something.  But unfortunately, there are
an extensions to the IMAP rfcs that describe sorting/threading 8)

http://www.imc.org/draft-ietf-imapext-thread
http://www.imc.org/draft-ietf-imapext-sort

But we could have this
mailbox_get_sort (mailbox_t, sorting_t *);

Which return a sorting/threading object hat the concrete mailbox can
implement/overload.

> > This is a little confusing  ... I have to think it over 8-)
> 
> Indeed!

8-)

> 
> Enjoy the sun.

Already done that 8-) and still loving every minute, no snow ... wow!
I forgot what grass looked like.

-- 
au revoir, alain
----
Aussi haut que l'on soit assis, on est toujours assis que sur son cul !!!




reply via email to

[Prev in Thread] Current Thread [Next in Thread]