[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] 1.3 release and Dcc/Bcc behaviour

From: Robert Elz
Subject: Re: [Nmh-workers] 1.3 release and Dcc/Bcc behaviour
Date: Mon, 09 Apr 2007 19:33:42 +0700

    Date:        Mon, 09 Apr 2007 13:38:15 +1000
    From:        Joel Reicher <address@hidden>
    Message-ID:  <address@hidden>

  | I was going to put it in the ChangeLog and give it a somewhat
  | prominent place in the 1.3 release announcement.

I'm not sure I can convince myself that most people read those, and of
the ones who do, how many would understand what it means.

  | I've been trying to think of what might break as a result of changing
  | this default, but can't come up with anything. That's one of the reasons
  | I'm not shy of making the change. Can you think of anything?

non-unique message IDs is what I expect to happen.

Or perhaps more accurately, message IDs whose uniqueness isn't
something that we have any particular reason to expect.

  | Perhaps you are making a point about the way nmh generates message-ids?
  | Should the algorithm be smarter than it is for us to change the
  | default with a clear conscience?

No, that's not it, the algorithm is fine (look at my message-id header,
you'll see nothing there different from what nmh would generate, as that's
what I use...).    That is, I have no objection at all to the -msgid
option as it now is, only to applying it on hosts that aren't correctly
configured to use it.

You might be right that it should be the site admin (site admin, these
days, on someone's random home PC??) who should fix this, and know how to
fix this, but I just don't see that happening - rather what I see is lots
of other mail admins telling people they don't like MH users, because of
the stupid message-ids they generate.

  | I'm not sure I follow. From what I can see in RFC2822, the only requirement
  | for the message-id is that it be "globally" unique. Using a "proper"
  | hostname is a recommended way of generating such a message-id, but not
  | required. Are you worried about the uniqueness of improper hostnames,
  | or are you saying the hostnames should be proper regardless?

You answered this yourself (correctly) already, so I'll just confirm it.
Uniqueness is what is needed - the hostname issue is that I know of know
way to ensure uniqueness without some kind of global registry - something
unique that we can start from.  For message-IDs, that (and certainly with
the MH algorithm) is the DNS and the hostname.   I guess we could switch to
use IEEE identifiers (these days almost every host that's going to be
running MH will have one).

Statistically likely unique message-ids (ones that we expect to be
unique just because they're big and random) are OK - provided we do
the analysis over all messages, over all time - which is a very very
huge number of messages, all of which should have unique message-ids.  

  | I'm not really sold on the whole "better for the MTA to do it" idea because
  | message-ids are for messages, not email. Usenet posts have message-ids
  | too. Messages and their IDs are a transport-independent idea, as far as
  | I understand it.

Yes, I'm not disputing that it is better that way, in fact, I use MH generated
message IDs (I explicitly said MH rather than nmh, because I've been doing it
that way since long before nmh existed).   I certainly wouldn't want the
option to vanish.   I just don't believe that it is a good idea to make it
the default without providing some kind of mechanism to actually make sure
that it works for everyone (more or less out of the box works, without
any special config, assuming only "normal" host config.)

I know that message-id generation has been kind of ignored - everyone just
does what they want, and it all just "kind of" works (they tend to be
pretty long, which means clashes are really fairly improbable, no matter
how they're generated, as long as there's anything variable - I know of one
system that deliberately sent the same message-id in every message, but that
was more of a statement about 822, and I think can be ignored).

We (on unix type systems) generally assume that if we include the time,
(in any representation we like) and the process id to guard against multiple
generations at the same time (to whatever resolution we're using, usually
1 second), and the local hostname (to guard against the same algorithm on
a different host) then we're pretty safe.   But that's not really good
enough - something may be using 20070409120000 as its time representation,
and another host, seconds since some epoch (1176120000 perhaps - standard
unix timestamp for the same (UTC) time).   However, the system using the
"seconds since" will eventually reach the value 20070409120000, and if at
that time (way off into the future) a message-id happens to be generated from
a process with the same process id as the one which generated the
20070409120000 message-id (using it as YYYYMMDDhhmmss), and it just happens
to be running on a host that (at that future time) has the same hostname
as the host (now) which uses epoch offsets, then we have a potential
duplicate, as nothing we were using to "ensure" uniqueness is now unique.

We (the community as a whole, not the nmh project) could reduce the effects
of that by making message-id generation a system function (ie: available in
libc or its equivalent) so there would at least be less algorithms to
have to worry about - rather than every application simply inventing its
own without considering how that affects anyone else's algorithm - but we
(the nmh community) don't get to make that happen - the best we can do is
perhaps avoid generating message ids (leave it for some other application,
like the MTA) unless we're told that our algorithm has been investigated
and found satisfactory for the local host (which is essentially what
turning on the -msgid option does).


ps: Apologies for the lengthy ramble - but message-id's have always been one
of my hangups - partly because I actually use the things - send me two
messages with the same message-id, and the later one to arrive will be
discarded as a duplicate (with no other checks whatever...)   For example,
I should have received 2 copies of your message (and most probably did, one
addressed to me, and one via the list) - the list copy however "never arrived"
(which almost certainly means it did arrive, then was discarded as a
duplicate, which, of course, is what it was).

reply via email to

[Prev in Thread] Current Thread [Next in Thread]