[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Pan-users] How would I debug Pan (intermittently) sending posts in
From: |
Duncan |
Subject: |
Re: [Pan-users] How would I debug Pan (intermittently) sending posts in triplicate? |
Date: |
Tue, 9 Jul 2013 13:54:16 +0000 (UTC) |
User-agent: |
Pan/0.140 (Chocolate Salty Balls; GIT 368aae4 /usr/src/portage/src/egit-src/pan2) |
Rock posted on Tue, 09 Jul 2013 05:11:42 +0000 as excerpted:
> Have you ever seen this problem (and do you have insight into the
> cause)?
>
> I'm using Pan 0.135, which is the latest easily possible on Centos6
> (long story)
> and, at times, but only with aioe.org:119, I end up with triplicate
> posts, each with its own message id, generally sent within the span of a
> few seconds to a minute, based on the time stamps.
I'm reasonably sure I know the issue, and it's not exactly pan's fault,
as I'll explain.
However, if you like, you can use the "ngrep" tool (homepage http://ngrep/
sourceforge.net , if you'd like to read a bit about it before
installing...), to grep your net connection for packets on port 119, etc,
and see what's actually being sent/received. That's one of the better
net-activity diagnostics I know of, and I too learned about it here. =:^)
> I can generally tell when this is going to happen, because when I hit
> the File->Send Article (or Control Return), Pan, when set up by default
> to not create a message ID, hangs saying "sending", for, oh, about 30
> seconds or so, and then it ends up sending three exact messages all with
> different message ids.
This matches the pattern I'd expect...
> The weird thing is that it doesn't happen frequently (but it's
> embarrassing when it does happen); and that it only happens with
> aioe.org:119 (and, from a look at the event log, in general, it seems to
> happen when there are numerous connection errors with aioe.org such that
> I often resort to setting aioe to 0 connections just to quiet down the
> log file.
So does this...
> Through extensive work on the aioe side, we've ruled out that AIOE is
> creating the three messages (they're actually receiving three separate
> messages) and, in fact, when it happens too much, they ban me for a
> week or two (so there are negative repercussions from what Pan is doing
> wrongly).
... and this...
> What I "think" is happening, is this sequence (but I can't prove it):
> a) Pan experiences intermittent connection errors
I'm with you so far.
> b) Pan sometimes sends single messages, and sometimes sends triplicate
> messages.
Well... sort of. But this is where you go wild, as I expect you're
missing some of the pieces to the puzzle.
> c) The aioe server is the innocent recipient of multiple messages, and
> creates multiple message IDs for those messages
Mostly true (tho the server could certainly try a bit harder to detect
duplicates sent without message-id from the same user within a narrow
timeframe), but misses the point, as you're missing pieces of the puzzle.
> After experiencing this anomaly perhaps a dozen times in the past year,
> I gave up trying to debug (because I just don't know how); and simply
> set Pan so that it created a (hopefully unique) message ID. It took
> about two weeks before aioe let me back on - and then, now, with Pan
> setting unique message IDs, the problem went away. So, in a sense, it's
> 'solved'.
It's worth noting here that in current pan, the option to let the server
assign message-id is gone. Pan always assigns a message-id now, but does
have an option to let you set the domain used in the message-id if you
like. Perhaps this sort of trouble is one reason why that option
disappeared.
> But, I ask, in the general sense: Have you seen something similar?
I'm not sure I've ever seen this personally, but I believe I've seen it
reported on the list a few times, and I understand enough about NNTP to
have a pretty good idea of what's happening.
If you like, you can have a look at RFC 3977, Network News Transfer
Protocol (NNTP) (dated October 2006 and obsoleting early RFC 977 while
updating RFC 2980, with a further update as RFC 6048, Nov 2010, altho
that updates the NNTP LIST command only and thus isn't of interest to us):
http://tools.ietf.org/html/rfc3977
Of interest to us is section 6.3, Article Posting, and in particular,
6.3.1, the POST command. Direct link to that:
http://tools.ietf.org/html/rfc3977#section-6.3.1
Basically, when a post is sent, the server MUST return a response code
indicating success (response code 240) or failure (probably 441). (Such
codes often have a plain English description appended as well, for use by
humans doing a manual telnet connection and/or for troubleshooting, but
the numeric code is what a news client will be looking for.) A 340 code
is the "OK to send article" response after the initial POST command,
before the actual article content is posted.
As covered in section 3.2, response codes...
http://tools.ietf.org/html/rfc3977#section-3.2
For the first digit, 2xx series responses indicate command completed OK,
4xx series indicate syntax correct but failed for some reason, and 3xx
indicates OK so far, send the rest. The second digit indicates the
function response, with x4x being the posting function. There's posting
examples at the end of section 6.3.1. =:^)
Back to section 6.3.1, the POST command, now quoting from the RFC:
>>>>
Following the presentation of the termination sequence by the client,
the server MUST return a response indicating success or failure of the
article transfer. [...]
A response of 240 SHOULD indicate that, barring unforeseen server errors,
the posted article will be made available on the server and/or
transferred to other servers, as appropriate, possibly following further
processing. In other words, articles not wanted by the server SHOULD be
rejected with a 441 response, rather than being accepted and then
discarded silently. However, the client SHOULD NOT assume that the
article has been successfully transferred unless it receives an
affirmative response from the server and SHOULD NOT assume that it is
being made available to other clients without explicitly checking (for
example, using the STAT command).
If the session is interrupted before the response is received, it is
possible that an affirmative response was sent but has been lost.
Therefore, in any subsequent session, the client SHOULD either check
whether the article was successfully posted before resending or ensure
that the server will allocate the same message-id to the new attempt (see
Appendix A.2). The latter approach is preferred since the article might
not have been made available for reading yet (for example, it may have to
go through a moderation process).
<<<<
So basically, what pan's doing is waiting for that 240/441 response and
acting accordingly.
It's worth noting here that NNTP is a TCP based protocol, and TCP is
connection-based, so if a packet is lost, there will be several
retransmit requests, before finally giving up on that connection as it
eventually times out at one end or the other, or is RESET.
Thus the hangs are pan waiting for that response. If the connection is
bad and it doesn't get it, there will be retransmission attempts, but if
the connection is bad enough that they fail to reach pan as well... that
last paragraph kicks in.
Of course if pan's not setting the message-ids but instead letting the
server do it, both it and the server have far less to go on when they try
to pick up the pieces from the lost connection. As you can see in the
quote, ensuring the same message-id is the recommended recovery option,
and of course pan can't do that if it didn't set the message-id and thus
doesn't know it.
Thus as I said, this very issue very likely has something to do with why
current pan always sets a message-id, but does allow the user to choose
the domain name that appears as part of that message-id, if desired.
So...
I'd /bet/ that both pan and the server are observing at least the letter
of the RFCs (the MUSTs), but not necessarily the spirit (as seen in the
SHOULDs). The primary culprit here is the bad network conditions, but
either side could do better. The server could do better by applying
common-sense checks to messages posted within seconds of each other,
particularly if they do not include the message-id, since in that case
obviously the client isn't observing the recommended SHOULD, and is thus
suspect. The client, pan, could do better by following the
recommendation to ensure the same message-id, basically by assigning the
message-id on its own, as is apparently still an option in the outdated
version you're using, but is simply done without giving the user the
choice, in current pan.
So for your older pan version, you've "solved" the issue the RFC-
recommended way, by ensuring that the same message-id gets used, by
having pan set it. =:^)
> Do you know how to debug why Pan is sending messages in triplicate?
As I mentioned up top, the ngrep (network grep) tool is very useful for
troubleshooting actual network traffic issues such as this, letting you
grep the specific connection via IP and port, as well as regex-matching
specific packets if you have a better idea what you're looking for than
simply "everything from that connection."
Here, for example, you could ngrep the connection (by IP and port) for
POST, and for result codes of 240, 340 and 441.
But I think the RFC and the behavior you see are clear enough that you
shouldn't actually need to do that for this problem, tho of course
following the wire-line logs of an actual session is likely to be quite
educational and is thus something I'd recommend in its own right. =:^)
FWIW, I used ngrep here, when configuring pan for the new header
compression as well as the secure connections features, to verify that it
was working as intended. (Obviously, ngrep won't yield much of use from
the content of the secure connection except for verifying that it's no
longer clear-text, so I set that up last, but the TCP packet headers will
still be in the clear, or the packets couldn't be routed.) Your pan is
of course too old to have those features but it's still worth doing an
ngrep on the traffic at least once, just to see how the protocol actually
works. =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
- [Pan-users] How would I debug Pan (intermittently) sending posts in triplicate?, Rock, 2013/07/09
- Re: [Pan-users] How would I debug Pan (intermittently) sending posts in triplicate?,
Duncan <=
- Re: [Pan-users] How would I debug Pan (intermittently) sending posts in triplicate?, Rock, 2013/07/10
- Re: [Pan-users] How would I debug Pan (intermittently) sending posts in triplicate?, Rock, 2013/07/10
- Re: [Pan-users] How would I debug Pan (intermittently) sending posts in triplicate?, Rock, 2013/07/10
- Re: [Pan-users] How would I debug Pan (intermittently) sending posts in triplicate?, Duncan, 2013/07/10
- Re: [Pan-users] How would I debug Pan (intermittently) sending posts in triplicate?, Rock, 2013/07/10
- Re: [Pan-users] How would I debug Pan (intermittently) sending posts in triplicate?, Rock, 2013/07/10
- Re: [Pan-users] How would I debug Pan (intermittently) sending posts in triplicate?, Rock, 2013/07/10
- Re: [Pan-users] How would I debug Pan (intermittently) sending posts in triplicate?, Rock, 2013/07/10
- Re: [Pan-users] How would I debug Pan (intermittently) sending posts in triplicate?, Rock, 2013/07/10