pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] How would I debug Pan (intermittently) sending posts in


From: Duncan
Subject: Re: [Pan-users] How would I debug Pan (intermittently) sending posts in triplicate?
Date: Tue, 9 Jul 2013 13:54:16 +0000 (UTC)
User-agent: Pan/0.140 (Chocolate Salty Balls; GIT 368aae4 /usr/src/portage/src/egit-src/pan2)

Rock posted on Tue, 09 Jul 2013 05:11:42 +0000 as excerpted:

> Have you ever seen this problem (and do you have insight into the
> cause)?
> 
> I'm using Pan 0.135, which is the latest easily possible on Centos6
> (long story)
> and, at times, but only with aioe.org:119, I end up with triplicate
> posts, each with its own message id, generally sent within the span of a
> few seconds to a minute, based on the time stamps.

I'm reasonably sure I know the issue, and it's not exactly pan's fault, 
as I'll explain.

However, if you like, you can use the "ngrep" tool (homepage http://ngrep/
sourceforge.net , if you'd like to read a bit about it before 
installing...), to grep your net connection for packets on port 119, etc, 
and see what's actually being sent/received.  That's one of the better 
net-activity diagnostics I know of, and I too learned about it here. =:^)

> I can generally tell when this is going to happen, because when I hit
> the File->Send Article (or Control Return), Pan, when set up by default
> to not create a message ID, hangs saying "sending", for, oh, about 30
> seconds or so, and then it ends up sending three exact messages all with
> different message ids.

This matches the pattern I'd expect...

> The weird thing is that it doesn't happen frequently (but it's
> embarrassing when it does happen); and that it only happens with
> aioe.org:119 (and, from a look at the event log, in general, it seems to
> happen when there are numerous connection errors with aioe.org such that
> I often resort to setting aioe to 0 connections just to quiet down the
> log file.

So does this...

> Through extensive work on the aioe side, we've ruled out that AIOE is
> creating the three messages (they're actually receiving three separate
> messages) and, in fact, when it happens too much, they ban me for a
> week or two (so there are negative repercussions from what Pan is doing
> wrongly).

... and this...

> What I "think" is happening, is this sequence (but I can't prove it):
> a) Pan experiences intermittent connection errors

I'm with you so far.

> b) Pan sometimes sends single messages, and sometimes sends triplicate
> messages.

Well... sort of.  But this is where you go wild, as I expect you're 
missing some of the pieces to the puzzle.

> c) The aioe server is the innocent recipient of multiple messages, and
> creates multiple message IDs for those messages

Mostly true (tho the server could certainly try a bit harder to detect 
duplicates sent without message-id from the same user within a narrow 
timeframe), but misses the point, as you're missing pieces of the puzzle.

> After experiencing this anomaly perhaps a dozen times in the past year,
> I gave up trying to debug (because I just don't know how); and simply
> set Pan so that it created a (hopefully unique) message ID. It took
> about two weeks before aioe let me back on - and then, now, with Pan
> setting unique message IDs, the problem went away. So, in a sense, it's
> 'solved'.

It's worth noting here that in current pan, the option to let the server 
assign message-id is gone.  Pan always assigns a message-id now, but does 
have an option to let you set the domain used in the message-id if you 
like.  Perhaps this sort of trouble is one reason why that option 
disappeared.

> But, I ask, in the general sense: Have you seen something similar?

I'm not sure I've ever seen this personally, but I believe I've seen it 
reported on the list a few times, and I understand enough about NNTP to 
have a pretty good idea of what's happening.

If you like, you can have a look at RFC 3977, Network News Transfer 
Protocol (NNTP) (dated October 2006 and obsoleting early RFC 977 while 
updating RFC 2980, with a further update as RFC 6048, Nov 2010, altho 
that updates the NNTP LIST command only and thus isn't of interest to us):

http://tools.ietf.org/html/rfc3977

Of interest to us is section 6.3, Article Posting, and in particular, 
6.3.1, the POST command.  Direct link to that:

http://tools.ietf.org/html/rfc3977#section-6.3.1


Basically, when a post is sent, the server MUST return a response code 
indicating success (response code 240) or failure (probably 441).  (Such 
codes often have a plain English description appended as well, for use by 
humans doing a manual telnet connection and/or for troubleshooting, but 
the numeric code is what a news client will be looking for.)  A 340 code 
is the "OK to send article" response after the initial POST command, 
before the actual article content is posted.

As covered in section 3.2, response codes...

http://tools.ietf.org/html/rfc3977#section-3.2

For the first digit, 2xx series responses indicate command completed OK, 
4xx series indicate syntax correct but failed for some reason, and 3xx 
indicates OK so far, send the rest.  The second digit indicates the 
function response, with x4x being the posting function.  There's posting 
examples at the end of section 6.3.1. =:^)


Back to section 6.3.1, the POST command, now quoting from the RFC:

>>>>

Following the presentation of the termination sequence by the client,
the server MUST return a response indicating success or failure of the 
article transfer. [...]

A response of 240 SHOULD indicate that, barring unforeseen server errors, 
the posted article will be made available on the server and/or 
transferred to other servers, as appropriate, possibly following further 
processing.  In other words, articles not wanted by the server SHOULD be 
rejected with a 441 response, rather than being accepted and then 
discarded silently.  However, the client SHOULD NOT assume that the 
article has been successfully transferred unless it receives an 
affirmative response from the server and SHOULD NOT assume that it is 
being made available to other clients without explicitly checking (for 
example, using the STAT command).

If the session is interrupted before the response is received, it is
possible that an affirmative response was sent but has been lost. 
Therefore, in any subsequent session, the client SHOULD either check 
whether the article was successfully posted before resending or ensure 
that the server will allocate the same message-id to the new attempt (see 
Appendix A.2).  The latter approach is preferred since the article might 
not have been made available for reading yet (for example, it may have to 
go through a moderation process).

<<<<

So basically, what pan's doing is waiting for that 240/441 response and 
acting accordingly.

It's worth noting here that NNTP is a TCP based protocol, and TCP is 
connection-based, so if a packet is lost, there will be several 
retransmit requests, before finally giving up on that connection as it 
eventually times out at one end or the other, or is RESET.

Thus the hangs are pan waiting for that response.  If the connection is 
bad and it doesn't get it, there will be retransmission attempts, but if 
the connection is bad enough that they fail to reach pan as well... that 
last paragraph kicks in.

Of course if pan's not setting the message-ids but instead letting the 
server do it, both it and the server have far less to go on when they try 
to pick up the pieces from the lost connection.  As you can see in the 
quote, ensuring the same message-id is the recommended recovery option, 
and of course pan can't do that if it didn't set the message-id and thus 
doesn't know it.

Thus as I said, this very issue very likely has something to do with why 
current pan always sets a message-id, but does allow the user to choose 
the domain name that appears as part of that message-id, if desired.


So...

I'd /bet/ that both pan and the server are observing at least the letter 
of the RFCs (the MUSTs), but not necessarily the spirit (as seen in the 
SHOULDs).  The primary culprit here is the bad network conditions, but 
either side could do better.  The server could do better by applying 
common-sense checks to messages posted within seconds of each other, 
particularly if they do not include the message-id, since in that case 
obviously the client isn't observing the recommended SHOULD, and is thus 
suspect.  The client, pan, could do better by following the 
recommendation to ensure the same message-id, basically by assigning the 
message-id on its own, as is apparently still an option in the outdated 
version you're using, but is simply done without giving the user the 
choice, in current pan.

So for your older pan version, you've "solved" the issue the RFC-
recommended way, by ensuring that the same message-id gets used, by 
having pan set it. =:^)

> Do you know how to debug why Pan is sending messages in triplicate?

As I mentioned up top, the ngrep (network grep) tool is very useful for 
troubleshooting actual network traffic issues such as this, letting you 
grep the specific connection via IP and port, as well as regex-matching 
specific packets if you have a better idea what you're looking for than 
simply "everything from that connection."

Here, for example, you could ngrep the connection (by IP and port) for 
POST, and for result codes of 240, 340 and 441.


But I think the RFC and the behavior you see are clear enough that you 
shouldn't actually need to do that for this problem, tho of course 
following the wire-line logs of an actual session is likely to be quite 
educational and is thus something I'd recommend in its own right. =:^)

FWIW, I used ngrep here, when configuring pan for the new header 
compression as well as the secure connections features, to verify that it 
was working as intended.   (Obviously, ngrep won't yield much of use from 
the content of the secure connection except for verifying that it's no 
longer clear-text, so I set that up last, but the TCP packet headers will 
still be in the clear, or the packets couldn't be routed.)  Your pan is 
of course too old to have those features but it's still worth doing an 
ngrep on the traffic at least once, just to see how the protocol actually 
works. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




reply via email to

[Prev in Thread] Current Thread [Next in Thread]