lwip-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lwip-devel] [bug #24212] Deadlocked tcp_retransmit due to exceeded pcb-


From: Simon Goldschmidt
Subject: [lwip-devel] [bug #24212] Deadlocked tcp_retransmit due to exceeded pcb->cwnd
Date: Sat, 06 Sep 2008 08:44:54 +0000
User-agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; de; rv:1.9.0.1) Gecko/2008070206 Firefox/3.0.1

URL:
  <http://savannah.nongnu.org/bugs/?24212>

                 Summary: Deadlocked tcp_retransmit due to exceeded pcb->cwnd
                 Project: lwIP - A Lightweight TCP/IP stack
            Submitted by: goldsimon
            Submitted on: Sa 06 Sep 2008 08:44:51 GMT
                Category: TCP
                Severity: 4 - Important
              Item Group: None
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any
         Planned Release: 
            lwIP version: 1.3.0

    _______________________________________________________

Details:

Hans-Joerg Wagner found this, posted on lwip-users:


Hi there

Kieran asked me to do further investigations considering the topic
"Deadlocked tcp_retransmit due to exceeded pcb->cwnd" (see
http://lists.gnu.org/archive/html/lwip-users/2008-07/msg00098.html).

With some "segment loss emulation" code I was able to reproduce the deadlock
frequently.

My summary first:
It is not a high ACK loss problem. The congestion window test in tcp_output()
fails, cause the unacked queue gets misordered in the situation I will further
describe. This is a severe bug in my opinion and I have no idea how to solve
it (despite I wrote a workaround, but that's not a properly styled solution).
I could imagine that some ghost like troubles of other lwip users are caused
by this bug too.

The "segment loss emulation"
To gulp only one segment is not sufficient to reproduce the problem
frequently. Therefore I decided to gulp the first retransmission of every 10th
segment too. And here we are:

/* begin code snippet */

static void
tcp_output_segment(struct tcp_seg *seg, struct tcp_pcb *pcb)
{

    //... Some statements ...

    packets_sent++;
    if ((packets_sent % 10) == 0)
    {
      //we enter here on every 10th segment and gulp it
        //(we omit to call ip_output()
        //we keep the sequence number of the gulped segment
        kept_seqno = seg->tcphdr->seqno;
    }
    else if (kept_seqno == seg->tcphdr->seqno)
    {
        //if the gulped segment gets retransmitted the first
        //time we gulp it once again.
        kept_seqno = 0;
    }
    else {
      //in every other case we do normal output
      ip_output(seg->p, &(pcb->local_ip), &(pcb->remote_ip), pcb->ttl,
pcb->tos,
      IP_PROTO_TCP);
    }

    // ... Some statements ...
}

/* end of code snippet */

And this is whats happening on the "ether". In my queue representation I use
the sequence number of the segments (not tcp_seg pointers). The sequence
numbers are given from the last traces I made on our GPRS system.

1. Segment 8720:10085 was the last acknowledged segment from our gprs remote
peer.
2. Segments 10085 to 14183 get enqueued by the local application the unsent
queue is as follows:
   unsent->10085->11453->12818->14183.
3. Segment 10085 should be the next "in-sequence" to be sent, hovever the
gulp
   mechanism of our local peer emulates a segment loss. The queues are as
follows:
   unsent->11453->12818->14183
   unacked->10085.
4. Due to the available congestion window (cwnd) segment 11453 is sent (not
gulped).
   unsent->12818->14183
   unacked->10085->11453
5. Due to the available congestion window (cwnd) segment 12818 is sent (not
gulped).
   Unsent->14183
   unacked->10085->11453->12818
6. Due to the available congestion window (cwnd) segment 14183 is sent (not
gulped).
   Unsent->empty
   unacked->10085->11453->12818->14183
7. Due to high round trip time in the gprs network we get the first dupack
for
   segment 10085 from our remote peer
8. We get the second dupack for 10085
9. We get the third dupack for 10085. According to RFC2581 we shall start a
fast retransmission now
10. For fast retransmission tcp_process() calls tcp_receive() calls
tcp_rexmit() calls tcp_output()
11. Cause tcp_output() was invoked by an initial tcp_input() it bailes out
on
    if (tcp_input_pcb == pcb)
    ==> !!! This violates RFC2581 IMHO !!!
12. But tcp_rexmit() already tinkered our queues by placing the first unacked
segment to the
    unsent queue.
    unsent->10085
    unacked->11453->12818->14183
13. The next few ouput attempts bail out in tcp_output() due to the nagle
algorith
    (tcp_do_output_nagle()). Thus nothing more hapens till a retransmission
timeout occurs
14. tcp_slowtmr() requires a retransmission (pcb->rtime >= pcb->rto). This
shrinks down the
    congestion window to the maximum segment size (1390 in my case).
    BTW: A retransmission is triggered by segment 14183 and not by 10085 in
this case
    which is an aftereffect of the underlying bug IMHO.
15. tcp_slowtmr() calls tcp_rexmit_rto(). The rto function moves all unacked
segments to the head
    of the unsent queue. This is final step causing the deadlock in
tcp_output() cause the
    smallest sequence number is now at the end of the queue.
    Unsent->11453->12818->14183->10085.
16. tcp_ouput() is finally called. Instead of retransmitting 10085 it
retransmits 11453 but fails
    on (seg->tcphdr->seqno - pcb->lastack + seg->len > wnd) cause
    seg->tcphdr->seqno = 11453
    pcb->lastack = 10085
    seg->len = 12818-11453 = 1365
    wnd = 1390
    11453-10085+1365=2733 which is greater than 1365 and therefor the test
fails.
17. from now on we have a deadlock cause the queue stays misordered and
tcp_output() always
    fails on this test.

I needed a quick fix for our project and therefore I reordered the queue in
tcp_output before the
While loop in tcp_output. However this is just a quick fix to fight the
symptoms. Therefore I ask
for other suggestions or perhaps a patch.

Remark: Perhaps this situation is hard to reproduce. Without the "segment
loss emulation" the
Deadlock only occured by using very paricular memory (pbuf etc.)
configurations and by using the gprs
Network with its high round trip delays and relays.

/* begin of quick fix */
  tcp_reorder_segments(&seg);

  while ((seg != NULL) && (seg->tcphdr->seqno - pcb->lastack + seg->len >
wnd))
  {
     //.... Some statements ...//
  }

void tcp_reorder_segments(struct tcp_seg **seg_ptr)
{
    struct tcp_seg* left_seg;
    struct tcp_seg* right_seg;
    struct tcp_seg* head_seg;

    if (*seg_ptr == NULL)
    {
        return;
    }
    if ((*seg_ptr)->next == NULL)
    {
        return;
    }
    left_seg = *seg_ptr;
    head_seg = *seg_ptr;
    right_seg = (*seg_ptr)->next;
    while (right_seg != NULL)
    {
        if (right_seg->tcphdr->seqno < head_seg->tcphdr->seqno)
        {
            left_seg->next = right_seg->next;
            right_seg->next = head_seg;
            head_seg = right_seg;
            if (left_seg->next == NULL)
            {
                break;
            }
        }
        else
        {
            left_seg = right_seg;
        }
        right_seg = left_seg->next;
    }
    *seg_ptr = head_seg;
}

/* end of quick fix */

Kind regs
Hans-Joerg Wagner B.Sc.EE / PGDip. SE




    _______________________________________________________

Reply to this item at:

  <http://savannah.nongnu.org/bugs/?24212>

_______________________________________________
  Nachricht geschickt von/durch Savannah
  http://savannah.nongnu.org/





reply via email to

[Prev in Thread] Current Thread [Next in Thread]