[mailto:address@hidden On Behalf Of Bill
Sent: Friday, April 24, 2009 13:01
Subject: [lwip-devel] Curious struct packing issue - is it GCC?
I’m trying to inline SMEMCPY.
I see it’s used with lengths only of 4, 6, 18, 20 and 28. By inlining it
with byte copies, I see over a 5% increase in outbound bandwidth (that’s all
I’m focused on right
now). As has come up this week, the call to memcpy to copy 4 or 6 bytes is very silly (IMO).
I thought I would copy
u16_t in half the copies since either everything is u32_t aligned for my
processor (NIOS II –
GCC as mentioned) or *should* be u16_t aligned for IP related
items. Everything *is* u16_t aligned except one item – hwaddr in struct netif.
Netif does *not* include packing around its definition, but curiously, it *does* include
packed struct members. Have I found a problem in that GCC carried the
included packed struct override through the remainder of the netif struct?
If I delete
struct netif and then replace the only
2 uses of it in dhcp.c (netif->hwaddr_len ) with ETHARP_HWADDR_LEN,
the remainder of netif is properly
Since we use ETHARP_HWADDR_LEN everywhere else in
lwIP, why keep it in the netif, especially if it’s used only in DHCP 2 times?
Also, in icmp.c
*)q->payload + sizeof(struct icmp_dur_hdr),
IP_HLEN + ICMP_DEST_UNREACH_DATASIZE);
this can be changed to MEMCPY since I don’t think any compiler will inline
a 28 byte
copy. And this call is only in the icmp_dest_unreach function.
do this, SMEMCPY uses only 4, 6, 18 and 20 bytes. The following macro
does well if anyone wants to experiment with it. With GCC, only the code
needed to copy the 2, 3, 9 or 10 u16_t’s is generated. (Interestingly, MSC++
2003 doesn’t eliminate unreachable code!)
((l > 1) ? * ((u16_t *) (d)+0) = *
((u16_t *) (s)+0),\
(l > 3) ? * ((u16_t *) (d)+1) = *
((u16_t *) (s)+1),\
(l > 5) ? * ((u16_t *) (d)+2) = *
((u16_t *) (s)+2),\
(l > 7) ? * ((u16_t *) (d)+3) = *
((u16_t *) (s)+3),\
(l > 9) ? * ((u16_t *) (d)+4) = *
((u16_t *) (s)+4),\
(l > 11) ? * ((u16_t *) (d)+5) = *
((u16_t *) (s)+5),\
(l > 13) ? * ((u16_t *) (d)+6) = *
((u16_t *) (s)+6),\
(l > 15) ? * ((u16_t *) (d)+7) = *
((u16_t *) (s)+7),\
(l > 17) ? * ((u16_t *) (d)+8) = *
((u16_t *) (s)+8),\
(l > 19) ? * ((u16_t *) (d)+9) = *
((u16_t *) (s)+9)\
: 0 : 0 : 0 : 0 : 0 : 0 : 0 : 0 : 0 : 0)
tested for DHCP, UDP, TCP on a processor where it fails if alignment isn’t
observed. That is, only after I removed the hwaddr_len from netif (or you
can simply make it a u32_t).