|
From: | Grubb, Jared |
Subject: | RE: [lwip-devel] Curious struct packing issue - is it GCC? |
Date: | Fri, 24 Apr 2009 13:45:47 -0700 |
After thinking some more, you MIGHT be able to pinch out a few
more bytes if you cast the pointers as restricted pointers. Depending on the
contexts, this could give the compiler more freedom. This (I think) also
requires that SRC and DEST do not overlap. Something along the lines of (note
the “restrict” keyword): #define SMEMCOPY(DEST, SRC, LEN) \ do { \ const
size_t len = (LEN); \ u16_t *
restrict dest = (u16_t*)(DEST); \ u16_t *
restrict src = "" \ assert(
len%2==0 && len<21 ); \ if (len
> 1) dest[0] = src[0]; \ if (len
> 3) dest[1] = src[1]; \ if (len
> 5) dest[2] = src[2]; \ if (len
> 7) dest[3] = src[3]; \ if (len
> 9) dest[4] = src[4]; \ if (len
> 11) dest[5] = src[5]; \ if (len
> 13) dest[6] = src[6]; \ if (len
> 15) dest[7] = src[7]; \ if (len
> 17) dest[8] = src[8]; \ if (len
> 19) dest[9] = src[9]; \ } while 0 From: address@hidden
[mailto:address@hidden On Behalf Of Grubb,
Jared A couple minor suggestions; maybe add an assert( l % 2 == 0
&& l < 21), since otherwise your code would silently not handle
these cases, in case someone adds code and forgets about these limitations.
Also, perhaps change the variable names, because “l” looks a lot
like “1” and “l>3” kinda looks funny under some
fonts.
From:
address@hidden
[mailto:address@hidden On Behalf Of Bill
Auerbach I’m trying to inline
SMEMCPY. I see it’s used with lengths only of 4, 6, 18, 20 and
28. By inlining it with byte copies, I see over a 5% increase in outbound
bandwidth (that’s all I’m
focused on right now). As has come up this week, the call to memcpy
to copy 4 or 6 bytes is
very silly (IMO). I thought I would copy
u16_t in half the copies since either everything is u32_t aligned for my
processor (NIOS II –
GCC as mentioned) or *should* be u16_t aligned for IP related
items. Everything *is* u16_t aligned except one item – hwaddr in struct
netif. Netif does *not* include packing around its definition, but
curiously, it *does*
include packed struct members. Have I found a problem in that GCC carried
the included packed struct override through the remainder of the netif struct? If I delete
hwaddr_len from
struct netif and then replace the only
2 uses of it in dhcp.c (netif->hwaddr_len ) with ETHARP_HWADDR_LEN,
the remainder of netif is properly
aligned. Since we use ETHARP_HWADDR_LEN everywhere else in
lwIP, why keep it in the netif, especially if it’s used only in DHCP 2
times? Also, in icmp.c
SMEMCPY((u8_t
*)q->payload + sizeof(struct icmp_dur_hdr),
p->payload,
IP_HLEN + ICMP_DEST_UNREACH_DATASIZE); I think
this can be changed to MEMCPY since I don’t think any compiler will
inline a 28 byte copy. And this call is only in the icmp_dest_unreach
function. If we
do this, SMEMCPY uses only 4, 6, 18 and 20 bytes. The following macro
does well if anyone wants to experiment with it. With GCC, only the code
needed to copy the 2, 3, 9 or 10 u16_t’s is generated. (Interestingly,
MSC++ 2003 doesn’t eliminate unreachable code!) #define SMEMCPY(d,s,l)\ ((l > 1) ? * ((u16_t *) (d)+0) = *
((u16_t *) (s)+0),\ (l > 3) ? * ((u16_t *) (d)+1) = *
((u16_t *) (s)+1),\ (l > 5) ? * ((u16_t *) (d)+2) = *
((u16_t *) (s)+2),\ (l > 7) ? * ((u16_t *) (d)+3) = *
((u16_t *) (s)+3),\ (l > 9) ? * ((u16_t *) (d)+4) = *
((u16_t *) (s)+4),\ (l > 11) ? * ((u16_t *) (d)+5) = *
((u16_t *) (s)+5),\ (l > 13) ? * ((u16_t *) (d)+6) = *
((u16_t *) (s)+6),\ (l > 15) ? * ((u16_t *) (d)+7) = *
((u16_t *) (s)+7),\ (l > 17) ? * ((u16_t *) (d)+8) = *
((u16_t *) (s)+8),\ (l > 19) ? * ((u16_t *) (d)+9) = *
((u16_t *) (s)+9)\ : 0 : 0 : 0 : 0 : 0 : 0 : 0 : 0 : 0 : 0) This is
tested for DHCP, UDP, TCP on a processor where it fails if alignment
isn’t observed. That is, only after I removed the hwaddr_len from
netif (or you can simply make it a u32_t). Bill |
[Prev in Thread] | Current Thread | [Next in Thread] |