[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lwip-devel] [bug #39683] Assertion "seg->tcphdr not aligned" failed
From: |
Sylvain Rochet |
Subject: |
Re: [lwip-devel] [bug #39683] Assertion "seg->tcphdr not aligned" failed with MEM_ALIGNMENT = 8 |
Date: |
Fri, 8 May 2015 21:05:04 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Hello Simon,
On Fri, May 08, 2015 at 08:39:52PM +0200, address@hidden wrote:
> Sylvain Rochet wrote:
> >
> > Most x86_64 CPUs are able to do unaligned access without any cost
> > penalty so that's actually a fine default for unix ports.
>
> Without any cost penalty? How's that possible when the data to load
> spreads accros 2 system bus addresses?
>
> It's true that these CPUs support loading unaligned data, but AFAIK
> there can still be a performance penalty due to executing 2 loads
> that are then merged and shifted to get the requested data, or has
> this changed?
AFAIK that's overcomed by caches and smart Intel assembly to core CPU
language. Sandy bridge cache line is 64 bytes, meaning fetching aligned
and unaligned data from DDR cost exactly the same most of the time, L1
byte access are so damn fast and so damn parallelised in the huge
pipeline those CPU have that any shift, mask, or-operation, have a
negligeable impact to overall performance. Even if you need to fetch 2
cache-line for cross-page data you are very very probably going to need
both cache-line in the sofware or at least the second one.
I mean, if you write a software just to trip over the worst case for
each access, yes, you are going to see a penalty. If you have just a few
unaligned cross-page access the way hardware engineers tried their best
to hide the penalty then you are not going to see a penalty ;-)
Of course, we are only talking about Intel/AMD x86 CPUs, which are made
to work fast with softwares which are only available in binary-form and
poorly written (hello .exe world).
Sylvain
signature.asc
Description: Digital signature