gnustep-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New ABI NSConstantString


From: David Chisnall
Subject: Re: New ABI NSConstantString
Date: Thu, 5 Apr 2018 18:41:19 +0100

On 5 Apr 2018, at 17:01, Ivan Vučica <address@hidden> wrote:
> 
> Layman question: does it make sense to optimize for space, too, and have a 
> smaller structure for tiny constant strings?

With the new ABI, we get much better deduplication across compilation units for 
selectors and protocols, which should extend to constant strings.

At run time, on 64-bit platforms, we generate GSTinyString instances, which are 
64 bits and are hidden inside a pointer.  I’m tempted to make the compiler 
generate those directly.

> For 32bit ptrs and longs, this would be 20 bytes without the string itself. I 
> don't think that's a lot, but I thought I'd ask.

20 bytes isn’t too bad, 36 (for 64-bit platforms) is a bit more.  On a 
CHERI-like platform, it grows to 52 bytes, which starts to feel a bit excessive.

The absolute minimum structure is an isa pointer immediately followed by the 
character data, with a null terminator.  That’s not a great idea, because the 
isa pointer needs to be mutable, which would make the constant string also 
accidentally mutable.

The next smallest would be an isa pointer and a null-terminated string pointer, 
so 8 / 16 / 32 bytes on the respective architectures.

The cost of recomputing the hash is sufficiently expensive that it’s probably 
worth using at least the 28 bits that we provide already for string hashes.  

I’ve done some measurements in -base.  In the compiled binary, we have a total 
of 84976 bytes of strings, in 3307 strings, so an average of just under 26 
bytes per string, so 36 bytes of overhead seems quite a lot, and even 20 is 
quite noticeable.  If we exclude strings of 8 or fewer characters, this gives 
us 81637 bytes in 2586 strings, so an average length of just under 32 bytes, so 
36 bytes is still more than 100% overhead and adds up to about 90KB in the 
final binary.  

With the current encoding, each constant string is 24 bytes, so that adds up to 
about 60KB (excluding the string data itself) on 64-bit platforms.  That’s 
about 0.5% of the total binary size, so I’m not too worried about making it 
bigger.  Even making it 80KB is a lot of overhead per string (roughly 100%), 
but isn’t that much of the total binary size.


David




reply via email to

[Prev in Thread] Current Thread [Next in Thread]