[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] jit_size: Don't round up the size to a multiple of 4 KiB

From: Paul Cercueil
Subject: Re: [PATCH] jit_size: Don't round up the size to a multiple of 4 KiB
Date: Tue, 07 Jun 2022 17:13:32 +0100

Hi Paulo,

Le lun., juin 6 2022 at 16:03:04 -0300, Paulo César Pereira de Andrade <> a écrit :
Em dom., 5 de jun. de 2022 às 07:25, Paul Cercueil
<> escreveu:


  Applied, but some considerations below.

When using an external code buffer, a program using Lightning will call jit_get_code() to get the expected size of the code, then allocate some space in the code buffer, then call jit_set_code() to specify where the
 code should be written to.

  There should be some api to reset _jit->user_code, and use the patch
based on if _jit->user_code. Also, likely instead of rounding to 4096 bytes, round to pagesize. The rounding up is just to "adapt" to what mmap does.

 If the reported size is rounded up to a multiple of 4 KiB, then the
 allocator will always try to allocate 4 KiB for blocks of code that
might even be smaller than a hundred bytes. The program can then choose
 to realloc() the allocated block to the actual size of the generated
 code, to reduce the memory used.

It is required some extra logic to allow easy reutilization of a _jit context. The current logic is good for significantly large jit buffers, but too costly
for code doing very small buffers. Resetting the internal state of a
_jit context
should be far faster than deleting a context and creating a new one.

That's something I could actually use. Right now I have $(nproc) worker threads compiling blocks of code in parallel. This can lead to compiling 1000+ blocks per second, each one with its own _jit context. I could use a _jit context per worker instead.

With that said - my setup has been working pretty well so far.

However, this will cause dramatic memory fragmentation; for instance, if working with a 2 MiB code buffer, in which 512 blocks of 4 KiB are
 allocated but later realloc'd to 128 bytes each, the total amount of
allocated memory will be 128 * 512 == 64 KiB, with almost 1.9 MiB free, yet it will be impossible to allocate any new blocks as there would be
 no way to find a contiguous 4 KiB area.

Resetting a jit_context could also have some "self healing" code, in case it has a too large _jit->*.{count,length}. Most likely one to use more memory
is _jit->pool.ptr, if some very large jit code buffer was written (it
always rounds
up to 1024 free nodes when running out of nodes).

Besides, I really don't understand why it was rounded up to a multiple
 of 4 KiB, as this is not a requirement for mmap().

Usually the memory rounding up to 4096 will either not be accessible (and not used by any other mmap call) or the code will just refuse to write to that extra memory. It was done so, just in case it miscalculated by a few bytes the code size, to not need to mmap or mremap again. If mremap is available,
that logic is mostly pointless...

What I was wondering was why it uses a sum of "maximum opcode size" as the code's buffer size. Wouldn't it be possible to have two jit_emit() passes, the first one incrementing a byte counter, the second one actually writing data?


 Signed-off-by: Paul Cercueil <>
  lib/jit_size.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/lib/jit_size.c b/lib/jit_size.c
 index 61f1aa4..3a78394 100644
 --- a/lib/jit_size.c
 +++ b/lib/jit_size.c
 @@ -105,7 +105,7 @@ _jit_get_size(jit_state_t *_jit)
for (size = JIT_INSTR_MAX, node = _jitc->head; node; node = node->next)
         size += _szs[node->code];

 -    return ((size + 4095) & -4096);
 +    return size;



reply via email to

[Prev in Thread] Current Thread [Next in Thread]