[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [avr-libc-dev] [untested PATCH] Save 11 instructions in vfprintf_flt
From: |
George Spelvin |
Subject: |
Re: [avr-libc-dev] [untested PATCH] Save 11 instructions in vfprintf_flt.o |
Date: |
8 Dec 2016 13:04:47 -0500 |
>> + * Unlike "if (src & smask) dst |= dmask", which is also two instructions
> This is confusing because the BST + BLD code below is not a replacement
> for what the C code is indicating. For example the C code never clears
> the bit as opposed to BLD.
>> + * and two cycles, this overwrites the destination bit (clearing it
>> + * if necessary), and has fewer constraints; it can operate on the low
>> + * 16 registers.
That's *exactly* what I was trying to say in the rest of the sentence you
quoted back to me! Perhaps I should just give the equivalent C code.
> + */
> +#define COPYBIT(dst, dmask, src, smask) \
> + asm( "bst %2,%3" \
> + "\n bld %0,%1" \
> + : "=r" (dst) \
> This is wrong because the old value of dst does not die here:
> all bits except %1 are surviving. Correct constraint is "+r".
Yes, I notice that myself a few minutes after posting. It didn't
seem earth-shaking enough to warrant a followup.
It's now:
/*
* Copy bit (src & smask) to (dst & dmask). This expands to a pair of
* bst/bld instructions which transfer the bit via the T register.
*
* Equivalent to "dst &= ~dmask; if (src & smask) dst |= dmask;", but is
* only two instructions and has fewer constraints; it can operate on
* the low 16 registers.
*/
#ifdef __BUILTIN_AVR_INSERT_BITS
/*
* Using a GCC builtin is preferable; it gives the optimizer more
* information and saves a few more bytes.
*
* The first argument to __builtin_avr_insert_bits is an array of 8
* nibbles, each of which indicates the source of the corresponding
* bit of the resilt. All are 0xf (return the corresponding dst bit
* unchanged) except the one corresponding to dmask, which specifies
* the bit position in src to copy from.
*/
#define COPYBIT(dst, dmask, src, smask) \
((dst) = __builtin_avr_insert_bits( \
~((15ul-ntz(smask))*(dmask)*(dmask)*(dmask)*(dmask)), \
src, dst))
#else
#define COPYBIT(dst, dmask, src, smask) \
asm( "bst %2,%3" \
"\n bld %0,%1" \
: "+r" (dst) \
: "n" (ntz(dmask)), \
"r" (src), \
"n" (ntz(smask)))
#endif