On 09/27/2016 10:45 PM, Rajalakshmi Srinivasaraghavan wrote:
+#if defined(HOST_WORDS_BIGENDIAN)
+#define VEXTULX_DO(name, elem) \
+target_ulong glue(helper_, name)(target_ulong a, ppc_avr_t *b) \
+{ \
+ target_ulong r = 0; \
+ int i; \
+ int index = a & 0xf; \
+ for (i = 0; i < elem; i++) { \
+ r = r << 8; \
+ if (index + i <= 15) { \
+ r = r | b->u8[index + i]; \
+ } \
+ } \
+ return r; \
+}
+#else
+#define VEXTULX_DO(name, elem) \
+target_ulong glue(helper_, name)(target_ulong a, ppc_avr_t *b) \
+{ \
+ target_ulong r = 0; \
+ int i; \
+ int index = 15 - (a & 0xf); \
+ for (i = 0; i < elem; i++) { \
+ r = r << 8; \
+ if (index - i >= 0) { \
+ r = r | b->u8[index - i]; \
+ } \
+ } \
+ return r; \
+}
+#endif
+
+VEXTULX_DO(vextublx, 1)
+VEXTULX_DO(vextuhlx, 2)
+VEXTULX_DO(vextuwlx, 4)
+#undef VEXTULX_DO
Ew.
This should be one 128-bit shift and one and.
Since the shift amount is a multiple of 8, the 128-bit shift for vextub[lr]x
does not need to cross a double-word boundary, and so can be decomposed into
one 64-bit shift of (count & 64 ? hi : lo).
For vextu[hw]lr]x, you'd need to do the whole left-shift, right-shift, or thing.
But still, fantastically better than a loop.