[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
From: |
Chen Gang |
Subject: |
Re: [Qemu-devel] [Consult] tilegx: About floating point instructions |
Date: |
Tue, 18 Aug 2015 05:09:30 +0800 |
User-agent: |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 |
On 8/18/15 01:31, Richard Henderson wrote:
> On 08/15/2015 11:16 AM, Chen Gang wrote:
>
>> But what you said is really quite valuable to me!! we can treat the flag
>> as a caller saved context, then can let the caller can use callee freely
>> (in fact, I guess, the real hardware treats it as caller context, too).
>>
>> - we have to define the flag format based on the existing format in the
>> related docs and tilegx.md (reserve 0-20 and 25-31 bits).
>>
>> - We can only use 21-24 for mark addsub, mul, or typecast result. If
>> 21-24 bits are all zero, it means typecast result. For fsingle: 32-63
>> bits is the input integer; for fdouble: srca is the input integer.
>
> Plausible.
>
>>
>> - For addsub and mul result, we use 32-63 bits for an index of resource
>> handler (like 'fd' returned by open). fsingle_addsub2, fsingle_mul1,
>> fdouble_mul_flags, fdouble_addsub allocate resource, and pack1 free.
>
> No, that's a bad idea. No state external to the inputs to the insns.
>
We can use 21-24 bits for the state external to the inputs to the insns.
My idea is below:
/*
* Single floaing point instructions decription.
*
* - fsingle_add1, fsingle_sub1, and fsingle_pack1/2 can be used individually.
*
* - when fsingle_pack1/2 is used individually, it is for type cast.
*
* - the old 4Kth result is alrealy useless for caller.
*
* fsingle_add1 ; make context and calc result from rsrca and rsrcb.
* ; save result in roundup array, and add index to context.
* ; move context to rdst.
*
* fsingle_sub1 ; make context and calc result from rsrca and rsrcb.
* ; save result in roundup array, and add index to context.
* ; move context to rdst.
*
* fsingle_addsub2 ; skipped.
*
* fsingle_mul1 ; make context and calc result from rsrca and srcb.
* ; save result in roundup array, and add index to context.
* ; move context to rdst.
*
* fsingle_mul2 ; move rsrca to rdst.
*
* fsingle_pack1 ; skipped.
*
* fsingle_pack2 ; get context from rsrca (rsrca is context).
* ; if context for add/sub/mul
* ; get result from roundup array based on index.
* ; move result to rdst.
* ; else
* ; get (u)int32_t interger from context,
* ; (u)int32_to_float32.
*/
/*
* Double floating point instructions' description.
*
* - fdouble_add_flags, fdouble_sub_flags, and fdouble_pack1/2 can be used
* individually.
*
* - when fdouble_pack1/2 is used individually, it is for type cast.
*
* - the old 4Kth result is alrealy useless for caller.
*
* fdouble_unpack_max: ; skipped.
*
* fdouble_unpack_min: ; skipped.
*
* fdouble_add_flags: ; make context and calc result from rsrca and rsrcb.
* ; save result in roundup array, and add index to context.
* ; move context to rdst.
*
* fdouble_sub_flags: ; make context and calc result from rsrca and rsrcb.
* ; save result in roundup array, and add index to context.
* ; move context to rdst.
*
* fdouble_addsub: ; skipped.
*
* fdouble_mul_flags: ; make context and calc result from rsrca and rsrcb.
* ; save result in roundup array, and add index to context.
* ; move context to rdst.
*
* fdouble_pack1: ; get context from rsrcb.
* ; if context for add/sub/mul
* ; get result from roundup array based on index.
* ; move result to rdst.
* ; else
* ; get (u)int32_t interger from rsrca
* ; (u)int32_to_float64.
*
* fdouble_pack2: ; skipped.
*/
#define TILEGX_F_COUNT 0x1000 /* Maximized results count for fdouble */
#define TILEGX_F_DUINT 0x21b00 /* exp is for uint32_t to double */
#define TILEGX_F_DINT 0xa1b00 /* exp is for int32_t to double */
#define TILEGX_F_SUINT 0x9e /* exp is for uint32_t to single */
#define TILEGX_F_SINT 0x29e /* exp is for int32_t to single */
#define TILEGX_F_TCAST 0 /* Result type is for typecast, MUST BE 0 */
#define TILEGX_F_TCALC 1 /* Result type is for add/sub/mul */
#pragma pack(push, 1)
typedef struct TileGXFPCtx {
/* According to float(uns)sisf2 and float(uns)sidf2 in gcc tilegx.md */
uint64_t exp : 20; /* Exponent, for TILEGX_F_(D/S)(U)INT */
/* Context type, defined and used by callee */
uint64_t type : 5; /* For TILEGX_F_T(CAST/CALC) */
/* Come from TILE-Gx ISA document, Table 7-2 for floating point */
uint64_t unordered : 1; /* The two are unordered */
uint64_t lt : 1; /* 1st is less than 2nd */
uint64_t le : 1; /* 1st is less than or equal to 2nd */
uint64_t gt : 1; /* 1st is greater than 2nd */
uint64_t ge : 1; /* 1st is greater than or equal to 2nd */
uint64_t eq : 1; /* The two operands are equal */
uint64_t neq : 1; /* The two operands are not equal */
/* Result data according to the context type */
uint64_t data : 32; /* The explanation is below */
#if 0
/* This is the explanation for 'data' above */
union {
uint32_t idx; /* Index for the add/sub/mul result */
uint32_t aint; /* Absolute input integer for fsingle typecast */
/*
* There is no input integer for fdouble typecast in context, it is in
* rsrca parameter of fdouble_pack1 instruction.
*/
};
#endif
} TileGXFPCtx;
#pragma pack(pop)
typedef struct FPUTLGState {
float_status fp_status; /* floating point status */
int pos32; /* Current position for fsingle result */
int pos64; /* Current position for fdouble result */
float32 val32s[TILEGX_F_COUNT]; /* results roudup array for fsingle */
float64 val64s[TILEGX_F_COUNT]; /* results roudup array for fdouble */
} FPUTLGState;
>
> It really would be nice if we had the same documentation that was used
> to implement the gcc backend. Otherwise we have to rely on guesswork.
>
> For single-precision it appears that the format is
>
> 63 31 24 10 9 0
> [ mantissa with implicit and guard bits | cmp flags | ?? | s | exp ]
>
> We are able to deduce the bias for the exponent based on the input gcc gives
> us
> for floatunssisf: 0x9e == 2**31 when the mantissa is normalized.
>
> So:
>
> fsingle_add1, fsingle_sub1: Perform the operation. Split the result
> such that all of the fields above are filled in.
>
> fsingle_mul1: Perform the operation. Split the result such that all
> of the fields above except for cmp-flags are filled in.
>
> fsingle_addsub2: Nop.
> fsingle_mul2: Move srca to dest.
>
> fsingle_pack1: Normalize and repack the above. In the add/sub/mul case,
> no normalization will be required, so no change to the result occurs.
>
> In the floatunssisf2 case, the input implicit bit may not be set, and
> guard bits may be set, so real rounding and normalization must occur,
> adjusting the exponent constructed by gcc in building the flags.
>
> For double-precision things are more complicated. Precisely because there is
> no dedicated fdouble_mul[1-4] instructions, but instead gcc is to use a normal
> 128-bit integer multiplication on the mantissa.
>
> For double-precision it appears that the format is
>
> 63 57 4 0
> unpack [ overflow bits? | mantissa with implicit bit | guard bits ]
>
> 63 31 24 20 19 8 0
> flags [ ?? | cmp flags | ?? | s | exp | ?? ]
>
> Similarly we can compute the bias for exp as 0x21b == 2**53.
> Or is it 20 bits of exponent and 0x21b00 == 2**53?
>
> So:
>
> fdouble_unpack_max, fdouble_unpack_min: Perform the operation as described,
> extracting the mantissa of the min/max absolute value.
>
> fdouble_add_flags, fdouble_sub_flags: Extract the signs and exponent of the
> sources, and compute the sign and exponent of the result. Set a bit,
> presumably one of [24:21] that tell fdouble_addsub whether to perform
> addition or subtraction. Set the comparison flags.
>
> fdouble_mul_flags: Extract the signs and exponent of the sources, and
> compute
> the sign and exponent of the result. Note that the result of the 128-bit
> multiplication is guaranteed to be non-normalized : the 2 57-bit inputs will
> produce a 114-bit intermediate result. Which means that bits [63:51] are
> guaranteed to be zero on entry to the pack stages. Which means that some
> bias will need to be applied to the intermediate exponent.
>
> fdouble_addsub: Add or subtract the mantissas based on a bit in flags.
>
> fdouble_pack1: Move flags (srcb) to result (dest).
> fdouble_pack2: Take the 128-bit mantissa of srca+srcb, the flags of dest,
> and normalize and pack the result.
>
OK, thanks, what you said above sounds reasonable. It is more precise
than my current implementation (but it is also a little more complex).
For me, if my current implementation can not pass gcc testsuite (I guess
not), I shall try to implement what you said above, next.
Thanks.
--
Chen Gang
Open, share, and attitude like air, water, and life which God blessed
- Re: [Qemu-devel] [Consult] tilegx: About floating point instructions, (continued)
- Re: [Qemu-devel] [Consult] tilegx: About floating point instructions, Chen Gang, 2015/08/08
- Re: [Qemu-devel] [Consult] tilegx: About floating point instructions, Chen Gang, 2015/08/08
- Re: [Qemu-devel] [Consult] tilegx: About floating point instructions, Chen Gang, 2015/08/11
- Re: [Qemu-devel] [Consult] tilegx: About floating point instructions, Chen Gang, 2015/08/13
- Re: [Qemu-devel] [Consult] tilegx: About floating point instructions, Chen Gang, 2015/08/15
- Re: [Qemu-devel] [Consult] tilegx: About floating point instructions, Richard Henderson, 2015/08/15
- Re: [Qemu-devel] [Consult] tilegx: About floating point instructions, Chen Gang, 2015/08/15
- Re: [Qemu-devel] [Consult] tilegx: About floating point instructions, Chen Gang, 2015/08/15
- Re: [Qemu-devel] [Consult] tilegx: About floating point instructions, Chen Gang, 2015/08/15
- Re: [Qemu-devel] [Consult] tilegx: About floating point instructions, Richard Henderson, 2015/08/17
- Re: [Qemu-devel] [Consult] tilegx: About floating point instructions,
Chen Gang <=
- Re: [Qemu-devel] [Consult] tilegx: About floating point instructions, Richard Henderson, 2015/08/17
- Re: [Qemu-devel] [Consult] tilegx: About floating point instructions, Chen Gang, 2015/08/18
- Re: [Qemu-devel] [Consult] tilegx: About floating point instructions, Peter Maydell, 2015/08/18
- Re: [Qemu-devel] [Consult] tilegx: About floating point instructions, Chen Gang, 2015/08/18
- Re: [Qemu-devel] [Consult] tilegx: About floating point instructions, Peter Maydell, 2015/08/18
- Re: [Qemu-devel] [Consult] tilegx: About floating point instructions, Chen Gang, 2015/08/18