qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Consult] tilegx: About floating point instructions


From: Chen Gang
Subject: Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
Date: Thu, 13 Aug 2015 22:59:14 +0800
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.7.0

Hello all:

For me, I guess for single insns, they are simple, and each calculation
insns group can not be mixed with each other. So current implementation
should be OK.

For double insns, I guess, only mul calculation can be mixed with other
calculation groups (add/sub groups or int2float/double groups), because
of optimization -- the mul calculation group have many insns.

So the implementation is below:

/*
 * Assume floating point mul operation group can mix with other groups.
 *
 * fdouble_unpack_max: ; skipped.
 *  
 * fdouble_unpack_min: ; skipped.
 *      
 * fdouble_add_flags:  ; move calc flags to dest.
 *                       save calc flags.
 *                       save calc addsub result.
 *
 * fdouble_sub_flags:  ; move calc flags to dest.
 *                       save calc flags.
 *                       save calc addsub result.
 *
 * fdouble_addsub:     ; move calc addsub result to dest.
 *                       set "addsub result" flag.
 *
 * fdouble_mul_flags:  ; move calc mul result to dest.
 *
 * fdouble_pack1:      ; if addsub result set
 *                         && srca == saved addsub result
 *                         && srcb == saved calc flags
 *                           move srca to dest.
 *                       else 
 *                           move srcb to dest.
 *
 * fdouble_pack2:      ; if srcb == r63 && "addsub result" flag
 *                           reset "addsub result" flag.
 *                       else if srcb == r63
 *                           pack srca dest (dest is orig srcb of pack1)
 *                           reference from tilegx.md: float(uns)sidf2.
 *                           get (u)int32_t a, then (u)int32_to_float64.
 *                       else
 *                           skipped.
 */


On 8/11/15 21:18, Chen Gang wrote:
> 
> Oh, it seems a little complex, for a testsuite case, it lets double add
> and double mul together! We need save more information for the correct
> calculation in pack1.
> 
> It is 20020314-1.exe, the related code (I guess it is correct):
> 
>         ...
> 
>         fdouble_unpack_max      r10, r3, zero
> .LVL2:
>         fdouble_unpack_max      r15, r2, zero
>         fdouble_add_flags       r12, r0, r1
>         mul_hu_lu       r13, r15, r10
>         mul_lu_lu       r16, r15, r10
>         mula_hu_lu      r13, r10, r15
>         fdouble_unpack_min      r11, r0, r1
>         {
>         shli    r14, r13, 32
>         fdouble_unpack_max      r17, r0, r1
>         }
>         {
>         mul_hu_hu       r15, r15, r10
>         add     r16, r16, r14
>         }
>         {
>         shrui   r13, r13, 32
>         fdouble_addsub  r17, r11, r12
>         }
>         {
>         cmpltu  r14, r16, r14
>         fdouble_mul_flags       r3, r2, r3
>         }
> .LVL3:
>         {
>         add     r13, r15, r13
>         fdouble_pack1   r12, r17, r12
>         }
>         {
>         add     r13, r13, r14
>         fdouble_unpack_max      r10, r0, zero
>         }
>         fdouble_pack1   r3, r13, r3
>         fdouble_pack2   r12, r17, zero
>         fdouble_pack2   r3, r13, r16
> 
>         ... 
> 
> Welcome any additional ideas, suggestions and completions.
> 
> Thanks.
> 
> On 8/9/15 09:14, Chen Gang wrote:
>> On 8/9/15 09:10, Chen Gang wrote:
>>>
>>> On 8/9/15 01:23, Chen Gang wrote:
>>>> Hello all:
>>>>
>>>> Below is my current idea for all floating point insns. For me, it is not
>>>> the precise implementation, even not completely implement -- assume pack
>>>> insns can only for packing (u)int32_t when they are used individually:
>>>>
>>>>   fsingle_add1        ; return calc flags, save calc result to env.
>>>>
>>>>   fsingle_sub1        ; return calc flags, save calc result to env.
>>>>
>>>>   fsingle_addsub2     ; set "has result" flag.
>>>>
>>>>   fsingle_mul1        ; skip return value, save calc result to env.
>>>>                         set "has result" flag.
>>>>
>>>>   fsingle_mul2        ; skipped.
>>>>
>>>>
>>>>   fsingle_pack1       ; skipped.
>>>>
>>>>   fsingle_pack1       ; if "has result"
>>>>                             reset "has result" flag.
>>>>                             return calc result from env.
>>>>                         else
>>>>                             pack srca 
>>>>                             reference from tilegx.md: float(uns)sisf2.
>>>>                             get (u)int32_t a, then (u)int32_to_float32.
>>>
>>> For "pack srca and srcb", the related demo like below (srca and srcb
>>> are uint64_t):
>>>
>>
>> Oh, sorry, for "pack srca" (not for "pack srca and srcb")
>>
>>>     switch (srca & 0x3ff) {
>>>
>>>     /* treat it as uint32_t */
>>>     case 0x9e:
>>>         return uint32_to_float32(srca >> 32, &FP_STATUS);
>>>
>>>     /* treat it as int32_t, must be negative number */
>>>     case 0x29e:
>>>         return int32_to_float32(srca >> 32 | 0x80000000, &FP_STATUS);
>>>
>>>     default:
>>>         unimplemented (gen_exception).
>>>     }
>>>
>>>>
>>>>   fdouble_unpack_max: ; skipped.
>>>>
>>>>   fdouble_unpack_min: ; skipped.
>>>>
>>>>   fdouble_add_flags:  ; return calc flags, save calc result to env.
>>>>
>>>>   fdouble_sub_flags:  ; return calc flags, save calc result to env.
>>>>
>>>>   fdouble_addsub:     ; set "has result" flag.
>>>>
>>>>   fdouble_mul_flags:  ; skip return flags, save calc result to env.
>>>>                         set "has result" flag.
>>>>
>>>>   fdouble_pack1:      ; if "has result" 
>>>>                             reset "has result" flag.
>>>>                             return calc result from env.
>>>>                         else
>>>>                             pack srca and srcb.
>>>>                             reference from tilegx.md: float(uns)sidf2.
>>>>                             get (u)int32_t a, then (u)int32_to_float64.
>>>>
>>>  
>>> For "pack srca and srcb", the related demo like below (srca and srcb
>>> are uint64_t):
>>>
>>>     switch (srcb & 0xffff) {
>>>
>>
>> Oh, sorry, should use 0xfffff instead of 0xffff.
>>
>>>     /* treat it as uint32_t */
>>>     case 0x21b00:
>>>         return uint32_to_float64(srca >> 4, &FP_STATUS);
>>>
>>>     /* treat it as int32_t, must be negative number */
>>>     case 0xa1b00:
>>>         return int32_to_float64(srca >> 4 | 0x80000000, &FP_STATUS);
>>>
>>>     default:
>>>         unimplemented (gen_exception).
>>>     }
>>>
>>>>   fdouble_pack2:      ; skipped.
>>>>
>>>>
>>>>   (fsingle_add1/sub1, fdouble_add/sub_flags can be used individually,
>>>>    e.g gcc testsuit for complex number).
>>>>
>>>>
>>>> Next, I shall implement the floating point insns, welcome any related
>>>> ideas, suggestions, and completions.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> On 8/5/15 22:16, Chen Gang wrote:
>>>>> On 8/4/15 23:04, Richard Henderson wrote:
>>>>>> On 08/04/2015 06:56 AM, Chen Gang wrote:
>>>>>>>
>>>>>>> On 8/4/15 04:47, Chen Gang wrote:
>>>>>>>> On 8/4/15 00:40, Richard Henderson wrote:
>>>>>>>>> On 08/01/2015 02:47 AM, Chen Gang wrote:
>>>>>>>>>> I am just adding floating point instructions (e.g. fsingle_add1),
>>>>>>>>>> but for me, I can not find any details about them (the ISA
>>>>>>>>>> documents only give a summary description, but not details), e.g.
>>>>>>>>>
>>>>>>>>> The tilegx splits the four/six cycle arithmetic into multiple
>>>>>>>>> black-box instructions.  You need only really implement one of the
>>>>>>>>> four, with the rest of them being implemented as nops or moves.
>>>>>>>>>
>>>>>>>>> Looking at what gcc produces gives the hints:
>>>>>>>>>
>>>>>>>>> fdouble_unpack_min    min, srca, srcb fdouble_unpack_max      max, 
>>>>>>>>> srca,
>>>>>>>>> srcb fdouble_add_flags        flg, srca, srcb fdouble_addsub          
>>>>>>>>> max, min, flg 
>>>>>>>>> fdouble_pack1         dst, max, flg fdouble_pack2             dst, 
>>>>>>>>> max, zero
>>>>>>>>>
>>>>>>>>> The unpack, addsub, and pack2 insns can be ignored, the add_flags
>>>>>>>>> insn can perform the whole operation, the pack1 insn performs a move
>>>>>>>>> from "flg" to "dst".
>>>>>>>>>
>>>>>>>>> Similarly for the single-precision:
>>>>>>>>>
>>>>>>>>> fsingle_add1          tmp, srca, srcb fsingle_addsub2         tmp, 
>>>>>>>>> srca, srcb 
>>>>>>>>> fsingle_pack1         flg, tmp fsingle_pack2          dst, tmp, flg
>>>>>>>>>
>>>>>>>>> The add1 insn performs the whole operation, the addsub2 and pack1
>>>>>>>>> insns are ignored, and the pack2 insn is a move from tmp to dst.
>>>>>>>>>
>>>>>>>
>>>>>>> After check the tilegx.md completely, for me, we still need implement
>>>>>>> each of them precisely, or we can not emulate all cases (e.g. muldf3).
>>>>>>
>>>>>> No, you can still implement all of muldf3 in fdouble_mul_flags.
>>>>>> Again, the fdouble_pack1 copies from the flag input to the output.
>>>>>>
>>>>>> Yes, there is a 64-bit multiply in there, but the tcg optimizer
>>>>>> should be able to delete all of that as unused.  Especially if you have 
>>>>>> the
>>>>>> fdouble_unpack* insns store zero into their destinations.
>>>>>>
>>>>>
>>>>> For me, I am not quite sure. But I guess, what you said should be OK (at
>>>>> least, what you said is very useful for the implementation).
>>>>>
>>>>>
>>>>>> Don't get me wrong -- more accurate implementation of the actual
>>>>>> insns would be nice, especially for debugging.  But if the insns
>>>>>> aren't accurately documented I don't see what choice we have.
>>>>>>
>>>>>
>>>>> For me, I guess, we can still try to implement the details.
>>>>>
>>>>>  - The document has all floating point instructions' summary, so we can
>>>>>    think of, or guess its implementation entirely.
>>>>>
>>>>>  - gcc uses them all and completely, so it is our good sample and good
>>>>>    reference (but we should not assume gcc must be correct, since we
>>>>>    just use qemu for gcc testsuite).
>>>>>
>>>>>  - Tilegx floating point format should be standard (at least, reference
>>>>>    to the standard format), so we can reference the related information
>>>>>    from google/baidu.
>>>>>
>>>>>
>>>>>> On the good side, implementing the entire operation as part of the 
>>>>>> "flags" step
>>>>>> probably results in faster emulation.
>>>>>>
>>>>>
>>>>> I guess so, too.
>>>>>
>>>>>
>>>>> I shall try to finish the simple implementation, firstly. Then try to
>>>>> implement the floating point instructions in details in the future (it
>>>>> should be lower priority).
>>>>>
>>>>>
>>>>> Thanks.
>>>>>
>>>>
>>>
>>
> 

-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed



reply via email to

[Prev in Thread] Current Thread [Next in Thread]