qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH] Add minimal Hexagon target - First in a series of patches -


From: Taylor Simpson
Subject: RE: [PATCH] Add minimal Hexagon target - First in a series of patches - linux-user changes + linux-user/hexagon + skeleton of target/hexagon - Files in target/hexagon/imported are from another project and therefore do not conform to qemu coding standards
Date: Wed, 20 Nov 2019 15:17:36 +0000

Are you saying there's a way to tell qemu to put the stack at a certain 
location?  Or are you suggesting we do something on the hardware side?

Taylor


-----Original Message-----
From: Richard Henderson <address@hidden>
Sent: Wednesday, November 20, 2019 8:43 AM
To: Taylor Simpson <address@hidden>; address@hidden; address@hidden; 
address@hidden
Subject: Re: [PATCH] Add minimal Hexagon target - First in a series of patches 
- linux-user changes + linux-user/hexagon + skeleton of target/hexagon - Files 
in target/hexagon/imported are from another project and therefore do not 
conform to qemu coding standards

-------------------------------------------------------------------------
CAUTION: This email originated from outside of the organization.
-------------------------------------------------------------------------

On 11/20/19 1:51 PM, Taylor Simpson wrote:
>>> +static inline target_ulong hack_stack_ptrs(CPUHexagonState *env,
>>> +                                           target_ulong addr) {
>>> +    target_ulong stack_start = env->stack_start;
>>> +    target_ulong stack_size = 0x10000;
>>> +    target_ulong stack_adjust = env->stack_adjust;
>>> +
>>> +    if (stack_start + 0x1000 >= addr && addr >= (stack_start - 
>>> stack_size)) {
>>> +        return addr - stack_adjust;
>>> +    }
>>> +    return addr;
>>> +}
>>
>> An explanation would be welcome here.
>> [Taylor]  I will add a comment.  One of the main debugging techniques is to 
>> use "-d cpu" and compare against LLDB output when single stepping.  However, 
>> the target and qemu put the stacks in different locations.  This is used to 
>> compensate so the diff is cleaner.
>
> Ug.  I understand why you want this, but... ug.
>
> [Taylor] Not sure what to say - I guess there's a fine line between elegance 
> and mayhem.

The hack may work for now, but I would certainly expect this code to go away 
eventually.  Sooner rather than later.

If you want dumps between LLDB and QEMU to match, then just put the stack at 
the same spot.  Messing about with the pointers after the fact is insane.


> [Taylor] To clarify, this is for a rare case when a hardware loop body is a 
> single packet.  Here's an example.  It's the portion of memset that gets 
> called when the number of bytes is small
>   400404:       10 40 02 60     60024010 {      loop0(0x40040c,r2)
>   400408:       08 c0 02 10     1002c008        p0 = cmp.eq(r2,#0); if 
> (p0.new) jump:nt 0x400414 <memset+0x34> }
>   40040c:       01 81 03 a1     a1038101 {      memb(r3+#1) = r1
>   400410:       10 c1 03 ab     ab03c110        memb(r3++#2) = r1 }  :endloop0
> The loop0 instruction sets the start address and the iteration count.  The 
> :endloop0 marks the end of the loop and tells the hardware to decrement the 
> counter and go back to the start if it's not zero.  So, the loop at 
> 0x40040c-0x400410 is a single packet.  In this case the hardware single step 
> will execute the entire loop.  Finally, if the loop has more than one packet, 
> the single stepping will go through each iteration as you describe.

The behaviour inconsistency between a single packet loop and a two packet loop 
sounds to me like a HW bug.
Perhaps of the WONTFIX variety, but still...


>>> +static void decode_packet(CPUHexagonState *env, DisasContext *ctx) {
>>> +    size4u_t words[4];
>>> +    int i;
>>> +    /* Brute force way to make sure current PC is set */
>>> +    tcg_gen_movi_tl(hex_gpr[HEX_REG_PC], ctx->base.pc_next);
>>
>> What's this for?
>> [Taylor] Honestly, I'm not sure.  If I remove it, nothing works - not even 
>> "hello world".

It's because you forgot to update PC before raising the exception for trap #0.
Which also ought to set DISAS_NORETURN to end the TB, since it longjmp's out.


>>> +    for (i = 0; i < 4; i++) {
>>> +        words[i] = cpu_ldl_code(env, ctx->base.pc_next + i * 
>>> sizeof(size4u_t));
>>> +    }
>>
>> And this?
>> [Taylor] It's reading from the instruction memory.  The switch statement 
>> below uses it.
>
> I thought packets are variable length and ended by a bit set.  Therefore why 
> the fixed iteration to 4?  Also...
>
>
> [Taylor] The maximum size of a packet is 4 words, so I go ahead and read that 
> much.  Once the packet is decoded, I set ctx->base.pc_next appropriately.  
> Note that most of the instructions in the switch add 4, but one of them adds 
> 8.

This is where you will need to take more care wrt the end of the packet and the 
end of the page.  I would expect something like

    max = -(dc->base.pc_next | TARGET_PAGE_MASK) / 4;
    if (max < 4) {
        /* this packet *might* cross a page boundary */
        if (dc->base.num_insns == 1) {
            /* this is the first packet in the TB -- allow it */
            max = 4;
        }
    } else {
        max = 4;
    }

    found_end = false;
    for (i = 0; i < 4; i++) {
        words[i] = load(...);
        if (end_of_packet(words[i])) {
            max = i + 1;
            found_end = true;
            break;
        }
    }
    if (!found_end) {
        if (i == 4) {
            /* illegal packet -- read all 4 without an end */
            gen_illegal_insn_or_whatever();
            dc->base.is_jmp = DISAS_NORETURN;
            return;
        }
        /* the packet crosses a page boundary -- defer to next tb */
        dc->base.is_jmp = DISAS_TOO_MANY;
        return;
    }

    ...

    case insn that needs two words:
        assert(i + 2 <= max);
        ...


r~

reply via email to

[Prev in Thread] Current Thread [Next in Thread]