Re: [Qemu-devel] [PATCH 08/10] target-avr: adding instruction translatio

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 08/10] target-avr: adding instruction translatio

From:	Richard Henderson
Subject:	Re: [Qemu-devel] [PATCH 08/10] target-avr: adding instruction translation
Date:	Sun, 5 Jun 2016 16:34:28 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1.0

On 06/05/2016 02:47 PM, Michael Rolnik wrote:

    Is there a reason this code isn't going into translate.c?
    You wouldn't need the declarations in translate-inst.h or translate.h.

I see here two levels of logic
a. instruction translation
b. general flow of program translation.

FWIW, static functions can be automatically inlined by the compiler, whereasexternal function calls can't. In theory, the compiler could auto-inline theentire translator into a single function.

    Order these functions properly and you don't need forward declarations.

is it a requirements? this way it look cleaner.

Does it? In my experience it just means you've got to edit two places when onechanges things.

    While this is exactly the formula in the manual, it's also equal to

        ((Rd ^ Rr) ^ R) & 16

Please explain. I don't see it.

http://www.wolframalpha.com/input/?i=A+and+B+or+A+and+not+C+or+B+and+not+C,+A+xor+B+xor+C


I did explain:

    where we examine the difference between the non-carry addition (Rd ^ Rr)
    and the carry addition (R) to find the carry out of bit 3.  This reduces
    the operation count to 2 from 5.

It's not a manipulation of the original expression, but a different way oflooking at the problem.

You want to compute carry-out of bit 3. Given the *result* of the addition,it's easier to examine bit 4, into which carry-in has happened, rather thanexamine bit 3 and re-compute the carry-out.

The AVR hardware probably computes it exactly as described in the manual,because that can be done in parallel with the addition, and has a lower totalgate latency. This is fairly common in the industry, where the documentationfollows the implementation more closely than it perhaps should.

    Note that carry and borrow are related, and thus this is *also* computable
    via ((Rd ^ Rr) ^ R) on bit 4.

please explain, I don't see it
http://www.wolframalpha.com/input/?i=not+A+and+B+or+not+A+and+C+or++C+and+B,+A+xor+B+xor+C

As above, given the *result* of the subtraction, examining bit 4 into whichborrow-in has happened.

Once you accept that, you'll note that the same expression can be used tore-create both carry-in and borrow-in.

    I'll also note that the piece-wise store is big-endian, so you could
    perform this in 1 store for 2_BYTE and 2 stores for 3_BYTE.

I got an expression that the platform is little endian.

Then you've got the order of the stores wrong. Your code pushes the LSB beforepushing the MSB, which results in the MSB at the lower address, which meansbig-endian.

    Wow.  Um... Surely it would be better to store X and Y internally as whole
    24-bit quantities, and Z as a 16-bit quantity (to be extended with rampz,
    rampd, or eind as needed).

rampX/Y/Z are represented now as 0x00ff0000.
X/Y/Z can be represented as 16 bits registers, however I do not know if and
when r26-r31 are used as 8 bits, so if X/Y/Z are represented as 16 bits it
would be hard to use r26-r31 in arithmetics

You would use a setup like the following, and use these functions instead ofother direct accesses to the cpu registers. This setup requires similarfunctions in cpu.h for use by e.g. gdbstub.c.



TCGv cpu_rb[24];
TCGv cpu_rw[4];

TCGv read_byte(unsigned rb)
{
    TCGv byte = tcg_temp_new();
    if (rb < 24) {
        tcg_gen_mov_tl(byte, cpu_rb[rb]);
    } else {
        unsigned rw = (rb - 24) / 2;
        if (rb & 1) {
            tcg_gen_shri_tl(byte, cpu_rw[rw]);
        } else {
            tcg_gen_ext8u_tl(byte, cpu_rw[rw]);
        }
    }
    return byte;
}

void write_byte(unsigned rb, TCGv val)
{
    if (rb < 24) {
        tcg_gen_mov_tl(cpu_rb[rb], val);
    } else {
        unsigned rw = (rb - 24) / 2;
        tcg_gen_deposit_tl(cpu_rw[rw], cpu_rw[rw], val, (rb & 1) * 8, 8);
    }
}

/* Return RB+1:RB.  */
TCGv read_word(unsigned rb)
{
    TCGv word = tcg_temp_new();
    if (rb < 24) {
        tcg_gen_deposit_tl(word, cpu_rb[rb], cpu_rb[rb + 1], 8, 8);
    } else {
        unsigned rw = (rb - 24) / 2;
        tcg_gen_mov_tl(word, cpu_rw[rw]);
    }
    return word;
}

void write_word(unsigned rb, TCGv val)
{
    if (rb < 24) {
        tcg_gen_ext8u_tl(cpu_rb[rb], val);
        tcg_gen_shri_tl(cpu_rb[rb + 1], val, 8);
    } else {
        unsigned rw = (rb - 24) / 2;
        tcg_gen_mov_tl(cpu_rw[rw], val);
    }
}

        +int    avr_translate_DEC(CPUAVRState *env, DisasContext *ctx, uint32_t

...

        +    tcg_gen_setcondi_tl(TCG_COND_EQ, cpu_Vf, Rd, 0x7f);  /* cpu_Vf   =
        Rd == 0x7f  */


    This is INC overflow.

please explain, I don't see a problem here


You have swapped the overflow conditions for INC and DEC.

    127 + 1 -> -128
    -128 - 1 -> 127


r~

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH 04/10] target-avr: adding instructions encodings, (continued)

Prev by Date: [Qemu-devel] [PATCH 1/2] macio: use DMA memory interface for non-block ATAPI transfers
Next by Date: [Qemu-devel] [Bug 1580459] Re: Windows (10?) guest freezes entire host on shutdown if using PCI passthrough
Previous by thread: Re: [Qemu-devel] [PATCH 08/10] target-avr: adding instruction translation
Next by thread: Re: [Qemu-devel] [PATCH 08/10] target-avr: adding instruction translation
Index(es):
- Date
- Thread