qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 2/3] sh: movca.l cancel by ocbi


From: Edgar E. Iglesias
Subject: Re: [Qemu-devel] [PATCH 2/3] sh: movca.l cancel by ocbi
Date: Wed, 14 Jan 2009 19:46:16 +0100
User-agent: Mutt/1.5.16 (2007-06-09)

On Tue, Jan 13, 2009 at 09:21:18AM +0900, Shin-ichiro KAWASAKI wrote:
> I've completely passed over Vladimir-san's nice patch on movca-ocbi,
> even though it is already in qemu-sh staging repository.
>
> http://lists.gnu.org/archive/html/qemu-devel/2008-12/msg00605.html

Hah funny, that patch looks very much like yours :)
I'll try to review it too.

Cheers

>
> But now, I can review the patch.  I'll do it later.
>
> Regards,
> Shin-ichiro KAWASAKI
>
>
> Shin-ichiro KAWASAKI wrote:
>> Thank you for your commmnts!
>> Edgar E. Iglesias wrote:
>>> On Sun, Jan 11, 2009 at 06:14:40PM +0900, Shin-ichiro KAWASAKI wrote:
>>>> Current sh4's "movca.l" instruction is implemented in a same way
>>>> as "mov.l". Then, write to memory cannot be canceled by "ocbi"
>>>> cache control instrunction.  This makes text area broken on
>>>> cache flush by linux kernel.
>>>>
>>>> This patch delays "movca.l" execution and provide the chance to cancel 
>>>> it for "ocbi".
>>>> # Thank you Edgar, for your advice!
>>>>
>>>> This patch does,
>>>> - on executing "movca.l", just records what and where movca   should 
>>>> store data.
>>>> - on executing "ocbi", find the corresponding record by movca,
>>>>   and delete it to cancel.
>>>> - lets TCG produce "delayed_movca" instruction at the first
>>>>   instruction which is neither "movca.l" nor "ocbi".
>>>> - on executing "delayed_movca", does the data store task,
>>>>   according to the record.
>>>
>>> Hello,
>>>
>>> I think your patch will catch the linux dflush sequence but I've got a
>>> few comments.
>>>
>>> There is a complication with the delayed stores and potential mmu
>>> exceptions that they might generate. I think with this approach
>>> you'll take those faults at the next instruction which might not
>>> even be a load/store. Not sure if that can cause problems for sh.
>>> Maybe you could avoid that by making movca a load + store and the
>>> ocbi a store with the original value loaded at movca. But you'll
>>> still get into trouble with exceptions that might break the
>>> movca/ocbi sequence.
>> That's an idea. Current implementation is "delay and do-or-cancel" way.
>> But your idea is "do and commit-or-rollback" way.
>> I'll try to implement it to avoid mmu fault by movca.
>> The mmu fault by movca will cause invoke exception handler, and break
>> movca/ocbi sequence.  It will be a quite rare case, however, I'll consider
>> it for next version.
>>> I don't it's worth having a dynamically sized movca delay buffer.
>>> The movca insn has a restricted addressing mode making it a bit hard
>>> to use back-to-back without alu insns in between. IIU your code
>>> correctly you flush the buffer before every non movca/ocbi insn so I
>>> guess that for most realistic use-cases you'll only need very few
>>> (maybe only 1) entries in the delay buffer.
>> That is a point I wavered.
>> I investigated linux kernel and found following lines in
>> "__flush_dcache_segment_4way()" in "arch/sh/mm/cache-sh4.c".
>>     a0 = base_addr;
>>     a1 = a0 + way_incr;
>>     a2 = a1 + way_incr;
>>     a3 = a2 + way_incr;
>>     a0e = base_addr + extent_per_way;
>>     do {
>>         asm volatile("ldc %0, sr" : : "r" (sr_with_bl));
>>         asm volatile("movca.l r0, @%0\n\t"
>>                  "movca.l r0, @%1\n\t"
>>                  "movca.l r0, @%2\n\t"
>>                  "movca.l r0, @%3\n\t"
>>                  "ocbi @%0\n\t"
>>                  "ocbi @%1\n\t"
>>                  "ocbi @%2\n\t"
>>                  "ocbi @%3\n\t" : :
>>                  "r" (a0), "r" (a1), "r" (a2), "r" (a3));
>> Multiple movca.l insns executed consequently, and they access different 
>> lines.
>> At least, 4 entries are needed for it.  This code shows kernel 
>> implementation
>> decides how many number of movcas are invoked consequently.
>> That's the reason why I introudcued dynamic size change. 
>>>> Signed-off-by: Shin-ichiro KAWASAKI <address@hidden>
>>>>
>>>> Index: trunk/target-sh4/op_helper.c
>>>> ===================================================================
>>>> --- trunk/target-sh4/op_helper.c    (revision 6133)
>>>> +++ trunk/target-sh4/op_helper.c    (working copy)
>>>> @@ -20,6 +20,7 @@
>>>>  #include <assert.h>
>>>>  #include "exec.h"
>>>>  #include "helper.h"
>>>> +void *qemu_mallocz(size_t size);
>>>>   #ifndef CONFIG_USER_ONLY
>>>>  @@ -604,3 +605,57 @@
>>>>      d.ll = t0;
>>>>      return float64_to_int32_round_to_zero(d.d, &env->fp_status);
>>>>  }
>>>> +
>>>> +void helper_movca(uint32_t val, uint32_t addr)
>>>> +{
>>>> +    delayed_movca_t *cur = &env->movca_list;
>>>> +    delayed_movca_t *prev = NULL;
>>>> +    while (cur) {
>>>> +    if (!cur->valid) {
>>>
>>> Might be overkill but you can merge here, i.e:
>>>
>>>     if (!cur->valid || cur->addr == addr) {
>>>
>>>
>>>> +        cur->valid = 1;
>>>> +        cur->value = val;
>>>> +        cur->addr = addr;
>>>> +        return;
>>>> +    }
>>>> +    prev = cur;
>>>> +    cur = cur->next;
>>>> +    }
>>>> +
>>>> +    /* movca entry shortage. allocate it. */
>>>> +    prev->next = cur = qemu_mallocz(sizeof(delayed_movca_t));
>>>> +    if (cur == NULL) {
>>>> +    printf("out of memory for delayed movca. @%08x\n", addr);
>>>> +    return;
>>>> +    }
>>>> +    cur->valid = 1;
>>>> +    cur->value = val;
>>>> +    cur->addr = addr;
>>>> +}
>>>> +
>>>> +void helper_ocbi(uint32_t addr)
>>>> +{
>>>> +    delayed_movca_t *cur = &env->movca_list;
>>>> +
>>>> +    while (cur) {
>>>> +    if (cur->valid && cur->addr == addr) { /* found! */
>>>> +        cur->valid = 0;
>>>> +        return;
>>>> +    }
>>>> +    cur = cur->next;
>>>> +    }
>>>
>>> I think you can quite easy catch a few more movca use cases if you
>>> dont compare line offsets, i.e for sh4 you'd mask the addr & ~(32 - 1).
>>> Also, you should not return until you've scanned the entire movca delay
>>> buffer as there might be multiple stores to the same line that need to be
>>> ignored.
>>>
>>> example:
>>>     addr &= ~31;
>>>     while (cur) {
>>>        if (cur->valid && (cur->addr & ~31) == addr)
>>>            cur->valid = 0;
>>>        cur = cur->next;
>>>     }
>> That's a good advice.  I'll try to implement it for next version.
>> Thanks again for your review!
>> Regards,
>> Shin-ichiro KAWASAKI
>
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]