qemu-s390x
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [qemu-s390x] [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_


From: Richard Henderson
Subject: Re: [qemu-s390x] [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access()
Date: Wed, 21 Aug 2019 15:31:58 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0

On 8/21/19 2:33 PM, David Hildenbrand wrote:
>> NOTDIRTY cannot fault at all.  The associated rcu critical section is ugly
>> enough to make me not want to do anything except continue to go through the
>> regular MMIO path.
>>
>> In any case, so long as we eliminate *access* faults by probing the page 
>> table,
>> then falling back to the byte-by-byte loop is, AFAICS, sufficient to 
>> implement
>> the instructions correctly.
> 
> "In any case, so long as we eliminate *access* faults by probing the
> page table" - that's what I'm doing in this patch (and even more correct
> in the prototype patch I shared), no? (besides the watchpoint madness below)

Correct.

My main objection to your current patch is that you perform the access checks
within MVC, and then do some more tlb lookups in fast_memmove.

I think that fast_memmove is where the access checks should live.  That allows
incremental improvement to combine access checks + host address lookup, which
cannot currently be done in one step with existing interfaces.

I guess you would still want access checks within MVC for the case in which you
must fall back to byte-by-byte because of destructive overlap.

> "falling back to the byte-by-byte loop is, AFAICS, sufficient"
> 
> I don't think this is sufficient. E.g., LAP protected pages
> (PAGE_WRITE_INV which immediately requires a new MMU walk on the next
> access) will trigger a new MMU walk on every byte access (that's why I
> chose to pre-translate in my prototype).

LAP protected pages is exactly why probe_write should return the host address,
so that we can do the access check + host address lookup in one step.

But in the meantime...

> If another CPU modified the
> page tables in between, we could suddenly get a fault - although we
> checked early. What am I missing?

You're concerned with a bare write to the page table by cpu B, while cpu A is
executing, and before cpu B issues the cross-cpu tlb flush?

The tlb victim cache should prevent having to re-read a tlb entry from memory,
at least for MVC.  The unlimited size we currently support for MVCL and MVCLE
could act weird, but would be fixed by limiting the execution as discussed.

Honestly, the os has to make sure that the page remains valid until after the
flush completes, otherwise it's an os bug.  The cross-cpu tlb flush happens via
async_run_on_cpu, and of course never occurs while we are executing a TB.  Yet
another reason to limit the amount of work any one instruction does.  ;-)


> I see that we use BP_STOP_BEFORE_ACCESS for PER (Program Event
> Recording) on s390x. I don't think that's correct. We want to get
> notified after the values were changed.
> 
> "A storage-alteration event occurs whenever a CPU,
> by using a logical or virtual address, makes a store
> access without an access exception to the storage
> area designated by control registers 10 and 11. ..."
> 
> "For a PER instruction-fetching nullification event, the
> unit of operation is nullified. For other PER events,
> the unit of operation is completed"
> 
> Oh man, why is everything I take a look at broken.

Heh.

>> In the latter case, if the instruction has had any side effects prior to the
>> longjmp, they will be re-done when we re-start the current instruction.
>>
>> To me this seems like a rather large bug in our implementation of 
>> watchpoints,
>> as it only really works properly for simple load/store/load-op-store type
>> instructions.  Anything that works on many addresses and doesn't delay side
>> effects until all accesses are complete will Do The Wrong Thing.
>>
>> The fix, AFAICS, is for probe_write to call check_watchpoint(), so that we
>> take the debug exit early.
> 
> Indeed. I see what you mean now. (I was ignoring the "before access"
> because I was assuming we don't need it on s390x)
> 
> probe_write() would have to check for all BP_STOP_BEFORE_ACCESS watchpoints.

!BP_STOP_BEFORE_ACCESS watchpoints exit to the main loop as well, so that it
can restart and then single-step the current instruction.

We need it the check in probe_write for all cases.

> Yes, that's what I mean, TARGET_PAGE_SIZE, but eventually crossing a
> page boundary. The longer I stare at the MVCL code, the more broken it
> is. There are more nice things buried in the PoP. MVCL does not detect
> access exceptions beyond the next 2k. So we have to limit it there
> differently.

That language is indeed odd.

The only reading of that paragraph that makes sense to me is that the hardware
*must* interrupt MVCL after every 2k bytes processed.  The idea that the user
can magically write to a read-only page simply by providing length = 2MB and
page that is initially writable is dumb.  I cannot imagine that is a correct
reading.

Getting clarification from an IBM engineer on that would be good; otherwise I
would just ignore that and proceed as if all access checks are performed.

> So what I understand is that
> 
> - we should handle watchpoints in probe_write()
> - not bypass IO memory (especially NOTDIRTY). We cannot always relay on
>   getting access to a host page.

Correct.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]