[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug 1859384] [NEW] arm gic: interrupt model never 1 on non-mpcore a
From: |
Alex Bennée |
Subject: |
Re: [Bug 1859384] [NEW] arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq |
Date: |
Mon, 13 Jan 2020 13:44:16 +0000 |
User-agent: |
mu4e 1.3.6; emacs 28.0.50 |
Alex Longwall <address@hidden> writes:
> Public bug reported:
>
> For a 1-N interrupt (any SPI on the GICv2), as mandated by the TRM, only
> one CPU can acknowledge the IRQ until it becomes inactive.
>
> The TRM also mandates that SGIs and PPIs follow the N-N model and that
> SPIs follow the 1-N model.
>
> However this is not currently the case with QEMU. I have locally (no
> minimal test case) seen e.g. uart interrupts being acknowledged twice
> before having been deactivated (expected: irqId on one CPU and 1023 on
> the other instead).
You might find there is enough in kvm-unit-tests GIC tests already to
build a test case for what you are seeing.
>
> I have narrowed the issue down to the following:
>
> 1) arm_gic_common_reset resets all irq_state[id] fields to 0. This means
> all IRQ will use the N-N model, and if s->revision != REV_11MPCORE, then
> there's no way to set any interrupt to 1-N.
>
> If ""fixed"" locally with a hackjob, I still have the following trace:
>
> pl011_irq_state 534130.800 pid=2424 level=0x1
> gic_set_irq 2.900 pid=2424 irq=0x21 level=0x1 cpumask=0xff target=0xff
> gic_update_set_irq 3.300 pid=2424 cpu=0x0 name=irq level=0x1
> gic_update_set_irq 4.200 pid=2424 cpu=0x1 name=irq level=0x1
> gic_acknowledge_irq 539.400 pid=2424 s=cpu cpu=0x1 irq=0x21
> gic_update_set_irq 269.800 pid=2424 cpu=0x0 name=irq level=0x1
> gic_cpu_read 4.100 pid=2424 s=cpu cpu=0x1 addr=0xc val=0x21
> gic_acknowledge_irq 15.600 pid=2424 s=cpu cpu=0x0 irq=0x21
> gic_cpu_read 265.000 pid=2424 s=cpu cpu=0x0 addr=0xc val=0x21
> pl011_write 1594.700 pid=2424 addr=0x44 value=0x50
> pl011_irq_state 2.000 pid=2424 level=0x0
> gic_set_irq 1.300 pid=2424 irq=0x21 level=0x0 cpumask=0xff target=0xff
> pl011_write 30.700 pid=2424 addr=0x38 value=0x0
> pl011_irq_state 1.200 pid=2424 level=0x0
> gic_cpu_write 110.600 pid=2424 s=cpu cpu=0x0 addr=0x10 val=0x21
> gic_cpu_write 193.400 pid=2424 s=cpu cpu=0x0 addr=0x1000 val=0x21
> pl011_irq_state 1169.500 pid=2424 level=0x0
>
> This is because:
>
> 2) gic_acknowledge_irq calls gic_clear_pending which uses
> GIC_DIST_CLEAR_PENDING but this usually has no effect on level-sensitive
> interrupts.
>
> With this often being a no-op (ie. assuming ispendr was not written to),
> any 1-n level-sensitive interrupt is still improperly pending on all the
> other cores.
>
> (Also, I don't really know how the qemu thread model works, there might
> be race conditions in the acknowledgment logic if gic_acknowledge_irq is
> called by multiple threads, too.)
All updates to the GIC internals should be protected by the BQL which
applies to all mmio emulated devices.
>
> Option used:
> -nographic -machine virt,virtualization=on,accel=tcg,gic-version=2 -cpu
> cortex-a57 -smp 4 -m 1024
> -kernel whatever.elf -d unimp,guest_errors -semihosting-config
> enable,target=native
> -chardev stdio,id=uart -serial chardev:uart -monitor none
> -trace gic_update_set_irq -trace gic_acknowledge_irq -trace pl011_irq_state
> -trace pl011_write -trace gic_cpu_read -trace gic_cpu_write
> -trace gic_set_irq
>
> Commit used: dc65a5bdc9fa543690a775b50d4ffbeb22c56d6d "Merge remote-
> tracking branch 'remotes/dgibson/tags/ppc-for-5.0-20200108' into
> staging"
>
> ** Affects: qemu
> Importance: Undecided
> Status: New
>
>
> ** Tags: arm gic
>
> ** Description changed:
>
> For a 1-N interrupt (any SPI on the GICv2), as mandated by the TRM, only
> one CPU can acknowledge the IRQ until it becomes inactive.
>
> The TRM also mandates that SGIs and PPIs follow the N-N model and that
> SPIs follow the 1-N model.
>
> However this is not currently the case with QEMU. I have locally (no
> minimal test case) seen e.g. uart interrupts being acknowledged twice
> before having been deactivated (expected: irqId on one CPU and 1023 on
> the other instead).
>
> I have narrowed the issue down to the following:
>
> 1) arm_gic_common_reset resets all irq_state[id] fields to 0. This means
> all IRQ will use the N-N model, and if s->revision != REV_11MPCORE, then
> there's no way to set any interrupt to 1-N.
>
> **If fixed locally** with a hackjob, I still have the following trace:
>
> pl011_irq_state 534130.800 pid=2424 level=0x1
> gic_set_irq 2.900 pid=2424 irq=0x21 level=0x1 cpumask=0xff target=0xff
> gic_update_set_irq 3.300 pid=2424 cpu=0x0 name=irq level=0x1
> gic_update_set_irq 4.200 pid=2424 cpu=0x1 name=irq level=0x1
> gic_acknowledge_irq 539.400 pid=2424 s=cpu cpu=0x1 irq=0x21
> gic_update_set_irq 269.800 pid=2424 cpu=0x0 name=irq level=0x1
> gic_cpu_read 4.100 pid=2424 s=cpu cpu=0x1 addr=0xc val=0x21
> gic_acknowledge_irq 15.600 pid=2424 s=cpu cpu=0x0 irq=0x21
> gic_cpu_read 265.000 pid=2424 s=cpu cpu=0x0 addr=0xc val=0x21
> pl011_write 1594.700 pid=2424 addr=0x44 value=0x50
> pl011_irq_state 2.000 pid=2424 level=0x0
> gic_set_irq 1.300 pid=2424 irq=0x21 level=0x0 cpumask=0xff target=0xff
> pl011_write 30.700 pid=2424 addr=0x38 value=0x0
> pl011_irq_state 1.200 pid=2424 level=0x0
> gic_cpu_write 110.600 pid=2424 s=cpu cpu=0x0 addr=0x10 val=0x21
> gic_cpu_write 193.400 pid=2424 s=cpu cpu=0x0 addr=0x1000 val=0x21
> pl011_irq_state 1169.500 pid=2424 level=0x0
>
> This is because:
>
> 2) gic_acknowledge_irq calls gic_clear_pending which uses
> GIC_DIST_CLEAR_PENDING but this usually has no effect on level-sensitive
> interrupts.
>
> With this often being a no-op (ie. assuming ispendr was not written to),
> any 1-n level-sensitive interrupt is still improperly pending on all the
> other cores.
>
> (Also, I don't really know how the qemu thread model works, there might
> be race conditions in the acknowledgment logic if gic_acknowledge_irq is
> called by multiple threads, too.)
> +
> + Option used:
> + -nographic -machine virt,virtualization=on,accel=tcg,gic-version=2 -cpu
> cortex-a57 -smp 4 -m 1024
> + -kernel whatever.elf -d unimp,guest_errors -semihosting-config
> enable,target=native
> + -chardev stdio,id=uart -serial chardev:uart -monitor none
> + -trace gic_update_set_irq -trace gic_acknowledge_irq -trace pl011_irq_state
> -trace pl011_write -trace gic_cpu_read -trace gic_cpu_write
> + -trace gic_set_irq
> +
> + Commit used: dc65a5bdc9fa543690a775b50d4ffbeb22c56d6d "Merge remote-
> + tracking branch 'remotes/dgibson/tags/ppc-for-5.0-20200108' into
> + staging"
>
> ** Description changed:
>
> For a 1-N interrupt (any SPI on the GICv2), as mandated by the TRM, only
> one CPU can acknowledge the IRQ until it becomes inactive.
>
> The TRM also mandates that SGIs and PPIs follow the N-N model and that
> SPIs follow the 1-N model.
>
> However this is not currently the case with QEMU. I have locally (no
> minimal test case) seen e.g. uart interrupts being acknowledged twice
> before having been deactivated (expected: irqId on one CPU and 1023 on
> the other instead).
>
> I have narrowed the issue down to the following:
>
> 1) arm_gic_common_reset resets all irq_state[id] fields to 0. This means
> all IRQ will use the N-N model, and if s->revision != REV_11MPCORE, then
> there's no way to set any interrupt to 1-N.
>
> - **If fixed locally** with a hackjob, I still have the following trace:
> + If ""fixed"" locally with a hackjob, I still have the following trace:
>
> pl011_irq_state 534130.800 pid=2424 level=0x1
> gic_set_irq 2.900 pid=2424 irq=0x21 level=0x1 cpumask=0xff target=0xff
> gic_update_set_irq 3.300 pid=2424 cpu=0x0 name=irq level=0x1
> gic_update_set_irq 4.200 pid=2424 cpu=0x1 name=irq level=0x1
> gic_acknowledge_irq 539.400 pid=2424 s=cpu cpu=0x1 irq=0x21
> gic_update_set_irq 269.800 pid=2424 cpu=0x0 name=irq level=0x1
> gic_cpu_read 4.100 pid=2424 s=cpu cpu=0x1 addr=0xc val=0x21
> gic_acknowledge_irq 15.600 pid=2424 s=cpu cpu=0x0 irq=0x21
> gic_cpu_read 265.000 pid=2424 s=cpu cpu=0x0 addr=0xc val=0x21
> pl011_write 1594.700 pid=2424 addr=0x44 value=0x50
> pl011_irq_state 2.000 pid=2424 level=0x0
> gic_set_irq 1.300 pid=2424 irq=0x21 level=0x0 cpumask=0xff target=0xff
> pl011_write 30.700 pid=2424 addr=0x38 value=0x0
> pl011_irq_state 1.200 pid=2424 level=0x0
> gic_cpu_write 110.600 pid=2424 s=cpu cpu=0x0 addr=0x10 val=0x21
> gic_cpu_write 193.400 pid=2424 s=cpu cpu=0x0 addr=0x1000 val=0x21
> pl011_irq_state 1169.500 pid=2424 level=0x0
>
> This is because:
>
> 2) gic_acknowledge_irq calls gic_clear_pending which uses
> GIC_DIST_CLEAR_PENDING but this usually has no effect on level-sensitive
> interrupts.
>
> With this often being a no-op (ie. assuming ispendr was not written to),
> any 1-n level-sensitive interrupt is still improperly pending on all the
> other cores.
>
> (Also, I don't really know how the qemu thread model works, there might
> be race conditions in the acknowledgment logic if gic_acknowledge_irq is
> called by multiple threads, too.)
>
> Option used:
> -nographic -machine virt,virtualization=on,accel=tcg,gic-version=2 -cpu
> cortex-a57 -smp 4 -m 1024
> -kernel whatever.elf -d unimp,guest_errors -semihosting-config
> enable,target=native
> -chardev stdio,id=uart -serial chardev:uart -monitor none
> -trace gic_update_set_irq -trace gic_acknowledge_irq -trace pl011_irq_state
> -trace pl011_write -trace gic_cpu_read -trace gic_cpu_write
> -trace gic_set_irq
>
> Commit used: dc65a5bdc9fa543690a775b50d4ffbeb22c56d6d "Merge remote-
> tracking branch 'remotes/dgibson/tags/ppc-for-5.0-20200108' into
> staging"
--
Alex Bennée
- [Bug 1859384] [NEW] arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq, Alex Longwall, 2020/01/12
- [Bug 1859384] Re: arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq, Alex Longwall, 2020/01/12
- Re: [Bug 1859384] [NEW] arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq,
Alex Bennée <=
- [Bug 1859384] Re: arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq, Peter Maydell, 2020/01/13
- [Bug 1859384] Re: arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq, Alex Longwall, 2020/01/13
- [Bug 1859384] Re: arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq, Peter Maydell, 2020/01/13
- [Bug 1859384] Re: arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq, Alex Longwall, 2020/01/13
- [Bug 1859384] Re: arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq, Alex Longwall, 2020/01/13
- [Bug 1859384] Re: arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq, Alex Longwall, 2020/01/13
- [Bug 1859384] Re: arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq, Alex Longwall, 2020/01/13
- [Bug 1859384] Re: arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq, Alex Bennée, 2020/01/14
- [Bug 1859384] Re: arm gic: gic_acknowledge_irq doesn't clear line level for other cores for 1-n level-sensitive interrupts and gic_clear_pending uses GIC_DIST_TEST_MODEL (even on v2 where it always read 0 - "N-N"), Alex Longwall, 2020/01/17