qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v9 03/14] hw/arm/smmu-common: VMSAv8-64 page tab


From: Auger Eric
Subject: Re: [Qemu-devel] [PATCH v9 03/14] hw/arm/smmu-common: VMSAv8-64 page table walk
Date: Wed, 7 Mar 2018 17:23:38 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0

Hi Peter,

On 06/03/18 20:43, Peter Maydell wrote:
> On 17 February 2018 at 18:46, Eric Auger <address@hidden> wrote:
>> This patch implements the page table walk for VMSAv8-64.
>>
>> Signed-off-by: Eric Auger <address@hidden>
>>
>> ---
>> v8 -> v9:
>> - remove guest error log on PTE fetch fault
>> - rename  trace functions
>> - fix smmu_page_walk_level_res_invalid_pte last arg
>> - fix PTE_ADDRESS
>> - turn functions into macros
>> - make sure to return the actual pte access permission
>>   into tlbe->perm
>> - change proto of smmu_ptw*
>>
>> v7 -> v8:
>> - rework get_pte
>> - use LOG_LEVEL_ERROR
>> - remove error checking in get_block_pte_address
>> - page table walk simplified (no VFIO replay anymore)
>> - handle PTW error events
>> - use dma_memory_read
>>
>> v6 -> v7:
>> - fix wrong error handling in walk_page_table
>> - check perm in smmu_translate
>>
>> v5 -> v6:
>> - use IOMMUMemoryRegion
>> - remove initial_lookup_level()
>> - fix block replay
>>
>> v4 -> v5:
>> - add initial level in translation config
>> - implement block pte
>> - rename must_translate into nofail
>> - introduce call_entry_hook
>> - small changes to dynamic traces
>> - smmu_page_walk code moved from smmuv3.c to this file
>> - remove smmu_translate*
>>
>> v3 -> v4:
>> - reworked page table walk to prepare for VFIO integration
>>   (capability to scan a range of IOVA). Same function is used
>>   for translate for a single iova. This is largely inspired
>>   from intel_iommu.c
>> - as the translate function was not straightforward to me,
>>   I tried to stick more closely to the VMSA spec.
>> - remove support of nested stage (kernel driver does not
>>   support it anyway)
>> - use error_report and trace events
>> - add aa64[] field in SMMUTransCfg
>> ---
>>  hw/arm/smmu-common.c         | 232 
>> +++++++++++++++++++++++++++++++++++++++++++
>>  hw/arm/smmu-internal.h       |  96 ++++++++++++++++++
>>  hw/arm/trace-events          |  10 ++
>>  include/hw/arm/smmu-common.h |   6 ++
>>  4 files changed, 344 insertions(+)
>>  create mode 100644 hw/arm/smmu-internal.h
>>
>> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
>> index d0516dc..24cc4ba 100644
>> --- a/hw/arm/smmu-common.c
>> +++ b/hw/arm/smmu-common.c
>> @@ -27,6 +27,238 @@
>>
>>  #include "qemu/error-report.h"
>>  #include "hw/arm/smmu-common.h"
>> +#include "smmu-internal.h"
>> +
>> +/* VMSAv8-64 Translation */
>> +
>> +/**
>> + * get_pte - Get the content of a page table entry located t
>> + * @address@hidden
>> + */
>> +static int get_pte(dma_addr_t baseaddr, uint32_t index, uint64_t *pte,
>> +                   SMMUPTWEventInfo *info)
>> +{
>> +    int ret;
>> +    dma_addr_t addr = baseaddr + index * sizeof(*pte);
>> +
>> +    ret = dma_memory_read(&address_space_memory, addr,
>> +                          (uint8_t *)pte, sizeof(*pte));
> 
> I think last time round I asked that these be done with the
> "read a 64-bit quantity" APIs and a comment that they're
> supposed to be atomic.

I was unsure about the fetch of the page descriptors themselves. I added
the comment on the fetch of STE, CD. 3.21.3 deals with configuration
structures. I will add the comment here as well.

> 
>> +
>> +    if (ret != MEMTX_OK) {
>> +        info->type = SMMU_PTW_ERR_WALK_EABT;
>> +        info->addr = addr;
>> +        return -EINVAL;
>> +    }
>> +    trace_smmu_get_pte(baseaddr, index, addr, *pte);
>> +    return 0;
>> +}
>> +
>> +/* VMSAv8-64 Translation Table Format Descriptor Decoding */
>> +
>> +/**
>> + * get_page_pte_address - returns the L3 descriptor output address,
>> + * ie. the page frame
>> + * ARM ARM spec: Figure D4-17 VMSAv8-64 level 3 descriptor format
>> + */
>> +static inline hwaddr get_page_pte_address(uint64_t pte, int granule_sz)
>> +{
>> +    return PTE_ADDRESS(pte, granule_sz);
>> +}
>> +
>> +/**
>> + * get_table_pte_address - return table descriptor output address,
>> + * ie. address of next level table
>> + * ARM ARM Figure D4-16 VMSAv8-64 level0, level1, and level 2 descriptor 
>> formats
>> + */
>> +static inline hwaddr get_table_pte_address(uint64_t pte, int granule_sz)
>> +{
>> +    return PTE_ADDRESS(pte, granule_sz);
>> +}
>> +
>> +/**
>> + * get_block_pte_address - return block descriptor output address and block 
>> size
>> + * ARM ARM Figure D4-16 VMSAv8-64 level0, level1, and level 2 descriptor 
>> formats
>> + */
>> +static hwaddr get_block_pte_address(uint64_t pte, int level, int granule_sz,
>> +                                    uint64_t *bsz)
>> +{
>> +    int n = 0;
>> +
>> +    switch (granule_sz) {
>> +    case 12:
>> +        if (level == 1) {
>> +            n = 30;
>> +        } else if (level == 2) {
>> +            n = 21;
>> +        }
>> +        break;
>> +    case 14:
>> +        if (level == 2) {
>> +            n = 25;
>> +        }
>> +        break;
>> +    case 16:
>> +        if (level == 2) {
>> +            n = 29;
>> +        }
>> +        break;
>> +    }
>> +    if (!n) {
>> +        error_setg(&error_fatal,
>> +                   "wrong granule/level combination (%d/%d)",
>> +                   granule_sz, level);
> 
> If this is guest-provokable then it shouldn't be a fatal error.
> If it isn't guest provokable then you can just assert.
> I think you should be able to sanitize the SMUTransTableInfo when
> you construct it from the CD (and give a C_BAD_CD event), and then
> you can trust the granule_sz and level when you're doing table walks.
OK.
> 
> You can calculate n as
>     n = (granule_sz - 3) * (4 - level) + 3;
> (compare target/arm/helper.c:get_phys_addr_lpae() calculation
> of page_size, and in the pseudocode the line
>     addrselectbottom = (3-level)*stride + grainsize;
> where stride is grainsize-3 and so comes out to the same thing.)

OK. I will simplify it and check when decoding the CD.
> 
>> +    }
>> +    *bsz = 1 << n;
>> +    return PTE_ADDRESS(pte, n);
>> +}
>> +
>> +static inline bool check_perm(int access_attrs, int mem_attrs)
>> +{
>> +    if (((access_attrs & IOMMU_RO) && !(mem_attrs & IOMMU_RO)) ||
>> +        ((access_attrs & IOMMU_WO) && !(mem_attrs & IOMMU_WO))) {
>> +        return false;
>> +    }
>> +    return true;
>> +}
> 
> This function doesn't seem to ever be used in this patchset?
> (clang will complain about that, though gcc won't.)
OK, will remove that.
> 
>> +
>> +SMMUTransTableInfo *select_tt(SMMUTransCfg *cfg, dma_addr_t iova)
>> +{
>> +    if (!extract64(iova, 64 - cfg->tt[0].tsz, cfg->tt[0].tsz - cfg->tbi)) {
>> +        return &cfg->tt[0];
>> +    }
>> +    return &cfg->tt[1];
> 
> I'm confused by your handling of the TBI bits here. In
> decode_cd() you take the 2-bit TBI field into cfg->tbi. That's
> a pair of bits, one of which is "top byte ignore" for TTB0 and the
> other is "top byte ignore" for TTB1. But here you're subtracting
> the whole field from cfg->tt[0].tsz. Which of the two tbi bits
> you need to use depends on bit 55 of the iova (compare the code
> in get_phys_addr_lpae() and the pseudocode function AddrTop()),
> and then if that bit is 1 it means that 8 bits of address should
> be ignored when determining whether to use TTB0 or TTB1.
> 
> You also need to consider the case where the input address is in
> neither the TTB0 range nor the TTb1 range (cf fig D4-14 in the
> v8A Arm ARM DDI0487C.a, and the code in get_phys_addr_lpae()).

Yes I totally misunderstood the way to decode that :-(

> 
>> +}
>> +
>> +/**
>> + * smmu_ptw_64 - VMSAv8-64 Walk of the page tables for a given IOVA
>> + * @cfg: translation config
>> + * @iova: iova to translate
>> + * @perm: access type
>> + * @tlbe: IOMMUTLBEntry (out)
>> + * @info: handle to an error info
>> + *
>> + * Return 0 on success, < 0 on error. In case of error, @info is filled
>> + * and tlbe->perm is set to IOMMU_NONE.
>> + * Upon success, @tlbe is filled with translated_addr and entry
>> + * permission rights.
>> + */
>> +static int smmu_ptw_64(SMMUTransCfg *cfg,
>> +                       dma_addr_t iova, IOMMUAccessFlags perm,
>> +                       IOMMUTLBEntry *tlbe, SMMUPTWEventInfo *info)
>> +{
>> +    dma_addr_t baseaddr;
>> +    int stage = cfg->stage;
>> +    SMMUTransTableInfo *tt = select_tt(cfg, iova);
>> +    uint8_t level;
>> +    uint8_t granule_sz;
>> +
>> +    if (tt->disabled) {
>> +        info->type = SMMU_PTW_ERR_TRANSLATION;
>> +        goto error;
>> +    }
>> +
>> +    level = tt->initial_level;
>> +    granule_sz = tt->granule_sz;
>> +    baseaddr = extract64(tt->ttb, 0, 48);
> 
> The spec says that bits specified by the TTB0/TTB1 fields
> in a CD that are outside the effective IPS range are ILLEGAL;
> you should detect that when you set up tt->ttb and then you
> don't need the extract64() here, I think.
OK
> 
>> +
>> +    tlbe->iova = iova;
>> +    tlbe->addr_mask = (1 << tt->granule_sz) - 1;
> 
> you could just use "granule_sz" here since you have it in a local.
sure
> 
>> +
>> +    while (level <= 3) {
>> +        uint64_t subpage_size = 1ULL << level_shift(level, granule_sz);
>> +        uint64_t mask = subpage_size - 1;
>> +        uint32_t offset = iova_level_offset(iova, level, granule_sz);
>> +        uint64_t pte;
>> +        dma_addr_t pte_addr = baseaddr + offset * sizeof(pte);
>> +        uint8_t ap;
>> +
>> +        if (get_pte(baseaddr, offset, &pte, info)) {
>> +                goto error;
>> +        }
>> +        trace_smmu_ptw_level(level, iova, subpage_size,
>> +                             baseaddr, offset, pte);
>> +
>> +        if (is_invalid_pte(pte) || is_reserved_pte(pte, level)) {
>> +            trace_smmu_ptw_invalid_pte(stage, level, baseaddr,
>> +                                       pte_addr, offset, pte);
>> +            info->type = SMMU_PTW_ERR_TRANSLATION;
>> +            goto error;
>> +        }
>> +
>> +        if (is_page_pte(pte, level)) {
>> +            uint64_t gpa = get_page_pte_address(pte, granule_sz);
>> +
>> +            ap = PTE_AP(pte);
>> +            if (is_permission_fault(ap, perm)) {
>> +                info->type = SMMU_PTW_ERR_PERMISSION;
>> +                goto error;
>> +            }
>> +
>> +            tlbe->translated_addr = gpa + (iova & mask);
>> +            tlbe->perm = PTE_AP_TO_PERM(ap);
>> +            trace_smmu_ptw_page_pte(stage, level, iova,
>> +                                    baseaddr, pte_addr, pte, gpa);
>> +            return 0;
>> +        }
>> +        if (is_block_pte(pte, level)) {
>> +            uint64_t block_size;
>> +            hwaddr gpa = get_block_pte_address(pte, level, granule_sz,
>> +                                               &block_size);
>> +
>> +            ap = PTE_AP(pte);
>> +            if (is_permission_fault(ap, perm)) {
>> +                info->type = SMMU_PTW_ERR_PERMISSION;
>> +                goto error;
>> +            }
>> +
>> +            trace_smmu_ptw_block_pte(stage, level, baseaddr,
>> +                                     pte_addr, pte, iova, gpa,
>> +                                    (int)(block_size >> 20));
> 
> I don't think you should need this cast, because the argument to the
> trace function is an int anyway ?
indeed
> 
>> +
>> +            tlbe->translated_addr = gpa + (iova & mask);
>> +            tlbe->perm = PTE_AP_TO_PERM(ap);
>> +            return 0;
>> +        }
>> +
>> +        /* table pte */
>> +        ap = PTE_APTABLE(pte);
>> +
>> +        if (is_permission_fault(ap, perm)) {
>> +            info->type = SMMU_PTW_ERR_PERMISSION;
>> +            goto error;
>> +        }
>> +        baseaddr = get_table_pte_address(pte, granule_sz);
>> +        level++;
>> +    }
>> +
>> +    info->type = SMMU_PTW_ERR_TRANSLATION;
>> +
>> +error:
>> +    tlbe->perm = IOMMU_NONE;
>> +    return -EINVAL;
>> +}
>> +
>> +/**
>> + * smmu_ptw - Walk the page tables for an IOVA, according to @cfg
>> + *
>> + * @cfg: translation configuration
>> + * @iova: iova to translate
>> + * @perm: tentative access type
>> + * @tlbe: returned entry
>> + * @info: ptw event handle
>> + *
>> + * return 0 on success
>> + */
>> +int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, IOMMUAccessFlags perm,
>> +             IOMMUTLBEntry *tlbe, SMMUPTWEventInfo *info)
>> +{
>> +    if (!cfg->aa64) {
>> +        error_setg(&error_fatal,
>> +                   "SMMUv3 model does not support VMSAv8-32 page walk yet");
> 
> This sort of guest-provokable error should not be fatal -- log
> it with LOG_UNIMP and carry on as best you can (in this case
> the spec should say what happens for SMMUv3 implementations which
> don't support AArch32 tables).
This code should never been entered as the check is done when decoding
the CD in the smmuv3. However since it is a base class I wanted to
enphase the ptw was only supporting AArch32.

> 
>> +    }
>> +
>> +    return smmu_ptw_64(cfg, iova, perm, tlbe, info);
>> +}
>>
>>  SMMUPciBus *smmu_find_as_from_bus_num(SMMUState *s, uint8_t bus_num)
>>  {
>> diff --git a/hw/arm/smmu-internal.h b/hw/arm/smmu-internal.h
>> new file mode 100644
>> index 0000000..3ed97ee
>> --- /dev/null
>> +++ b/hw/arm/smmu-internal.h
>> @@ -0,0 +1,96 @@
>> +/*
>> + * ARM SMMU support - Internal API
>> + *
>> + * Copyright (c) 2017 Red Hat, Inc.
>> + * Copyright (C) 2014-2016 Broadcom Corporation
>> + * Written by Prem Mallappa, Eric Auger
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along
>> + * with this program; if not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef HW_ARM_SMMU_INTERNAL_H
>> +#define HW_ARM_SMMU_INTERNAL_H
>> +
>> +#define ARM_LPAE_MAX_ADDR_BITS          48
> 
> You define this, but nothing in the series uses it.
removed. Actually we don't seem to need that anymore.
> 
> Also, from an SMMU perspective, it would be helpful to clarify
> exactly what addresses are involved here (input addresses,
> intermediate addresses, output addresses?) -- cf the spec section 3.4.
> 
>> +#define ARM_LPAE_MAX_LEVELS             4
> 
> This doesn't seem to be used either.
removed
> 
>> +
>> +/* PTE Manipulation */
>> +
>> +#define ARM_LPAE_PTE_TYPE_SHIFT         0
>> +#define ARM_LPAE_PTE_TYPE_MASK          0x3
>> +
>> +#define ARM_LPAE_PTE_TYPE_BLOCK         1
>> +#define ARM_LPAE_PTE_TYPE_TABLE         3
>> +
>> +#define ARM_LPAE_L3_PTE_TYPE_RESERVED   1
>> +#define ARM_LPAE_L3_PTE_TYPE_PAGE       3
>> +
>> +#define ARM_LPAE_PTE_VALID              (1 << 0)
>> +
>> +#define PTE_ADDRESS(pte, shift) \
>> +    (extract64(pte, shift, 47 - shift + 1) << shift)
>> +
>> +#define is_invalid_pte(pte) (!(pte & ARM_LPAE_PTE_VALID))
>> +
>> +#define is_reserved_pte(pte, level)                                      \
>> +    ((level == 3) &&                                                     \
>> +     ((pte & ARM_LPAE_PTE_TYPE_MASK) == ARM_LPAE_L3_PTE_TYPE_RESERVED))
>> +
>> +#define is_block_pte(pte, level)                                         \
>> +    ((level < 3) &&                                                      \
>> +     ((pte & ARM_LPAE_PTE_TYPE_MASK) == ARM_LPAE_PTE_TYPE_BLOCK))
>> +
>> +#define is_table_pte(pte, level)                                        \
>> +    ((level < 3) &&                                                     \
>> +     ((pte & ARM_LPAE_PTE_TYPE_MASK) == ARM_LPAE_PTE_TYPE_TABLE))
>> +
>> +#define is_page_pte(pte, level)                                         \
>> +    ((level == 3) &&                                                    \
>> +     ((pte & ARM_LPAE_PTE_TYPE_MASK) == ARM_LPAE_L3_PTE_TYPE_PAGE))
>> +
>> +#define PTE_AP(pte) \
>> +    (extract64(pte, 6, 2))
>> +
>> +#define PTE_APTABLE(pte) \
>> +    (extract64(pte, 61, 2))
>> +
>> +#define is_permission_fault(ap, perm) \
>> +    (((perm) & IOMMU_WO) && ((ap) & 0x2))
> 
> Don't we also need to check AP bit 1 in some cases?
> (when the StreamWorld is S or NS EL1 and either (a) the incoming
> transaction has its attrs.user = 1 and STE.PRIVCFG is 0b0x, or
> (b) STE.PRIVCFG is 0b10).
I think I don't need to as I don't support this feature at the moment:
spec says:
"When SMMU_IDR1.ATTR_PERMS_OVR=0, this field is RES0 and the incoming
PRIV attribute is used."
But to be honest I was not aware this existed ;()
> 
>> +
>> +#define PTE_AP_TO_PERM(ap) \
>> +    (IOMMU_ACCESS_FLAG(true, !((ap) & 0x2)))
> 
> Similarly here.
?
> 
>> +
>> +/* Level Indexing */
>> +
>> +static inline int level_shift(int level, int granule_sz)
>> +{
>> +    return granule_sz + (3 - level) * (granule_sz - 3);
>> +}
>> +
>> +static inline uint64_t level_page_mask(int level, int granule_sz)
>> +{
>> +    return ~((1ULL << level_shift(level, granule_sz)) - 1);
>> +}
>> +
>> +/**
>> + * TODO: handle the case where the level resolves less than
>> + * granule_sz -3 IA bits.
>> + */
>> +static inline
>> +uint64_t iova_level_offset(uint64_t iova, int level, int granule_sz)
>> +{
>> +    return (iova >> level_shift(level, granule_sz)) &
>> +            ((1ULL << (granule_sz - 3)) - 1);
> 
> Maybe "... & MAKE_64BIT_MASK(0, granule_sz - 3)" ?
OK
> 
>> +}
>> +
>> +#endif
>> diff --git a/hw/arm/trace-events b/hw/arm/trace-events
>> index 193063e..3584974 100644
>> --- a/hw/arm/trace-events
>> +++ b/hw/arm/trace-events
>> @@ -2,3 +2,13 @@
>>
>>  # hw/arm/virt-acpi-build.c
>>  virt_acpi_setup(void) "No fw cfg or ACPI disabled. Bailing out."
>> +
>> +# hw/arm/smmu-common.c
>> +
>> +smmu_page_walk(int stage, uint64_t baseaddr, int first_level, uint64_t 
>> start, uint64_t end) "stage=%d, baseaddr=0x%"PRIx64", first level=%d, 
>> start=0x%"PRIx64", end=0x%"PRIx64
>> +smmu_lookup_table(int level, uint64_t baseaddr, int granule_sz, uint64_t 
>> start, uint64_t end, int flags, uint64_t subpage_size) "level=%d 
>> baseaddr=0x%"PRIx64" granule=%d, start=0x%"PRIx64" end=0x%"PRIx64" flags=%d 
>> subpage_size=0x%lx"
>> +smmu_ptw_level(int level, uint64_t iova, size_t subpage_size, uint64_t 
>> baseaddr, uint32_t offset, uint64_t pte) "level=%d iova=0x%lx 
>> subpage_sz=0x%lx baseaddr=0x%"PRIx64" offset=%d => pte=0x%lx"
>> +smmu_ptw_invalid_pte(int stage, int level, uint64_t baseaddr, uint64_t 
>> pteaddr, uint32_t offset, uint64_t pte) "stage=%d level=%d 
>> address@hidden"PRIx64" address@hidden"PRIx64" offset=%d pte=0x%"PRIx64
>> +smmu_ptw_page_pte(int stage, int level,  uint64_t iova, uint64_t baseaddr, 
>> uint64_t pteaddr, uint64_t pte, uint64_t address) "stage=%d level=%d 
>> iova=0x%"PRIx64" address@hidden"PRIx64" address@hidden"PRIx64" 
>> pte=0x%"PRIx64" page address = 0x%"PRIx64
>> +smmu_ptw_block_pte(int stage, int level, uint64_t baseaddr, uint64_t 
>> pteaddr, uint64_t pte, uint64_t iova, uint64_t gpa, int bsize_mb) "stage=%d 
>> level=%d address@hidden"PRIx64" address@hidden"PRIx64" pte=0x%"PRIx64" 
>> iova=0x%"PRIx64" block address = 0x%"PRIx64" block size = %d MiB"
>> +smmu_get_pte(uint64_t baseaddr, int index, uint64_t pteaddr, uint64_t pte) 
>> "baseaddr=0x%"PRIx64" index=0x%x, pteaddr=0x%"PRIx64", pte=0x%"PRIx64
> 
> These have some uses of "%lx" for uint64_t, which should use PRIx64
> to avoid compile issues on 32 bit systems.
OK

Thanks

Eric
> 
>> diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
>> index aee96c2..0fb27f7 100644
>> --- a/include/hw/arm/smmu-common.h
>> +++ b/include/hw/arm/smmu-common.h
>> @@ -127,4 +127,10 @@ static inline uint16_t smmu_get_sid(SMMUDevice *sdev)
>>  {
>>      return  ((pci_bus_num(sdev->bus) & 0xff) << 8) | sdev->devfn;
>>  }
>> +
>> +int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, IOMMUAccessFlags perm,
>> +             IOMMUTLBEntry *tlbe, SMMUPTWEventInfo *info);
>> +
>> +SMMUTransTableInfo *select_tt(SMMUTransCfg *cfg, dma_addr_t iova);
>> +
>>  #endif  /* HW_ARM_SMMU_COMMON */
>> --
>> 2.5.5
> 
> thanks
> -- PMM
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]