qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 8/8] hw/mem/cxl_type3: Add CXL RAS Error Injection Support


From: Philippe Mathieu-Daudé
Subject: Re: [PATCH v5 8/8] hw/mem/cxl_type3: Add CXL RAS Error Injection Support.
Date: Tue, 21 Feb 2023 23:15:49 +0100
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.8.0

Hi Jonathan,

On 21/2/23 16:21, Jonathan Cameron wrote:
CXL uses PCI AER Internal errors to signal to the host that an error has
occurred. The host can then read more detailed status from the CXL RAS
capability.

For uncorrectable errors: support multiple injection in one operation
as this is needed to reliably test multiple header logging support in an
OS. The equivalent feature doesn't exist for correctable errors, so only
one error need be injected at a time.

Note:
  - Header content needs to be manually specified in a fashion that
    matches the specification for what can be in the header for each
    error type.

Injection via QMP:
{ "execute": "qmp_capabilities" }
...
{ "execute": "cxl-inject-uncorrectable-errors",
   "arguments": {
     "path": "/machine/peripheral/cxl-pmem0",
     "errors": [
         {
             "type": "cache-address-parity",
             "header": [ 3, 4]
         },
         {
             "type": "cache-data-parity",
             "header": 
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]
         },
         {
             "type": "internal",
             "header": [ 1, 2, 4]
         }
         ]
   }}
...
{ "execute": "cxl-inject-correctable-error",
     "arguments": {
         "path": "/machine/peripheral/cxl-pmem0",
         "type": "physical"
     } }

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
v5: (Thanks to Dave Jiang for review)
- Spell out Implementation Defined (previously typo as Imdef which did
   not help)
v4:
- Improved QMP help text wth more detail (following request in review
   of the Poison injection series)
---
  hw/cxl/cxl-component-utils.c   |   4 +-
  hw/mem/cxl_type3.c             | 281 +++++++++++++++++++++++++++++++++
  hw/mem/cxl_type3_stubs.c       |  10 ++
  hw/mem/meson.build             |   2 +
  include/hw/cxl/cxl_component.h |  26 +++
  include/hw/cxl/cxl_device.h    |  11 ++
  qapi/cxl.json                  | 118 ++++++++++++++
  qapi/meson.build               |   1 +
  qapi/qapi-schema.json          |   1 +
  9 files changed, 453 insertions(+), 1 deletion(-)


diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
new file mode 100644
index 0000000000..b6b51ced54
--- /dev/null
+++ b/hw/mem/cxl_type3_stubs.c
@@ -0,0 +1,10 @@
+
+#include "qemu/osdep.h"
+#include "qapi/qapi-commands-cxl.h"
+
+void qmp_cxl_inject_uncorrectable_errors(const char *path,
+                                         CXLUncorErrorRecordList *errors,
+                                         Error **errp) {

What about:

    error_setg(errp, "CLX support is not compiled in");

}
+
+void qmp_cxl_inject_correctable_error(const char *path, CxlCorErrorType type,
+                                      Error **errp) {}

Ditto.


diff --git a/qapi/cxl.json b/qapi/cxl.json
new file mode 100644
index 0000000000..ac7e167fa2
--- /dev/null
+++ b/qapi/cxl.json
@@ -0,0 +1,118 @@
+# -*- Mode: Python -*-
+# vim: filetype=python
+
+##
+# = CXL devices
+##
+
+##
+# @CxlUncorErrorType:

Likely missing a "(since 8.0)" mention somewhere.

+#
+# Type of uncorrectable CXL error to inject. These errors are reported via
+# an AER uncorrectable internal error with additional information logged at
+# the CXL device.
+#
+# @cache-data-parity: Data error such as data parity or data ECC error 
CXL.cache
+# @cache-address-parity: Address parity or other errors associated with the
+#                        address field on CXL.cache
+# @cache-be-parity: Byte enable parity or other byte enable errors on CXL.cache
+# @cache-data-ecc: ECC error on CXL.cache
+# @mem-data-parity: Data error such as data parity or data ECC error on CXL.mem
+# @mem-address-parity: Address parity or other errors associated with the
+#                      address field on CXL.mem
+# @mem-be-parity: Byte enable parity or other byte enable errors on CXL.mem.
+# @mem-data-ecc: Data ECC error on CXL.mem.
+# @reinit-threshold: REINIT threshold hit.
+# @rsvd-encoding: Received unrecognized encoding.
+# @poison-received: Received poison from the peer.
+# @receiver-overflow: Buffer overflows (first 3 bits of header log indicate 
which)
+# @internal: Component specific error
+# @cxl-ide-tx: Integrity and data encryption tx error.
+# @cxl-ide-rx: Integrity and data encryption rx error.
+##
+
+{ 'enum': 'CxlUncorErrorType',

Doesn't these need

     'if': 'CONFIG_CXL_MEM_DEVICE',

?

+  'data': ['cache-data-parity',
+           'cache-address-parity',
+           'cache-be-parity',
+           'cache-data-ecc',
+           'mem-data-parity',
+           'mem-address-parity',
+           'mem-be-parity',
+           'mem-data-ecc',
+           'reinit-threshold',
+           'rsvd-encoding',
+           'poison-received',
+           'receiver-overflow',
+           'internal',
+           'cxl-ide-tx',
+           'cxl-ide-rx'
+           ]
+ }
+
+##
+# @CXLUncorErrorRecord:
+#
+# Record of a single error including header log.
+#
+# @type: Type of error
+# @header: 16 DWORD of header.
+##
+{ 'struct': 'CXLUncorErrorRecord',
+  'data': {
+      'type': 'CxlUncorErrorType',
+      'header': [ 'uint32' ]
+  }
+}
+
+##
+# @cxl-inject-uncorrectable-errors:
+#
+# Command to allow injection of multiple errors in one go. This allows testing
+# of multiple header log handling in the OS.
+#
+# @path: CXL Type 3 device canonical QOM path
+# @errors: Errors to inject
+##
+{ 'command': 'cxl-inject-uncorrectable-errors',
+  'data': { 'path': 'str',
+             'errors': [ 'CXLUncorErrorRecord' ] }}
+
+##
+# @CxlCorErrorType:
+#
+# Type of CXL correctable error to inject
+#
+# @cache-data-ecc: Data ECC error on CXL.cache
+# @mem-data-ecc: Data ECC error on CXL.mem
+# @crc-threshold: Component specific and applicable to 68 byte Flit mode only.
+# @cache-poison-received: Received poison from a peer on CXL.cache.
+# @mem-poison-received: Received poison from a peer on CXL.mem
+# @physical: Received error indication from the physical layer.
+##
+{ 'enum': 'CxlCorErrorType',
+  'data': ['cache-data-ecc',
+           'mem-data-ecc',
+           'crc-threshold',
+           'retry-threshold',
+           'cache-poison-received',
+           'mem-poison-received',
+           'physical']
+}
+
+##
+# @cxl-inject-correctable-error:
+#
+# Command to inject a single correctable error.  Multiple error injection
+# of this error type is not interesting as there is no associated header log.
+# These errors are reported via AER as a correctable internal error, with
+# additional detail available from the CXL device.
+#
+# @path: CXL Type 3 device canonical QOM path
+# @type: Type of error.
+##
+{ 'command': 'cxl-inject-correctable-error',
+  'data': { 'path': 'str',
+            'type': 'CxlCorErrorType'
+  }
+}



reply via email to

[Prev in Thread] Current Thread [Next in Thread]