guix-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

branch master updated: doc: Add a Problems/solutions knowledge base sect


From: Maxim Cournoyer
Subject: branch master updated: doc: Add a Problems/solutions knowledge base section.
Date: Thu, 10 Nov 2022 21:09:27 -0500

This is an automated email from the git hooks/post-receive script.

apteryx pushed a commit to branch master
in repository maintenance.

The following commit(s) were added to refs/heads/master by this push:
     new eee43c5  doc: Add a Problems/solutions knowledge base section.
eee43c5 is described below

commit eee43c569c1a87ee3cf9991c649cec1d2522c04f
Author: Maxim Cournoyer <maxim.cournoyer@gmail.com>
AuthorDate: Thu Nov 10 21:01:23 2022 -0500

    doc: Add a Problems/solutions knowledge base section.
    
    * doc/infra-handbook.org (Specifications): Mention the QLogic
    adapters.
    (Btrfs compression and mount options): Use 'compress' instead of
    'compress-force', as the later can cause too many file extents, which
    in turn translate into a slow mount for a very large file system.
    (Problems/solutions knowledge base): New section.
---
 doc/infra-handbook.org | 48 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 47 insertions(+), 1 deletion(-)

diff --git a/doc/infra-handbook.org b/doc/infra-handbook.org
index 956173d..0a67037 100644
--- a/doc/infra-handbook.org
+++ b/doc/infra-handbook.org
@@ -22,6 +22,7 @@ Dell PowerEdge R7425 server with the following specifications:
 
 - 2x AMD EPYC 7451 24-Core processors
 - Storage Area Network (SAN) of 100 TiB
+- SAN connected to two QLogic QLE2692 16G Fibre Channel adapters (qla2xxx)
 - 188 GiB of memory
 
 The machine can be remotely administered via iDRAC, the Dell server
@@ -162,7 +163,7 @@ file:../hydra/deploy-node-129.scm for a build machine when 
high
 availability is preferred over data safety (degraded):
 
 #+begin_src scheme
-(define %common-btrfs-options '(("compress-force" . "zstd")
+(define %common-btrfs-options '(("compress" . "zstd")
                                 ("space_cache" . "v2")
                                 "degraded"))
 #+end_src
@@ -191,3 +192,48 @@ file:../hydra/deploy-node-129.scm machine configuration:
                   "balance" "start" "-dusage=5" "/"))
          "btrfs-balance"))
 #+end_src
+
+* Problems/solutions knowledge base
+** The boot fails with kernel panick on qla2xxx-related errors
+Here's an example:
+#+begin_example
+[   51.266790] Call Trace:
+[   51.266792]  <TASK>
+[   51.266794]  _raw_spin_lock_irqsave+0x46/0x60
+[   51.266799]  qla2xxx_dif_start_scsi_mq+0x2b7/0xe60 [qla2xxx 
124f4fec4ef588623af420625c6af8b5bcce53fd]
+[   51.266823]  qla2xxx_mqueuecommand+0x222/0x2d0 [qla2xxx 
124f4fec4ef588623af420625c6af8b5bcce53fd]
+[   51.266838]  qla2xxx_queuecommand+0x1a1/0x3d0 [qla2xxx 
124f4fec4ef588623af420625c6af8b5bcce53fd]
+[   51.266852]  scsi_queue_rq+0x390/0xc00
+[   51.266857]  __blk_mq_try_issue_directly+0x176/0x1e0
+[   51.266861]  blk_mq_plug_issue_direct.constprop.0+0x93/0x180
+[   51.266865]  blk_mq_flush_plug_list+0x23d/0x2a0
+[   51.266868]  __blk_flush_plug+0xed/0x130
+[   51.266872]  blk_finish_plug+0x31/0x50
+[   51.266874]  read_pages+0x1f5/0x300
+[   51.266879]  page_cache_ra_unbounded+0x131/0x180
+[   51.266882]  force_page_cache_ra+0xc7/0x100
+[   51.266885]  page_cache_sync_ra+0x34/0x90
+[   51.266887]  filemap_get_pages+0x127/0x700
+[   51.266893]  filemap_read+0xde/0x420
+[   51.266898]  blkdev_read_iter+0xbd/0x1e0
+[   51.266901]  new_sync_read+0x13e/0x1c0
+[   51.266905]  vfs_read+0x151/0x1a0
+[   51.266908]  ksys_read+0x73/0xf0
+[   51.266911]  __x64_sys_read+0x1e/0x30
+[   51.266913]  do_syscall_64+0x60/0xc0
+[   51.266919]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
+[   51.266922] RIP: 0033:0x4e73de
+[   51.266924] Code: 0f 1f 40 00 48 c7 c2 bc ff ff ff f7 d8 64 89 02 48 c7 c0 
ff ff ff ff eb ba 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 
f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28
+[   51.266926] RSP: 002b:00007ffc403f39e8 EFLAGS: 00000246 ORIG_RAX: 
0000000000000000
+[   51.266928] RAX: ffffffffffffffda RBX: 0000000001a98738 RCX: 
00000000004e73de
+[   51.266929] RDX: 0000000000000100 RSI: 0000000001a98748 RDI: 
0000000000000006
+[   51.266930] RBP: 0000000001a51bc0 R08: 0000000001a98720 R09: 
0000000001a3ef10
+[   51.266932] R10: 0000000000000007 R11: 0000000000000246 R12: 
000009ffffffe000
+[   51.266933] R13: 0000000000000100 R14: 0000000001a98720 R15: 
0000000001a51c10
+[   51.266936]  </TASK>
+[   54.246148] NMI watchdog: Watchdog detected hard LOCKUP on cpu 64
+#+end_example
+Solution: Stop the server, update the firmware of the QLogic cards,
+then start the server.  The exact failure reason is unknown but it is
+possible that the QLogic cards firmware becomes incompatible with that
+of the SAN, which is always kept up to date.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]