sks-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Sks-devel] SKS Performance oddity


From: Todd Fleisher
Subject: Re: [Sks-devel] SKS Performance oddity
Date: Fri, 8 Mar 2019 22:52:31 -1000

I've been having similar issues his week, though it's mainly high IO load/wait 
that is the issue. Also it's not been my primary nodes that recon with the 
outside world, but some of my secondary nodes that only peer internally. I've 
been restoring them by replacing the DB & PTree files/dirs from another node 
and that seems to do the trick for a period of time but I have already done it 
twice in the last few days so it's not really a sustainable approach. I just 
haven't had time to dig deeper into it to try and determine why it is happening 
and/or how to better protect against it. 

Sent from the Fleishphone

> On Mar 8, 2019, at 19:22, Jeremy T. Bouse <address@hidden> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
>    I don't know what is going on here with my cluster but I have 3 of 4
> nodes that absolutely perform as I would expect... They have 2 vCPU
> with 4GB RAM each along with an extra 50GB drive exclusively for SKS
> use under /var/lib/sks. The three behaving fine are my sks02, sks03
> and sks04 secondary nodes. My primary node on the other hand is
> another story. First I tried increasing it from 2 vCPU/4GB RAM like
> the others to 2 vCPU/8GB RAM and then 4 vCPU/8GB RAM without it making
> any change. I then built out a new physical server with a quad-core
> Xeon 2.4GHz processor and 4GB RAM and a dedicated 3TB RAID5 array and
> I'm seeing the same problem. SKS is constantly pegging the CPU at 100%
> and eating up nearly all the memory whether it's running on a virtual
> or physical. server. Recon service is working and I'm ingesting keys
> from peers and peering with my internal cluster nodes but everytime it
> goes into recon mode the node starts failing to respond as the CPU and
> RAM spike which then leads to the node being dropped from the pool as
> the stats page can't be hit before it times out.
> 
>    I've been fighting with this for a several days now... Anyone else
> out there seeing this behavior or if not and have similar resourced
> servers care to share details to see if I'm missing something here.
> 
>    The particulars are that all nodes are Debian 9.8 (Stretch) 64-bit.
> Then only primary node handles running NGINX configured for load
> balancing the cluster. The only other daemons running across all nodes
> besides SKS are OpenSSH for remote access, SSSD for centralized
> authenication, Haveged for entropy and Postfix configured for
> smarthost relaying.
> -----BEGIN PGP SIGNATURE-----
> 
> iQGzBAEBCgAdFiEEakJ0F+CHS9VzhSFg6lYpTv4TPXUFAlyDTX0ACgkQ6lYpTv4T
> PXUB0Qv/fRbDkGPes3eq3xDkv6MQHfVFLXuUNdjOtrgpvCwkiS8b340dDKmI5a+x
> NufUzvSHX4GjOc3Joxivc/N1rA7ENrwEX+2T/cwrE8iu+himuvAJkQtXp2qo2Dye
> 9CgzGKR/J0BO50tdmNCJLp6xuR4eY4ISBo0FeeGplipmZIv5BSqKcTcYWaFCNddr
> FLqk6gKT1yzVHb8aO4KzIyB9CqcJEBbTL/RTaJWslCewYcmikw6NBOc1dV/BoxBA
> uGXK3o48o3mo7LJj+sH8/U6F0Ffqnn/tbwIIe/dZSnyonTyP1ENAN2zBWgdzyiRK
> qp1/TDoFC6FuujJgJNKOSsPMNw9bVd5gXYUIIDIE9YK7SeCEP2us4TWS4LQJmuB9
> 7aidQ0rseyN9cSKrswUyWq7k3pM8iLnzx7D8BwW2uvO2SjKo+ALce5UtjyOhgg9v
> ECnxoKjeUTujle/0ZRyi5AbC3AfKi/CoREIJ98w+tAh7jdM5w34vYH8plekRGbFp
> 4bNo9Fyl
> =EdIY
> -----END PGP SIGNATURE-----
> 
> _______________________________________________
> Sks-devel mailing list
> address@hidden
> https://lists.nongnu.org/mailman/listinfo/sks-devel
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]