[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Sks-devel] SKS Performance oddity
From: |
Michiel van Baak |
Subject: |
Re: [Sks-devel] SKS Performance oddity |
Date: |
Sat, 9 Mar 2019 11:29:14 +0100 |
User-agent: |
NeoMutt/20180716 |
On Sat, Mar 09, 2019 at 12:22:14AM -0500, Jeremy T. Bouse wrote:
> I don't know what is going on here with my cluster but I have 3 of 4
> nodes that absolutely perform as I would expect... They have 2 vCPU
> with 4GB RAM each along with an extra 50GB drive exclusively for SKS
> use under /var/lib/sks. The three behaving fine are my sks02, sks03
> and sks04 secondary nodes. My primary node on the other hand is
> another story. First I tried increasing it from 2 vCPU/4GB RAM like
> the others to 2 vCPU/8GB RAM and then 4 vCPU/8GB RAM without it making
> any change. I then built out a new physical server with a quad-core
> Xeon 2.4GHz processor and 4GB RAM and a dedicated 3TB RAID5 array and
> I'm seeing the same problem. SKS is constantly pegging the CPU at 100%
> and eating up nearly all the memory whether it's running on a virtual
> or physical. server. Recon service is working and I'm ingesting keys
> from peers and peering with my internal cluster nodes but everytime it
> goes into recon mode the node starts failing to respond as the CPU and
> RAM spike which then leads to the node being dropped from the pool as
> the stats page can't be hit before it times out.
>
> I've been fighting with this for a several days now... Anyone else
> out there seeing this behavior or if not and have similar resourced
> servers care to share details to see if I'm missing something here.
>
> The particulars are that all nodes are Debian 9.8 (Stretch) 64-bit.
> Then only primary node handles running NGINX configured for load
> balancing the cluster. The only other daemons running across all nodes
> besides SKS are OpenSSH for remote access, SSSD for centralized
> authenication, Haveged for entropy and Postfix configured for
> smarthost relaying.
Hey,
I hav exactly the same problem.
Several times in the last month I have done the following steps:
- Stop all nodes
- Destroy the datasets (both db and ptree)
- Load in a new dump from max 2 days old
- Create the ptree database
- Start sks on the primary node, without peering configured (comment out
all peers)
- Give it some time to start
- Check the stats page and run a couple of searches
# Up until here everything works fine #
- Add the outside peers on the primary node and restart it
- After 5 minutes the machine takes 100% CPU, is stuck in I/O most of
the time and falls off the grid
It doesn't matter if I enable peering with the internal nodes or not.
Just having 1 SKS instance running, and peering it with the network is
enough to basically render this instance unusable.
Like you, I tried in a vm first, and also on a physical machine (dual
6-core xeon E5-2620 0 @ 2.00GHz with 96GB ram and 2 samsung evo 840 pro
ssds for storage)
I see exactly the same every time I follow the steps outlined above.
The systems I tried are Debian linux and FreeBSD and all the same.
--
Michiel van Baak
address@hidden
GPG key: http://pgp.mit.edu/pks/lookup?op=get&search=0x6FFC75A2679ED069
NB: I have a new GPG key. Old one revoked and revoked key updated on keyservers.