[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Taler] taler-exchange-aggregator scalability update

From: Christian Grothoff
Subject: [Taler] taler-exchange-aggregator scalability update
Date: Sat, 4 Sep 2021 20:17:05 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0

Dear all,

I just did a first set of benchmarks on the taler-exchange-aggregator
sharding logic that was implemented over the last week.

Executive summary

taler-exchange-aggregator now supports sharding. Single-system
performance was measured around 33k transactions/second. At this point,
the system is not CPU bound, so likely the database is then IO bound.

Experiments and results

Using taler-aggregator-benchmark, I populated a database with 1 M
deposits to 1000 different merchants (1000 deposits/merchant).
The baseline for this system, so running the aggregator logic on it
without sharding takes 253s. That is roughly 4000 deposits/second. 90%
of the CPU load is in Postgres.

For a first sharding test, I configured the system to use 8 shards, and
but only ran a single worker. The performance did not change, so the
sharding itself is virtually free.  Then, I ran 2 workers on the 8
shards. Now it took 134s (real time), or 7462 TPS (almost doubling).
Using 8 workers on the 8 shards (the maximum) it took 40s, for 25000 TPS
(quadrupling, as it would for a system with perfect linear scalability).

Now, the 1 M records used for this test basically fit into memory, so
next I went to see what would happen with 25 M records (the *disk* I am
using for the database only has 128 GB storage, so it doesn't take much
more than this given that we seem to store 2-3kb/transaction including
indices and overheads).  Again I pre-initialized a database with 25 M
deposits, 5000 each for 5000 merchants and configured 32 shards (empty
shards cause trouble with my timing/benchmark logic, but with many more
transaction records we can safely use more shards).

 8 workers: 1175s = 21k trans/sec @ 90% CPU load on Postgres (0% idle!)
16 workers:  792s = 31k trans/sec @ 85% CPU load on Postgres
32 workers:  756s = 33k trans/sec @ 70% CPU load on Postgres (25% idle)

The benchmarks were done on a 32-core system, so more workers will not
make sense. Besides, we can already see performance hardly growing
linearly. I suspect it is the single (SSD) disk backing the Postgres
database that is limiting us here -- iotop reports 50 MB/s read and 300
MB/s write load. At least that's the best explanation I have for the
drop in CPU utilization (note that with X workers I only look at the
most loaded X CPUs when reporting CPU utilization).


* I did not even attempt to tune the Postgres installation;
* Taler was compiled without optimization (-O0 -g -Wall), but
  I doubt this matters much given that Postgres is dominating CPU
* CPU is a AMD Ryzen Threadripper 1950X 16-Core, so a drop > 16 workers
  may ALSO be due to hyper-threading not being as effective as full
* taler-exchange-wirewatch was already scaled to close to 100k TPS
  on this system;
* taler-exchange-httpd in its latest incarnation should scale
  trivially by running more processes, however, secmod sharing
  is required and the secmod's will likely be the bottleneck;
* taler-exchange-transfer is next on my list to scale, and should
  be comparably trivial;
* taler-exchange-closer doesn't _need_ to scale IMO;
* taler-mechant-httpd should already trivially scale by running
  more processes
* that leaves the auditor (which will require a larger rewrite to
  scale nicely on paper), but the auditor at least only needs to
  handle the average transaction rate, and not the peek rate.


So except for taler-exchange-transfer (work in progress), this basically
means the exchange components scale _individually_. The next months I
plan to deploy Taler on Grid5000 as part of NGI Fed4Fire+ and do an
_integrated_ system scalability test with exchange, many merchants and
many, many wallets (the auditor may then be added into the mix in 2022).

Why do this?

Because I want to produce hard(er) evidence that Taler can handle the
transaction rates central banks need in their central bank digital
currency efforts.

How can you help?

Well, my experience with Postgres is limited, so I would love to see
some Postgres DBA involved with experience in sharding/scaling Postgres.
I would also not be surprised if someone could optimize DB
settings/indices/query plans/table structure and speed things up
substantially without extra hardware.

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]