400 concurrent clients on a 12core/24thread CPU is, quite frankly, nuts. The CPU utilization never tops 40% with significant IO wait times. I also see up to 2s delays per statement due to locking.
In the interest of testing the currently derived test configuration across multiple configurations I will go ahead and iterate over different file systems: btrfs, ext4, xfs.
We can observe that latency is drastically improved with ext4 over btrfs (6.8ms vs. 13.7ms), which leads to significant tps improvements. Interestingly, the peak throughput on ext4 is achieved with even more clients (600). Latency is again significantly improved with xfs (3.5ms) and we see again much improved peak throughput at 400 clients.
CPU utilization is improved, but still barely exceeds 50%.
WAL reaches a max of 8.5GB
The longest delay due to locking with xfs shinks to ~200ms.
image.png
In the interest of testing the currently derived test configuration across multiple configurations I will go ahead and iterate over different file systems: btrfs, ext4, xfs.
We can observe that latency is drastically improved with ext4 over btrfs (6.8ms vs. 13.7ms), which leads to significant tps improvements. Interestingly, the peak throughput on ext4 is achieved with even more clients (600). Latency is again significantly improved with xfs (3.5ms) and we see again much improved peak throughput at 400 clients.
CPU utilization is improved, but still barely exceeds 50%.
WAL reaches a max of 8.5GB
The longest delay due to locking with xfs shinks to ~200ms.
image.png
Comment