Announcement

**Michael** · 05 January 2023, 03:03 PM

Just to be sure I am not missing something, for the prepared query mode it's just a matter of adding --protocol=prepared to the pgbench options, correct? Also can easily extend the client count as well. Thanks.

**jochendemuth** · 05 January 2023, 03:12 PM

Originally posted by Michael View Post

Just to be sure I am not missing something, for the prepared query mode it's just a matter of adding --protocol=prepared to the pgbench options, correct? Also can easily extend the client count as well. Thanks.

Yep - that's it!

**jochendemuth** · 05 January 2023, 03:31 PM

Michael - if I may provide some constructive criticism on the pts suite...

The breadth of pts is staggering and few, if any, will understand all the things that can be tested with it.
Most tests have parameters of highly technical nature. One needs to have an inherent insight into the tested software to make sound decisions.

I am working on a suggestion to change this for postgresql. Away from technical specifications of scale and client, but rather offering a list of simulated use cases.

The above discussed use case is valid - it represents a read-only, in-memory-only (in all but the smallest hw configurations) OLAP workload. Based on my documented test results it would be easily possible to set parameters for pgbench for any hardware configuration: simply choose "--scale=100 --clients=4*$NR_CPU_CORES --protocol=prepared".

Similarly, I think I can define a meaningful read-write workload that is auto-tuned to a given hardware. That would enable meaningful comparison across hw and software configurations.

Will present my proposal in a followup post in this thread.

**jochendemuth** · 09 January 2023, 12:32 AM

Looking at the current implementation of PTS for the default operation of pgbench, the "tpcb-like" script, we first need to recognize that this script contains select, insert, update, and delete operations. Invariably, data will be altered, which means that this test will stress more computer components than the simpler "select-only" script.

I see several situations in the current test implementation that leads to lower than possible test results:

Table locking. The man page of pgbench explains that a higher number of clients than used to scale will lead to locking in the default pgbench script. At this point the benchmark mostly measures how long tables are locked before transactions get completed.
I tried this out with scale=50 (smaller than optimal # of clients from my first post) and scale=3210 (represents a size 3/4 of memory). As can be seen easily on the graph below, both latency and tps are much improved in the larger scale test database.
image.png

**jochendemuth** · 09 January 2023, 12:33 AM

Write-ahead log (WAL). WAL's central concept is that changes to data files (where tables and indexes reside) must be written only after those changes have been logged, that is, after log records describing the changes have been flushed to permanent storage. Checkpoints are points in the sequence of transactions at which it is guaranteed that the heap and index data files have been updated with all information written before that checkpoint. At checkpoint time, all dirty data pages are flushed to disk and a special checkpoint record is written to the log file. Checkpoints are triggered either after a set timeout occured or when the allotted space for the WAL is full. Checkpoints are fairly expensive, first because they require writing out all currently dirty buffers, and second because they result in extra subsequent WAL traffic.
In the default configuration, the WAL configuration is too small, resulting in too frequent checkpoints. Checkpoints that occur during a test run will lower the test score because it takes disk resources away from the test. Because checkpoints in default configuration are triggered at different points in and between test runs, the test results are not very consistent. Production deployments tune WAL in such a way that checkpoints are expected to be time triggered based on expected loads, the frequency of checkpoints is monitored.
I tried this out with default WAL configuration and with WAL size increased to the point that a checkpoint is guaranteed not to be triggered for the duration of the test. Also, I forced a checkpoint before every run of pgbench. As a result the latencies are more consistent and lower and a 60% higher TPS is achieved.

**Michael** · 09 January 2023, 06:01 AM

Originally posted by jochendemuth View Post

I tried this out with default WAL configuration and with WAL size increased to the point that a checkpoint is guaranteed not to be triggered for the duration of the test. Also, I forced a checkpoint before every run of pgbench. As a result the latencies are more consistent and lower and a 60% higher TPS is achieved.

What was the WAL configuration values you used? Thanks.

**jochendemuth** · 09 January 2023, 08:17 AM

Originally posted by Michael View Post

What was the WAL configuration values you used? Thanks.

I set max_wal_size='100GB'.

I created a checkpoint by issuing the command "checkpoint;" as superuser. Here as a command ahead of the pgbench command:

Code:

$ psql test -c 'checkpoint;'; pgbench ...

It's possible to query the amount of WAL used by postgresql with the following command:

Code:

psql test -c 'select sum(size) from pg_ls_waldir();'

In my testing WAL reached about 4GB in size. The size is dependent on the number of transactions between checkpoints. Fast setups will consume more.

**jochendemuth** · 09 January 2023, 08:31 AM

An upper bound for the max_wal_size parameter is the sequential write performance of the storage medium for the test duration.

So, in case of the dual AMD EPYC Genoa on Optane P5800X I'd assume that WAL will take about 7GB/s for 120s, meaning up to ~1TB of storage. Hard coding this amount of storage will not be a great choice for PTS. I wonder if you have a way to query free storage capacity in the setup phase and, more crucially, delete the postgresql folder after a successful test run to clear up space.

I personally find that pgbench reaches steady state relatively quickly, and only run it for 30s at a time.

I used the following bash calculation to initialize pgbench with ~3/4 memory size. I assume that PTS has an easier way to do that.

Code:

$ scale=$(echo "mem = $(grep MemTotal /proc/meminfo | grep -o '[[:digit:]]*')/1024; s = mem/15*0.75; scale = 0; (s+0.5)/1 " | bc -l);
$ pgbench -i -s $scale test

**jochendemuth** · 09 January 2023, 12:27 PM

The last set of tests still only reaches ~40% CPU utilization being completely bottlenecked by the default storage config on Fedora 37. Now, I'm adding the current default optimizations (shared_buffers=1/4 memory) and the suggested options from the select-only test (--protocol=prepared) for completeness.
We can see that these optimization predictably lead to slightly lower latency, and especially at high concurrency (400 clients) leads to signficantly higher tps.

image.png

Sorry, about the information dissemination in many posts, but my account is limited to one picture per post and the graphs really tell the story.

Announcement

Suggestions for improving the pts/postgresql test

Suggestions for improving the pts/postgresql test

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment