How to minimise missed slot leader checks

Are you getting a high number of missed blocks / missed slot leader checks in gLiveView or on your Grafana dashboard for your Cardano node block producer? Not to worry so are most people. There are a

If you find this guide useful consider staking some Cardano to SHARD stake pool. It helps support us to keep creating content for Cardano Stakepool Operators and other Crypto

Originally posted on our other site

Assumptions:

This guide assumes you are using a Linux based server and a Cardano node built using the coincashew guide; however, you can easily adapt it if you have installed using other methods

Garbage Collection Optimisation

Assuming your node is healthy and correctly configured the usual cause for the missed slot leader check is Garbage Collection.

Default settings can see missed slot leader percentages around 4-5% which isn't great. With a bit of tuning you can get it down to around 0.4%.

To change this you will need to modify your startup script. But we will get to that in a moment firstly lets touch on a few of the key settings that can impact garbage collection

–disable-delayed-os-memory-return

Optional results in more accurate resident memory usage

–nonmoving-gc

Enable the concurrent mark-and-sweep garbage collector

-N (x)

Use ⟨x⟩ simultaneous threads when running the program. In most cases you want the number to match your processor cores available on the machine. i.e. a 6 core machine would use -N6. You can just use -N and the system will attempt to determine the best number of cores.

-n

(details from Haskell)

When set to a non-zero value, this option divides the allocation area (-A value) into chunks of the specified size. During execution, when a processor exhausts its current chunk, it is given another chunk from the pool until the pool is exhausted, at which point a collection is triggered.

This option is only useful when running in parallel (-N2 or greater). It allows the processor cores to make better use of the available allocation area, even when cores are allocating at different rates. Without -n, each core gets a fixed-size allocation area specified by the -A, and the first core to exhaust its allocation area triggers a GC across all the cores. This can result in a collection happening when the allocation areas of some cores are only partially full, so the purpose of the -n is to allow cores that are allocating faster to get more of the allocation area. This means less frequent GC, leading a lower GC overhead for the same heap size.

This is particularly useful in conjunction with larger -A values, for example -A64m -n4m is a useful combination on larger core counts (8+).

-T

Collects GC stats – you can query the data using the GHC.Stats module

-I (seconds)

Sets the minimum idle time that must pass before amajor garbage collection event occurs. -I0 disables idle garbage collection

-Iw ⟨seconds⟩

Sets the minimum time between major garbage collection events

-A16m

Set the allocation size to 16MB

-F (factor)

(details from Haskell)

This option controls the amount of memory reserved for the older generations (and in the case of a two space collector the size of the allocation area) as a factor of the amount of live data. For example, if there was 2M of live data in the oldest generation when we last collected it, then by default we’ll wait until it grows to 4M before collecting it again.

The default seems to work well here. If you have plenty of memory, it is usually better to use -H ⟨size⟩ (see -H [⟨size⟩]) than to increase -F ⟨factor⟩.

The -F ⟨factor⟩ setting will be automatically reduced by the garbage collector when the maximum heap size (the -M ⟨size⟩ setting) is approaching.

Optimise Processor Usage and Garbage Collection

Again this depends on how you have compiled and installed Haskell on your node. So long as you have compiled in threaded mode then you can pass in the additional options when starting your node.

Edit the script you use to start the cardano-node, if using coincashew guide then edit the following file, otherwise edit the script you use to start cardano-node

sudo nano $NODE_HOME startBlockProducingNode.sh

look for the part of the script containing the following

cardano-node run --topology

in between cardano-node run and --topology you want to insert the following:

+RTS -N --disable-delayed-os-memory-return -I0.3 -Iw600 -A16m -F1.5 -H2500M -RTS

so the start of the line should be looking something like this

cardano-node run +RTS -N --disable-delayed-os-memory-return -I0.3 -Iw600 -A16m -F1.5 -H2500M -RTS --topology

-N will use all available cores. If you want to specifically specify the number of cores you can use -N4 (for example use 4 cores). If you plan to ever resize your VM or change your processor it’s probably safer to use -N

At this point you will need to restart your node for the settings to take effect; however, read the optional section below and if you don't feel the need to turn off TraceMempool then skip ahead to Restarting your node.

Turn off TraceMempool (optional)

If you are still having issues you can consider disabling TraceMempool which can consume additional CPU and cause issues on some installations. Typically if you are not dancing on the edge of CPU and Memory minimums you shouldn’t have a problem.

Edit mainnet-config.json and change the following line from

"TraceMempool": true,

to

"TraceMempool": false,

Once you are done with the settings changes restart your node

Restart your Block Producer

sudo systemctl restart cardano-node

Tested Configs for 1.30.1

+RTS -N -T -I0 -A16m --disable-delayed-os-memory-return --nonmoving-gc -RTS (low number of missed slot checks, stable)
+RTS -N -A16m -qg -qb -RTS (coin cashew resulted in many missed slot leader checks)
+RTS -N --disable-delayed-os-memory-return -I0.3 -Iw600 -A16m -F1.5 -H2500M -RTS (low missed slots stable, faster startup)
+RTS -N --disable-delayed-os-memory-return -I0.3 -Iw600 -A16m -F1.5 -H3000M -RTS (low missed slots stable, faster startup)

Best Config Tested for 1.30.1

+RTS -N6 -I0.3 -Iw600 -A16m -F1.5 -H2500M

Tested Configs for 1.32.1

Coming soon

Last updated