Ursadb configuration is a simple set of key-value pairs. The defaults are sane. You don’t need to change them, unless you want to tweak ursadb to your system.
They can be read by issuing a config get
command with ursacli
:
$ ursacli
ursadb> config get;
The response format is:
{
"result": {
"keys": {
"database_workers": 10,
"merge_max_datasets": 10,
"merge_max_files": 1073741824,
"query_max_edge": 2,
"query_max_ngram": 16
}
},
"type": "config"
}
To change a config value, you may issue a config set
command:
$ ursacli
ursadb> config set "database_workers" 10;
Maximum number of values a first or last character in sequence can take to be considered when planning a query. The default is a conservative 1, so query plan will never start or end with a wildcard.
Recommendation: Stick to the default value. If you have a good disk and want to reduce false-positives, increase (but no more than 16).
Maximum number of values a ngram can take to be considered when planning
a query. For example, with a default value of 16, trigram 11 2? 33
will
be expanded and included in query, but 11 ?? 33
will be ignored.
Recommendation: Stick to the default value at first. If your queries are fast, use many wildcards, but have many false positives, increase to 256.
How many tasks can be processed at once? The default 4 is a very conservative value for most workloads. Increasing it will make the database faster, but at a certain point the disk becomes a bottleneck. This will also linearly increase memory usage in the worst case.
Recommendation: If your server is dedicated to ursadb, or your IO latency is high (for example, files are stored on NFS), increase to 8 or more.
How many datasets can be merged at once? This has severe memory usage implications - before merging, datasets must be fully loaded, and every loaded dataset consumes a bit over 128MiB. Increasing this number makes compacting huge datasets faster, but may run out of ram.
Recommendation: merge_max_datasets * 128MiB can safely be set to around 1/4 of RAM dedicated to the database, so for example 8 for 4GiB server or 32 for 16GiB server. Increasing past 10 has diminishing returns, so unless you have a lot of free RAM you can leave it at default.
When merging, what is the maximal allowed number of files in the resulting dataset? Large datasets make the database faster, but also need more memory to run efficiently.
Recommendation: ursadb was used with multi-million datasets in the wild, but currently we recommend to keep datasets smaller than 1 million files.