ursadb

Indexing

First, start ursadb client command prompt

ursacli
[2020-05-10 05:23:27.216] [info] Connecting to tcp://localhost:9281
[2020-05-10 05:23:27.219] [info] Connected to UrsaDB v1.3.2+be20951 (connection id: 006B8B45B4)
ursadb>

Now, type:

ursadb> index "/mnt/samples";

To index "/mnt/samples" directory. By default this will only use gram3 index. It’s a good idea to use more indexes for better results:

ursadb> index "/mnt/samples" with [gram3, text4, wide8, hash4];

There are more variations of this command. For example you can:

See query syntax documentation

All indexing is part of a single transaction, so when the server crashes indexing will have to be restarted. This is intentional - because of this it’s always possible to tell which files have been indexed, and the database is in the consistend state. But it makes indexing really large collections harder.

To avoid this problem, use utils/index.py script shipped with mquery.