mquery

YARA malware query accelerator (web frontend)

View on GitHub

Integrate mquery with S3

One very common question is “how to use Mquery with S3”. S3 is a file storage API is exposed by many open- and closed-source solutions, like Minio or of course AWS. Mquery does not support S3 natively but can work with S3 thanks to its very flexible plugin system. Unfortunately, this is not completely transparent, and S3 deployment is not easy. In this guide, I’ll explain how to integrate Mquery with existing S3 deployment.

Requirements

Caveats

The integration has some rough edges and assumes some things. Most importantly:

This integration works in the following way:

During the indexing, samples are temporarily downloaded to the ursadb machine, but don’t worry - after indexing, samples can be safely removed - so the ursadb machine will only contain the index.

Integration procedure

1. Install additional dependencies

We will need to install minio package. This can be done with a simple pip install minio. If you followed our native install guide you can install it in the virtual environment like this:

cd /opt/mquery/src/
source /opt/mquery/venv/bin/activate
pip install minio

If your installation slightly differs, adjust this command to your needs.

By the way, minio is a Python library for S3 communication - it doesn’t mean that you must use MinIO server.

2. Deploy a minio server for test purposes

(This is optional - if you already have an S3 server, you can use it)

We will use docker to keep things simple. Remember, that this is just for demonstration - our server will be neither secure nor persistent. Install docker if you don’t have it already:

apt install docker.io

And run the minio server (username: minio, password: minio123)

docker run --network host -p 9000:9000 -p 9001:9001 -e "MINIO_ROOT_USER=minio" -e "MINIO_ROOT_PASSWORD=minio123" quay.io/minio/minio server /data --console-address ":9001"

You should be able to login to minio at http://localhost:9001.

Login using username minio and password minio123. Click “Create Bucket” and call it mquery.

3. Enable S3 plugin

Open the mquery config file (/opt/mquery/src/config.py in the example installation):

vim /opt/mquery/src/config.py

Change PLUGINS key to:

PLUGINS = ["plugins.s3_plugin:S3Plugin"]

And exit vim with [esc]:x[enter].

Restart mquery workers and the web interface.

4. Configure the plugin

If you did it correctly, workers should print the following message:

[12/01/2023 00:23:40][ERROR] Failed to load S3Plugin plugin
Traceback (most recent call last):
  File "/opt/mquery/src/plugins/__init__.py", line 49, in __init__
    active_plugins.append(plugin_class(db, plugin_config))
  File "/opt/mquery/src/plugins/s3_plugin.py", line 24, in __init__
    super().__init__(db, config)
  File "/opt/mquery/src/metadata.py", line 28, in __init__
    raise KeyError(
KeyError: "Required configuration key 's3_url' is not set"

Navigate to the mquery config page at http://localhost/config. You should see the plugin configuration there. Set all the fields:

At this point, workers should be able to load plugins correctly.

5. Index your files

Good news - that’s everything you need for querying! Bad news - you still need to index your files.

To do this, you will need to run a dedicated S3 indexing script. If your ursadb instance is on a different server than your workers, you must run this script on the UrsaDB server (to be more precise, you need a shared storage with UrsaDB. The easiest way to obtain this is to run on the same disk).

Go to the mquery directory and execute the following (fix the parameters depending on your use case):

cd /opt/mquery/src/
source /opt/mquery/venv/bin/activate
python3 -m utils.s3index \
    --workdir /root/mquery_tmp \
    --s3-url localhost:9000 \
    --s3-secret-key YOUR-SECRET-KEY \
    --s3-access-key YOUR-ACCESS-KEY \
    --s3-bucket mquery \
    --s3-secure 0

--workdir is used to specify directory where the samples are temporarily downloaded. This is important, because this will be the path that UrsaDB sees and stores in the index. To make things simple for yourself, you should always use the same working directory, for example /var/s3/mquery.

This may take a while (or A LOT of time)), depending on how many samples you have. Unfortunately, this script is not parallelised, and it’s not safe to run multiple instances of this script at once. Future versions of this script will improve the performance.

Next steps

Congratulations, that’s all! You can index files in s3 and query them using mquery.