Rspamd statistics configuration with Redis

Rspamd is able to use combinations of words to qualify messages as Spam or Ham. We will configure it to learn these tokens per user and store them in redis with a TTL of 100 days as spammers may develop new methods over time.

The key idea of OSB algorithm is to use not merely single words as tokens but combinations of words weighted by theirs positions.

Once we have gathered enough samples, Rspamd will classify new messages using our tokens in addition to its symbols. We will use separate redis-instances to store the data. Our custom configuration will be stored as /etc/rspamd/local.d/classifier-bayes.conf.

/etc/rspamd/statistic.conf

classifier "bayes" {
  tokenizer {
    name = "osb";
  }
  cache {
  }
  new_schema = true; # Always use new schema
  store_tokens = false; # Redefine if storing of tokens is desired
  signatures = false; # Store learn signatures
  #per_user = true; # Enable per user classifier
  min_tokens = 11;
  backend = "redis";
  min_learns = 200;

  statfile {
    symbol = "BAYES_HAM";
    spam = false;
  }
  statfile {
    symbol = "BAYES_SPAM";
    spam = true;
  }
  learn_condition = 'return require("lua_bayes_learn").can_learn';

  # Autolearn sample
  # autolearn {
  #  spam_threshold = 6.0; # When to learn spam (score >= threshold and action is reject)
  #  junk_threshold = 4.0; # When to learn spam (score >= threshold and action is rewrite subject or add header, and has two or more positive results)
  #  ham_threshold = -0.5; # When to learn ham (score <= threshold and action is no action, and score is negative or has three or more negative results)
  #  check_balance = true; # Check spam and ham balance
  #  min_balance = 0.9; # Keep diff for spam/ham learns for at least this value
  #}

  .include(try=true; priority=1) "$LOCAL_CONFDIR/local.d/classifier-bayes.conf"
  .include(try=true; priority=10) "$LOCAL_CONFDIR/override.d/classifier-bayes.conf"
}

.include(try=true; priority=1) "$LOCAL_CONFDIR/local.d/statistic.conf"
.include(try=true; priority=10) "$LOCAL_CONFDIR/override.d/statistic.conf"

We will use the standard configuration that comes with Rspamd and apply our changes.

/etc/rspamd/local.d/classifier-bayes.conf

servers = "/run/redis-6380/redis-server.sock";
expire = 100d;
per_user = true;
autolearn = true;

The autolearn mechanism can be configured to use individual thresholds: autolearn = [2, 10]. This will learn messages as ham with a score of less than 2 and spam with a score of 10 and greater.

Setup Redis instances

We will use individual Redis instances rather than Redis databases to store the Bayes data produced by Rspamd. The main benefit of this approach is being able to backup Redis dumps separately vs. using multiple databases in one Redis instance.

/etc/redis/redis-6830.conf

port 6380
unixsocket /run/redis-6380/redis-server.sock
pidfile /run/redis-6380/redis-server.pid
logfile /var/log/redis/redis-server-6380.log
dbfilename dump-6380.rdb

systemctl start redis-server@6380.service

Display all members of the ham key:

redis-cli -s /run/redis-6380/redis-server.sock smembers BAYES_HAM_keys

Setup Redis instances

About tlx@leuxner.net

Related Articles