Rspamd is able to use combinations of words to qualify messages as Spam or Ham. We will configure it to learn these tokens per user and store them in redis with a TTL of 100 days as spammers may develop new methods over time.
The key idea of OSB algorithm is to use not merely single words as tokens but combinations of words weighted by theirs positions.
Once we have gathered enough samples, Rspamd will classify new messages using our tokens in addition to its symbols. We will use separate redis-instances to store the data. Our custom configuration will be stored as
classifier "bayes" { tokenizer { name = "osb"; } cache { } new_schema = true; # Always use new schema store_tokens = false; # Redefine if storing of tokens is desired signatures = false; # Store learn signatures #per_user = true; # Enable per user classifier min_tokens = 11; backend = "redis"; min_learns = 200; statfile { symbol = "BAYES_HAM"; spam = false; } statfile { symbol = "BAYES_SPAM"; spam = true; } learn_condition = 'return require("lua_bayes_learn").can_learn'; # Autolearn sample # autolearn { # spam_threshold = 6.0; # When to learn spam (score >= threshold and action is reject) # junk_threshold = 4.0; # When to learn spam (score >= threshold and action is rewrite subject or add header, and has two or more positive results) # ham_threshold = -0.5; # When to learn ham (score <= threshold and action is no action, and score is negative or has three or more negative results) # check_balance = true; # Check spam and ham balance # min_balance = 0.9; # Keep diff for spam/ham learns for at least this value #}.include(try=true; priority=1) "$LOCAL_CONFDIR/local.d/classifier-bayes.conf" .include(try=true; priority=10) "$LOCAL_CONFDIR/override.d/classifier-bayes.conf" } .include(try=true; priority=1) "$LOCAL_CONFDIR/local.d/statistic.conf" .include(try=true; priority=10) "$LOCAL_CONFDIR/override.d/statistic.conf"
We will use the standard configuration that comes with Rspamd and apply our changes.
servers = "/run/redis-6380/redis-server.sock"; expire = 100d; per_user = true; autolearn = true;
The autolearn
mechanism can be configured to use individual thresholds: autolearn = [2, 10]
. This will learn messages as ham with a score of less than 2 and spam with a score of 10 and greater.
Setup Redis instances
We will use individual Redis instances rather than Redis databases to store the Bayes data produced by Rspamd. The main benefit of this approach is being able to backup Redis dumps separately vs. using multiple databases in one Redis instance.
port 6380 unixsocket /run/redis-6380/redis-server.sock pidfile /run/redis-6380/redis-server.pid logfile /var/log/redis/redis-server-6380.log dbfilename dump-6380.rdb
systemctl start redis-server@6380.service
Display all members of the ham key:
redis-cli -s /run/redis-6380/redis-server.sock smembers BAYES_HAM_keys