Table of Contents

The Anomy Mail Sanitizer

The Anomy Sanitizer is what most people would call “an email virus scanner”. That description is not totally accurate, but it does cover one of the more important jobs that the sanitizer can do for you - it can scan email attachments for viruses.

It is a rather old piece of software (the last 1.76 release is dated 2006), but it is still included in Debian 11 Bullseye and it can perform rather sofisticated email processing as a simple filter operation.

I use it as a personal mail filter in GNU/Linux mail servers, because it can be activated on a per-user basis, by the Local Delivery Agent called by Postfix. The LDA can be as simple as procmail or the more complex Dovecot LDA with Pigeonhole Sieve Interpreter.

Perl unescaped left brace warning

The Sanitizer version included in Debian Bullseye contains a deprecated syntax into the Perl code, which triggers the warning message:

Unescaped left brace in regex is passed through in regex;

It turned out to be into the file /usr/share/perl5/Anomy/Sanitizer/MacroScanner.pm, at lines 120 and 127. Here the fix:

$score +=  4 while ($buff =~ s/\000(ID="\{[-0-9A-F]+)$/x$1/i);
$score +=  1 while ($buff =~ s/\000(ID="\{[-0-9A-F]+\}"|ThisWorkbook\000|PrivateProfileString)/x$1/i);

The HTML MIME multipart problem

Several mail user agents nowaday compose email messages in HTML format, sometimes without including a text-only copy of the same message. Some agents include the HTML as a part of multipart MIME message, correctly marked as text/html. Other agents compose the message body directly in HTML, without using the MIME multipart system.

In some circumstances Sanitizer defang the HTML message or the HTML part (changing its content type); thus a modern email reader does not display it correctly. In the best case an anonymous attachment is shown, in the worst case an empty message is shown.

The Anomy Sanitizer uses several methods to detect the HTML parts into a message, relaying on the Content-Type: text/html or the filename of the MIME part (if specified). Once it detects an HTML part, it performs some operations on it, one of them is the match with a regular expression to confirm that it is actually an HTML text. If that regex test fails, the Sanitizer neutralizes (defang) such part changing its content type from text/html to something like application/DEFANGED-14789 (the type name is composed using the msg_defanged configuration option).

That behaviour is triggered by the feat_files = 1 configuration option (enable filename-based policy decisions).

Unfortunately the regex used by Sanitizer to detect an HTML part is very naive: it simply must contain this expression:

<html|<body|<p>|<b>|<i>|<br>|</a>

Notably the Gmail application nowaday (Jan 2023) composes the mail messages using only a <div> tag, thus fooling Sanitizer into defanging that part.

I fixed the Perl code into /usr/share/perl5/Anomy/Sanitizer/FileTypes.pm, changing the regular expression in this way:

my $HTML = {
    id         => "html",
    risk       => $low,
    name       => "HTML text file",
    extensions => [ "html", "htm", "shtml" ],
    mime_types => [ 'text/html' ],
    magic      => [ ],
    regexp     => '<html|<body|<div|<span|<p>|<b>|<i>|<br>|</a>',
};

It is also possibile to remove the regexp element of the dictionary, in this case Sanitizer will recognize an HTML part only by the content type or the filename.

The customized perl module can be installed into /etc/perl/Anomy/Sanitizer/FileTypes.pm, without changing the file installed by the Debian package.