It used to be a trickle of two or three a day, but lately we’ve been hammered with hundreds of automated SPAM submissions through the cformsII contact form plugin we use with WordPress. The contact form feeds into a Google Groups mailing list that forwards to the whole team.
99% of the SPAM advertised a dozen designer brand knock-offs: North Face, Gucci handbags, Burberry, etc. This stuff isn’t hard to filter by keyword. First we looked at Google Groups to see if there was a keyword filter to protect group lists from SPAM, naughty language, or whatever. Surprisingly nothing.
Next we checked the cformsII plugin. As versatile and flexible as it is, there’s no easy way to add a banned word list. There’s a couple antispam features, but the honeypot requires CSS changes to the site theme and we’d die before burdening you with a captcha to contact us.
cformII allows us to use a regular expression to evaluate each field of the submission form. Searching the support forum turned up tons of requests for a simple regex to filter SPAM words. Each was met with suggestions to search the forum for some epic post on the topic, but we couldn’t find it anywhere.
^(?i)(\b(red bottoms|timberland|beats by dre|burberry|Louis Vuitton|Gucci|uggs|ray ban|north face|Tiffany|Michael Kors|coach)\b)$
First we worked up a regex to match SPAM words with The Regex Coach. The problem is this matches words and evaluates to true, allowing only forms with these words to be accepted. We needed a way to negate the regex result. The equivalent of ! or NOT in most languages like C, PHP, Perl, Basic, etc.
^(?i)(?:(?!\b(red bottoms|timberland|beats by dre|burberry|Louis Vuitton|Gucci|uggs|ray ban|north face|Tiffany|Michael Kors|coach)\b).)*$
Here’s the final regex we’re using after much painful tinkering. ^ and $ encase the regex. (?i) makes it case insensitive. (?:(?! and .)* negate the results of the list of SPAM words so only messages without these words are allowed. \b( and )\b is a list of bad words that are rejected from the message body and name, \b defines the word boundary.
Hope this helps out anyone else dealing with repetitive SPAM from the same few bots. Obviously its not a perfect solution for everything, but it stopped the flood instantly without modifying site themes or forcing you to use a captcha on the contact form.