Scientists at the bioinformatics research group at IBM's Thomas J Watson Research Center in New York, have developed an algorithm named Chung-Kwei that can catch nearly 97 per cent of spam.
According to New Scientist, it is based on the Teiresias algorithm, that was designed to search different DNA and amino acid sequences for recurring patterns.
The algorithm was fed 65,000 examples of known spam. Each email was treated as a long, DNA-like chain of characters. Teiresias identified six million recurring patterns in this collection, such as "Viagra".
Each pattern represented a common sequence of letters and numbers that had appeared in more than one unsolicited message. A collection of known non-spam (dubbed "ham")was run through the same process, and the patterns that occurred in both groups were removed.
Incoming email was given a score based on how many spam patterns it had. A long email that only had a few spammy sentences would get a relatively low score; but one with many patterns spread across the length of the message would score much higher.
The Chung-Kwei correctly identified 96.56 per cent test messages as being spam. Its rate of misidentifying genuine email as spam was just 1 in 6000 messages.
DNA doubles up as spam buster Posted by: DTB
at 9:01 AM |
Permalink