Tech Analysis - Channel Insider
Empowering the next generation Channel
 

Bull’s Eye Awards
Nominations Open for Channel Insider 2009 Bull’s Eye Awards
Nominations are now open for the Channel Insider 2009 Bull’s Eye Awards, which recognize excellence in customer service, technology prowess, business acumen, channel leadership, communications and community building, and innovation among vendors, solution providers, distributors and channel services companies.



Sponsored Links
  • Control VM Sprawl, What You Don’t Know Can Hurt You
  • FREE Sophos Encryption Tool: Encrypt, compress and share files easily
  • LSI 6Gb/s Portfolio Expands to Include SATA+SAS HBAs
  • Reduce the cost of managing your mobile workers.
  • Find out 7 Ways to Drive Data Center Efficiency
  • SonicWALL breaks through network and email gridlock
  • Save up to 40% on calling costs with Avaya Aura™



  •  

    Slamming Spam for Free

    in Tech Analysis


    Article Rating:starstarstarstarstar / 1
    Article Views: 2070

    Rate This Article:
    Add This Article To:
    Open-source anti-spam applications that use statistical analysis can pick out spam much more accurately than Bayesian filters, once they get enough training.

    Jonathan Zdziarski knows how open-source spam-filtering programs work. He wrote one.

    Even so, Zdziarski's creation DSPAM—and similar programs that are designed to get better at their job over time—sometimes gives its creator the willies. It usually happens when the application blocks spam messages that appear absolutely legitimate at first blush.

    "It gets really accurate after the first 12 months," said Zdziarski, referring to DSPAM's built-in ability to "learn" how to filter mail. "As you train it more, and as it learns more on its own, it gets to this level of accuracy that's really spooky."

    Could DSPAM and other spam-blocking programs using statistical language filtering, such as POPFile, Bogofilter and SpamProbe, be the long-awaited killer apps for a world weary of unsolicited Viagra ads or Nigerian money scams?

    Resource Library:
    Zdziarski, the author of a new book titled "Ending Spam," believes these new-generation programs have finally tipped the scales away from spammers and in favor of beleaguered e-mail users.

    "Spammers have been trying to evade statistical filters for a few years, to no avail," he writes in his book. "As spammers change their messages to evade the filters, the filters always seem to have an eerie way of detecting their new tricks—usually without the end user even noticing."

    They do that with statistical analysis not only of the content and subject lines of the e-mail, but by using a form of artificial intelligence to learn and adapt to their users' personal preferences.

    Most current commercial spam filters use Bayesian analysis to filter spam. These programs are usually based on a "heuristic" model, similar to most anti-virus applications, in which programmers compile lists of "detection rules" and blacklists used by the software to compare ingredients of incoming mail with telltale factors found in known spam.

    Based on a theorem developed by 18th century mathematician Thomas Bayes—and suggested in 2002 in a paper titled "A Plan for Spam" by author Paul Graham—Bayesian language classification filters break down e-mail into smaller components called "tokens," which are assigned numeric values based on their "spamminess."

    Savvy spammers are continuously concocting ingenious ways of tricking these outdated programs. For end users of heuristic spam blockers, the only way to keep their in-boxes relatively free of junk mail is to repeatedly update the software by downloading the latest detection rules and blacklists.

    This scenario benefits the companies selling heuristic filters (and that includes most of the popular, commercially available offerings) because they require users to purchase subscriptions in order to download the newest detection rules.

    "There are some applications out there that do a good job, but for the most part they are driven by this business model," complains Zdziarski. "Any [spam blocker] company out there that wants to stay alive needs monthly or annually recurring revenue."

    If a heuristic spam blocker is not injected with updates, it quickly loses effectiveness.

    The programs also use historical data sets, which serve as their memory of "characteristics that the filter has learned to be identifying characteristics of spam [and nonspam]," Zdziarski writes.

    The Bayesian filter's analysis engine is its reasoning tool, using input from the tokenizer and historical data set to determine whether messages are spam or ham (legitimate mail).

    Zdziarski said most commercial spam blockers are about 95 percent accurate. A well-tuned statistical filter can consistently be 10 times more accurate. "It's a huge difference," he said.





    Discuss Slamming Spam for Free
     
    >>> Be the FIRST to comment on this article!
     

     
     
    >>> More Tech Analysis Articles          >>> More By Channel Insider Staff
     


     


    [ci] feeds
    XML
    Add Channel News, Product Reviews, Trends and Analysis to your RSS newsreader or My Yahoo!


    HTML PLAIN TEXT

    Keep on top of news for VARs and Resellers with CI's Weekly Newsletter and Alerts.

     


    CHANNEL RESOURCE CENTER
     
     
    Enterprise Mobility Zone
    The Enterprise Mobility Zone (EMZ) blog is a tool designed to help senior IT executives discuss, create and deploy next-generation mobile strategies in their organizations.
    Go beyond yesterday's tactical approach to mobility!
     
    Build A More Efficient Data Center
    Demands are growing but budgets are not. Solve your pressing IT issues using the resources you already have. Determine which technologies can help you drive efficiencies and how they are applied. Gain a quick ROI on new initiatives
    Find out how
    Let Enterprise TechBrief do the work for you. Aggregated content, tech news, product reviews, vendor updates, how-to’s—all you need to boost your efficiencies and cut costs, all from one place.
    enterprisetechbrief.com