Channel News and Analysis - Channel Insider
Empowering the next generation Channel
 

Sponsored Links
  • Get up and running in as quickly as 30 days with BI. Learn how today.
  • FREE Securing Smartphones & Tablets for Dummies Book from Sophos
  • 5 New Technologies That Will Change Enterprise ITAdvertisement
  • Build an IT Infrastructure That Delivers the Future

  •  

    Microsoft Research Automates Hunt for Search Engine Spam

    in Channel News and Analysis



    Article Rating:starstarstarstarstar / 0
    Article Views: 1744

    Researchers at Microsoft are working on an ambitious new project to hunt down and neutralize large-scale search engine and blog comment spammers.

    Rate This Article:
    Add This Article To:

    Researchers at Microsoft are working on an ambitious new project to hunt down and neutralize large-scale search engine spammers.

    The Redmond, Wash., software giant's Cybersecurity and Systems Management Research Group has taken the wraps off Strider Search Defender, an experimental project that automates the discovery of search spammers through non-content analysis.

    The project integrates technology from two previous Microsoft Research prototypes—Strider HoneyMonkey and Strider URL Tracer—and promises a new approach to removing junk results from search engine queries.

    "The Web is so badly spammed, you can find a spam site on just about every search query," said Yi-Min Wang, the researcher heading up the project at Microsoft, in an interview with eWEEK. "We think this approach can pinpoint the big spammers and use their own tactic against them."

    According to data from Automattic Kismet, a tool that helps bloggers thwart comment spammers, a whopping 93 percent of all blog comments are spam. With Strider Search Defender, Wang's team is taking a context-based approach that uses URL-redirection analysis to pinpoint spammers.

    "For the spammers to be successful, they have to post millions of fake comments on message boards and blogs. That's the only way to get picked up by search engines. If we can find a way to pinpoint them before they get indexed by search engines, the problem is solved," Wang said.

    "They want to be found by search engines, that's why they're spamming. Well, now we're finding you," he added.

    The problem is tied to the use of spam blogs, or splogs, to earn money from pay-per-click advertising programs offered by Google, Yahoo and MSN. Content on fake blogs often contain text stolen from legitimate Web sites and include an unusually high number of links to sites associated with the splog creator. The sole purpose is to boost the search engine rank of the affiliated sites and cash in on ad impressions from unsuspecting surfers.

    Read more here about the Strider TypoPatrol and URL Tracer projects.

    During the early stages of the Microsoft research, Wang discovered that successful large-scale spammers create a huge number of "doorway pages" on reputable domains to trick search engine users into clicking on a fake site. It is well-known that Google's BlogSpot, Yahoo's GeoCities and AOL's Hometown services are all used by spammers to create doorway pages.

    The doorway pages are then spammed to millions of forums, blog comments and archived newsgroups, pushing the page up the search engine results for certain target keywords. A user clicking on a doorway-page link in search listings gets redirected to a target page controlled by the spammer or, in some cases, Wang explained, the browser is instructed to either redirect to or fetch ads listing operated by the spammer.

    Next Page: "Monkey program" analyzes traffic.

    The Microsoft Research team is now proposing to treat each spam page as a dynamic program rather than a static page and use a "monkey program" to analyze the traffic resulting from visiting each page with an actual browser. "By identifying those domains that serve target pages for a large number of doorway pages, we can catch major spammers' domains together with all their doorway pages and doorway domains," Wang explained.

    Read more here about Microsoft's Strider HoneyMonkey project.

    Strider Search Defender starts with a seed list of confirmed spam URLs and uses a homegrown tool called Spam Hunter to run link queries on search engines. This is an automated process that pinpoints the forums and guest books on which the known spam URLs were posted. On these pages, additional spam links are scrapped to automatically generate a list of spam URLs.

    To filter out false positives, Microsoft feeds the list of potential spam URLs to the Strider URL Tracer, a tool released earlier this year by Microsoft to help trademark owners find typo-squatting domains of their Web sites.

    Using the URL Tracer, Wang's team can launch an actual browser to visit each URL and record all secondary URLs visited as a result. At the end of that automated scan, the researchers can figure out which target-page domains are associated with a large number of doorway-page URLs.

    In one scenario, Wang said the Spam Hunter collected more than 17,000 BlogSpot URLs and fed them into the URL Tracer. The group was able to identify the top 25 target-page domains that are behind the Google-hosted splogs. The top six are particularly active, Wang said, identifying them as s-e-arch.com, speedsearcher.net, abcsearcher.com, eash.info, paysefeed.net and veryfastsearch.com, which collectively were responsible for approximately 45 percent of the BlogSpot URLs.

    Wang said the Strider Search Defender project has already helped to remove junk results from MSN Search. "The more widely spammed a URL is, the easier it is for the Spam Hunter to find it. Once a spammed forum is identified, it becomes a 'HoneyForum' that can be used to capture new spam URLs in new comment postings," he said. "Ideally, since there is a delay between spamming and its effect on search engine results, our spam hunter should be able to identify new spam URLs and notify the search engine before the URLs enter top search results."

    Check out eWEEK.com's for the latest security news, reviews and analysis. And for insights on security coverage around the Web, take a look at eWEEK.com Security Center Editor Larry Seltzer's Weblog.




    comments dic


     
     
    >>> More Channel News and Analysis Articles          >>> More By Ryan Naraine
     


     



    channel chatter


    HTML PLAIN TEXT

    Keep on top of news for VARs and Resellers with CI's Weekly Newsletter and Alerts.


    [ci] feeds
    XML
    Add Channel News, Product Reviews, Trends and Analysis to your RSS newsreader or My Yahoo!


     


    CHANNEL SPONSORED RESOURCE CENTER
     
     
     
    Start the New Year with business intelligence—it’s a smart move
    Join us on February 1 for an encore rebroadcast at either 5 am or 12 noon EST and discover how business intelligence (BI) supports companies in uncertain business and economic climates. Get expert advice on how to create a strategy that fits your organization's needs and budget and see how quickly it can pay for itself.
    Click Here
     
    Security and Availability Essentials for Running Your Business in the Cloud
    Are you moving to the cloud? Find out what every IT professional should know about security and availability before moving to the cloud. Hear what a security provider’s own CSO has to say.
    Watch Video
    A new algorithm automatically identifies relationships between variables to help reduce researcher prejudice.
    Click HereAdvertisement