« A Smaller Share of Larger World Markets | Main | Five Myths? »

Sunday, January 28, 2007

Along Came a Spider

Data mining for tax cheats:

Tax Takers Send in the Spiders, by Quinn Norton, Wired: Websites around the world are getting a new computerized visitor among the Googlebots and Yahoo web spiders: The taxman. A five-nation tax enforcement cartel has been quietly cracking down on suspected internet tax cheats, using a sophisticated web crawling program to monitor transactions on auction sites, and track operators of online shops...

The "Xenon" program ... was started in The Netherlands in 2004 by the Dutch equivalent of the IRS, Belastingdienst. It has since been expanded and enhanced by … Austria, Denmark, Britain and Canada, with the assistance of Amsterdam-based data mining firm Sentient Machine Research. ...

Xenon, explained Marten den Uyl of Sentient, is in some ways the opposite of something like Google's web crawler, which traverses a tree of links and grabs a copy of everything it sees. Xenon is smart about link selection and context, and uses a "slow search paradigm," he said.

Whereas a spider like the Googlebot might hit thousands of websites in a second, "With Xenon it may take minutes, hours or even days to do a slow search." The slow search prevents the crawler from creating excessive traffic on a website, or drawing attention in the sites' server logs. ...

The spider can also be configured and trained to look at particular economic niches -- a useful feature for compiling lists of business in industries that traditionally have high rates of non-filing. ...

Once the web pages are screen-scraped, Xenon's Identity Information Extraction Module interfaces with national databases containing information like street and city names. It uses that data to automatically identify mailing addresses and other identity information…, which it puts into a database that can be matched in bulk with national tax records.

As illuminating as Xenon is for the tax man, the data-mining effort poses dangers to citizen privacy, said Par Strom, a noted privacy advocate... "Of course it's not illegal," said Strom. "I don't feel quite comfortable having a tax office sending out those kind of spiders."

One issue has to do with how the information Xenon captures is protected. ... [T]he Swedish government ... is currently keeping a copy of everything it spiders. That means that someone's long-expired actions have the potential to come back and haunt them. "We can scan and store all actions for every e-marketplace in Sweden, it's about 55,000 per day," said Hardyson. ...

In the United States, the IRS is not a part of the Xenon project, but would neither confirm nor deny that it uses spidering software in its investigations.

Strom said now that the cat is out of the bag, there's no way to get governments or corporations to forgo technologies like spiders and data mining. "The information is public of course, because it's posted on the internet," Strom says. "It wasn't meant to be used this way ... (this is) using the naivete of people. It's on the limit of what is ethical."

    Posted by on Sunday, January 28, 2007 at 12:09 AM in Economics, Technology | Permalink  Comments (10)


    Feed You can follow this conversation by subscribing to the comment feed for this post.