http://blogs.wsj.com/digits/2012/12/07/how-dataium-watches-you/

December 7, 2012

How Dataium Watches You

By Jeremy Singer-Vine

If you've shopped for a car online on Cars.com or other automotive websites recently, there's a good chance Dataium LLC was watching most of your mouse-clicks.

Dataium, which is the subject of an article in Saturday's Wall Street Journal, [1] says 10,000 automotive websites use its code.

The Wall Street Journal observed Dataium logging information about a visitors' nearly every action -- not just what pages were viewed, but also what parts of the page were clicked, which dropdown options were selected, and what information (such as name, email address, and phone number) were entered in dealer-contact forms.

The Journal's tests indicated that Dataium does not collect Social Security numbers and credit card numbers, even if users enter them on a dealer-contact form.

Click-tracking is common among analytics programs, says Jules Polonetsky, director of the Future of Privacy Forum. Many companies use such tracking to see, for example, how well new page-layouts perform. But some techniques may overstep visitors' expectations of privacy, Polonetsky says. "The question is, are you reaching further than I can reasonably imagine?"

The Journal also saw on several occasions that Dataium software used a controversial technique to attempt to determine whether a visitor had been to nearly 100 other sites, including edmunds.com, bmw.com, usatoday.com, google.com, and linkedin.com.

Known as "CSS history sniffing", this technique exploits a security vulnerability in older Web browsers, such as Internet Explorer 8. Modern browsers have plugged this privacy hole. [2] Dataium CEO Eric Brown told the Journal it has used the technique intermittently for testing.

On December 5, the Federal Trade Commission announced [3] it had settled with Epic Marketplace, Inc., an advertising network that had been using history sniffing to target ads. "This type of unscrupulous behavior undermines consumers' confidence, and we won't tolerate it," FTC Chairman Jon Leibowitz said in the announcement.

Dataium obscures its techniques under layers of ciphers and other obfuscation methods.

Here's how Dataium's code works: Websites load a computer file, written in the JavaScript language, called vcu.js from Dataium's servers.

This file sets tracking cookies, small text files that are associated with a person's Web browser and can follow a person from website to website. The vcu.js file also loads Dataium's main tracking code, JavascriptInsert.js.

The purpose of the JavascriptInsert.js computer code is buried behind at least four layers of obfuscation.

The bulk of the JavascriptInsert.js file is a string of 53,000 characters that, at first glance, looks like gibberish, with bits such as "ff.ot;zs-=;}=zm-tetAzt=oaj;Aj+=o;}h.z=buzsi+;e;hifffyCkee;od(baX'&.otXse+=)f." The meaning of this huge chain of characters doesn't become clear until a separate bit of code transforms the gibberish into computer code.

This code unscrambles yet another set of gibberish and turns that into more computer code, which in turn creates a code (specifically, a set of rules know as a function) named "ste."

The "ste" computer code unscrambles the 53,000-character original string of gibberish, swaps around its characters, and replaces certain symbols with others. The result is the JavaScript code that runs Dataium's actual tracking program.

Companies that want to hide their code's mechanisms from competitors often use such obfuscation techniques, says Nicholas C. Zakas, an independent web technologist and consultant who has written several books on JavaScript. But obfuscation can only deter, rather than prevent, outsiders from understanding a given piece of code.

Dataium also uses a homegrown -- and reversible -- cipher to scramble the information it collects about visitors to Web pages containing its code. (Dataium opts not to send the data using the industry standard encryption known as "HTTPS" by default.) The Journal decoded the cipher, which works roughly like this:

Let's say we're scrambling the string "THEQUICKBROWNFOXJUMPSOVERTHELAZYDOG".

First, we'll break up the data into two-character chunks:

TH EQ UI CK BR OW NF OX JU MP SO VE RT HE LA ZY DO G

Then, we'll swap the characters in each pair:

HT QE IU KC RB WO FN XO UJ PM OS EV TR EH AL YZ OD G

And recombine into a single string:

HTQEIUKCRBWOFNXOUJPMOSEVTREHALYZODG

Next, we'll split what we have so far into alternating eight-character and five-character chunks:

HTQEIUKC RBWOF NXOUJPMO SEVTR EHALYZODG

... swap neighboring chunks:

RBWOF HTQEIUKC SEVTR NXOUJPMO EHALYZODG

And finally recombine them:

RBWOFHTQEIUKCSEVTRNXOUJPMOEHALYZODG

Dataium also scrambles the history-sniffing data with a cipher known as ROT13. This cipher replaces each letter in the alphabet with the letter 13 places later in the alphabet. It is not considered to be "secure."

Ashkan Soltani contributed to this article.

[1] http://online.wsj.com/article/SB10001424127887324784404578143144132736214.html

[2] http://blog.mozilla.org/security/2010/03/31/plugging-the-css-history-leak/

[3] http://www.ftc.gov/opa/2012/12/epic.shtm