The Wall Street Journal reports on online data harvesting in ‘Scrapers’ Dig Deep for Data on the Web.
The crux of the story is something many of my generation already knew: anything you put online is public information. Companies are systematically trawling the web for data, and they are putting all the scattered bits and pieces back together. They can put your professional Linked-In account together with your personal Facebook account, along with your rants on youtube and the cell phone number on the resume you posted to Monster.com, and maybe even whatever your friend Crazy Betty tweeted about you after her bachelorette party. Yikes.
These companies “harvest online conversations and collect personal details from social-networking sites, résumé sites and online forums where people might discuss their lives.”
“Some companies collect personal information for detailed background reports on individuals, such as email addresses, cell numbers, photographs and posts on social-network sites.”
“Others offer what are known as listening services, which monitor in real time hundreds or thousands of news sources, blogs and websites to see what people are saying about specific products or topics.”
“Many scrapers and data brokers argue that if information is available online, it is fair game, no matter how personal.”
“New York-based PeekYou LLC has applied for a patent for a method that, among other things, matches people’s real names to the pseudonyms they use on blogs, Twitter and other social networks. PeekYou’s people-search website offers records of about 250 million people, primarily in the U.S. and Canada.”
Thanks to Flowing Data for the pointer. I can’t summarize any better than they did:
“My rule of thumb: if I put anything on the Web, I’m assuming it’s public.”