Wednesday, February 9, 2011

how to Use your browser URL history to estimate gender

Thanks to Paul Cook for the initial link to this fascinating little javascript script Social History. Thes cript analyzes the css color of various links to determine whether or not the user has been to that site. If the link has the “visited” style, then he marks the user as having been to that site. Now the Social History implementation of this is rather innocuous — it’s a clever way of only displaying only the sharing buttons of sites that the user is an active participant of. Of course there are far more interesting applications for advertising.
One of the things that I always wanted to do but never got around to was to analyze a user’s browsing history to estimate age and gender. Of course the idea is definitely not new, in fact Xerox (of all companies??) has a patent on the whole process and I’m certain plenty of networks already do something of the sort… but what the heck, let’s have some fun!
So what I did is I modified the SocialHistory JS so that it polled the browser to find out which of the Quantcast top 10k sites were visited. I then apply the ratio of male to female users for each site and with some basic math determine a guestimate of your gender. The math is really quite simple, I just take:


1 / (1 + r_1 * r_2 * … * r_n)

where p_i is the ratio of men-to-women for the specific site. For example, if you had been to two sites that had a 2-1 ratio of men to women, the probability of you being female would be:

1 / (1 + 2 * 2) = 1/5 = 20%