Thursday, January 17, 2008

Britney-Amy : Celebrity Deathmatch!

Discovered on French television last week, and provide an interesting challenge: guess when the two divas will die, the closest wins an iPod Touch, or a PS3. Huge buzz, of course, thousands of people went there to take a chance and leave a pre-condolences message. Both sites are of course optimized to make money with ads (contrary to this more confidential game, which is just as sweet though: the "TopMort", where you can pick up people you think will die within the year, and who have not been ), and only provide raw data entered by people who signed. No stats at all, what a shame.

I was very lucky that Matthieu Muffato, a friend who happens to be an impressive Python expert, used a few code lines and some execution hours to retrieve the data and mail it to me.

The initial question I had about it was simple: what is the biggest time interval not yet chosen, which would a priori maximize the chance to win? By "a priori", I mean considering any time interval of some fixed length is uniformly dangerous for Amy and Britney, and uniformly chosen by other visitors.

Unfortunately, those ideal conditions are far from being true in the real world, for a very simple reason: the visitor wants his iPod or PS3 right now, not in 30 years! So if you wish to target a month that has not yet been chosen, for Britney, you will have to wait for February 2023. For Amy, there has been less voters yet, so if nothing has changed since data was retrieved, november 2016 is still available, or you can try year 2031 as only october was chosen then. I must add, as Matthieu told me, that those websites contain no date after January 2038, probably because of some date coding problem. Now let's move on to more serious stuff, here is an overview of the number of votes per month (with a simple vertical normalization for Amy who received less votes, sorry for the title in French...) :
I guess you are as flabbergasted as I was when the curves appeared: they are almost identical! Correlation coefficient equals 0.98, we get the same power law! We can check that it is indeed a power law using a log-log dotplot, which also gives us approximately the equation Y=4-3X, in logarithmic coordinates, that is when we go back to linear: y = 10 000 - x^(4/3), which is the equation of the pale blue curve.

In fact power laws are everywhere in real data, (especially in small-world graphs which have a power law degree distribution). What is surprising here is that both laws have approximately the same parameters. If we check the details we can notice however that voters prefered 2008 for Britney and 2009 for Amy.

By checking the curves carefuly, one also notices some kind of periodicity. At least they are not monotone, and I've put on the left a representation of the percentage of votes per month, each year from 2008 to 2013, for Miss Winehouse. Variations are quite strange, as august attracts twice as much voters as november! I don't have any explanation for those smaller choices of Novembre, December and February, it may be a mechanism similar to what Knuth describes in one of the first exercises of Volume 2: ask a friend (or an enemy) a random digit, he will more probably say 7.

Here is the representation of the choices per day, for any year. I've removed January 1 which was artificially big (due to the year coding problem, which gave a lot of 01/01/1970).
We can observe a new surprising periodicity phenomenon: voters prefer the middle of the month. Note also the vicious voters who chose February 14, poor Britney! Even the dot of her birthday, December 2, is quite high compared to its neighbors...

So to forget about those sad things, let's end with emotion and poetry, here are the pre-condolences tag clouds (made with Freecorp TagCloud Builder) for both stars.

This post originally appeared in French: Britney-Amy, duel mortel.

Vote spreadhseet file by day, by month, contact me if you would like to get other source files.

No comments: