Wednesday, January 16, 2008

What does veronising mean?

Well, to get some idea of what veronising is, maybe you should check Jean Veronis's blog. My definition would be "to design and publish on a blog programs or methods able to help analyzing data". Jean has created a whole bunch of useful tools, which work mainly on texts (he is a researcher in natural language processing) or internet corpuses (search engines results for example). Among the most impressive, the Nébuloscope, which makes tag clouds out of words appearing frequently in the results of a search engine request, or the Chronologue, which used to draw the evolution of a keyword use on the internet (it used the "date" function of a search engin which has now disappeared).

Inspired by his impressive results, I've started to analyze data I find interesting myself, and program some little tools to help me do that. I may translate some of my previous posts, here are some topics I've worked on, I put the links to French posts until they are translated to English.

Phylogenetic trees are used to represent the evolution of species, based on the idea that some species close to each other will appear in a same subtree, and a lot of algorithms exist to build them from biology data. But phylogenetic trees are also an excellent mean of visualizing data, and I've tried building the trees of country votes at the Eurovision song contest, French "députés" (our congressmen) according to their proximity of votes (as well as a DNA chip visualization of those votes), and more recently I've been working on building what I call a "tree cloud" from a text, the same idea than a tag cloud except the order of the words is not alphabetical, but they are displayed as leaves of a tree. Until the program is finished, I still rely on tag clouds (with nice colors and a logarithmic scale, pleaaase, not those ugly and unexpressive ones we often find on the internet !). I've tried using them to analyze one's writing style (with instant messaging logs) or speaking style (with the planned version and the pronounced version of a press conference talk by President Sarkozy).
I like doing some search engine statistics, to help spelling, visualize and date the birth of the web, or send massive requests to compare popularity of people or concepts. Those stats analyzes often make critical use of spreadsheet programs, which also helped me to track the evolution of a petition, which gave me a glance on the time of the day people connect to the internet depending on their job (students, teachers, engineers...). I could also get nice synthesis pictures of French polls before the first round of the presidential election, in 2002 and 2007. I'm very interested in informative and original visualizations, like Voronoi diagrams (for McDonald's restaurants in Paris) or metro map views (building them from a genuine metro map is a GI-complete problem).

I have also analyzed a blog meme last year, the "Z-list", which in France appeared as "la F-list". Even if I did not publish my data on the "Z-list", I still have the files, as well as the "infection tree", on my computer somewhere. This year I've created a little utility, the "CaptuCourbe", to put data from the picture of a curve into a spreadsheet file (some "unscan" programs do this but they are quite complicated to use, or expensive), which helps comparing the evolution of a buzz on many buzz tracking systems (Google Trends, Technorati, site stats systems...). Currently the program is in French only, but Jean motivated me to translate it to English, which will soon be done.

And you will never guess the topic of my most visited blog post, which I'm not the most proud of: I had noticed a bug on some French TV channel website which gave access to the channel live on the internet. It lasted about 3 days, but since then Google sends me all people who want to watch "M6" on the web. I've put links to other French channels which can be viewed free anyway, to avoid frustration.

See you soon for some new computer-powered experimentations!

No comments: