« CSV files need SQL | Main | Hadoop and Dumbo »

April 12, 2012

Email, Gource, Hadoop, and Python

I never knew that one of the guys (Andrew C) who works at Catalyst wrote a fantastic times series visualisation tool called Gource . It's incredible what people have done with it - just look on Youtube. The focus of use seems to have been on analysis of source code repository activity, but I think there is more mileage to be had from Gource than this. I wrote a simple Map/Reduce map chain for Hadoop (not really necessary for my volume of data) that stripped out the from/to/date information from all my mbox history since 1996. It really is simple - all you need is to a generate a file in the customformat - eg.:

0970518767|"DJ Adams" |M|Andrew_Powis/RVSUK/FES/Rank@rank.com
...
and then pump this through Gource:
gource --start-position 0.28 --stop-position 0.29 --title 'Communication sphere since 1996' -s 1 --log-format custom email-log.txt

You can record it as a video too:

gource --start-position 0.28 --stop-position 0.29 --title 'Communication sphere since 1996' -s 1 \
    --log-format custom email-log.txt  -1280x720 -o - | ffmpeg -y -r 60 \
    -f image2pipe -vcodec ppm -i - -vcodec libx264 -preset ultrafast -crf 1 -threads 0 -bf 0 gource-video-of-email.mp4
And this is what it looks like:

Posted by PiersHarding at April 12, 2012 8:27 PM