<?xml version="1.0" encoding="utf-8"?>
<feed version="0.3" xmlns="http://purl.org/atom/ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xml:lang="en">
<title>Where on Earth is Piers?</title>
<link rel="alternate" type="text/html" href="http://www.piersharding.com/blog/" />
<modified>2012-04-28T07:25:33Z</modified>
<tagline></tagline>
<id>tag:www.piersharding.com,2012:/blog//1</id>
<generator url="http://www.movabletype.org/" version="4.24-en">Movable Type</generator>
<copyright>Copyright (c) 2012, PiersHarding</copyright>

<entry>
<title>R and Hadoop</title>
<link rel="alternate" type="text/html" href="http://www.piersharding.com/blog/archives/2012/04/r_and_hadoop.html" />
<modified>2012-04-28T07:25:33Z</modified>
<issued>2012-04-28T07:04:09Z</issued>
<id>tag:www.piersharding.com,2012:/blog//1.94</id>
<created>2012-04-28T07:04:09Z</created>
<summary type="text/plain"> R is my hackers language of choice for analysis work. It really appeals to my sense of iteratively refining a solution. To my delight, I stumbled across this set of libraries for calling out to Hadoop Mapreduce, HDFS, and...</summary>
<author>
<name>PiersHarding</name>
<url>http://www.piersharding.com</url>
<email>piers@ompka.net</email>
</author>
<dc:subject>Hadoop</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://www.piersharding.com/blog/">
<![CDATA[<p>
<a href="http://www.r-project.org/">R</a> is my hackers language of choice for analysis work.  It really appeals to my sense of iteratively refining a solution.  To my delight, I stumbled across this set of libraries for calling out to Hadoop Mapreduce, HDFS, and HBASE directly from R - <a href="https://github.com/RevolutionAnalytics/RHadoop">RHadoop</a> .
<br/>
It was surprisingly easy to get going - especially with some patient help from <a href="https://github.com/piccolbo">Antonio</a> - the project owner.  RHadoop relies on the same fixes that <a href="https://github.com/klbostee/dumbo/wiki">Dumbo</a> requires, but the game changer here is that from <a href="http://hadoop.apache.org/common/releases.html#3+Apr%2C+2012%3A+Release+1.0.2+available">Hadoop 1.0.2</a>, all the key patches that both require are now part of core.<br/>
The thing that tripped me up was a custom .Rprofile file I was using to load, and print things at the startup for R.  This was causing R to write things to stdout which is what Hadoop streaming is using to pass data between tasks.  This corrupted the data transfer, which was killing RHadoop with a weird Java Heap error.  Anyway, once sorted out, everything runs smoothly, and I like the intuitive way things are handled in an R'esque manner. eg. take the example from the <a href="https://github.com/RevolutionAnalytics/RHadoop/wiki/Tutorial">tutorial</a> :
<pre>
> library(rmr)
Loading required package: RJSONIO
Loading required package: itertools
Loading required package: iterators
Loading required package: digest
> small.ints = to.dfs(1:10)
Warning: $HADOOP_HOME is deprecated.

12/04/28 19:17:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/04/28 19:17:45 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
12/04/28 19:17:45 INFO compress.CodecPool: Got brand-new compressor
> out = mapreduce(input = small.ints, map = function(k,v) keyval(v, v^2))
Warning: $HADOOP_HOME is deprecated.

packageJobJar: [/tmp/RtmpXlELmY/rmr-local-env, /tmp/RtmpXlELmY/rmr-global-env, 
                              /tmp/RtmpXlELmY/rhstr.map2cf71bf8a3a9, 
                              /home/piers/hadoop/tmp/hadoop-unjar1509588906818235502/] []
                              /tmp/streamjob912555254031649512.jar tmpDir=null
12/04/28 19:18:04 INFO mapred.FileInputFormat: Total input paths to process : 1
12/04/28 19:18:05 INFO streaming.StreamJob: getLocalDirs(): [/home/piers/hadoop/tmp/mapred/local]
12/04/28 19:18:05 INFO streaming.StreamJob: Running job: job_201204281916_0001
12/04/28 19:18:05 INFO streaming.StreamJob: To kill this job, run:
12/04/28 19:18:05 INFO streaming.StreamJob: /usr/libexec/../bin/hadoop job 
                             -Dmapred.job.tracker=192.168.1.3:9001 -kill job_201204281916_0001
12/04/28 19:18:15 INFO streaming.StreamJob: Tracking URL: http://192.168.1.3:50030/jobdetails.jsp?jobid=job_201204281916_0001
12/04/28 19:18:16 INFO streaming.StreamJob:  map 0%  reduce 0%
12/04/28 19:18:45 INFO streaming.StreamJob:  map 100%  reduce 0%
12/04/28 19:18:54 INFO streaming.StreamJob:  map 100%  reduce 17%
12/04/28 19:18:57 INFO streaming.StreamJob:  map 100%  reduce 67%
12/04/28 19:19:06 INFO streaming.StreamJob:  map 100%  reduce 100%
12/04/28 19:19:21 INFO streaming.StreamJob: Job complete: job_201204281916_0001
12/04/28 19:19:21 INFO streaming.StreamJob: Output: /tmp/RtmpXlELmY/file2cf7546b881b
> from.dfs('/tmp/RtmpXlELmY/file2cf7546b881b')
Warning: $HADOOP_HOME is deprecated.

Warning: $HADOOP_HOME is deprecated.

Warning: $HADOOP_HOME is deprecated.

[[1]]
[[1]]$key
[1] 1

[[1]]$val
[1] 1

attr(,"rmr.keyval")
[1] TRUE

[[2]]
[[2]]$key
[1] 2

[[2]]$val
[1] 4
...
</pre>
</p>
]]>

</content>
</entry>

<entry>
<title>Moodle bulk management of users, courses, and course categories</title>
<link rel="alternate" type="text/html" href="http://www.piersharding.com/blog/archives/2012/04/moodle_bulk_man.html" />
<modified>2012-04-28T07:24:04Z</modified>
<issued>2012-04-14T07:41:36Z</issued>
<id>tag:www.piersharding.com,2012:/blog//1.93</id>
<created>2012-04-14T07:41:36Z</created>
<summary type="text/plain"> One of the (many) new features of Moodle 2.2 is the ability to create administration Tools plugins. This enables us developers to create and package (hopefully) useful tools that make the management of Moodle easier. One of the things...</summary>
<author>
<name>PiersHarding</name>
<url>http://www.piersharding.com</url>
<email>piers@ompka.net</email>
</author>
<dc:subject>moodle</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://www.piersharding.com/blog/">
<![CDATA[<p>
One of the (many) new features of <a href="http://moodle.org/">Moodle 2.2</a> is the ability to create administration Tools <a href="http://docs.moodle.org/dev/Admin_tools">plugins</a>.  This enables us developers to create and package (hopefully) useful tools that make the management of Moodle easier.  One of the things that I've seen wished for is the ability to bulk upload courses and related material, and over recent months, this is something that I've been working on.
</p>
<p>The key things that people want (from an administration point of view) are to manage people and courses.  Often these activities are a tiresome bulk process at set times of the year with a relatively minor tweaking type of activity in between.  For managing users - create/update/delete, and enrolments - we already have the built in functionality to do <a href="http://docs.moodle.org/22/en/Upload_users">bulk user upload</a>.  I have added to this for <a href="https://gitorious.org/moodle-tool_uploadcourse">Courses</a> and for <a href="https://gitorious.org/moodle-tool_uploadcoursecategory">Course categories</a>.<br/>
The course upload admin tool can be used to create and manage course outlines, but it can also populate courses using either a nominated course as a template (copies the course contents using the Moodle backup/restore facility), or populate the course from a Moodle backup file.
</p>]]>

</content>
</entry>

<entry>
<title>Hadoop and Dumbo</title>
<link rel="alternate" type="text/html" href="http://www.piersharding.com/blog/archives/2012/04/hadoop_and_dumb.html" />
<modified>2012-04-28T07:24:11Z</modified>
<issued>2012-04-13T04:20:43Z</issued>
<id>tag:www.piersharding.com,2012:/blog//1.92</id>
<created>2012-04-13T04:20:43Z</created>
<summary type="text/plain">Dumbo is a Python framework for writing Map Reduce flows with or without Hadoop. It&apos;s been a pain up until now, trying to get it going as it has relied on a number of patches to Hadoop for different byte...</summary>
<author>
<name>PiersHarding</name>
<url>http://www.piersharding.com</url>
<email>piers@ompka.net</email>
</author>
<dc:subject>Data</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://www.piersharding.com/blog/">
<![CDATA[<a href="https://github.com/klbostee/dumbo">Dumbo</a> is a <a href="http://www.python.org/">Python</a> framework for writing Map Reduce flows with or without <a href="http://hadoop.apache.org/">Hadoop</a>.  It's been a pain up until now, trying to get it going as it has relied on a number of patches to Hadoop for different byte streams, type codes etc. to make it work.  No longer - as the necessary patches ave now made it into core as of <a href="http://hadoop.apache.org/common/docs/r1.0.2/releasenotes.html">1.0.2</a>.
<br/>
On Ubuntu 12.04 all I needed was the debian package from <a href="http://hadoop.apache.org/common/releases.html#Download">here</a>, (<a href="http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/">install</a> as per these instructions) and then run sudo easy_install dumbo .
<br/>
The only catch is that Dumbo does not currently recognise the Debian package layout used by the Hadoop package maintainers, so I found that I had to make a one line patch to compensate for it:
<pre>
diff --git a/dumbo/util.py b/dumbo/util.py
index a57166d..cd35df3 100644
--- a/dumbo/util.py
+++ b/dumbo/util.py
@@ -267,6 +267,7 @@ def findjar(hadoop, name):
     hadoop home directory and component base name (e.g 'streaming')"""
 
     jardir_candidates = filter(os.path.exists, [
+        os.path.join(hadoop, 'share', 'hadoop', 'contrib', name),
         os.path.join(hadoop, 'mapred', 'build', 'contrib', name),
         os.path.join(hadoop, 'build', 'contrib', name),
         os.path.join(hadoop, 'mapred', 'contrib', name, 'lib'),
</pre>
<br/>
And then run the quick tutorial example from <a href="https://github.com/klbostee/dumbo/wiki/Short-tutorial">here</a> like so:
<pre>
hadoop fs -copyFromLocal /var/log/apache2/access.log /user/hduser/access.log
hadoop fs -ls /user/hduser/
dumbo start ipcount.py -hadoop /usr -input /user/hduser/access.log -output ipcounts
dumbo cat ipcounts/part* -hadoop /usr | sort -k2,2nr | head -n 5
</pre>]]>

</content>
</entry>

<entry>
<title>Email, Gource, Hadoop, and Python</title>
<link rel="alternate" type="text/html" href="http://www.piersharding.com/blog/archives/2012/04/email_gource_ha.html" />
<modified>2012-04-28T07:24:15Z</modified>
<issued>2012-04-12T07:27:43Z</issued>
<id>tag:www.piersharding.com,2012:/blog//1.91</id>
<created>2012-04-12T07:27:43Z</created>
<summary type="text/plain">I never knew that one of the guys (Andrew C) who works at Catalyst wrote a fantastic times series visualisation tool called Gource . It&apos;s incredible what people have done with it - just look on Youtube. The focus of...</summary>
<author>
<name>PiersHarding</name>
<url>http://www.piersharding.com</url>
<email>piers@ompka.net</email>
</author>
<dc:subject>Data</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://www.piersharding.com/blog/">
<![CDATA[<p>I never knew that one of the guys (Andrew C) who works at <a href="http://www.catalyst.net.nz/">Catalyst</a> wrote a fantastic times series visualisation tool called <a href="http://code.google.com/p/gource/">Gource</a> .  It's incredible what people have done with it - just look on <a href="http://www.youtube.com/results?search_query=gource">Youtube</a>.
The focus of use seems to have been on analysis of source code repository activity, but I think there is more mileage to be had from Gource than this.  I wrote a simple Map/Reduce map chain for Hadoop (not really necessary for my volume of data) that stripped out the from/to/date information from all my mbox history since 1996.  It really is simple - all you need is to a generate a file in the customformat - eg.:
</p>
<pre>
0970518767|"DJ Adams" <DJ_Adams@rank.com>|M|Andrew_Powis/RVSUK/FES/Rank@rank.com
...
</pre>

and then pump this through Gource:<pre>
gource --start-position 0.28 --stop-position 0.29 --title 'Communication sphere since 1996' -s 1 --log-format custom email-log.txt
</pre>
<p>
You can record it as a video too:
</p>
<pre>
gource --start-position 0.28 --stop-position 0.29 --title 'Communication sphere since 1996' -s 1 \
    --log-format custom email-log.txt  -1280x720 -o - | ffmpeg -y -r 60 \
    -f image2pipe -vcodec ppm -i - -vcodec libx264 -preset ultrafast -crf 1 -threads 0 -bf 0 gource-video-of-email.mp4
</pre>
And this is what it looks like:<br/>
<iframe width="560" height="315" src="http://www.youtube.com/embed/i3nag9vSdjo?rel=1&modestbranding=1" frameborder="0" allowfullscreen></iframe>]]>

</content>
</entry>

<entry>
<title>CSV files need SQL</title>
<link rel="alternate" type="text/html" href="http://www.piersharding.com/blog/archives/2012/03/csv_files_need.html" />
<modified>2012-04-28T07:24:22Z</modified>
<issued>2012-03-30T17:46:27Z</issued>
<id>tag:www.piersharding.com,2012:/blog//1.90</id>
<created>2012-03-30T17:46:27Z</created>
<summary type="text/plain">As part of learning about R it soon has become apparent that the basic unit of currency is a CSV file - there are lots of other ways of getting data in and out of the R environment (JSON with...</summary>
<author>
<name>PiersHarding</name>
<url>http://www.piersharding.com</url>
<email>piers@ompka.net</email>
</author>
<dc:subject>python</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://www.piersharding.com/blog/">
<![CDATA[<p>As part of learning about <a href="http://www.r-project.org/">R</a> it soon has become apparent that the basic unit of currency is a CSV file - there are lots of other ways of getting data in and out of the R environment (JSON with library(RJSONIO), DB intefaces with library(RPostgreSQL) ...) but for the majority of work (which consists of hackery and experimentation which is why R is so attractive) CSV is the transportation mechanism.</p>
<p>
I have found that, in particular at the beginning, it is often harder to think of basic data munging concepts in R - typical tasks like sorting, grouping, data type conversion - often your language of choice (Perl, Python, or even bash) is just quicker for doing these things in the first instance when I'm paring the data down into what I want to apply some form of statistical analysis or charting too.</p>
<p>
With this in mind - I basically wanted to be able to perform SQL against a CSV file, without the hassle of loading it into a database first.
Enter a clever tool written in Haskell called <a href="http://keithsheppard.name/txt-sushi/tssql.html">txt-sushi</a>.  This enables you to do interesting things like:
</p>
<pre>
cat test.csv | tssql -table x - 'select a,b, sum(hours) AS hours_sum from x group by a,b'
</pre>
<p>
However, for my purposes tssql is too strict on handling data types, and is dependent on Haskell, so I've built my own simple CSV SQL processor - <a href="https://github.com/piersharding/csvtable">csvtable</a> in Python using <a href="www.sqlite.org">SQLite</a> as a backend.  This is surprisingly easy to do, and let's you have the benefit of the convenience and power of SQLite syntax:
</p>
<pre>
python csvtable.py \
  --where="system_code != 'LEAVE'" \
  --convert='date_epoch:date,hours:int' \
  --list="*, sum(hours) AS hours_sum, min(date_epoch) AS date_epoch_min, 
               max(date_epoch) AS date_epoch_max, count(*) AS days,
               ROUND(AVG(hours), 2) AS avg_time, MIN(hours) AS hours_min,
               MAX(hours) AS hours_max" \
  --group='organisation_code, system_code, request_id' \
  --file=test1.csv | \
 python csvtable.py --list='*, ROUND(((date_epoch_max - date_epoch_min) / (60 * 60 * 24)) + 1, 2) AS duration' > test2.csv
</pre>

]]>

</content>
</entry>

<entry>
<title>Hadoop and single file to mapper processing flow</title>
<link rel="alternate" type="text/html" href="http://www.piersharding.com/blog/archives/2012/03/hadoop_and_sing.html" />
<modified>2012-04-28T07:24:27Z</modified>
<issued>2012-03-26T17:13:50Z</issued>
<id>tag:www.piersharding.com,2012:/blog//1.89</id>
<created>2012-03-26T17:13:50Z</created>
<summary type="text/plain">It seems like a trivial thing to want to do, but it appears that the standard Hadoop workflow is to treat all input files as line oriented transactions, which does not help at all when I want to process on...</summary>
<author>
<name>PiersHarding</name>
<url>http://www.piersharding.com</url>
<email>piers@ompka.net</email>
</author>
<dc:subject>Hadoop</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://www.piersharding.com/blog/">
<![CDATA[<p>It seems like a trivial thing to want to do, but it appears that the standard Hadoop workflow is to treat all input files as line oriented transactions, which does not help at all when I want to process on a file by file basis.  The example I was working through is where I have 20 years worth of mbox email files.  Each file needs to be broken into individual emails, the contents parsed, and useful information in the headers stripped out into a convenient format for subsequent processing.  To do this in the context of Hadoop is slightly odd.  It appears that the usual approach is to create an input file of mbox file names (loaded into HDFS), and then each mapper execution uses the HDFS API to pull the file and process it.</p>

<p>This presented another problem - in Python, how do you access the HDFS API?  There are two existing integrations that  I can find - https://github.com/traviscrawford/python-hdfs, and http://code.google.com/p/libpyhdfs/.  <a href="https://github.com/traviscrawford">Travis Crawfords'</a> is easy to get going, but as it uses a JNI binding I didn't relish the prospect of trying to make sure CLASSPATHs etc are right across all my Hadoop nodes (which for my purposes are any machine that I can beg, borrow or steal), in light of this I created my own cheap and cheerful library  that uses subprocess to call the 'hadoop' executable for 'fs' - <a href="https://github.com/piersharding/hdfsio">hdfsio</a> .<br />
I admit this isn't the height of efficiency (or possibly elegance), but it is surprisingly robust and very simple.<br />
</p>]]>

</content>
</entry>

<entry>
<title>Journey into Hadoop</title>
<link rel="alternate" type="text/html" href="http://www.piersharding.com/blog/archives/2012/03/journey_into_ha.html" />
<modified>2012-04-28T07:24:32Z</modified>
<issued>2012-03-25T20:23:14Z</issued>
<id>tag:www.piersharding.com,2012:/blog//1.88</id>
<created>2012-03-25T20:23:14Z</created>
<summary type="text/plain">I&apos;ve been building up my background knowledge on current toolsets used in Data Science, and part of this is R and another is Hadoop. Hadoop is a big thing, and takes (to my mind) quite a lot of effort to...</summary>
<author>
<name>PiersHarding</name>
<url>http://www.piersharding.com</url>
<email>piers@ompka.net</email>
</author>
<dc:subject>python</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://www.piersharding.com/blog/">
<![CDATA[<p>I've been building up my background knowledge on current toolsets used in Data Science, and part of this is <a href="http://www.r-project.org/">R</a> and another is <a href="http://hadoop.apache.org/">Hadoop</a>.</p>

<p>Hadoop is a big thing, and takes (to my mind) quite a lot of effort to get going, and to understand how you can bend it to your will.  Par of this learning process has been about finding a comfortable installation pattern for Linux - in particular Ubuntu, and the best help I've found so far has been from <a href="http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/">Michael Noll</a>.  Things that I had to be careful about were getting ssh working, and name resolution exactly right on all nodes that you put in your cluster, as you distribute things like /etc/hadoop/masters and the *-site.xml config files.</p>

<p>The next stage was to find a development pattern that enabled me to avoid Java.  The answer to this for me is <a href="http://wiki.apache.org/hadoop/HadoopStreaming">Hadoop Streaming</a>.  This basically allows you to pipe IO in and out of programs written in your favourite language - and in this case Michael does brilliantly again with <a href="http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/">Python and MapReduce</a>.<br />
</p>]]>

</content>
</entry>

<entry>
<title>Web Services for Mahara</title>
<link rel="alternate" type="text/html" href="http://www.piersharding.com/blog/archives/2011/06/web_services_fo.html" />
<modified>2012-03-25T20:21:24Z</modified>
<issued>2011-06-24T20:12:37Z</issued>
<id>tag:www.piersharding.com,2011:/blog//1.87</id>
<created>2011-06-24T20:12:37Z</created>
<summary type="text/plain"> As part of some work for the Ministry of Education for LMS -&gt; myPortfolio (Mahara) integration, it became apparent that we needed a Web Services stack. This is not particularly interesting in it&apos;s self, but it is something that...</summary>
<author>
<name>PiersHarding</name>
<url>http://www.piersharding.com</url>
<email>piers@ompka.net</email>
</author>
<dc:subject>mahara</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://www.piersharding.com/blog/">
<![CDATA[<p><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><img alt="mahara.png" src="http://www.piersharding.com/blog/mahara.png" width="160" height="160" class="mt-image-none" style="" /></span></p>

<p>As part of some work for the <a href='http://www.minedu.govt.nz'>Ministry of Education</a> for LMS -> <a href="http://myportfolio.school.nz">myPortfolio</a>  (<a href="http://www.mahara.org">Mahara</a>) integration, it became apparent that we needed a Web Services stack.  This is not particularly interesting in it's self, but it is something that an interconnected service needs, in order to participate in a Socially Networked world. <br />
Building a WS framework is not a difficult thing, but it is relatively time consuming (anything that takes more than a few weeks is considered expensive here), so the problem was, how to develop an unexciting feature that in itself does not deliver any great new user experience quickly and cheaply.  At this point it occurred to me that there might be a solution in what <a href="http://www.moodle.org">Moodle</a>  has achieved with it's <a href="http://docs.moodle.org/dev/Web_services">Web Services Framework</a> - after all, Mahara is (in a previous life) based on Moodle.</p>

<p>It turned out, that the way that Peta Skoda has developed the Moodle WSF is fundamentally based on Zend data services,  and is quite portable.  </p>

<p>To this end, I have ported it as an <a href="https://wiki.mahara.org/index.php/Plugins"> auth plugin</a> which can be <a href="https://gitorious.org/mahara-contrib/auth-webservice">downloaded here</a>, and the documentation is <a href="https://wiki.mahara.org/index.php/Plugins/Artefact/WebServices">here</a>.</p>

<p>This gives the basic features of token, and user based auth, with SOAP, XML-RPC and JSON emitting REST based services.  There are a number of other things that I'd like to add to this, the most important being OAuth based authentication, and JSON based import parameter consumption.</p>

<p>Edit: JSON based import parameter consumption, has been done, but I want to add replacing MNet to the list of things to do.<br />
</p>]]>

</content>
</entry>

<entry>
<title>Moodle, OAuth, and Google Fusion</title>
<link rel="alternate" type="text/html" href="http://www.piersharding.com/blog/archives/2010/09/moodle_oauth_an.html" />
<modified>2011-03-23T22:01:56Z</modified>
<issued>2010-09-05T18:06:30Z</issued>
<id>tag:www.piersharding.com,2010:/blog//1.86</id>
<created>2010-09-05T18:06:30Z</created>
<summary type="text/plain">Convergence is a strange and reoccurring theme, and it&apos;s happened again from me over the last few months with BI reporting, Moodle, OAuth, and Google. I&apos;ve looked at a few BI (well SAP, Business Objects, and Pentaho) implementations over the...</summary>
<author>
<name>PiersHarding</name>
<url>http://www.piersharding.com</url>
<email>piers@ompka.net</email>
</author>
<dc:subject>moodle</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://www.piersharding.com/blog/">
<![CDATA[<p>Convergence is a strange and reoccurring theme, and it's happened again from me over the last few months with BI reporting, <a href="http://www.moodle.org">Moodle</a>, <a href="http://oauth.net">OAuth</a>, and Google.</p>

<p>I've looked at a few BI (well <a href="http://www.sdn.sap.com/irj/sdn/edw">SAP</a>, <a href="http://en.wikipedia.org/wiki/Crystal_Reports">Business Objects</a>, and <a href="http://www.pentaho.com/">Pentaho</a>) implementations over the years, and one of the things that I have always found frustrating/off putting is what I consider the huge startup costs for such implementations.  This has usually been characterised by expensive infrastructure implementations in both hardware and software coupled with the difficulty that most businesses have in visualising what data they need to have access to, and how it should be most effectively presented.</p>

<p>I've found this dilemma more accute in the <a href="http://www.moodle.org">Moodle</a> world, as the so many of the customers involved are on a very tight to non-existent budget, yet their requirement to analyse Learning Managment System performance data is still there.</p>

<p>A year ago, I concluded that Pentaho was my first choice, for the twin reasons that it's OpenSource (specifically no license fees), and that it has sufficiently good data modelling tools to enable a suite of reports customised to Moodle to be delivered.  While this reduces the cost of delivering a flexible reporting solution for Moodle, it still falls short on a couple of points:</p>

<p>(1) Most people who implement Moodle are not Data Warehousing, or Modelling experts so they are unlikely to be able to sufficiently accurately determine what their requirements are in advance (actually a common business problem, not unique to the Moodle community).<br />
(2) Pentaho, while reasonably straight forward to install, is still another complex piece of software to host - a major barrier to entry for most Moodle implementations.</p>

<p>What I started looking for then, was a set of visualisation tools that could be integrated with Moodles PHP environment - atleast users would then be able to do more complex reporting and analysis.  What I found exceeded my expectations, in the form of a Labs project from Google called <a href="http://tables.googlelabs.com/Home">Fusion Tables</a>.</p>

<p>Fusion Tables is shaping up to be Business Intelligence reporting with the twist of collaborative, and Geo encoding capabilities.  The basic mode is that CSV files of data can be uploaded into a flexible storage engine, datasets can be joined and merged, automatically Geo encoded, and then consumed through a good set of graphical presentation tools.  <a href="http://tables.googlelabs.com/DataSource?dsrcid=197026">Datasets</a> can be shared and collaboratively edited.<br />
<script src="http://www.gmodules.com/ig/ifr?url=http://www.google.com/ig/modules/bar-chart.xml&up__table_query_url=http://tables.googlelabs.com/gvizdata?tq=select+col0%252Ccol5+from+191509++skip+0+limit+228&up__table_query_refresh_interval=0&w=600&h=400&border=%23ffffff%7C3px%2C1px+solid+%23999999&synd=open&output=js"></script></p>

<p><br />
As Luck would have it that this service is firstly free, and secondly exposed via an SQL-like <a href="http://code.google.com/apis/fusiontables/">API</a> integrated with the standard Google OAuth mechanism.  This makes it attractive as a generic data analysis and reporting tool for a low cost operating environment like Moodle and the education sector.</p>

<p>To test out the theory of all this, I've implemented 3 things:<br />
 * OAuth integration for Moodle including a site, and secret registry<br />
 * A generic Fusion Tables data proxy for Moodle<br />
 * A Gradebook export module that enables the export of the standard gradebook data to Fusion Tables</p>

<p>For the curious, this can be found at Gitorious -<a href="http://gitorious.org/moodle-local_oauth/moodle-local_oauth">moodle-local_oauth</a>.</p>]]>

</content>
</entry>

<entry>
<title>New release for sapnwrfc PHP and Python</title>
<link rel="alternate" type="text/html" href="http://www.piersharding.com/blog/archives/2009/08/new_release_for.html" />
<modified>2009-08-27T21:27:18Z</modified>
<issued>2009-08-26T18:43:52Z</issued>
<id>tag:www.piersharding.com,2009:/blog//1.85</id>
<created>2009-08-26T18:43:52Z</created>
<summary type="text/plain">Been a busy month, working on the NW SAP RFC connectors. With build help from Menelaos, I now have a working Python build system for Windows on the Python NW RFC Connector as of version 0.07 - this is available...</summary>
<author>
<name>PiersHarding</name>
<url>http://www.piersharding.com</url>
<email>piers@ompka.net</email>
</author>
<dc:subject>general</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://www.piersharding.com/blog/">
<![CDATA[<p>Been a busy month, working on the NW SAP RFC connectors.  With build help from Menelaos, I now have a working Python build system for Windows on the Python NW RFC Connector as of version 0.07 - this is available <a href="http://www.piersharding.com/download/python/sapnwrfc/">here</a>.</p>

<p>Also, with help from Joachim, I've added a static function sapnwrfc_removefunction(&lt;sysid&gt;, &lt;function name&gt;) to the PHP connector that allows the removing of function definitions from the local cache.  this is most useful when developing RFC applications in PHP, as you can modify your RFC definition without having to restart the web server everytime.  This is available from version 0.09 <a href="http://www.piersharding.com/download/php/sapnwrfc/">here</a>.</p>]]>

</content>
</entry>

<entry>
<title>Auth SAML 2.0 for Mahara</title>
<link rel="alternate" type="text/html" href="http://www.piersharding.com/blog/archives/2009/08/auth_saml_20_fo.html" />
<modified>2009-08-14T20:31:34Z</modified>
<issued>2009-08-14T20:25:49Z</issued>
<id>tag:www.piersharding.com,2009:/blog//1.84</id>
<created>2009-08-14T20:25:49Z</created>
<summary type="text/plain">Following on from the SAML 2.0 work that I&apos;ve done recently for Moodle, I thought it was useful to do the same for the Mahara ePortfolio service, while I was in the same space. Details of the first release can...</summary>
<author>
<name>PiersHarding</name>
<url>http://www.piersharding.com</url>
<email>piers@ompka.net</email>
</author>
<dc:subject>catalyst</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://www.piersharding.com/blog/">
<![CDATA[<p>Following on from the SAML 2.0 work that I've done recently for Moodle, I thought it was useful to do the same for the <a href="http://www.mahara.org">Mahara</a> ePortfolio service, while I was in the same space.  Details of the first release can be found <a href="http://wiki.mahara.org/Plugins/Auth/Saml">here</a>, with tested version for both trunk, and 1.1_STABLE.</p>]]>

</content>
</entry>

<entry>
<title>Moodle and SAML 2.0 Web SSO</title>
<link rel="alternate" type="text/html" href="http://www.piersharding.com/blog/archives/2009/08/moodle_and_saml.html" />
<modified>2009-08-02T20:03:17Z</modified>
<issued>2009-08-02T19:41:44Z</issued>
<id>tag:www.piersharding.com,2009:/blog//1.83</id>
<created>2009-08-02T19:41:44Z</created>
<summary type="text/plain">Of late I have been doing a lot of SSO integration work for the NZ Ministry of Education, and during this time I came across an excellent project FEIDE. One of the off shoots of this has been the development...</summary>
<author>
<name>PiersHarding</name>
<url>http://www.piersharding.com</url>
<email>piers@ompka.net</email>
</author>
<dc:subject>moodle</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://www.piersharding.com/blog/">
<![CDATA[<p>Of late I have been doing a lot of SSO integration work for the NZ Ministry of Education, and during this time I came across an excellent project  <a href="http://rnd.feide.no/">FEIDE</a>.  One of the off shoots of this has been the development of a high quality PHP library for SAML 2.0 Web SSO -  <a href="http://rnd.feide.no/simplesamlphp">SimpleSAMLPHP</a>.</p>

<p>For Moodle integration, Erlend Strømsvik of Ny Media AS, developed an authentication plugin, which I've made a number of changes to around configuration options, and  <a href="http://www.moodle.org">Moodle</a> session integration.  This has now been documented and added to Moodle Contrib to give it better visibility to the Moodle community at large.  Documentation is <a href="http://docs.moodle.org/en/AUTHSAML_authentication_plugin">here</a> and the contrib entry is <a href="http://moodle.org/mod/data/view.php?d=13&rid=2574">here</a>.</p>]]>

</content>
</entry>

<entry>
<title>Perl sapnwrfc 0.30</title>
<link rel="alternate" type="text/html" href="http://www.piersharding.com/blog/archives/2009/06/perl_sapnwrfc_0.html" />
<modified>2009-07-24T03:44:20Z</modified>
<issued>2009-06-27T18:20:41Z</issued>
<id>tag:www.piersharding.com,2009:/blog//1.82</id>
<created>2009-06-27T18:20:41Z</created>
<summary type="text/plain">I doing some work for a client recently, I got the opportunity to do some major performance work on sapnwrfc for Perl. The net result is that a number of memory leaks, mainly of Perl values not going out of...</summary>
<author>
<name>PiersHarding</name>
<url>http://www.piersharding.com</url>
<email>piers@ompka.net</email>
</author>
<dc:subject>general</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://www.piersharding.com/blog/">
<![CDATA[<p>I doing some work for a client recently, I got the opportunity to do some major performance work on <a href="http://search.cpan.org/dist/sapnwrfc/">sapnwrfc</a> for Perl.  The net result is that a number of memory leaks, mainly of Perl values not going out of scope properly, have been fixed.</p>

<p>Additionally, I've had some time to put together a proper cookbook style set of examples in the <a href="http://search.cpan.org/dist/sapnwrfc/sapnwrfc-cookbook.pod">sapnwrfc-cookbook</a>.  These examples, while specifically for Perl, are almost identical for sapnwrfc for <a href="http://cheeseshop.python.org/pypi/sapnwrfc/">Python</a>, <a href="http://raa.ruby-lang.org/project/sapnwrfc">Ruby</a>, and <a href="http://www.piersharding.com/download/php/sapnwrfc/">PHP</a> too.</p>]]>

</content>
</entry>

<entry>
<title>Dynamic Weather Map</title>
<link rel="alternate" type="text/html" href="http://www.piersharding.com/blog/archives/2009/04/dynamic_weather.html" />
<modified>2009-04-26T05:38:45Z</modified>
<issued>2009-04-26T05:29:14Z</issued>
<id>tag:www.piersharding.com,2009:/blog//1.81</id>
<created>2009-04-26T05:29:14Z</created>
<summary type="text/plain">I had once seen on a colleague of mines Mac, a weather widget of New Zealand, that gave him an animated view of the weather situation as seen by the satellite passing over. I had not been able to find...</summary>
<author>
<name>PiersHarding</name>
<url>http://www.piersharding.com</url>
<email>piers@ompka.net</email>
</author>
<dc:subject>general</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://www.piersharding.com/blog/">
<![CDATA[<p>I had once seen on a colleague of mines Mac, a weather widget of New Zealand, that gave him an animated view of the weather situation as seen by the satellite passing over.  I had not been able to find this animation on the <a href="http://www.metservice.co.nz">Met Service </a>  website, and was puzzling over where the data was coming from.  Then I realised it was staring me in the face in the <a href="http://www.metservice.co.nz/public/maps/tasman-sea-nz-infrared-series.html">Infrared series</a>. So - in a bit of retro shell script coding, using GET, perl, convert, and gifsicle, I built my own <a href="http://www.piersharding.com/download/weather_anim.gif"><img border="0" src="http://www.piersharding.com/download/weather_anim.gif"/></a>.</p>]]>

</content>
</entry>

<entry>
<title>OpenERP and Pentaho</title>
<link rel="alternate" type="text/html" href="http://www.piersharding.com/blog/archives/2009/03/openerp_and_pen.html" />
<modified>2009-03-23T00:48:41Z</modified>
<issued>2009-03-23T00:36:15Z</issued>
<id>tag:www.piersharding.com,2009:/blog//1.80</id>
<created>2009-03-23T00:36:15Z</created>
<summary type="text/plain">As part of some ongoing investigation work of the potential use of OpenERP, I have had a look into connecting OpenERP with Pentaho. At other times, I have implemented a limited form of Pentaho BI reporting for previous employers, but...</summary>
<author>
<name>PiersHarding</name>
<url>http://www.piersharding.com</url>
<email>piers@ompka.net</email>
</author>
<dc:subject>general</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://www.piersharding.com/blog/">
<![CDATA[<p>As part of some ongoing investigation work of the potential use of <a href="http://www.openerp.com">OpenERP</a>, I have had a look into connecting OpenERP with <a href="http://www.pentaho.org">Pentaho</a>.  At other times, I have implemented a limited form of Pentaho BI reporting for previous employers, but had mostly confined my activities to using the Metadata reporting object designer, which gives users the ability to create their own simple reports which they can generate as html, pdf, or spreadsheet.  This time, I wanted to get to grips with the far more powerful forms of interactive reporting, which meant <a href="http://mondrian.pentaho.org/">Mondrian</a>.</p>

<p>It's not easy getting it up and running (need to setup datasources, build Modrian schemas, design MDX queries, and then implement the xaction),  but once you are there, there is so much potential.  The ability to tune the queries interactively using the Pivot reporting engine, and the drill down features are excellent.</p>]]>

</content>
</entry>

</feed>
