July 20, 2014
RStudio Server and SSORstudio Server is a powerful analytics workbench that I have implemented for customers as a standalone service. Out of the box it provides fantastic tools for code management, data access and interactive visualisations, however, it is not possible in the open source release to integrate it with Web based authentication solutions. This is a problem for delivering application as SaaS, as most clients will come with SAML, Shibboleth, CAS or OAuth type integration requirements.
RStudio Server is a mixture of C++ and GWT, with the HTTP server side component being predominantly C++, and it turns out that it is not too hard to hack (in the old school meaning of the word). The simplest solution for my purposes, to add in external web based authentication is to add in support for identity being passed through by headers. it works like this:
- RStudio Server sits behind a Proxy (in this case Apache2), which is a typical implementation pattern as the proxy can handle SSL termination or integration with other domain services.
- The Proxy (Apache2) authenticates the user, and on success inserts a header - X-Remote-User - identifying them.
- RStudio Server (or any other application) then uses this header to identify the user and log them in as appropriate.
If the user is identified by a header then this can obviously be injected by the client so it is imperative that:
- RStudio Server must be locked down to listen only to the Proxy service typically by listening locally on 127.0.0.1:8787.
- The proxy must take care to strip out any attempt to spoof the authentication header - X-Remote-User.
As the customer base that I'm interested in have a particular focus on SAML based authentication, I like to use SimpleSAMLphp. This has strong support for SAML1.3, 2.0, and Shibboleth, and as an added bonus can multiplex to a wide variety of other authentication sources such as Google Apps, Yahoo, OpenID, Fb, Twitter - to name a few.
With SimpleSAMLphp comes a component called authmemcookie. This enables SimpleSAMLphp to be setup as a Service Provider that is triggered on the HTTP 401 ErrorDocument state. In conjunction with this, I have written a mod_perl authentication handler (Apache::Auth:AuthMemCookie) that accesses the authmemcookie data, and passes the identity on in the X-Remote-User header for the protected application - namely RStudio Server.
Setting Up The Proxy - Apache2
ProxyRequests Off ErrorDocument 401 "/simplesaml/authmemcookie.php" perlModule Apache::Auth::AuthMemCookie # Prompt for authentication: <Location /rstudio> AuthType Cookie AuthName "RStudio Server" Require valid-user PerlAuthenHandler Apache::Auth::AuthMemCookie::authen_handler PerlSetVar AuthMemCookie "AuthMemCookie" PerlSetVar AuthMemServers "127.0.0.1:11211, /var/sock/memcached" PerlSetVar AuthMemAttrsInHeaders 1 PerlSetVar AuthMemDebug 1 </Location> ProxyPass /rstudio http://localhost:8787 ProxyPassReverse /rstudio http://localhost:8787
Building RStudio Server
For my purposes, I have forked RStudio on GitHub. It would be great if this change could be up-streamed though.
General build instructions are:
# get the code base git clone firstname.lastname@example.org:piersharding/rstudio.git cd rstudio # install dependencies and build for Debian ./dependencies/linux/install-dependencies-debian # installation target directory is the same as for the RStudio packages cmake . -DRSTUDIO_TARGET=Server -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_FLAGS=-I/usr/share/R/include -DCMAKE_C_FLAGS=-I/usr/share/R/inc lude -DCMAKE_INSTALL_PREFIX=/usr/lib/rstudio-server make sudo make install # now configure /etc/rstudio/rserver.conf as you would normally
RStudio Server configuration
The key configuration elements required in /etc/rstudio/rserver.conf are:
# make sure that it only receives local requests from the Apache2 proxy www-address=127.0.0.1 # enable checking of the X-Remote-User HTTP header auth-sso-remote-user=1 # provide a URL for redirection after RStudio Server logout # - this enables 3rd party signout triggering auth-sso-signout-url=http://<my host>/simplesaml/module.php/core/authenticate.php?as=default-sp&logout
Posted by PiersHarding at 9:13 AM
September 10, 2013
Integrating R with Pentaho
RPentaho - R integration for Pentaho based on community tools.
Recently, I've been involved in a project that has implemented Pentaho for an Analytics solution for Moodle . This is a large (and probably will be very large) Moodle implementation, so standard Moodle reporting is just not up to it.
One of the requirements was to be able to export student interaction and activity completion data. This can quickly become huge, and the standard Pentaho CSV exporting interfaces can't cope, but there is a good solution to this based on the WebDetails CDA and CDB work. What WebDetails have done, is provide an excellent authenticated JSON API for common Pentaho queries, whether they be Saiku Analytics or Saiku Adhoc queries. With this, a user can use the familiar tools to design a query, and then bookmark it.
Posted by PiersHarding at 9:02 AM
February 10, 2013
Hosting an R Repository for RSAP and RMonet
It's a very easy process as document here .
This repository can be generally accessed by doing the following:
setRepositories(addURLs = c(PiersHarding = "http://piersharding.com/R"))
Or for and individual package:
Posted by PiersHarding at 8:49 AM
January 31, 2013
Data Hackery - R, SAP, and OpenSource in-memory databasesI've just completed a post on SAP SCN regarding using In-Memory column oriented database MonetDB with SAP and R for exploratory data analysis titled "Data Hackery - R, SAP, and OpenSource in-memory databases" . This uses an R library that I've created as a database interface to MonetDB called RMonet.
Posted by PiersHarding at 5:38 PM
July 18, 2012
Google Drive repository plugin for Moodle
Just added a Google Drive repository plugin for Moodle to my moodle-google set of applications here: https://github.com/piersharding/moodle-google/tree/master/repository/googledrive.
Posted by PiersHarding at 2:31 PM