July 20, 2014

RStudio Server and SSO

Rstudio Server is a powerful analytics workbench that I have implemented for customers as a standalone service. Out of the box it provides fantastic tools for code management, data access and interactive visualisations, however, it is not possible in the open source release to integrate it with Web based authentication solutions. This is a problem for delivering application as SaaS, as most clients will come with SAML, Shibboleth, CAS or OAuth type integration requirements.
RStudio Server is a mixture of C++ and GWT, with the HTTP server side component being predominantly C++, and it turns out that it is not too hard to hack (in the old school meaning of the word). The simplest solution for my purposes, to add in external web based authentication is to add in support for identity being passed through by headers. it works like this:
  • RStudio Server sits behind a Proxy (in this case Apache2), which is a typical implementation pattern as the proxy can handle SSL termination or integration with other domain services.
  • The Proxy (Apache2) authenticates the user, and on success inserts a header - X-Remote-User - identifying them.
  • RStudio Server (or any other application) then uses this header to identify the user and log them in as appropriate.

Safety considerations
If the user is identified by a header then this can obviously be injected by the client so it is imperative that:
  • RStudio Server must be locked down to listen only to the Proxy service typically by listening locally on 127.0.0.1:8787.
  • The proxy must take care to strip out any attempt to spoof the authentication header - X-Remote-User.

Authentication
As the customer base that I'm interested in have a particular focus on SAML based authentication, I like to use SimpleSAMLphp. This has strong support for SAML1.3, 2.0, and Shibboleth, and as an added bonus can multiplex to a wide variety of other authentication sources such as Google Apps, Yahoo, OpenID, Fb, Twitter - to name a few.
With SimpleSAMLphp comes a component called authmemcookie. This enables SimpleSAMLphp to be setup as a Service Provider that is triggered on the HTTP 401 ErrorDocument state. In conjunction with this, I have written a mod_perl authentication handler (Apache::Auth:AuthMemCookie) that accesses the authmemcookie data, and passes the identity on in the X-Remote-User header for the protected application - namely RStudio Server.

Setting Up The Proxy - Apache2
Apache2 Configuration:
ProxyRequests Off
ErrorDocument 401 "/simplesaml/authmemcookie.php"
perlModule Apache::Auth::AuthMemCookie

    # Prompt for authentication:
    <Location /rstudio>
        AuthType Cookie
        AuthName "RStudio Server"
        Require valid-user
        PerlAuthenHandler Apache::Auth::AuthMemCookie::authen_handler
        PerlSetVar AuthMemCookie "AuthMemCookie"
        PerlSetVar AuthMemServers "127.0.0.1:11211, /var/sock/memcached"
        PerlSetVar AuthMemAttrsInHeaders 1
        PerlSetVar AuthMemDebug 1
    </Location>
ProxyPass /rstudio http://localhost:8787
ProxyPassReverse /rstudio http://localhost:8787

Building RStudio Server
For my purposes, I have forked RStudio on GitHub. It would be great if this change could be up-streamed though.
General build instructions are:
# get the code base
git clone git@github.com:piersharding/rstudio.git
cd rstudio
# install dependencies and build for Debian
./dependencies/linux/install-dependencies-debian
# installation target directory is the same as for the RStudio packages
cmake . -DRSTUDIO_TARGET=Server -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_FLAGS=-I/usr/share/R/include -DCMAKE_C_FLAGS=-I/usr/share/R/inc
lude -DCMAKE_INSTALL_PREFIX=/usr/lib/rstudio-server
make
sudo make install
# now configure /etc/rstudio/rserver.conf as you would normally

RStudio Server configuration
The key configuration elements required in /etc/rstudio/rserver.conf are:
# make sure that it only receives local requests from the Apache2 proxy
www-address=127.0.0.1
# enable checking of the X-Remote-User HTTP header
auth-sso-remote-user=1 
# provide a URL for redirection after RStudio Server logout 
# - this enables 3rd party signout triggering
auth-sso-signout-url=http://<my host>/simplesaml/module.php/core/authenticate.php?as=default-sp&logout 

Posted by PiersHarding at 9:13 AM

September 10, 2013

Integrating R with Pentaho

RPentaho - R integration for Pentaho based on community tools.

Recently, I've been involved in a project that has implemented Pentaho for an Analytics solution for Moodle . This is a large (and probably will be very large) Moodle implementation, so standard Moodle reporting is just not up to it.
One of the requirements was to be able to export student interaction and activity completion data. This can quickly become huge, and the standard Pentaho CSV exporting interfaces can't cope, but there is a good solution to this based on the WebDetails CDA and CDB work. What WebDetails have done, is provide an excellent authenticated JSON API for common Pentaho queries, whether they be Saiku Analytics or Saiku Adhoc queries. With this, a user can use the familiar tools to design a query, and then bookmark it.

To complete the loop, I've written a library for R that uses the JSON interface to access these stored queries, and import the data as a standard data.frame object - RPentaho.

Posted by PiersHarding at 9:02 AM

February 10, 2013

Hosting an R Repository for RSAP and RMonet

I've just setup an R repository to host my R extensions that I've published. This currently contains RSAP the SAP RFC connector, and RMonet the MonetDB connector using the Monet MAPI C API.

It's a very easy process as document here .

This repository can be generally accessed by doing the following:
setRepositories(addURLs = c(PiersHarding = "http://piersharding.com/R"))

Or for and individual package:
install.packages('RMonet', repos=c('http://piersharding.com/R'))

Posted by PiersHarding at 8:49 AM

January 31, 2013

Data Hackery - R, SAP, and OpenSource in-memory databases

I've just completed a post on SAP SCN regarding using In-Memory column oriented database MonetDB with SAP and R for exploratory data analysis titled "Data Hackery - R, SAP, and OpenSource in-memory databases" . This uses an R library that I've created as a database interface to MonetDB called RMonet.

Posted by PiersHarding at 5:38 PM

July 18, 2012

Google Drive repository plugin for Moodle

Just added a Google Drive repository plugin for Moodle to my moodle-google set of applications here: https://github.com/piersharding/moodle-google/tree/master/repository/googledrive.

Posted by PiersHarding at 2:31 PM