Setting up a quick Rserve in AWS

Here is a quick way to get an Rserve up and running in AWS, have it ready to connect via Tableau’s External Service Connection and have some base libraries to get going primarily for demo’ing and as a PoC.

Firstly, I’m not an R expert! The installation shown here is designed to get an environment up and running where you can quickly setup an Rserve environment so you can explore some Tableau workbooks with embedded R.  You may hit errors on execution and if so, it’s probably going to require additional libraries installed in your R instance.

This won’t be a comprehensive walk through of the AWS EC2 installation as there’s lots of resources out there.  However,  I’m using a 64-bit CentOS Linux instance (currently a t2.2xlarge ).  A couple of handy utilities you will also need if you don’t have them already are PuTTy (I also use mRemoteNG) and winSCP for copying any necessary files into and out of your Linux instance.  Also, one other thing to note about my setup is that I don’t have a firewall configured, and I’m just using the AWS security/port group configurations.  It’s a demo box that’s powered down most of the time 🙂 so it’s not really an issue for me.  Yours might be different…

Once you have your fresh Linux machine up and running you will want to update your instance and install the Development tools package as  per below.

Install and update packages. Then reboot..:

sudo yum install epel-release nano unzip wget curl -y
sudo yum groupinstall "Development Tools" -y
sudo yum install igraph-devel openssl-devel curl-devel libxml2-devel freetype-devel -y
sudo yum update -y
sudo shutdown -r now

Install R:

sudo yum install R -y

Check the rstudio site here and grab the latest version of the rstudio-server package:

wget https://download2.rstudio.org/rstudio-server-rhel-1.1.383-x86_64.rpm
sudo yum install --nogpgcheck rstudio-server-rhel-1.1.383-x86_64.rpm

Start and enable the R Services:

sudo systemctl status rstudio-server.service 
sudo systemctl enable rstudio-server.service

Run a quick verify…and fix any errors.

sudo rstudio-server verify-installation

The next steps configures external access port configuration and a user.

Create a new User that will manage the R services.  In the example below I have created a user called ‘ruser’.  By default R won’t allow external access from a privilleged user.

sudo adduser ruser
sudo passwd ruser
<mypassword>

Once done you need to configure R to allow external access for Rserve and the web based Rstudio.  Firstly, make sure you have enabled the port in your AWS security group configuration.  By Default, Rserve uses port 6311 and the Rstudio web interface is on 8787.

2017-11-05_1-06-46

2017-11-05_19-52-06

Create the Rserve configuration files.

The Ruser.txt file configures the users that can login and use R via Rserve and the Rserv.conf files that configures Rserve how to behave when it comes up.

sudo nano /etc/Ruser.txt

Create an entry that looks similar to this choosing whichever username and password you like.

sudo nan 2017-11-05_1-18-29

Next, create the /etc/Rserv.conf file as follows:

sudo nano /etc/Rserv.conf

…with the following entries:


remote enable
port 6311
plaintext enable
auth required
pwdfile /etc/Ruser.txt

2017-11-05_1-23-31

The entries should be fairly explanatory, but essentially enables remote access on port 6311 using plaintext to send the password , authorisation is required and the user details are stored in the Ruser.txt file in /etc/ directory.

If required, open up the firewall ports

sudo firewall-cmd --get-active-zones

…my one is “public” so…

sudo firewall-cmd --zone=public --add-port=8787/tcp --permanent
sudo firewall-cmd --zone=public --add-port=6311/tcp --permanent
sudo firewall-cmd --reload

Restart the server just for good measure.

sudo reboot

We’re now going to install the packages.

Your Rstudio service will be running on port 8787 by default.  So jump onto your AWS Console and grab the IP/External host name of your server to connect to.

Open up a web browser and log into your Rstudio environment with one of the users from Ruser.txt file you setup earlier.

Go to Rstudio via your web browser for example:

 http://your.external.ip.address:8787/

Rstudio login.png

Use the credentials for your ‘ruser‘ instance (or whichever user you created to manage your R services).

We now need to load a bunch of libraries.  Cut and paste the below into the Console.  This installs a whole bunch of packages and dependancies to get a baseline R installation going to support some Tableau demo workbooks.   Some of these may produce errors which you might then need to investigate and troubleshoot.   Also, you could use more or less depending on what you need to do.

install.packages(c("assertthat", "BH", "bindr", "bindrcpp", "bitops", "car", "chron", "colorspace", "curl", "cvTools", "data.table", "DEoptimR", "dichromat", "digest", "diptest", "dplyr", "dtt", "e1071", "english", "flexmix", "foreach", "forecast", "fpc", "fracdiff", "gdata", "gender", "geosphere", "GGally", "ggmap", "ggplot2", "glmnet", "glue", "gridExtra", "gtable", "gtools", "ttr", "igraph", "irlba", "iterators", "jpeg", "jsonlite", "kernlab", "labeling", "laeken", "lars", "lazyeval", "lexicon", "lme4", "lmtest", "magrittr", "mapproj", "maps", "MatrixModels", "mclust", "mime", "minqa", "modeltools", "moments", "munsell", "mvoutlier", "mvtnorm", "ngramrr", "nloptr", "NLP", "openNLP", "openNLPdata", "openssl", "pbkrtest", "pcaPP", "pkgconfig", "plogr", "plotrix", "pls", "plyr", "png", "prabclus", "prettyunits", "progress", "proto", "purrr", "qdap", "qdapDictionaries", "qdapRegex", "qdapTools", "quadprog", "quantmod", "quantreg", "R6", "randomForest", "RColorBrewer", "Rcpp", "RcppArmadillo", "RcppEigen", "RCurl", "reports", "reshape", "reshape", "RgoogleMaps", "rJava", "rjson", "rlang", "robCompositions", "robustbase", "rrcov", "Rserve", "scales", "SentimentAnalysis", "sentimentr", "sgeostat", "slam", "SnowballC", "sp", "SparseM", "spikeslab", "sROC", "stringdist", "stringi", "stringr", "syuzhet", "tau", "textclean", "textshape", "tibble", "tidyr", "tidyselect", "timeDate", "tm", "trimcluster", "tseries", "TTR", "vcd", "venneuler", "VIM", "viridisLite", "wordcloud", "xlsx", "xlsxjars", "XML", "xml2", "xtsz", "zoo"))

The above will take a while, so grab a coffee and chill 🙂 …

It should finish and give you a prompt similar to below:

Rstuio prompt.png

Load and run the R services:

library (Rserve)
Rserve(port=6311, args="--RS-conf /etc/Rserv.conf --no-save")

Open up Tableau and make sure you can connect to your Rserve instance by going to Help -> Settings and Performance -> Manage External Service Connection:

2017-11-05_1-53-39

As per above, your ‘Server’ will be your AWS server’s public IP address or public hostname available from your AWS Console

Your username and password will be the ones you entered in your Ruser.txt file.  Once entered hit “Test Connection” and you should get the below.  If not, double check your installation.

2017-11-05_2-03-53

You should now be able to get connected and work through your analysis with R and Tableau.  If you get errors, you may need to install additional libraries in R.

This slideshow requires JavaScript.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s