Quick preview of the Data Model enabled Tableau Desktop

One of the cool new features being show-cased in a preview release of Tableau is the new Data Model capability.  It was announced at Tableau Conference (TC18) and a feature that I really like is that it greatly simplifies the data modelling process so you can get to analysis and insight faster.  The Data Model capability leverages  your existing star or snowflake schema to automatically build the relationships without having specialist database skills!  Great for non technical individuals.  Very cool!

Here’s a view straight out of SQL Server Management Studio from the Database Diagrams option for the “BikeStores” sample database.

Snag_e4b011.png

Here’s the new view in a pre-release version of a data model enabled Tableau Desktop connecting live to the BikeStores database in SQL Server.

Snag_ea09e1.png

Getting started with your analysis is now super-easy…

B00CC0D9-8E83-4850-A0BA-E27BC2AD95D3.GIF

Advertisements

Covering the Crypto craziness with Google Sheets and Tableau

Snag_3d1270f

If you’re like most people, you’ve probably dabbled a little in the crypto craziness and now need a nice neat way to see how much money you’re…losing…making

Most Crypto currency exchanges and trading sites have a way to access their data via an API.  Independent Reserve in Australia is one such Crypto-currency exchange platform that provides an API to access their  feeds and they provide both a publicly accessible API and a private one that will require obtaining an API key before it returns any results.

Here is an example of their “GetTradeHistorySummary” API:

Snag_21fe06.png

The data is returned in JSON format, for example:

Snag_3ab39f4.png

So the challenge is to get this into Tableau and have it automatically update and track the changes.  Their “…API is rate limited to 1 call per second…” which is way more than what we need 🙂   They already provide good info if you’re into hardcore trading and prepared to sit in front of your screen day in day out… However, for most of us, we don’t need that.  Every few hours or even updated a couple of times a day is fine for us mere mortals 🙂

Although there are many methods to get data into Tableau (e.g. a Web Data Connector would be cool! ), the method I’ll cover is to use Google Sheets to periodically pull the data in on a schedule.

  1. Open Google sheets , sign-in and start a new spreadsheet
  2. Go to Tools -> Script Editor
  3. Using this importJSON script  simply copy and paste over the current content.
  4. Rename “Code.gs” to something like “importJSON.gs”
  5. Save.

Back in the Spreadsheet, you can use the “=importJSON()” function in a cell with the API as a parameter.  For example:

=importJSON("https://api.independentreserve.com/Public/GetTradeHistorySummary?primaryCurrencyCode=xbt&secondaryCurrencyCode=aud&numberOfHoursInThePastToRetrieve=240")

The above will bring in the last 240 hours or 10 days (currently the maximum supported by Independent Reserve) worth of trade transactions for Bitcoin (XBT) in AUD format.

Snag_385340.png

Independent Reserve currently support three Crypto-currencies, Bitcoin, Bitcoin Cash and Ethereum and the codes are XBT, BCH and ETH respectivley.  These codes are referenced in the API call:

Snag_38c247f.png

Note the entry for “XBT” to indicate Bitcoin.  You will need to copy and modify this entry for each of the sheets.

Create another 2x sheets (3 in total) pulling in the last 10 days (240 hours) worth of transactions for the other currencies.  I named the sheet-tabs as XBT Trades, ETH Trades and BCH Trades.

Snag_34e2b2.png

=importJSON("https://api.independentreserve.com/Public/GetTradeHistorySummary?primaryCurrencyCode=xbt&secondaryCurrencyCode=aud&numberOfHoursInThePastToRetrieve=240")
=importJSON("https://api.independentreserve.com/Public/GetTradeHistorySummary?primaryCurrencyCode=bch&secondaryCurrencyCode=aud&numberOfHoursInThePastToRetrieve=240")
=importJSON("https://api.independentreserve.com/Public/GetTradeHistorySummary?primaryCurrencyCode=eth&secondaryCurrencyCode=aud&numberOfHoursInThePastToRetrieve=240")

Once done, you should now have 3x sheet-tabs with data being pulled from Independent Reserve into your Google Sheets.

This is all good and well, but doesn’t automatically update…

Automatically updating Google Sheets data

The previous example, gets the JSON data into a nice neat format in Google Sheets…

Next thing to do is to automate obtaining the data via Independent Reserve’s API on a schedule.  There are probably a couple of different ways to do this, I’ve settled on using a Google Script and Google’s Triggers to automate this process.

In Google Sheets…

  1. Go to Tools -> Script Editor
  2. Copy and Paste the following:
function getDataIReth() {

// Set the active sheet
var ss = SpreadsheetApp.getActiveSpreadsheet();
var active = ss.getSheetByName("ETH Trades");
active.activate();

// Refresh the table
var queryString = Math.random();
var cellFunction = '=importJSON("https://api.independentreserve.com/Public/GetTradeHistorySummary?primaryCurrencyCode=eth&secondaryCurrencyCode=aud&numberOfHoursInThePastToRetrieve=240")';

SpreadsheetApp.getActiveSheet().getRange('A1').setValue(cellFunction);
}

Snag_15dde19.png

Save, give it a name and create a couple more or how ever many you require.  I’m using 3 sheets in total to cover the 3 currencies that Independent Reserve support.

In my setup I created 3 separate Google Scripts for the 3 different crypto currencies which are feeding into the 3 different sheet-tabs.  I just simply copy and pasted them into separate scripts, saved as a different name and adjusted the variables as shown above and below:

  1. function getDataIRxbt()
  2. function getDataIReth()
  3. function getDataIRbch()

..and the 3x sheet tabs (as per above) are referenced as:

  1. var active = ss.getSheetByName(“XBT Trades“);
  2. var active = ss.getSheetByName(“ETH Trades“);
  3. var active = ss.getSheetByName(“BCH Trades“);

Next, we need to set up the triggers to tell Google sheets how often to retrieve the data.

Snag_5b160d.png

  1. In the Script Editor view, click on the “Current Projects Triggers” button:  Snag_16469e1.png
  2. In the drop down, choose the function you created from above (e.g. getDataIReth)
  3. …and then choose the appropriate timers and frequencies that are approriate.
  4. For example, Time-Driven -> Hour Timer -> Every Hour

Snag_1672b44.png

Tableau-ize it

The initial instinct is to try to get the data shaped (pivoted, calculations etc.) in Google Sheets, however we’ll do what we need in Tableau.  Google Sheets is simply just to hold our data.

Fire up Tableau and use the Google Sheets Connector, a browser window will open asking you for your Google Account to allow Tableau to login and access your Google Sheets.  Select the Sheet, mine is creatively called “Testing”:

Snag_4908a7.png

Click Connect and choose New Union.

Snag_1727017.png

Drag all 3 worksheets into the dialog box and then click on OK.

If you get a series of F1, F2, F3 etc. then check the “Use Data Interpreter” to have Tableau automatically clean the data and assign the first row as column headers.  Pretty cool.

Snag_174d000.png

Tableau’s union makes it easy for us to combine the sheet tabs for easy analysis.  The other great feature is that Tableau creates additional meta-data such as the Sheet Name and Table Name.  So we now know which entries are for Bitcoin, Bitcoin Cash and Ethereum – and the main reason I decided to use 3 separate worksheets.

Snag_4e1263.png

Change “Sheet” to “Currency” and hide “Table Name”.

The data is hourly  as per the Independent Reserve API.

Snag_177d10b.png

Adjust the data types for the date fields as “Date & Time”.  It’s also a good idea to rename some of these fields as well.

Tableau will autodetect the date for you.  If however, it doesn’t pick it up you can use a Table Calculation such as the one I used below:

DATEPARSE("yyyy-MM-dd'T'hh:mm:ss'Z'", [Start Time Stamp])

Snag_394b7ee.png

The DATEPARSE function I’ve used matches the literal structure of the date format being returned by Independent Reserve’s API (below) so that Tableau can translate it.  Depending on your source, yours may vary…

Snag_397ec1f.png

We now have a correctly formatted date column titled “Start Date” that we can work with.

Snag_399378b.png

From here, you can start to build your analysis…

 

Since we’re connected Live to Google Sheets, the data should update regularly so that once you open up your workbook (or better yet, publish it to your Tableau Server or to Tableau Public and embed your credentials) you should get the latest updated results when people view your dashboard.

Final Dashboard shown below (hosted on Tableau Public).

Snag_3a01d69.png

Finally, a quick shout-out to a couple of great resources:

Enjoy…

Dynamic process resourcing with Tableau Server for Linux

One of the coolest features with the Linux version of Tableau Server is the ability to dynamically add and remove additional Tableau Processes to your running instance of Tableau Server.

The cool feature is that we can dynamically add these without the need to stop/start Tableau Server.  However only if the node is already a member of the cluster and already has one of these process types running.

Think of the flexibility you now have to dynamically adapt your Tableau Server infrastructure to accommodate your corporate wide analytics.

Architectually, Tableau Server runs multiple components (processses) which can be tweaked to adapt to your environment.  In a lot of cases the defaults work fine for most environments, and as a result they’re a great baseline to work off.

Some of the processes shown in the architecture diagram below can be increased/decreased and/or moved to additional nodes if you have a multi-node setup to optimize Tableau Server in your environment.

Snag_dcfc15

The section in the online help talks to this in more detail and what to look for, including the tuning examples baseline.

At a high level this a good guide for when to look at adding nodes and reconfiguring the processes on your Tableau Server (quoted below):

  • More than 100 concurrent users: If your deployment is user-intensive (>100 simultaneous viewers), it’s important to have enough VizQL processes—but not so many that they exceed your hardware’s capacity to handle them. Also, enabling the Tableau Server Guest User account can increase the number of potential simultaneous viewers beyond the user list you may think you have. The administrative view can help you gauge this. For more information, see Actions by Specific User.
  • Heavy use of extracts: Extracts can consume a lot of memory and CPU resources. There’s no one measurement that qualifies a site as extract-intensive. Having just a few, extremely large extracts could put your site in this category, as would having very many small extracts. Extract heavy sites benefit from isolating the Backgrounder process on its own machine.
  • Frequent extract refreshes: Refreshing an extract is a CPU-intensive task. Sites where extracts are frequently refreshed (for example, several times a day) are often helped by more emphasis on the background process, which handles refresh tasks. Use the Background Tasks for Extracts administrative view to see your current refresh rate.
  • Downtime potential: If your server system is considered mission critical and requires a high level of availability, you can configure it so there’s redundancy for the server processes that handle extracts, the repository, and the gateway.

So lets get to tweaking!

Taking stock of our default installation and process/node configuration, the command below gives us a break down of the nodes and the processes running on each.

tsm topology list-nodes -v

 

Snag_1405bb.png

The above is a single-node 64-Bit Tableau Server 10.5 running in CentOS7.  You’ll note the additional processes which match the default recommendations from performance tuning examples on the online help.

… and how it looks from Status -> Server Status -> Process Status in the web-interface.

Snag_157e90.png

Lets say after some testing and evaluation we need to optimise our environment by increasing the VizQL server process and the number of backgrounders to assist with Extract refreshes.

It’s always good to check and see if there are any pending-commands yet to be applied to the Server.

tsm pending-changes list

…gives us a list of any changes currently in the queue to be applied to the server.

Snag_1ba286.png

The example above shows I am increasing the number of backgrounder processes from 2 to 4 and the number of vizqlserver processes from 2 to 4 also.  All of this is to be applied on ‘node1’ i.e. the first (and only) node in this example.

Entering the commands below will adjust the number of processes running on node1:

tsm topology set-process -n node1 -pr backgrounder -c 4
tsm topology set-process -n node1 -pr vizqlserver -c 4

A quick view of the pending changes:

tsm pending-changes list

Snag_1f3d40.png

The above shows what’s currently in the queue to be applied to the server.

To apply the changes to the server, initiate the command:

tsm pending-changes apply

Tableau Services Manager (tsm) will then apply the configuration to the running Tableau Server.

Snag_207246.png

A quick view in the web-based interface shows the addition of the new processes.  The red ‘X’ indicates that Tableau has increased the process counts and is in the process of starting them up.

Snag_21152d.png

…after a few minutes the processes will go ‘green’ indicating that the additional processes are now running on the node supporting your workload.

Snag_22875b.png

enjoy…

The Magic of Maestro!

So you have this interesting concept, idea or burning question that needs to be answered.  You can get your hands on the data readily enough, and when you do it’s really a bit of a mess…  No fields really correlate, there’s some typo’s and it’s not really in a format to ‘mash’  together with anything else logically… Maybe you can engage your IT teams to help you ETL it into the Data Warehouse… maybe your own SQL sills aren’t what they used to be.  Or it might not be valuable enough to warrant your specialist teams time…or it might be, you just won’t know unless you dig a little deeper and discover those nuggets of information…  So how do I get my data sets workable for a quick analysis?

Enter Project Maestro.  Maestro is designed to help you with your Data Prep in a user friendly, drag and drop, non ‘super-technical’ way for analysis within Tableau, so grab a copy of the Beta now!

2017-12-09_10-05-20

…and yes note that as of December 2017, it’s in Beta.  That means everything you see will  change, and since we’re at Beta 1 a lot will change between now and it’s release date.  As a result, be mindful of how you’re using it, plan on implementing it, supporting it etc…

Before I go on, you really need to check out the demo at the Tableau Conference 2017 by Anushka Anand to get an overview.   Anushka’s session starts at around ~54:00 minutes in.

Some key features of Maestro:

  • It’s visual so you can get immediate feedback on the results
  • It has the look and feel of Tableau so it’ll feel just like building a Tableau Viz.
  • It’s tightly integrated with Tableau so you can keep in the flow of analyzing your data.

So where were we…

A friend and I were musing over the crazy property prices in Melbourne and what it would take to afford to purchase something inner city.  He joked, “…you’d have to resort to a life of crime to afford one…” and  “…they must all be criminals living in the inner city...” 🙂  So naturally this led to:

  • Which inner city suburb has the highest crime?
  • Whats the most common crime or offence types in the more affluent inner city suburbs?

Where do we start?  Well, the data wasn’t all that hard to find.  Firstly, property prices that were sold within the last year or so is readily available from Kaggle.Com  and secondly, what better site to source crime data for Victoria than the Crime Statistics Agency of Victoria at crimestats.vic.gov.au .  So I got to downloading the respective .CSV’s…

As you can imagine, having data from two totally separate data sources means the data doesn’t really match,  but we have something, suburb names of inner-city Melbourne.  So we need to do a bit of data prep.

So after starting Maestro, the first data set I imported was the Melbourne Housing Data from Kaggle and then the Crimestats Data by opening and connecting to the CSV files:

2017-12-08_23-18-54

Maestro samples the data and gives you an instant snapshot of your data set.

I want to look at relatively recent data only as I don’t have a complete set for 2017.  So last year’s (2016) will do just fine.  We have Date fields in both data sets so this is easy to filter in Maestro – simply choose the “Melbourne_housing” data, right click the Date field and choose “Filter Field”.

2017-12-08_23-31-41

Maestro has a calculations field similar to Tableau’s to keep you in your flow, and in the example below I’m only looking for the Year 2016 in my house pricing data.  Build a boolean calculation to capture just the 2016 Year.

2017-12-08_23-28-38

A filter is now applied to only look at 2016 data:

Snag_4e3d0d.png

Choose “Add Step”

Meastro-2

Maestro then gives me a visual snapshot of my data with the results…in real time…

snag_51bb28.png

To compare the same years (i.e. 2016) we now need to adjust our Offences/Crime data set to reflect 2016.

Here, I’m going to Add a Step and then perform some filtering on that field.  Basically i’Il ask Maestro to change the format to Date and then Keep Only 2016 data.

So lets “Add Step”

36429C1B-0824-4481-AF89-102F337CAEA2.GIF

“Add Step”

 

..and then convert the numerical values from ‘2,016’ to a Date and then “Keep Only” that Year (2016).
2E260FAE-C08A-4ADB-B404-E0BE0B01030F.GIF

 

So now how do I correlate the two data sets?

The common field between the two data sets is the Suburb field.  So lets get Maestro to Join on the two fields.  You can simply drag the connector to the “New Join” function:

Maestro join

…however after the join you’ll note that nothing shows up as per below.

The reason is that Maestro has chosen the Record_ID field as the default Join clause.  This won’t work for our situation, so change that to Suburb.

2017-12-09_0-17-45

Lets change the “Applied Join Clause” to Suburb and perform a Left-Inner Join.  I want to bring in everything from the Melbourne Housing Data and only the suburbs that match from the Crime data:

D5FAB99B-3332-4CEB-81E0-DA2CC546D4FD

Note the Join Clause is now Suburb.

The summary of join results shows we have some data excluded.  It’s worth checking what they are before we continue – it might be nothing important, but it might be something we need…

2017-12-09_13-05-55

Maestro gives you real time feedback on the excluded items.  So by clicking on the Excluded entries, the display updates to show what’s been excluded in the join.

Note the Join Result is ~400,000 rows!  This is expected as the join is on Suburb and I have a data set that has 161,000 rows that needs to correlate with each other in a meaningful way (i.e. the join).  No real problem for Maestro and Tableau of course, but something you need to be mindful of as you analyse your data – especially if you’re working with very large data sets.  We’ll deal with this a little later.

A quick glance through the Suburb data shows that these are mainly country Victoria suburbs, and this is okay as I’m only interested in the inner city suburbs – so no real concern for us and certainly not a show-stopper.

By choosing the Unmatched Rows in the Included field (i.e. the 8,406 value), I can see that some fields didn’t match –  this is a good area to start looking for anomalies in the data, like below…

2017-12-09_13-10-09

This one looks pretty straight forward.  It’s safe to assume that ‘WEST FOOTSCRAY’ is recorded as ‘FOOTSCRAY WEST’ in the other data set… so lets change that.

CA45D36B-24CD-4683-8697-B111CF0B2B87

…and we have a Match!  Note how the item disappears as it is now not ‘unmatched’. Maestro allows us to edit the values directly!  Very cool…

The ‘IVANHOE EAST’ Suburb does not show up in the Offences data set, whereas IVANHOE does.  So for our scenario and the following analysis it’s safe to combine that with ‘IVANHOE’.  Simply edit the value and change ‘IVANHOE EAST’ to ‘IVANHOE’ – and we get a match.

So from here, we can do a quick check of the data and see how it looks.  Simply right click the Join and choose “Open Sample in Tableau”.  This will create an extract of the data at the current point in the workflow so you can take a quick look in Tableau – Very cool…

Snag_5dd155.png

Now as mentioned before, I have nearly half a million rows from a relativley small subset of data.  Since we have thousands of addresses as well as different combinations of offences, offence types etc, the join process needs to correlate all of these together.  I won’t go into too much detail here, but sometimes this can be useful and sometimes unavoidable especially when working with disparate and unrelated data sources.  However, with Tableau and also with Maestro we can manage this… Click “Add Step” and choose “Aggregate”:

snag_6902ef.png

Since for this analysis we don’t need to bring in every piece of data (i.e. address level detail), we can exclude some fields such as the Addresses.  To help us overcome the issue with all the rows.

So Lets just bring in:

  • Suburb, Price, Offence Count, Record-ID, Offence Division, Offence Group, Offence Subdivision and Postcode

Because we’re not bringing in every field (especially the ‘Address’ field) we have significantly less rows – we’re working with more aggregated data as opposed to the address level detail.

Snag_6f7f84.png

You can do a quick Sample check in Tableau from Maestro.  Click the ‘+’ sign and choose output.  Maestro gives you the option to output to a Tableau Data Extract or the new Hyper Extract format.  Here i will choose Hyper (note as of December 2017, you will need the Tableau 10.5 Beta version to open Hyper Extracts).

Snag_72ecb7.png

Once saved, open your Extract in Tableau and you can then begin analysing your data and get to answering those questions much more quickly.

Snag_74da1f.png

I have excluded Melbourne (postcode 3000) for the analysis as it skews the values (lots of offences here!) – I only wanted to look at the surrounding inner-city suburbs.  So what are some of the high level insights?

  • No surprise that the South Eastern Suburbs are the priciest of suburbs here in Victoria…
  • For 2016, property and deception related offences were the most prevalent.
  • The inner city suburbs of Preston, St. Kilda and Sunshine show the highest amount of Offences in 2016 (excluding central Melbourne),  with Preston a real stand out.
  • Inner city expensive suburbs (>$1.5m median) such as Kooyong, Eaglemont, Canterbury and Kew/Kew East recorded the lowest reported Offences.
  • In these suburbs, the most prevalent Offences in 2016 were:
    1. Stealing from a Motor Vehicle
    2. Residential non-aggravated burglary
    3. Breach Family Violence Order

So, in a nutshell the Inner South Eastern Suburbs are a great place to live if you can afford it of course.  Get a place with off-street parking or even better a secure garage and setup some good old security camera’s, home alarm systems and a big guard dog!

Snag_123651b.png

 

Whats the most popular Social Media app with American teens today?

…according to Piper Jaffray that’ll be Snapchat!

But obviously it wasn’t always that way.  Apparently the ‘Story’ feature in Snapchat and it’s closest rival Instagram is hugely popular, so much so that according to Business Insider   “…By April, more people were using Instagram Stories on a daily basis than Snapchat…”

Submitting my one for the #ANZMonthlyMakeover

Dashboard 2

Tableau Server 10.5 Beta on Linux: Adding an additional node.

Update: As of 15th of January 2018, Tableau 10.5 has been released I have updated links to the online help.

One of the great things around cloud deployments is the ability to stand infrastructure up in next to no time.  Literally a few clicks and your done…and then you can get to the more interesting stuff.

And this means we can play… 🙂

The Linux version of Tableau Server comes with the new Tableau Services Manager which has been announced in conjunction with the Linux release – expect to see a Windows variant coming soon.  TSM is planned to replace the well known versions (currently on Windows) of:

  • Tableau Server Configuration utility
  • tabadmin command line utility
  • Tableau Server Monitor

Since it’s separate to the core Tableau processes,  its operational even before the main Tableau Services are running allowing a much easier way to manage your server nodes in a Tableau context and is especially handy in Linux + Cloud environments.  Think of this when managing multiple environments, easier demand scaling, ability to script and looking after a mixed Linux/Windows setup 🙂

For this exercise, again we have not considered security, and you should refer to the Tableau online help guide if you need more information in regards to this.

Check your AWS Security Groups.  For this setupI have opened all ports – but for your specific scenario you will want to consider the ports.

2017-11-10_22-54-50.png

Scenario

In this scenario we have a Primary Tableau Server node running and due to demand we now require additional ‘grunt’ to service our expanding user requirements.

Scale - additional node-2017-11-11_15-30-20.gif

So the basic arrangement is an existing node running Tableau Services and then add an additional node, re-balance the Tableau Server processes across the nodes to handle the new demand.

2017-11-11_15-28-53-e1510375075713.png

Once we have the additional node, we want to balance the Tableau Services or resources running across the nodes to optimize for performance.

These resources as shown below are covered in detail here so I won’t go through them all.  Suffice to say we can run multiples of these across multiple nodes to optimize Tableau’s performance.

Current Linux Server processes - 2017-11-07_13-38-37.png

Jump back onto your Primary node and generate the bootstrap file.  This node configuration or bootstrap file contains the configuration details that the secondary node needs to configure itself as a member node.  Please note that the node-configuration file contains keystore information as outlined here and as a result you may want to look at securing this file.

In my case, I’ll just generate the node-configuration (or bootstrap) file as is:

tsm topology nodes get-bootstrap-file --file cluster_bootstrap.json

Copy the bootstrap config from the primary node.  I used WinSCP to login to both instances and then used the “Duplicate” function to copy the “cluster_bootstrap.json” file from the primary node to the secondary node.

WinSCP Duplicate.png

On the secondary node, Install Tableau Server from the installer (rpm) file.

sudo yum install tableau-server-10.5-beta5-1.x86_64.rpm -y

Run the configuration, note the ‘-b cluster_bootstrap.json’ option.

sudo /opt/tableau/tableau_server/packages/scripts.10500.17.1030.1652/initialize-tsm -b cluster_bootstrap.json -u centos --accepteula

Jump on the Primary Node and check to see if the new node appears.  You can also jump into Tableau Server’s web interface:

tsm topology list-nodes -v
New node -2017-11-10_23-09-26.png

Note the “node16” node which was added to the primary node (node1) to create the Tableau Server cluster

Now it’s time to distribute the resources we have to the available node.  For our scenario, lets optimize this for a heavy extract environment  as outlined in the Performance Tuning examples.  Please note that the configurations listed are examples and as mentioned, it’s best to use these as starting points for tuning your specific environment.

Optimized for user traffic - 2017-11-10_23-14-46.png

Make sure you’re on the Primary Node and then run the following:

tsm topology set-process -n node16 -pr clustercontroller -c 1
tsm topology set-process -n node16 -pr gateway -c 1
tsm topology set-process -n node16 -pr backgrounder -c 4
tsm topology set-process -n node16 -pr cacheserver -c 2
tsm topology set-process -n node1 -pr backgrounder -c 0
tsm topology set-process -n node1 -pr vizqlserver -c 2
tsm topology set-process -n node1 -pr cacheserver -c 2
tsm topology set-process -n node1 -pr dataserver -c 2
tsm pending-changes apply

Below is the newly configured cluster once Tableau finishes applying the configuration.

newly configured processes-2017-11-10_23-28-26.png

Enjoy…

Tableau Server for Linux 10.5 Beta in AWS EC2 (Part 1)

Update: As of 15th of January 2018, Tableau 10.5 has been released I have updated links to the online help.

First things first, go ahead and check out the Pre-release site to register, check out the release notes and download the latest Beta for the Linux version of Tableau Server.  I’ll be going ahead and setting this up in an AWS EC2 using a 64-bit CentOS Linux instance.  This is a machine that will primarily be used for testing and will spend most of it’s life powered down, so I have decided not implement things such as firewalls and use local authentication only.  Your situation may be different, so please adjust accordingly 🙂

Assumptions:

  • You already have an account in AWS
  • …and you have access to your AWS Access Keys
  • You are aware that this is a Beta and hence not recommended in Production environments.

One handy utility is curlWget  which, as per the author, “Builds a command line for ‘curl/wget’ tools to enable the download of data on a console only session.”.  Exactly what we need 🙂  and is really helpful for accessing those complex URL’s when downloading the RPM packages.   Another utility you will want is WinSCP.

For reference you will want to bookmark the site below, which is (currently as of  November 2017) still in Beta.:

http://onlinehelp.tableau.com/current/server-linux/en-us/server_linux.htm

Before I go on, it is assumed that you are aware of Tableau Server’s requirements for use in Development and Production requirements.  If not, take a look here:

2017-11-05_22-52-24

…and for Production spec instances – not that you should be running the Beta in production:

2017-11-05_22-52-08

As of today (Nov. 2017) the current supported Linux distrbutions are :

  • Red Hat Enterprise Linux (RHEL) 7, CentOS 7, and Oracle Linux 7.
  • Ubuntu 16.04 LTS only.  Version 17.04 is not supported.

For this installation I’ll use CentOS 7, and this one looks like a good one to use from the AWS Marketplace.

AWS Marketplace-2017-11-07_10-07-48.png

This won’t be a detailed walk through of setting up your AWS EC2 instance as there’s lots of resources around that, however here are a few highlights to ensure a smoother Tableau install later on:

Instance details – ensure you pick a supported instance as per above:

EC2 instance details (1)

Storage capacity – ensure you pick at least 50GB of storage during Step 4: “Add Storage”.  You can add storage afterwards of course using Linux’s disk and file management tools.

EC2 storage

Security Group configuration in EC2 – Ensure you open the ports shown below at a minimum.  These ports are outlined in the jump-start install for Tableau Server on Linux:

EC2 security groups-ports-2.png

..again as a caveat, security for this installation is not the primary concern for my specific use and you should do your own investigation as to what’s appropriate for your use case, i.e:

security group warnings in EC2

…and finally you should have access to that public/private key pair, right?

AWS EC2 keypair.png

Launch, wait a few minutes and then connect using your key pair via PuTTy or mRemoteNG and you should get a command prompt as ‘centos’ user:

CentOS prompt

Finally double check you have everything ready for Tableau Server installation, namely:

  • Operating system: CentOS 7 or Ubuntu
  • Opened ports: TCP 80, TCP 8850, UDP 2233
  • Tableau Server administrator account: admin (yours might be different)

Once all done, go to Part 2…