Display Control Plane API Operations using Amazon CloudWatch Logs Insights

Introduction

For small organisations that cannot afford to spend much on their network security, moving to the cloud enables them to easily uplift their security posture. The organisation can concentrate on innovating and scaling their workloads, while AWS provides them with a secure environment to use. More information regarding AWS security and compliance can be found at https://docs.aws.amazon.com/whitepapers/latest/aws-overview/security-and-compliance.html

The sentence above is not entirely accurate. The security of workloads in AWS is a shared responsibility between AWS and the customer. AWS is responsible for “Security of the cloud”. This means that AWS is responsible for protecting the infrastructure that runs the workloads (hardware, software, networking, physical facilities). The customer is responsible for “Security in the cloud”. This means that the customer, based on the service they select, must perform all the necessary security configuration and management tasks to keep the workload secure. A good place to learn more about this is at https://aws.amazon.com/compliance/shared-responsibility-model/

As a good security practice, one must always monitor all the activities that happen in their cloud environment, especially those that involve the management of resources. For instance, if one notices a lot of large Amazon EC2 instances being provisioned, this could possibly be an indication of a breach (or someone authorised is provisioning these without notifying others). The management operations performed on resources in your AWS account are referred to as Control Plane operations.

There are many commercial products that can help you with monitoring your AWS environment. However, you can quite adequately benefit from the tools that are natively provided by AWS as well.

In this blog, I will take you through the steps of using Amazon CloudWatch Logs Insights, to easily display the Control Plane operations, in a meaningful way.

Implementation

Let’s start:

  1. Open the AWS Console and then navigate to the AWS CloudTrail service page. Change to the appropriate AWS region.
  2. Create a new Trail to record just Management events. Ensure it is applied to all the regions and that it delivers events to Amazon CloudWatch Logs as well. Below is a screenshot of the required settings

  3. Open the Amazon CloudWatch service page and ensure it is in the same region as the AWS CloudTrail that was just configured (in step 2 above)
  4. Click on Dashboards and then click on Create dashboard. Give the dashboard a meaningful name (I called my dashboard ControlPlaneOperations-Dashboard)
  5. In the next screen, from the top menu, click on Add widget. Another screen will open. Select Query results and then click on Configure

  6. In the next screen, use the drop-down arrow (pointed by the red arrow below) to select the CloudWatch Log group that was configured for the new CloudTrail in step 1 above.

    In the formula section (denoted by the red rectangle), delete everything and replace it with the text below (the screenshot above already has the correct formula)

    fields eventTime, userIdentity.userName, userIdentity.accessKeyId, sourceIPAddress, awsRegion, eventSource, eventName | sort eventTime desc
    

    The above formula directs Amazon CloudWatch Log Insights to display the event time, user name and access key of the identity that performed the control plane operation, the ip address from where the operation was performed, the AWS region inside which the operation was performed, the event’s source and name. The results are sorted based on the event time in descending order.

    Next, set the time range for events that CloudWatch Logs Insights must process. To configure this, pick the appropriate duration from those displayed on the top right (as displayed inside the green rectangle in the screenshot above). For my setup, I chose 1hr.

    Once completed, click Create widget

  7. The next screen should look similar the screenshot below. Click Save dashboard

     

    Amazon CloudWatch Logs Insights - Create Dashboard

  8. Your Amazon CloudWatch dashboard is now complete. To refresh the events, you can press the refresh button (pointed by the red arrow in the screenshot
  9. Amazon CloudWatch Logs Insights - Refresh Events
  10. You can also enable auto refresh of the events by clicking the small arrow beside the refresh button. You will get a menu option similar to the screenshot below.

    Tick Auto refresh and choose the Refresh interval of your choice.

  11. Pro Tip 1 – if you want your dashboard to be displayed under the Favorite section when you open the Amazon CloudWatch service page, go into the Dashboards section of Amazon CloudWatch and click on Favorite (star) beside your dashboard name.
  12. Pro Tip 2 – If you want your dashboard to appear on the default Amazon CloudWatch service page, rename your dashboard to CloudWatch-Default.
  13. Pro Tip 3 = at the beginning of each row In your dashboard, you will notice a small arrow head. If you click on the arrow head, it expands that event and provides additional information.

Thats it! You should now have a dashboard similar to the one below that shows the control plane operations as they happen.

Amazon CloudWatch Logs Insights Dashboard

At times, I found there to be approximately five minutes of delay between when the event happened and when it was displayed. This could be due to the delay between when the event was generated and when that service that generated it delivered the logs to AWS CloudTrail.

The dashboard should allow you to easily monitor any suspicious control plane activities in your AWS account.

I hope the above was useful. Till the next time, Enjoy!

Using Serverless Framework and AWS to map Near-Realtime Positions of Trains

Background

A couple of months back, I found out about the Open Data initiative from Transport for New South Wales. This is an awesome undertaking, to provide data to developers and other interested parties, so that they can develop great applications. For those interested, the Open Data hub can be accessed at https://opendata.transport.nsw.gov.au.

I had been playing with data for a few months now and when I looked through the various APIs that I could access via the Open Data hub, I became extremely interested.

In this blog, I will take you through one of my mini projects based off the data at Open Data Hub. I will be using the Public Transport – Vehicle Positions API to plot the near realtime positions of Sydney trains on a map. The API provides access to more than just train position data, however to keep things simple, I will concentrate on only trains in this blog.

Solution Architecture

As I am a huge fan of serverless, I decided to architect my solution with as much serverless components as possible. The diagram below shows a high-level architecture of how the Transport Positioning System (this is what I will call my solution, TPS for short) will be created.

Let’s go through the steps (as marked in the diagram above)

  1. The lambda function will query the Open Data API every 5 minutes for the position of all trains
  2. After the data has been received, the lambda function will go through each record and assign a label to each train. To ensure the labels are consistent across each lambda invocation, the train to label association will be stored in an Amazon DynamoDB table. The lambda function will query the table to check if a train has already been allocated a label. If it has, then that label will be used. Otherwise, a new label will be created, and the Amazon DynamoDB table will be updated to store this new train to label association.
  3. I found Bing Maps to be much easier (and cheaper) to use for plotting items on a map. The only disadvantage is that it can, at most show 100 points (called pushpins) on the map. The lambda function will go through the first 100 items returned from Open Data API and using the labels that were found/created in step 2, create a pushpin url that will be used to generate a map of the location of the first 100 trains. This url will then be used to generate the map by sending a request to Bing Maps.
  4. The lambda function will then create a static webpage that displays the map showing the position of the trains, along with a description that provides more information about the labels used for each train (for example, label 1 could have a description of “19:10 Central Station to Penrith Station”). The label description is set to the train’s label, which is obtained from the query results of the Open Data API query.

This is what I cooked up earlier

Be warned! This blog is quite lengthy as a lot was done to make this solution work. However, if you would rather see the final result before delving into the details, check it out https://sls-tps-website-dev.s3-ap-southeast-2.amazonaws.com/vehiclelocation.html.

Screenshot of TPS

Above is a screenshot of TPS in action. It is a static webpage that is being generated every 5 minutes, showing the position of trains. I will keep the lambda function running for at least three months, so you have plenty of time available to check it out.

Okay, let’s get our hands dirty and start coding.

Prerequisites

Before we start, the following must be in place

  1. You must have an AWS account. If you don’t have one already, you can sign up for a free tier at https://aws.amazon.com/free/
  2. You must have Serverless Framework installed on your computer. If you don’t have, it follow the details at https://serverless.com/framework/docs/getting-started/
  3. Setup the AWS access keys and secret access key that Serverless Framework will use to provision resources into your AWS account. Instructions to get this done can be obtained from https://serverless.com/framework/docs/providers/aws/guide/credentials/
  4. After items 1 – 3 have been completed, create a python runtime Serverless Framework service (my service is called tps)
  5. Within the tps service folder, install the following Serverless Framework plugins
    1. serverless-python-requirements – this plugin adds all the required python modules into a zip file containing our lambda function script, which then gets uploaded to AWS (the required python modules must be defined in the requirement.txt file)
    2. serverless-prune-plugin – this plugin ensures that only the specified number of lambda function versions exist.
  6. Serverless-python-requirements requires a file called requirements.txt. For this solution, create a file called requirements.txt at the root of the service folder and put the following lines inside it
    requests==2.22.0
    gtfs-realtime-bindings==0.0.6
  7. Register an account with Transport for New South Wales Open Data Hub (https://opendata.transport.nsw.gov.au/). This is free. Once registered, login to the Open Data Hub portal and then under My Account, click on Applications and then create an Application that has permissions to Public Transport – Realtime Vehicle Positions API. Note down the API key that is generated as it will be used by the python script later.
  8. Create an account with Bing Maps, use the Website licence plan. This will provide 125,000 billable transactions of generating maps per calendar year at no charge. For generating a map every 5 minutes, this is more than enough. Note down the API key that is provided. Details on creating a Bing Maps account is available at https://www.microsoft.com/en-us/maps/licensing/options

This project has two parts to it. The first is to create the AWS resources that will host our project. For this, we will be using Serverless Framework to create our AWS Lambda function, Amazon Simple Storage Service (S3) bucket, Amazon DynamoDB table, AWS CloudWatch Logs and AWS CloudWatch Events.

The second part is the API query and data ingestion from the Open Data API. The next sections will cover each of these parts.

AWS Resource Creation

As previously mentioned, we will use Serverless Framework to create our AWS resources. Serverless Framework uses serverless.yml to specify which resources need to be created. This file is created by default whenever a Serverless Framework service is created.

In this section, I will take you through the serverless.yml file I used for this project.

The file starts like this.

It defines the service name, plugins and variables that will be used in this file (notice the plugins serverless-python-requirements and serverless-prune-plugin)

The following default values have been configured in the above serverless.yml file.

  • application name is set to tps
  • environment is set to dev
  • Amazon CloudWatch logs is set to 14 days retention
  • the AWS region is set to ap-southeast-2 (Sydney)
  • the Amazon DynamoDB table is set to transportPosition
  • the billing mode for the Amazon DynamoDB table will be set to PAY PER REQUEST.

The next section defines the details for the cloud provider where resources will be provisioned.

As previously mentioned, I am using AWS. The lambda function will use python 3.7 runtime. The deployment bucket to host all artefacts is also defined. This is an Amazon Simple Storage Service (S3) bucket. Ensure that this S3 bucket exists before deploying the serverless service.

The next section defines the IAM role that will be created for the lambda function.

The specified IAM role will allow the lambda function to carry out all required operations on Amazon Dynamodb table (currently the IAM role permits all DynamoDB actions, however if required, this can be tightened) and to upload objects to the Amazon S3 bucket that will host the static website.

The next section provides instructions on what to include and exclude when creating the serverless package.

Now we come to the important sections for within serverless.yml. The next section defines the lambda function, its handler and what events will trigger it.

For the tps lambda function, the handler function is at src.tps_vehiclepos.run (this means that there is a subfolder within the service folder called src, within which there is a python file called tps_vehiclepos. Inside this file is a function called run). The lambda function will be run every 5 minutes. To achieve this, I am using AWS Events. Two environment variables are also being passed to the lambda function (BUCKET and TRANSPORTPOSITION_TABLE)

The last section defines all the resources that will be created by the Serverless Framework.

The following resources will be created

  • Amazon S3 bucket. This bucket will store the Bing Maps that show the train positions. It will also serve the website which will be used to display the position maps.
  • Amazon DynamoDB table. The table will be used to store details for trains found in the Open Data API query. Note that we are also using the DynamoDB TTL feature. Since we don’t need to retain items older than a day, this will allow us to easily prune the Amazon DynamoDB table, to reduce costs.

The full serverless.yml file can be downloaded from https://gist.github.com/nivleshc/951ff2f235a89d58d4abec6de91ef738

Generating the map

In this section we will go through the script that does all the magic, which is querying the Open Data API, processing the data, generating the map and then displaying the results in a webpage.

The script is called tps_vehiclelocation.py. I will discuss the script in parts below.

The first section lists the python modules that will be imported. It also defines the variables that will be used within the script.

Remember to replace <insert your opendata api key> and <insert your bing maps api key here> with your own Open Data and Bing Maps api key. Do not include the ‘<‘ ‘>’ in the script.

Next, I will take you through the various functions that have been created to carry out specific tasks.

First up is the initialise global variables function. As you might be aware, AWS Lambdas can get reused over various invocations. During my experimentations, I found this happening and the issue I found was that my global variables were not being automatically initialised. This caused erroneous results. To circumvent this issue, I decided to write an explicit function that will initialise all global variables at the beginning of each lambda function invocation.

When placing each train location on the map (these will be doing using pushpins), a label will be used to identify each pushpin. Unfortunately, Bing Maps doesn’t allow more than three characters for each label. This is not enough to provide a meaningful label. The solution I devised was to use a consecutive numbering scheme for the labels and then provide a key on the website page. The key will provide a description for each label. One complication with this approach is that I need to ensure the same label is used for the same train for any lambda invocation. This is achieved by storing the label to train mapping in the Amazon DynamoDb table.

The next function downloads all label to train associations stored in Amazon DynamoDB. This ensures that labels remain consistent across lambda invocations.

The next function just queries Open Data API for the train positions.

The following function takes the data from the above function and processes it.

The function goes through each vehicle record that was returned and checks to see if the vehicle already has a label associated to it. If there is one, then this label will be used for it otherwise a new one is created. The Amazon DynamoDB table will be updated with this newly created label. This function uses the first 100 trains returned by the Open Data API, to generate a pushpin URL. This URL will be used to generate the Bing Maps showing the positions of the trains.

The following function uses the pushpin URL to create a map using Bing Maps.

The function provides the pushpin URL to Bing Maps. Bing Maps returns a map showing the positions of the trains. The map is then uploaded to the Amazon S3 bucket that will serve the website. The function then goes generates a description page showing descriptions for each label. As you can imagine, each day, there are hundreds of labels being created. Not all of these labels will be displayed on the map, however they would be listed in the key area. To provide quick access to the key descriptions which show trains that are currently displayed in the map, the corresponding entries will be displayed in blue. This ensures that people don’t go on a wild goose chase, trying to locate a train on the map which might not be displayed.

The next function updates the transport position Amazon DynamoDB table with any new labels that were created during this invocation. This ensures that the labels persist for all subsequent lambda invocations.

Now that we know what all the functions do, lets move on to the handler function that the tps lambda will call. The handler function will coordinate all the functions.

The handler function calls the respective functions to get the following tasks done (in the order listed below)

  • Initialises the global variables
  • downloads the previously associated labels from the Amazon DynamoDB table
  • calls Open Data API to get the latest position of the trains.
  • the train data is then processed, and the pushpin URL generated
  • Using the pushpin URL, the map is generated using Bing Maps.
  • A landing page is created. This webpage shows the map along with a key to show descriptions for each label.
  • Finally, all labels that were created within this lambda invocation are uploaded to the Amazon DynamoDB table. This ensures that for all subsequent invocations of the lambda function, the respective trains get the same label assigned to them.

The full tps_vehiclepos.py file can be downloaded from https://gist.github.com/nivleshc/03cb06bd6ae9cb969192af0ee2a1a15b

That’s it! Now you have a good idea about how the AWS resources were generated and how the data was acquired, processed and then visualised.

Cost to run the solution

When I started developing this solution, to view what type of data was being provided by Open Data API, I tried to ingest everything into DynamoDB. This was a VERY VERY bad idea as it cost me quite a lot. However, since then I have modified my code to only ingest and store fields that are required, into Amazon DynamoDB table. This has drastically reduced by running costs. Currently I am being charged approximately USD0.05 or less per day. You can run this easily within your free tier without incurring any additional costs (just as a precaution, monitor your costs to ensure there are no surprises).

As mentioned at the beginning of this blog, the final result can be seen at https://sls-tps-website-dev.s3-ap-southeast-2.amazonaws.com/vehiclelocation.html. The webpage is refreshed every one minute however the map is generated every five minutes. I will keep this project running for at least three months. So, if you want to check it out, you have ample time.

I have extremely enjoyed creating this project, I hope you all enjoy it as well. Till the next time, Enjoy!

Building a Breakfast Ordering Skill for Amazon Alexa – Part 1

Introduction

At the AWS Summit Sydney this year, Telstra decided to host a breakfast session for some of their VIP clients. This was more of a networking session, to get to know the clients much better. However, instead of having a “normal” breakfast session, we decided to take it up one level 😉

Breakfast ordering is quite “boring” if you ask me 😉 The waitress comes to the table, gives you a menu and asks what you would like to order. She then takes the order and after some time your meal is with you.

As it was AWS Summit, we decided to sprinkle a bit of technical fairy dust on the ordering process. Instead of having the waitress take the breakfast orders, we contemplated the idea of using Amazon Alexa instead 😉

I decided to give the Alexa skill development a go. However, not having any prior Alexa skill development experience, I anticipated an uphill battle, having to first learn the product and then developing for it. To my amazement, the learning curve wasn’t too steep and over a weekend, spending just 12 hours in total, I had a working proof of concept breakfast ordering skill ready!

Here is a link to the proof of concept skill https://youtu.be/Z5Prr31ya10

I then spent a week polishing the Alexa skill, giving it more “personality” and adding a more “human” experience.

All the work paid off when I got told that my Alexa skill would be used at the Telstra breakfast session! I was over the moon!

For the final product, to make things even more interesting, I created a business intelligence chart using Amazon QuickSight, showing the popularity of each of the food and drink items on the menu. The popularity was based on the orders that were being received.

BothVisualsSidebySide

Using a laptop, I displayed the chart near the Amazon Echo Dot. This was to help people choose what food or drink they wanted to order (a neat marketing trick 😉 ) . If you would like to know more about Amazon QuickSight, you can read about it at Amazon QuickSight – An elegant and easy to use business analytics tool

Just as a teaser, you can watch one of the ordering scenarios for the finished breakfast ordering skill at https://youtu.be/T5PU9Q8g8ys

In this blog, I will introduce the architecture behind Amazon Alexa and prepare you for creating an Amazon Alexa Skill. In the next blog, we will get our hands dirty with creating the breakfast ordering Alexa skill.

How does Amazon Alexa actually work?

I have heard a lot of people use the name “Alexa” interchangeably for the Amazon Echo devices. As good as it is for Amazon’s marketing team, unfortunately, I have to set the records straight. Amazon Echo are the physical devices that Amazon sells that interface to the Alexa Cloud. You can see the whole range at https://www.amazon.com/Amazon-Echo-And-Alexa-Devices/b?ie=UTF8&node=9818047011. These devices don’t have any smarts in them. They sit in the background listening for the “wake” command, and then they start streaming the audio to Alexa Cloud. Alexa Cloud is where all the smarts are located. Using speech recognition, machine learning and natural language processing, Alexa Cloud converts the audio to text. Alexa Cloud identifies the skill name that the user had requested, the intent and any slot values it finds (these will be explained further in the next blog). The intent and slot values (if any) are passed to the identified skill. The skill uses the input and processes it using some form of compute (AWS Lambda in my case) and then passes the output back to Alexa Cloud. Alexa Cloud, converts the skill output to Speech Synthesis Markup Language (SSML) and sends it to the Amazon Echo device. The device then converts the SSML to audio and plays it to the user.

Below is an overview of the process.

alexa-skills-kit-diagram._CB1519131325_

Diagram is from https://developer.amazon.com/blogs/alexa/post/1c9f0651-6f67-415d-baa2-542ebc0a84cc/build-engaging-skills-what-s-inside-the-alexa-json-request

Getting things ready

Getting an Alexa enabled device

The first thing to get is an Alexa enabled device. Amazon has released quite a few different varieties of Alexa enabled devices. You can checkout the whole family here.

If you are keen to try a side project, you can build your own Alexa device using a Raspberry Pi. A good guide can be found at https://www.lifehacker.com.au/2016/10/how-to-build-your-own-amazon-echo-with-a-raspberry-pi/

You can also try out EchoSim (Amazon Echo Simulator). This is a browser-based interface to Amazon Alexa. Please ensure you read the limits of EchoSim on their website. For instance, it cannot stream music

For developing the breakfast ordering skill, I decided to purchase an Amazon Echo Dot. It’s a nice compact device, which doesn’t cost much and can run off any usb power source. For the Telstra Breakfast session, I actually ran it off my portable battery pack 😉

Create an Amazon Account

Now that you have got yourself an Alexa enabled device, you will need an Amazon account to register it with. You can use one that you already have or create a new one. If you don’t have an Amazon account, you can either create one beforehand by going to https://www.amazon.com or you can create it straight from the Alexa app (the Alexa app is used to register the Amazon Echo device).

Setup your Amazon Echo Device

Use the Alexa app to setup your Amazon Echo device. When you login to the app, you will be asked for the Amazon Account credentials. As stated above, if you don’t have an Amazon account, you can create it from within the app.

Create an Alexa Developer Account

To create skills for Alexa, you need a developer account. If you don’t have one already, you can create one by going to https://developer.amazon.com/alexa. There are no costs associated with creating an Alexa developer account.

Just make sure that the username you choose for your Alexa developer account matches the username of the Amazon account to which your Amazon Echo is registered to. This will enable you to test your Alexa skills on your Amazon Echo device without having to publish it on the Alexa Skills Store (the skills will show under Your Skills in the Alexa App)

Create an AWS Free Tier Account

In order to process any of the requests sent to the breakfast ordering Alexa skill, we will make use of AWS Lambda. AWS Lambda provides a cheap and cost-effective way to run code due to the fact that you are only charged for the time that the code is run. There are no costs for any idle time.

If you already have an AWS account, you can use that otherwise, you can sign up for an AWS Free tier account by going to https://aws.amazon.com . AWS provides a lot of services for free for the first 12 months under the Free Tier, with some services continuing the free tier allowance even beyond the 12 months (AWS Lambda is one such). For a full list of Free Tier services, visit https://aws.amazon.com/free/

High Level Architecture for the Breakfast Ordering Skill

Below is the architectural overview for the Breakfast Ordering Skill that I built. I will introduce you to the various components over the next few blogs.Breakfast Ordering System_HighLevelArchitecture

In the next blog, I will take you through the Alexa Developer console, where we will use the Alexa Skills Kit (ASK) to start creating our breakfast ordering skill. We will define the invocation name, intents, slot names for our Alexa Skill. Not familiar with these terms? Don’t worry,  I will explain them in the next blog.  I hope to see you there.

See you soon.

 

Amazon QuickSight – An elegant and easy to use business analytics tool

Introduction

Recently, I had a requirement for a tool to visualise some data I had collected. My requirements were very simple. I didn’t want something that would cost me a lot, and at the same time I wanted the reports to be elegant and informative. Most of all, I didn’t want to have to go through pages and pages of documentation to learn how to use it.

As my data was within Amazon Web Services (AWS), I thought to check if AWS had any such offerings. Guess what, there was indeed a tool just for what I wanted, and after using it, I was amazed at how simple and elegant it is.

In this blog, I will show how you can easily get started with Amazon QuickSight. I will take you through the steps to import your data into Amazon QuickSight and then create some informative visualisations.

Some background on Amazon QuickSight

Pricing

Amazon QuickSight is very inexpensive, infact, if your data is not too much, you won’t have to pay anything!

For standard edition use, Amazon QuickSight provides 1GB of SPICE for the first user free per month. SPICE is an acronym for Super-fast, Parallel, In-memory, Calculation Engine and it uses a combination of columnar storage, in-memory technologies enabled through the latest hardware innovations, machine code generation, and data compression to allow users to run interactive queries on large datasets and get rapid responses.  SPICE is the calculation engine that Amazon QuickSight uses.

Any additional SPICE is priced at $USD0.25 per GB/month. For the latest pricing, please refer to https://aws.amazon.com/quicksight/#Pricing

Data Sources

Currently Amazon QuickSight supports the following data sources

  • Relational Data Sources
    • Amazon Athena
    • Amazon Aurora
    • Amazon Redshift
    • Amazon Redshift Spectrum
    • Amazon S3
    • Amazon S3 Analytics
    • Apache Spark 2.0 or later
    • Microsoft SQL Server 2012 or later
    • MySQL 5.1 or later
    • PostgreSQL 9.3.1 or later
    • Presto 0.167 or later
    • Snowflake
    • Teradata 14.0 or later
  • File Data Sources
    • CSV/TSV – (comma separated, tab separated value text files)
    • ELF/CLF – Extended and common log format files
    • JSON – Flat or semi-structured data files
    • XLSX – Microsoft Excel files

Unfortunately, currently Amazon DynamoDB is not supported as a native data source. Since my data is in Amazon DynamoDB, I had to write some custom lambda functions to export it to a csv file, so that it could be imported into Amazon QuickSight.

Ok, time for that walk-through I promised earlier.  For this blog, I will be using an S3 bucket as my data source. It will contain the CSV files that I will use for analysis in Amazon QuickSight.

Step 1 – Create S3 buckets

If you haven’t already done so, create an S3 bucket that will contain the csv files. The S3 bucket does not have to be publicly accessible. Once created, upload the csv files into the S3 bucket.

In my case, the csv file is called orders.csv and its location is https://s3.amazonaws.com/sample/orders.csv (to get the URL to your S3 file, login to the S3 console and navigate to the S3 bucket that contains the file. Click the S3 bucket to open it, then click the file name to open its properties. Under Overview you will see Link. This is the URL to the file)

Step 2 – Create an Amazon QuickSight Account

Before you start using Amazon QuickSight, you must create an account. Unfortunately, I couldn’t find a way for creating an Amazon QuickSight account without creating an Amazon AWS account. If you don’t have an existing Amazon AWS account, you can create an AWS Free Tier account. Once you have got an AWS account, go ahead and create an Amazon QuickSight account at https://aws.amazon.com/quicksight/.

While creating your Amazon QuickSight account, you will be asked if you would like Amazon QuickSight to auto-discover your Amazon S3 buckets. Enable this and then click to Choose S3 buckets. Choose the S3 bucket that you created in Step 1 above. This will give Amazon QuickSight read-only access to the S3 bucket, so that it can read the data for analysis.

Step 3 – Create a manifest file

A manifest file is a JSON file that provides the location and format of the data files to Amazon QuickSight. This is required when creating a data set for S3 data sources. Please refer to https://docs.aws.amazon.com/quicksight/latest/user/supported-manifest-file-format.html if you would like more information about manifest files.

Below is my manifest file, which I have affectionately named ordersmanifest.json.

{
   "fileLocations": [
      {
         "URIs": [
            "https://s3.amazonaws.com/sample/orders.csv"
         ]
      },
   ],
   "globalUploadSettings": {
      "format": "CSV",
      "delimiter": ",",
      "textqualifier": "'",
      "containsHeader": "true"
   }
}

Once created, upload the manifest file into the same S3 bucket as to where the csv file is stored.

Step 4 – Create a data set

  • Login to your Amazon QuickSight account. From the top right, click on Manage data
  • In the next screen, click on New data set
  • In the next screen, for Create a Data Set FROM NEW DATA SOURCES, click on S3
  • In the next screen
    • provide a name for the data source
    • for Upload a manifest file ensure URL is clicked and enter the URL to the manifest file (you can get the url by logging into the S3 console, and then clicking on the manifest file to reveal its properties. Under the Overview tab, you will see Link. This is the URL to the manifest file).NewS3DataSource
    • Click Connect
    • Amazon QuickSight will now read the manifest file and then import the csv file to SPICE. You will see the following screenFinishDataSetCreation
    • Click on Edit/Preview data.
    • In the next screen, you will see the contents of the data file that was imported, along with the Fields name on the left. If you want to exclude any columns from the analysis, simply untick them (I unticked orderTime (S) since I didn’t need it) EditPreviewDataSet
    • By default, the data is called Group 1. To customise the name, replace Group 1 with a text of your choice (I have renamed my data to Orders Data)RenameGroup1Label
    • Click Save & visualize from the top menu

Step 5 – Create Visualisations

Now that you have imported the data into SPICE, you can start analysing it and creating visualisations.

After step 4, you should be in the Analysis section.

  • Depending on which visualisation you want, you can select the respective type under Visual types from the bottom left hand side of the screen. For my visualisations, I chose Pie Chart (side note – you will notice that orderTime (S)  isn’t listed under Fields list. This is because we had unticked it in the previous screen)OrdersDataAnalysis-01
  • I want to create two Pie Charts, one to show me analysis about what is the most popular foodName and another to find out what is the most popular drinkName. For the first Pie Chart, drag foodName (S) from the Fields list to the Value – Add a measure here box  in the top of the screen. Then drag foodName (S) from the Fields list to the Group/Color – Add a dimension here box in the top of the screen. You will see the followingOrdersDataAnalysis-02
  • You can customise the visualisation title Count of Foodname (S) by Foodname (S) by clicking it and then changing the text (I have changed the title to Popularity of Food Types)FoodNamePopularity
  • If you look closely, the legend on the right hand side doesn’t serve much purpose since the pie slices are already labelled quite well. You can also get rid of the legend and get more space for your visual. To do this, click on the down arrow above FoodName (S) on the right and then select Hide legend FoodNameHideLengend
  • Next, lets create a Pie Chart visualisation for drinkName. From the top menu, click on Add and then Add visual drinkNameAddVisual
  • You will now have another Canvas at the bottom of the first Pie Chart. Click this new canvas area to select it (a blue border will appear to show that it is selected). From Visual types at the bottom left hand side, click on the Pie Chart visual. Then from the top, click on Field wells to expose the Value and Group/Color boxes for the second canvas drinkNameCanvas
  • From the Field list on the left, drag drinkName (S) to the Value – Add a measure here box  in the top of the screen. Then drag drinkName (S) from the Fields list to the Group/Color – Add a dimension here box in the top of the screen. You will now see the following foodanddrinkvisual
  • We are almost done. I actually want the two Pie Charts to sit side by side, instead of one ontop ofthe other. To do this, I will show you a neat trick. In each of the visuals, at the bottom right border, you will see two diagonal lines. If you move your mouse pointer over them, they change to a resizing cursor. Use this to resize the visual’s canvas area. Also, in the middle of the top border of the visual, you will see two rows of gray dots. Click your mouse pointer on this and drag to the location you want to move the visual to.VisualResizeandMove
  • I have hidden the legend for the second visual, customised the title and resized both the visuals and moved them side by side. Viola! Below is what I get. Not bad aye!BothVisualsSidebySide

Step 6 – Create a dashboard

Now that the visuals have been created, they can be shared it with others. This can be done by creating a dashboard. A dashboard is a read-only snapshot of the analysis. When you share the dashboard with others, they can view and filter the dashboard data, however any filters applied to the dashboard visual exist only when the user is viewing the dashboard, and aren’t saved once it is closed.

One thing to note about sharing dashboards – you can only share dashboards with users who have an Amazon QuickSight account.

Creating a dashboard is very easy.

  • In the Analysis screen, on the top right corner, click on Share and then select Create dashboardCreateDashboard
  • You can either replace an existing dashboard or create a new one. In our case, since we are creating a new dashboard, select Create a new dashboard as and enter a name for the dashboard. Once finished, click Create dashboardCreateDashboard-Name
  • You will then be asked to enter the username or email address of those you want to share the dashboard with. Enter this and click on Share ShareDashboard
  • That’s it, your dashboard is now created. To access it, go to the Amazon QuickSight home screen (click on the Amazon QuickSight icon on the top left hand side of the screen) and then click on All dashboards. Those that you have shared the dashboard with will also be able to see it once they login to their Amazon QuickSight account.AllDashboards

Step 6 – Refreshing the Data Set

If your data set continually changes, your visualisations/dashboards will not show the updated information. This can be done by refreshing the data set. Doing this will import the new data into SPICE, which will then automatically update the analysis/visualisations and dashboards

Note: you will have to manually reload the webpage to see the updated visualisations and dashboard

There are two ways of refreshing data sets. One is to do it manually while the other is to use a schedule. The scheduled data refresh allows for the data to be automatically refreshed at a certain time daily, weekly or monthly. A maximum of five scheduled refreshes can be configured.

The steps below show how you can manually refresh the data or create schedules to refresh the data

  • From the Amazon QuickSight main screen, click on Manage data from the top left of the screen ManageData
  • In the next screen, you will see all your currently configured data sets. Click the Orders Data dataset (this is the one we had created previously).
  • In the next screen, you will see Refresh Now and Schedule refreshManualScheduleDataRefresh
  • Clicking on Refresh Now will manually refresh the data. Clicking on Schedule refresh will bring up the screen where you can configure a schedule for refreshing the data automatically.

 

That’s it folks! Wasn’t that simple? If you already have an Amazon AWS account, I would strongly recommend giving Amazon QuickSight a try for all your analytics needs. Even if you don’t have an Amazon AWS account, I would still suggest getting an AWS free tier account to try it out.

Enjoy 😉