Create an Omnichannel Chatbot using Amazon Lex and Amazon Connect

Background

These days, chatbots are used pretty much everywhere. From getting quotes to resetting password, their use cases are endless. They also elevate the customer experience, as now customers don’t need to call during your helpdesk’s manned hours, instead they can call anytime that is convenient to them. One of the biggest business benefits is that of lessening the load on their support staff.

A good practice to adhere by, when deploying chatbots is to make them channel agnostic. This means that your chatbots are available via the internet and also via a phone call, and they provide the same customer experience. This enables your customers to choose whichever channel suits them best, without any loss of customer experience.

In this blog, I will demonstrate how you can use Amazon Connect and Amazon Lex, to create an omnichannel chatbot.

I will be extending one of my previous blogs, so if you haven’t read it already, I would highly recommend that you do so, prior to continuing. Here is the link to the blog https://nivleshc.wordpress.com/2020/04/08/create-a-web-chatbot-for-generating-life-insurance-quotes-using-amazon-lex/).

High Level Architecture Diagram

Below is the high-level architecture diagram for the solution. We will build the section inside the blue box.

For steps 1 – 5 please refer to https://nivleshc.wordpress.com/2020/04/08/create-a-web-chatbot-for-generating-life-insurance-quotes-using-amazon-lex/

For steps 6 – 8 please refer to https://nivleshc.wordpress.com/2020/05/24/publish-a-web-chatbot-frontend-using-aws-amplify-console/

Steps 9 – 12 will be created in this blog and are described below:

9. The customer calls the phone number for the chatbot. This is attached to a contact flow in Amazon Connect

10. Amazon Connect proxies the customer to the Amazon Lex chatbot (this is the web chatbot created in https://nivleshc.wordpress.com/2020/04/08/create-a-web-chatbot-for-generating-life-insurance-quotes-using-amazon-lex/ )

11. The output from the Amazon Lex chatbot is sent back to Amazon Connect.

12. Amazon Connect converts the output from the Amazon Lex chatbot into audio and then plays it to the customer.

Prerequisites

This blog assumes the following:

  • you already have an Amazon Connect instance deployed and configured.
  • you have a working Amazon Lex web chatbot

You can refer to the following blogs, if you need to deploy either of the above prerequisites

https://nivleshc.wordpress.com/2020/04/08/create-a-web-chatbot-for-generating-life-insurance-quotes-using-amazon-lex/

https://nivleshc.wordpress.com/2020/05/24/publish-a-web-chatbot-frontend-using-aws-amplify-console/

Implementation

Let’s get started.

  1. Login to your AWS Management Console, open the Amazon Connect console and change to the respective AWS region.
  2. Within the Amazon Connect console, choose the instance that will be used and click its Instance Alias.
  3. In the next screen, from the left-hand side menu, click Contact flows. Then, in the right-hand side screen, under Amazon Lex select the Region where the Amazon Lex bot resides. From the Bot drop-down list, select the name of the Amazon Lex bot. Once done, click +Add Lex Bot.
  4. Click Overview from the left-hand side menu, and then click Login URL to open the Amazon Connect administration portal. Enter your administrator credentials (currently only the following internet browsers are supported – latest three versions of Google Chrome, Mozilla Firefox ESR, Mozilla Firefox).
  5. Once logged in, from the left hand-side menu, click Routing and then Contact flows.
  6. Click Create contact flow (located on the top-right). You will now see the contact flow designer.
  7. Enter a name for the contact flow (top left where it says Enter a name)
  8. From the left-hand side menu, expand Interact and drag Get customer input to the right-hand side screen.
  9. Click on the circle to the right of Start in the Entry point block and drag the arrow to the Get customer input block. This will connect the two blocks.
  10. Click on the title of the Get customer input block to open its configuration.
  11. Select Text-to-speech or chat text. Click Enter text and in the textbox underneath, type the message you want to play to the customer when they call the chatbot.
  12. Click Amazon Lex and then under Lex bot Name select the Amazon Lex bot that you had created earlier (if the bot doesn’t show, ensure you had carried out step 3 above). Under Intents type the Amazon Lex bot intent that should be invoked. Click Save.

13. From the left-hand side menu, expand Interact and drag two Play prompt blocks to the right-hand side screen.

14. The first Play prompt block will be used to play a goodbye message, after the Amazon Lex bot has finished execution. Click on the title of this block to open its configuration. Click Text-to-speech or chat text and then click Enter text. Enter a message to be played before the call is ended. Click Save.

  15. The second Play prompt block will be used to play a message when an error occurs. Click on the title of this block to open its configuration. Click Text-to-speech or chat text and then click Enter text. Enter a message to be played when an error occurs. Click Save.

16. From the left-hand side menu, expand Terminate / Transfer and drag the Disconnect / hang up block to the right-hand side screen.

17. In the designer (right-hand side screen), in the Get customer input block, click the circle beside startConversation (this is the name of your Amazon Lex bot intent) and drag the arrow to the first Play prompt block.

18. Repeat step 17 for the circle beside Default in the Get customer input block.

19. In the Get customer input block, click the circle beside Error and drag it to the second Play prompt block.

20. Inside both the Play prompt blocks, click the circle beside Okay and drag the arrow to the Disconnect / hang up block.

21. From the top-right, click Save. This will save the work you have done.

22. From the top-right, click Publish. You will get a prompt Are you sure you want to publish this content flow? Click Publish.

23. Once done, you should see a screen similar to the one below:

24. Next, we need to ensure that whenever someone calls, the newly created contact flow is invoked. To do this, from the left-hand side menu, click Routing and then click Phone numbers.

25. In the right-hand side, click the phone number that will be used for the chatbot. This will open its settings. Enter a description (optional) and then from the drop-down list underneath Contact flow / IVR, choose the contact flow that was created above. Click Save.

Give it a few minutes for the settings to take effect. Now, call the phone number that was assigned to the contact flow above. You should be greeted by the welcome message you had created above. The phone chatbot experience would be the same as what you experienced when interacting with the chatbot over the internet!

Congratulations! You just created your first omnichannel chatbot! How easy was that?

Till the next time, Enjoy!

Publish A Web Chatbot Frontend Using AWS Amplify Console

Background

In my previous blog (https://nivleshc.wordpress.com/2020/04/08/create-a-web-chatbot-for-generating-life-insurance-quotes-using-amazon-lex/), I demonstrated how easy it was to create a web chatbot using Amazon Lex. As discussed in that blog, one of the challenges with Amazon Lex is not having an out-of-the-box frontend solution for the bots. This can throw a spanner in the works, if you are planning on showcasing your chatbots to customers, without wanting to overwhelm them with the code. Luckily, with some work, you can create a front-end that exposes just the bot. I provided instructions for achieving this in the same blog.

Having a static website hosted out of an Amazon S3 bucket is good, however it does come with a few challenges. As the website gains popularity, it becomes more integral to your business. Soon, you will not be able to afford any website outages. In such situations, how do you deploy changes to the website without breaking it? How do you track the changes, to ensure they can be rolled back, if something does break? How do you ensure your end-users don’t suffer from slow webpage loads? These are some of the questions that need to be answered, as your website achieves popularity.

AWS Amplify Console provides an out-of-the-box solution for deploying static websites. The contents of the website can be hosted in a source code repository. This provides an easy solution to track changes, and to rollback, should the need arise. AWS Amplify Console serves the website using Amazon CloudFront, AWS’s Content Delivery Network. This ensures speedy page loads for end-users. These are just some of the features that make hosting a static website using AWS Amplify Console a great choice.

In this blog, I will modify my life Insurance quote web chatbot solution, by migrating its frontend from an Amazon S3 bucket to AWS Amplify Console.

High Level Architecture Diagram

Below is a high-level architecture diagram for the solution described in this blog.

Steps 1 – 5 are from my previous blog https://nivleshc.wordpress.com/2020/04/08/create-a-web-chatbot-for-generating-life-insurance-quotes-using-amazon-lex

Steps 6 – 8 will be created in this blog and are described below:

6. Website developer will push changes for the Amazon Lex web chatbot frontend to GitHub.

7. GitHub will inform AWS Amplify about the new changes.

8. AWS Amplify will retrieve the changes from GitHub, build the new web chatbot frontend and deploy it, thereby updating the previous web chatbot        frontend.

Implementation

Before we continue, if you haven’t already, I would highly recommend that you read my Amazon Lex Life Insurance quote generating web chatbot blog ( https://nivleshc.wordpress.com/2020/04/08/create-a-web-chatbot-for-generating-life-insurance-quotes-using-amazon-lex)

Let’s get started.

  1. Login to your AWS Management Console, open the AWS Amplify Console and change to your AWS Region of choice.
  2. Click on Connect app.
  3. Next, choose the location where the source code is hosted. As I have stored the web chatbot frontend files in GitHub, I chose GitHub. Note that the repository should contain only the frontend files. Click Continue.
  4. Unless you had previously authorised AWS Amplify Console access to your source code repository, a screen will pop-up requesting access to your source code repository. Click Authorize aws-amplify-console.
  5. Once successfully authorised, you will be returned to the AWS Amplify Console. Using the dropdown menu beside Select a repository select the repository that contains the frontend code.
  6. Next, choose the Branch to use and then click Next.
  7. The next screen shows the Configure build settings page. AWS Amplify Console will inspect the source code and deduce the appropriate App build and test settings. If what is shown is incorrect, or you would like to modify it, you can use the Edit button.

    In my case, I did not find anything needed changing from what AWS Amplify Console had provided.

    You can change the App name, if this needs to be different from what is automatically provided.

    Click Next.

  8. In the next screen, review all the settings. Once confirmed, click Save and deploy.
  9. AWS Amplify Console will start creating the application. You will be redirected to the application’s configuration page, where on the right, a continuous deployment pipeline, similar to the one below, will be shown.

  10. Wait for all stages of the pipeline to complete successfully and then click on the url on the left (the one similar to https://master…amplifyapp.com). The page that opens next is the Insurance Chatbot Frontend, served by AWS Amplify Console! (below is how the web chatbot looks like)

  11. Now, whenever the frontend files are modified and pushed into the master branch of the source code repository (GitHub in this case), AWS Amplify Console will automatically update the Insurance Chatbot frontend, with all changes easily trackable from within GitHub.
  12. You can use a custom domain name, to make the application URL more personalised (by default, AWS Amplify Console applications are allocated the amplifyapp.com domain urls). To do this, from your application’s configuration page, click Domain management in the left-hand side menu. Then click add domain and follow the instructions.
  13. You might also benefit from email notifications whenever AWS Amplify Console updates your application. To configure this, from your application’s configuration page, click Notifications in the left-hand side menu. Then click Add notification and add an email address to receive notifications for successful and failed builds.
  14. To view site access logs, from your applications configuration page, click Access logs in the left-hand side menu.

There you go. Hopefully this provides valuable information for those looking for an easy solution to deploy their static websites in a consistent, auditable and highly available manner.

Till the next time, Enjoy!

Using Serverless Framework and AWS to map Near-Realtime Positions of Trains

Background

A couple of months back, I found out about the Open Data initiative from Transport for New South Wales. This is an awesome undertaking, to provide data to developers and other interested parties, so that they can develop great applications. For those interested, the Open Data hub can be accessed at https://opendata.transport.nsw.gov.au.

I had been playing with data for a few months now and when I looked through the various APIs that I could access via the Open Data hub, I became extremely interested.

In this blog, I will take you through one of my mini projects based off the data at Open Data Hub. I will be using the Public Transport – Vehicle Positions API to plot the near realtime positions of Sydney trains on a map. The API provides access to more than just train position data, however to keep things simple, I will concentrate on only trains in this blog.

Solution Architecture

As I am a huge fan of serverless, I decided to architect my solution with as much serverless components as possible. The diagram below shows a high-level architecture of how the Transport Positioning System (this is what I will call my solution, TPS for short) will be created.

Let’s go through the steps (as marked in the diagram above)

  1. The lambda function will query the Open Data API every 5 minutes for the position of all trains
  2. After the data has been received, the lambda function will go through each record and assign a label to each train. To ensure the labels are consistent across each lambda invocation, the train to label association will be stored in an Amazon DynamoDB table. The lambda function will query the table to check if a train has already been allocated a label. If it has, then that label will be used. Otherwise, a new label will be created, and the Amazon DynamoDB table will be updated to store this new train to label association.
  3. I found Bing Maps to be much easier (and cheaper) to use for plotting items on a map. The only disadvantage is that it can, at most show 100 points (called pushpins) on the map. The lambda function will go through the first 100 items returned from Open Data API and using the labels that were found/created in step 2, create a pushpin url that will be used to generate a map of the location of the first 100 trains. This url will then be used to generate the map by sending a request to Bing Maps.
  4. The lambda function will then create a static webpage that displays the map showing the position of the trains, along with a description that provides more information about the labels used for each train (for example, label 1 could have a description of “19:10 Central Station to Penrith Station”). The label description is set to the train’s label, which is obtained from the query results of the Open Data API query.

This is what I cooked up earlier

Be warned! This blog is quite lengthy as a lot was done to make this solution work. However, if you would rather see the final result before delving into the details, check it out https://sls-tps-website-dev.s3-ap-southeast-2.amazonaws.com/vehiclelocation.html.

Screenshot of TPS

Above is a screenshot of TPS in action. It is a static webpage that is being generated every 5 minutes, showing the position of trains. I will keep the lambda function running for at least three months, so you have plenty of time available to check it out.

Okay, let’s get our hands dirty and start coding.

Prerequisites

Before we start, the following must be in place

  1. You must have an AWS account. If you don’t have one already, you can sign up for a free tier at https://aws.amazon.com/free/
  2. You must have Serverless Framework installed on your computer. If you don’t have, it follow the details at https://serverless.com/framework/docs/getting-started/
  3. Setup the AWS access keys and secret access key that Serverless Framework will use to provision resources into your AWS account. Instructions to get this done can be obtained from https://serverless.com/framework/docs/providers/aws/guide/credentials/
  4. After items 1 – 3 have been completed, create a python runtime Serverless Framework service (my service is called tps)
  5. Within the tps service folder, install the following Serverless Framework plugins
    1. serverless-python-requirements – this plugin adds all the required python modules into a zip file containing our lambda function script, which then gets uploaded to AWS (the required python modules must be defined in the requirement.txt file)
    2. serverless-prune-plugin – this plugin ensures that only the specified number of lambda function versions exist.
  6. Serverless-python-requirements requires a file called requirements.txt. For this solution, create a file called requirements.txt at the root of the service folder and put the following lines inside it
    requests==2.22.0
    gtfs-realtime-bindings==0.0.6
  7. Register an account with Transport for New South Wales Open Data Hub (https://opendata.transport.nsw.gov.au/). This is free. Once registered, login to the Open Data Hub portal and then under My Account, click on Applications and then create an Application that has permissions to Public Transport – Realtime Vehicle Positions API. Note down the API key that is generated as it will be used by the python script later.
  8. Create an account with Bing Maps, use the Website licence plan. This will provide 125,000 billable transactions of generating maps per calendar year at no charge. For generating a map every 5 minutes, this is more than enough. Note down the API key that is provided. Details on creating a Bing Maps account is available at https://www.microsoft.com/en-us/maps/licensing/options

This project has two parts to it. The first is to create the AWS resources that will host our project. For this, we will be using Serverless Framework to create our AWS Lambda function, Amazon Simple Storage Service (S3) bucket, Amazon DynamoDB table, AWS CloudWatch Logs and AWS CloudWatch Events.

The second part is the API query and data ingestion from the Open Data API. The next sections will cover each of these parts.

AWS Resource Creation

As previously mentioned, we will use Serverless Framework to create our AWS resources. Serverless Framework uses serverless.yml to specify which resources need to be created. This file is created by default whenever a Serverless Framework service is created.

In this section, I will take you through the serverless.yml file I used for this project.

The file starts like this.

service: sls-${self:custom.application}
plugins:
serverless-python-requirements
serverless-prune-plugin
custom:
pythonRequirements:
dockerizePip: true
slim: true
noDeploy:
typing
stage: ${opt:stage, self:custom.defaults.stage}
region: ${opt:region, self:custom.defaults.region}
application: ${env:APPLICATION, self:custom.defaults.application}
logRetentionInDays: ${opt:logretentionindays, env:logretentionindays, self:custom.defaults.logretentionindays}
s3:
websiteBucket: sls-${self:custom.application}-website-${self:custom.stage}
dynamodb:
transportPositionTableName: transportPosition
billingMode: PAY_PER_REQUEST
defaults:
application: tps
stage: dev
region: ap-southeast-2
logretentionindays: 14
prune:
automatic: true
number: 1

It defines the service name, plugins and variables that will be used in this file (notice the plugins serverless-python-requirements and serverless-prune-plugin)

The following default values have been configured in the above serverless.yml file.

  • application name is set to tps
  • environment is set to dev
  • Amazon CloudWatch logs is set to 14 days retention
  • the AWS region is set to ap-southeast-2 (Sydney)
  • the Amazon DynamoDB table is set to transportPosition
  • the billing mode for the Amazon DynamoDB table will be set to PAY PER REQUEST.

The next section defines the details for the cloud provider where resources will be provisioned.

provider:
name: aws
runtime: python3.7
endpointType: regional
stage: ${self:custom.stage}
region: ${self:custom.region}
memorySize: 256
timeout: 300
versionFunctions: false
deploymentBucket: sls-${self:custom.application}-deploymentbucket
logRetentionInDays: ${self:custom.logRetentionInDays}
environment:
STAGE: ${self:custom.stage}
REGION: ${self:custom.region}

As previously mentioned, I am using AWS. The lambda function will use python 3.7 runtime. The deployment bucket to host all artefacts is also defined. This is an Amazon Simple Storage Service (S3) bucket. Ensure that this S3 bucket exists before deploying the serverless service.

The next section defines the IAM role that will be created for the lambda function.

# define IAM roles
iamRoleStatements:
Effect: Allow
Action:
s3:HeadObject
s3:GetObject
s3:GetObjectAcl
s3:PutObject
s3:PutObjectAcl
s3:CreateMultiPartUpload
Resource:
arn:aws:s3:::${self:custom.s3.websiteBucket}/*
Effect: Allow
Action:
dynamodb:*
Resource:
!GetAtt transportPositionTable.Arn
!Join
''
– !GetAtt transportPositionTable.Arn
'/index/TripDate-GSI'

The specified IAM role will allow the lambda function to carry out all required operations on Amazon Dynamodb table (currently the IAM role permits all DynamoDB actions, however if required, this can be tightened) and to upload objects to the Amazon S3 bucket that will host the static website.

The next section provides instructions on what to include and exclude when creating the serverless package.

# you can add packaging information here
package:
include:
src/csv/*.csv
exclude:
__cache__/**
__pycache__/**
events/**
node_modules/**
'conf/**/*'
'gulp-libs/**/*'
'yarn.lock'
'package.json'
'.dotenv'

Now we come to the important sections for within serverless.yml. The next section defines the lambda function, its handler and what events will trigger it.

functions:
tpsFindVehiclePositionsCron:
handler: src.tps_vehiclepos.run
memorySize: 3008
timeout: 900
environment:
BUCKET: ${self:custom.s3.websiteBucket}
TRANSPORTPOSITION_TABLE: ${self:custom.dynamodb.transportPositionTableName}
events:
schedule:
rate: cron(0/5 * * * ? *)
enabled: true

For the tps lambda function, the handler function is at src.tps_vehiclepos.run (this means that there is a subfolder within the service folder called src, within which there is a python file called tps_vehiclepos. Inside this file is a function called run). The lambda function will be run every 5 minutes. To achieve this, I am using AWS Events. Two environment variables are also being passed to the lambda function (BUCKET and TRANSPORTPOSITION_TABLE)

The last section defines all the resources that will be created by the Serverless Framework.

resources:
Description: Transport Positioning System Resources
Resources:
websiteBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: ${self:custom.s3.websiteBucket}
#DynamoDB will be used to store the label that each transport will be given.
#The label will be used as the pushpin label on the map.
#DynamoDB will have attributes TripDate, TripId, VehicleId, ExpirationTime(TTL),
#TimeAdded, Latitude, Longitude, PushpinLabel, PushPinLabelDescr
transportPositionTable:
Type: AWS::DynamoDB::Table
Properties:
TableName: ${self:custom.dynamodb.transportPositionTableName}
AttributeDefinitions:
AttributeName: TripDate
AttributeType: S
AttributeName: VehicleId
AttributeType: S
KeySchema:
AttributeName: TripDate
KeyType: HASH
AttributeName: VehicleId
KeyType: RANGE
BillingMode: ${self:custom.dynamodb.billingMode}
GlobalSecondaryIndexes:
IndexName: TripDate-GSI
KeySchema:
AttributeName: TripDate
KeyType: HASH
Projection:
ProjectionType: ALL
TimeToLiveSpecification:
AttributeName: ExpirationTime
Enabled: true

The following resources will be created

  • Amazon S3 bucket. This bucket will store the Bing Maps that show the train positions. It will also serve the website which will be used to display the position maps.
  • Amazon DynamoDB table. The table will be used to store details for trains found in the Open Data API query. Note that we are also using the DynamoDB TTL feature. Since we don’t need to retain items older than a day, this will allow us to easily prune the Amazon DynamoDB table, to reduce costs.

The full serverless.yml file can be downloaded from https://gist.github.com/nivleshc/951ff2f235a89d58d4abec6de91ef738

Generating the map

In this section we will go through the script that does all the magic, which is querying the Open Data API, processing the data, generating the map and then displaying the results in a webpage.

The script is called tps_vehiclelocation.py. I will discuss the script in parts below.

The first section lists the python modules that will be imported. It also defines the variables that will be used within the script.

#this script will query OpenData API to get the current positions for Sydney Trains. It will then use this information to plot the
#positions on a map
from google.transit import gtfs_realtime_pb2
import requests
import datetime
import boto3
import os
import time
from pytz import timezone
from decimal import Decimal
#list all variables –start
baseURL = 'https://api.transport.nsw.gov.au/v1/gtfs/vehiclepos/' # Define URL for opendata API
headers = {'Authorization': 'apikey <insert your opendata api key here>'} # Define header for opendata API
bucketName = os.environ['BUCKET'] # Set up bucket from environment variable
transportPositionTable = os.environ['TRANSPORTPOSITION_TABLE'] #dynamodb table for pushpin labels
bucketImagesKey = "images/" # set bucket key for images
HtmlLandingpagename = "vehiclelocation.html" #this is the landing html page that will load the image file to show the current position of vehicles
#Define operators for opendata API. Possible operators are ['sydneytrains', 'buses', 'ferries', 'lightrail', 'nswtrains', 'regionbuses', 'metro']
operators = ['sydneytrains']
#we will use bing maps to plot the locations
maps_bing_prefix = "https://dev.virtualearth.net/REST/v1/Imagery/Map/Road?mapsize=20000,20000&&quot;
maps_bing_apikey = "format=jpeg&dcl=1&key=<insert your bing maps api key here>"
maps_error_imageURL = "https://image.shutterstock.com/image-photo/image-white-text-message-will-600w-1012395526.jpg&quot; #show this image when there are errors
#generating map
maps_bing_pushpin_limit = 100 #this is the maximum pushpins that bing maps allows
inactive_vehicle_datetime_limit = 10 #(in minutes) this will be used to identify inactive vehicles. If a vehicle's
#last stop id timestamp is at least inactive_vehicle_datetime_limit minutes old,
# then it is considered to be inactive
maps_active_vehicle_icon = '4' #this is the icon that will be used to show active vehicles on the map
maps_inactive_vehicle_icon = '10' #this is the icon that will be used to show inactive vehicles on the map
legend_vehicle_displayed_on_map_color = "#00bfff" #this color will be used to show legend entries for which the vehicle is displayed on map
lastpushpinLabel = 0 #this will be used to generate labels for pushpins. Each pushpin will have a number, which will be associated with a description.
#the association will be displayed under a legends section on the right hand side of the web page showing the map.
#the vehicle's label will be used as the description
#the dicts below allow generating labels for the pushpins that will be mapped. Each pushpin will have a label and a description.
dict_pushpinLabel_to_desc_map = {} #this dict will associate a pushpin label to its description.
dict_vehicleId_to_pushpinLabel_map = {} #this dict will associate a vehicleId to the pushpin label it has been associated to on the map.
pushpin_url = "" #this is the url for the map with the pushpins containing the vehicle positions
total_vehicles_found = 0 #this is the total vehicles found, mapped or not
dynamodb_item_expiry_in_days = 2 #number of days after which items in dynamodb will be automatically deleted. Using DynamoDB TTL
tz_sydney = timezone('Australia/Sydney') # Set up timezone
#list all variables –end
s3 = boto3.resource('s3') # Define boto3 object to do s3 operations
dynamodb = boto3.resource('dynamodb') # Define boto3 object to do dynamodb operations
pushpinLabelsTable = dynamodb.Table(transportPositionTable)
# Initialize gtfs_realtime_pb2.FeedMessage
feed = gtfs_realtime_pb2.FeedMessage()

Remember to replace <insert your opendata api key> and <insert your bing maps api key here> with your own Open Data and Bing Maps api key. Do not include the ‘<‘ ‘>’ in the script.

Next, I will take you through the various functions that have been created to carry out specific tasks.

First up is the initialise global variables function. As you might be aware, AWS Lambdas can get reused over various invocations. During my experimentations, I found this happening and the issue I found was that my global variables were not being automatically initialised. This caused erroneous results. To circumvent this issue, I decided to write an explicit function that will initialise all global variables at the beginning of each lambda function invocation.

#most of the times, lambda functions get re-used by AWS for consequetive runs. To ensure the global variables are sanitised and do not
#cause issues because they contain values from last run, we will initialise them
def initialise_global_variables():
global lastpushpinLabel
global dict_pushpinLabel_to_desc_map
global dict_vehicleId_to_pushpinLabel_map
global pushpin_url
global total_vehicles_found
#initialise the values for the above global variables
print("Initialising global variables")
lastpushpinLabel = 0
dict_pushpinLabel_to_desc_map = {}
dict_vehicleId_to_pushpinLabel_map = {}
pushpin_url = ""
total_vehicles_found = 0

When placing each train location on the map (these will be doing using pushpins), a label will be used to identify each pushpin. Unfortunately, Bing Maps doesn’t allow more than three characters for each label. This is not enough to provide a meaningful label. The solution I devised was to use a consecutive numbering scheme for the labels and then provide a key on the website page. The key will provide a description for each label. One complication with this approach is that I need to ensure the same label is used for the same train for any lambda invocation. This is achieved by storing the label to train mapping in the Amazon DynamoDb table.

The next function downloads all label to train associations stored in Amazon DynamoDB. This ensures that labels remain consistent across lambda invocations.

#read the contents ot pushpinLabelsTable and populate pushpinLabels_LabelNum and pushpinLabels_Label dict
def load_pushpinLabels_from_dynamodb(today):
global dict_pushpinLabel_to_desc_map
global dict_vehicleId_to_pushpinLabel_map
global lastpushpinLabel
rdate = today.strftime("%Y-%m-%d")
pushpinLabelsTableContents = pushpinLabelsTable.query(
IndexName='TripDate-GSI',
KeyConditionExpression='TripDate = :tripdate',
ExpressionAttributeValues={":tripdate":rdate}
)
#check to see if there were any items returned from the above query
if len(pushpinLabelsTableContents['Items']) > 0:
#there were some items returned
#the vehicle's label could be same across different vehicles however its id is unique for the day. For this reason
#we must store the vehicle id along with the pushpinLabel and pushpinLabelDesc
for item in pushpinLabelsTableContents['Items']:
pushpinLabel = item['pushpinLabel']
pushpinLabelDesc = item['pushpinLabelDesc']
vehicleId = item['VehicleId']
inDynamodbTable = True #we will mark all entries that are being populated from dynamodb, so that when we update dynamodb
#we only write items that are missing, not everything. This saves on write capacity units
#the following fields were added progressively to dynamodb. Ensure that missing items doesn't break the code. Just add blanks if missing
try:
latitude = item['latitude']
except:
latitude = ""
try:
longitude = item['longitude']
except:
longitude = ""
try:
tripId = item['tripId']
except:
tripId = ""
dict_pushpinLabel_to_desc_map[str(pushpinLabel)] = {"desc": str(pushpinLabelDesc), "vehicleId": str(vehicleId), "inDynamodbTable": inDynamodbTable, "latitude": latitude, "longitude": longitude, "tripId": tripId}
dict_vehicleId_to_pushpinLabel_map[str(vehicleId)] = {"label": str(pushpinLabel), "desc": str(pushpinLabelDesc)}
#if a higher pushpinLabel was found in pushpinLabels table, then set the lastPushpinLabel to this
if int(pushpinLabel) > int(lastpushpinLabel):
lastpushpinLabel = int(pushpinLabel)
#else there were no results for the above query, which means there were no entries in dynamodb table for today. Let's start fresh
print("load_pushpinLabels_from_dynamodb: Loaded ", len(pushpinLabelsTableContents['Items']), "pushpinLabels from dynamodb table. lastpushpinLabel:", lastpushpinLabel)

The next function just queries Open Data API for the train positions.

def callOpenData(operator, feed):
print("callOpenData:Obtaining vehicle location for", operator,end="…")
response = requests.get(baseURL + operator, headers=headers)
feed.ParseFromString(response.content)
print("done")

The following function takes the data from the above function and processes it.

def process_feed(feed, today):
global dict_pushpinLabel_to_desc_map
global dict_vehicleId_to_pushpinLabel_map
global lastpushpinLabel
global pushpin_url
global total_vehicles_found
#we will use the python module 'time' to convert epoch time (this is what gtfsr timestamps are in) to local time
#set the timezone for time
os.environ['TZ'] = 'AEST-10AEDT-11,M10.5.0,M3.5.0'
time.tzset()
print(f'timezone set to {time.tzname}')
num_pushpin_assigned = 0
total_feed_entity = len(feed.entity)
total_vehicles_found += total_feed_entity
print('Total feed.entity:',total_feed_entity," Total Vehicles Found:",total_vehicles_found)
#As we are using Bing maps, there is a limit to the number of pushpins we can specify for our map. If we
#reach this limit, we will break out of the loop below as there is no benefit for continuing on processing
for entity in feed.entity:
if num_pushpin_assigned >= maps_bing_pushpin_limit:
break #we haev exceeded the number of pushpins that can be used with Bing Maps. Exit loop
tripupdatetimestamp_autz = time.ctime(entity.vehicle.timestamp)
vehicleId = entity.vehicle.vehicle.id
#we need to make sure that the fields used for dynamodb keys are not null. If they are then skip this record. Currently these are TripDate which is
#todays date and vehicleId. So just check vehicleId for being not null and TripDate won't be null.
if (vehicleId): #only go ahead if vehicleId is present. Otherwise just print that vehicleId for this record is missing
#lets find out if this vehicle already has a pushpinLabel assigned for today
if vehicleId in dict_vehicleId_to_pushpinLabel_map.keys():
#this vehicle already has a pushpinLabel. Get the label
pushpinLabel = dict_vehicleId_to_pushpinLabel_map[vehicleId]['label']
else:
#this vehicle doesn't have any pushpinLabel already assigned. Generate a new pushpinLabel for it
lastpushpinLabel += 1 #increment the lastpushpinLabel so that it now points to a new number
pushpinLabel = lastpushpinLabel
tripId = entity.vehicle.trip.trip_id
#the pushpinLabelDesc will be set to the vehicle's label. There have been instances where I noticed the vehicle's label is missing/null.
#In these cases, set the pushpinLabelDesc to TripId
if not entity.vehicle.vehicle.label:
pushpinLabelDesc = tripId
else:
pushpinLabelDesc = entity.vehicle.vehicle.label
#since this vehicle had not been previously assigned a pushpinLabel for today, add its details to the two dict
inDynamodbTable = False #this item has not been read from or written to dynamodb yet
latitude = entity.vehicle.position.latitude
longitude = entity.vehicle.position.longitude
dict_pushpinLabel_to_desc_map[str(pushpinLabel)] = {"desc": str(pushpinLabelDesc),"vehicleId": str(vehicleId),"inDynamodbTable":inDynamodbTable,"latitude":latitude,"longitude":longitude,"tripId":tripId}
dict_vehicleId_to_pushpinLabel_map[str(vehicleId)] = {"label": str(pushpinLabel), "desc": str(pushpinLabelDesc)}
#this vehicle will be displayed on the map. Update dict_pushpinLabel_to_desc_map for this vehicle's entry so that when the legend is
#generated, it will be coloured differently to show that it is currently displayed on the map
dict_pushpinLabel_to_desc_map[str(pushpinLabel)]['isDisplayedOnMap'] = True
#add this vehicle's details to the pushpin url
pushpin_url += "pp=" + str(entity.vehicle.position.latitude) + "," + str(entity.vehicle.position.longitude)
#based on how long ago the laststopid timestamp is, calculate if the vehicle is active or inactive and respectively assign the icon
if (datetime.datetime.now() datetime.datetime.strptime(tripupdatetimestamp_autz,'%a %b %d %H:%M:%S %Y')) > datetime.timedelta(minutes=inactive_vehicle_datetime_limit):
#this vehicle is inactive
pushpin_url += ";" + maps_inactive_vehicle_icon + ";"
else:
pushpin_url += ";" + maps_active_vehicle_icon + ";"
pushpin_url += str(pushpinLabel) + "&"
num_pushpin_assigned += 1 #increment the counter that denotes the number of pushpins added to the pushpin url
else:
print("process_feed:MissingVehicleId:VehicleWillBeSkipped:TripId:", entity.vehicle.trip.trip_id, " VehicleLabel:", entity.vehicle.vehicle.label," Latitude: ", entity.vehicle.position.latitude, " Longitude: ", entity.vehicle.position.longitude)

The function goes through each vehicle record that was returned and checks to see if the vehicle already has a label associated to it. If there is one, then this label will be used for it otherwise a new one is created. The Amazon DynamoDB table will be updated with this newly created label. This function uses the first 100 trains returned by the Open Data API, to generate a pushpin URL. This URL will be used to generate the Bing Maps showing the positions of the trains.

The following function uses the pushpin URL to create a map using Bing Maps.

def generate_vehicle_position_webpage(map_url, body, s3Bucket, s3ImageKey, imagefilename, mainHtmlfilename, today):
global dict_pushpinLabel_to_desc_map
global dict_vehicleId_to_pushpinLabel_map
global lastpushpinLabel
global total_vehicles_found
#get a handle on the s3 bucket
s3 = boto3.resource('s3')
imgObject = s3.Object(s3Bucket,s3ImageKey+imagefilename)
map_webrequest = requests.post(map_url, data=body)
#check if the maps were successfully obtained.
if map_webrequest.status_code != 200:
#there was an error. show the error
print("generate_vehicle_position_webpage:Bing Map request failed")
print("generate_vehicle_position_webpage:Reason:", map_webrequest.reason)
print("generate_vehicle_position_webpage:ErrorMessage:", map_webrequest.text)
print("generate_vehicle_position_webpage:RequestBody:", body)
#as there has been an error generating the map, display an image denoting that there has been an error
map_webrequest = requests.get(maps_error_imageURL)
else:
print("generate_vehicle_position_webpage:Bing Map request was successful. Status code:",map_webrequest.status_code)
#upload the image file to S3 bucket, set it for public read and ensure content-type is image/jpeg
print("generate_vehicle_position_webpage:Uploading map to s3 bucket")
upload_img_result = imgObject.put(Body=map_webrequest.content,ACL='public-read',ContentType='image/jpeg')
print("generate_vehicle_position_webpage:Generating landing html page:",mainHtmlfilename,end="…")
#Generate the label description using all the pushpin labels for the day. The first entry will be the header field
pushpinLabels = "<li>Label – Description <font color=" + legend_vehicle_displayed_on_map_color + ">[blue label descriptions show currently displayed vehicles]</font></li>"
for index in range(1, lastpushpinLabel + 1):
pushpinLabel_desc = dict_pushpinLabel_to_desc_map[str(index)]['desc']
#the isDisplayedOnMap attribute was added later to DynamoDB so there might be items that don't have it. This ensures that a call to get this
#attribute will not break the program
try:
isDisplayedOnMap = dict_pushpinLabel_to_desc_map[str(index)]['isDisplayedOnMap']
except:
isDisplayedOnMap = False
#for all vehicles currently displayed on map, show their label description in a different color. This makes it easy to differentiate between a vehicle
#that is currently displayed and one that was previously displayed today however it is now not mapped.
if isDisplayedOnMap:
pushpinLabels += "<font color=" + legend_vehicle_displayed_on_map_color + "><li>" + str(index) + " – " + str(pushpinLabel_desc) + "</li></font>"
else:
pushpinLabels += "<li>" + str(index) + " – " + str(pushpinLabel_desc) + "</li>"
htmlObject = s3.Object(s3Bucket,mainHtmlfilename)
htmlContent= """<!DOCTYPE html>
<html>
<head>
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>
body {
font-family: Arial;
color: white;
}
.split {
height: 100%;
width: 50%;
position: fixed;
z-index: 1;
top: 0;
overflow-x: hidden;
padding-top: 20px;
}
.left {
left: 0;
width: 70%;
background-color: #111;
}
.right {
right: 0;
width: 30%;
background-color: black;
}
.centered {
position: absolute;
top: 50%;
left: 50%;
transform: translate(-50%, -50%);
text-align: center;
}
</style>
<script>
<!–
function timedRefresh(timeoutPeriod) {
setTimeout("location.reload(true);",timeoutPeriod);
}
window.onload = timedRefresh(60000);
// –>
</script>
</head>"""
htmlContent +="""
<body>
<div class="split left">
<div class="centered">
<img src={} alt="Vehicle Location Map" width="1250" height="1000">
<p> image source {}</p>
</div>
</div>
<div class="split right">
<p>Map last updated at {} [updated every 5 min].
<br>Total vehicles found {} (at most, only the first 100 will be mapped)
<br>
<br>Icon Description
<br>Blue = active vehicles (location reported within last {} minutes)
<br>Red = inactive vehicles (location last reported at least {} minutes ago)
</p>
<ul style="list-style-type:square;">""".format(s3ImageKey+imagefilename, imagefilename, imagefilename[:4], total_vehicles_found, inactive_vehicle_datetime_limit, inactive_vehicle_datetime_limit)
htmlContent += pushpinLabels
htmlContent +="""
</ul>
</div>
</body>
</html>"""
upload_html_result = htmlObject.put(Body=htmlContent,ACL='public-read',ContentType='text/html')
print("done")
print("generate_vehicle_position_webpage:Uploaded map:",imagefilename," UploadResult:",upload_img_result)
print("generate_vehicle_position_webpage:Uploaded htmlfile:",mainHtmlfilename," UploadResult:",upload_html_result)

The function provides the pushpin URL to Bing Maps. Bing Maps returns a map showing the positions of the trains. The map is then uploaded to the Amazon S3 bucket that will serve the website. The function then goes generates a description page showing descriptions for each label. As you can imagine, each day, there are hundreds of labels being created. Not all of these labels will be displayed on the map, however they would be listed in the key area. To provide quick access to the key descriptions which show trains that are currently displayed in the map, the corresponding entries will be displayed in blue. This ensures that people don’t go on a wild goose chase, trying to locate a train on the map which might not be displayed.

The next function updates the transport position Amazon DynamoDB table with any new labels that were created during this invocation. This ensures that the labels persist for all subsequent lambda invocations.

def update_pushpinLabelsTable(tripdate):
global lastpushpinLabel
#update the pushpinLabels dynamodb table with all the pushpinLabels that were created in this invocation
print("update_pushpinLabelsTable:Uploading new pushpinLabels to dynamodb Table")
time_now = datetime.datetime.now(tz_sydney) #each item uploaded to dynamodb table will have the time it was inserted
time_now_str = time_now.strftime("%Y-%m-%d %H:%M:%S")
epoch_time_now = time_now.timestamp()
expirationTime = int(epoch_time_now + (dynamodb_item_expiry_in_days * 24 * 3600)) #convert expiry days to seconds
print("update_pushpinLabelsTable:time_now:", time_now_str," epoch_time_now:", epoch_time_now, " expirationTime:", expirationTime)
num_items_added_to_dynamodb = 0
for index in range(1, lastpushpinLabel + 1):
pushpinLabel = index
pushpinLabelDesc = dict_pushpinLabel_to_desc_map[str(index)]['desc']
vehicleId = dict_pushpinLabel_to_desc_map[str(index)]['vehicleId']
inDynamodbTable = dict_pushpinLabel_to_desc_map[str(index)]['inDynamodbTable']
latitude = dict_pushpinLabel_to_desc_map[str(index)]['latitude']
longitude = dict_pushpinLabel_to_desc_map[str(index)]['longitude']
tripId = dict_pushpinLabel_to_desc_map[str(index)]['tripId']
#only write back to dynamodb table those entries that are new. Dynamodb items to be regarded as immutable and should not be changed.
#using TTL (which is set to attribute ExpirationTime) allows for easy cleanup of items as we don't want items longer than 24 hours as they
#are not being mapped (default expirationTime has been set to 2 days)
if (not inDynamodbTable):
try:
pushpinLabelsTable.put_item(
Item={
'TripDate': str(tripdate),
'VehicleId': str(vehicleId),
'tripId': str(tripId),
'pushpinLabel': str(pushpinLabel),
'pushpinLabelDesc': str(pushpinLabelDesc),
'latitude': str(latitude),
'longitude': str(longitude),
'TimeAdded': time_now_str,
'ExpirationTime': expirationTime
}
)
num_items_added_to_dynamodb += 1
except Exception as e:
print("update_pushpinLabelsTable:Error with put_item operation ",str(e))
print("update_pushpinLabelsTable:TripDate:",str(tripdate)," VehicleId:",str(vehicleId)," pushpinLabel:",str(pushpinLabel)," pushpinLabelDesc:",str(pushpinLabelDesc)," Latitude:",str(latitude)," Longitude:",str(longitude)," TimeAdded:",time_now_str," ExpirationTime:",expirationTime)
num_items_added_to_dynamodb += 1
print("update_pushpinLabelsTable:Uploaded ",num_items_added_to_dynamodb," new pushpinLabel(s) to dynamodb table. LastpushpinLabel:",lastpushpinLabel)

Now that we know what all the functions do, lets move on to the handler function that the tps lambda will call. The handler function will coordinate all the functions.

def run(event, context):
print(datetime.datetime.now(), 'Started')
today = datetime.datetime.now(tz_sydney)
initialise_global_variables()
load_pushpinLabels_from_dynamodb(today)
#loop through each operator and get their vehicle positions
for operator in operators:
callOpenData(operator, feed)
process_feed(feed, today)
rdate = today.strftime("%Y-%m-%d")
time_now = datetime.datetime.now(tz_sydney)
mapImageName = time_now.strftime("%Y-%m-%dT%H%M") + '.jpg'
#get fhe image file and upload it to s3
generate_vehicle_position_webpage(maps_bing_prefix + maps_bing_apikey,pushpin_url[:1], bucketName, bucketImagesKey, mapImageName, HtmlLandingpagename, rdate)
update_pushpinLabelsTable(rdate)

The handler function calls the respective functions to get the following tasks done (in the order listed below)

  • Initialises the global variables
  • downloads the previously associated labels from the Amazon DynamoDB table
  • calls Open Data API to get the latest position of the trains.
  • the train data is then processed, and the pushpin URL generated
  • Using the pushpin URL, the map is generated using Bing Maps.
  • A landing page is created. This webpage shows the map along with a key to show descriptions for each label.
  • Finally, all labels that were created within this lambda invocation are uploaded to the Amazon DynamoDB table. This ensures that for all subsequent invocations of the lambda function, the respective trains get the same label assigned to them.

The full tps_vehiclepos.py file can be downloaded from https://gist.github.com/nivleshc/03cb06bd6ae9cb969192af0ee2a1a15b

That’s it! Now you have a good idea about how the AWS resources were generated and how the data was acquired, processed and then visualised.

Cost to run the solution

When I started developing this solution, to view what type of data was being provided by Open Data API, I tried to ingest everything into DynamoDB. This was a VERY VERY bad idea as it cost me quite a lot. However, since then I have modified my code to only ingest and store fields that are required, into Amazon DynamoDB table. This has drastically reduced by running costs. Currently I am being charged approximately USD0.05 or less per day. You can run this easily within your free tier without incurring any additional costs (just as a precaution, monitor your costs to ensure there are no surprises).

As mentioned at the beginning of this blog, the final result can be seen at https://sls-tps-website-dev.s3-ap-southeast-2.amazonaws.com/vehiclelocation.html. The webpage is refreshed every one minute however the map is generated every five minutes. I will keep this project running for at least three months. So, if you want to check it out, you have ample time.

I have extremely enjoyed creating this project, I hope you all enjoy it as well. Till the next time, Enjoy!

Using Ansible to create an inventory of your AWS resources

Background

I was recently at a customer site, to perform an environment review of their AWS real-estate. As part of this engagement, I was going to do an inventory of all their AWS resources. Superficially, this sounds like an easy task, however when you consider the various regions that resources can be provisioned into, the amount of work required for a simple inventory can easily escalate.

Not being a big fan of manual work, I started to look at ways to automate this task. I quickly settled on Ansible as the tool of choice and not long after, I had two ansible playbooks ready (the main and the worker playbook) to perform the inventory.

In this blog, I will introduce the two ansible playbooks that I wrote. The first playbook is the main actor. This is where the variables are defined. This playbook iterates over the specified AWS regions, calling the worker playbook each time, to check if any resources have been provisioned in these regions. The output is written to comma separated value (csv) files (I am using semi-colons instead of commas), which can be easily imported into Microsoft Excel (or any spreadsheet program of your choice) for analysis.

Introducing the Ansible playbooks

The playbooks have been configured to check the following AWS resources

  • Virtual Private Cloud (vpc)
  • Subnets within the VPCs
  • Internet Gateways
  • Route Tables
  • Security Groups
  • Network Access Control Lists
  • Customer Gateways
  • Virtual Private Gateways
  • Elastic IP Addresses
  • Elastic Compute Cloud Instances
  • Amazon Machine Images that were created
  • Elastic Block Store Volumes
  • Elastic Block Store Snapshots
  • Classic Load Balancers
  • Application Load Balancers
  • Relational Database Service Instances
  • Relational Database Service Snapshots
  • Simple Storage Service (S3) Buckets

The table below provides details for the two ansible playbooks.

Filename Purpose
ansible-aws-inventory-main.yml This is the controller playbook. It iterates over each of the specified regions, calling the worker playbook to check for any resources that are provisioned in these regions.
ansible-aws-inventory-worker.yml This playbook does all the heavy lifting. It checks for any provisioned resources in the region that is provided to it by the controller playbook

Let’s go through each of the sections in the main ansible playbook (ansible-aws-inventory-main.yml), to get a better understanding of what it does.

First off, the variables that will be used are defined

hosts: localhost
connection: local
gather_facts: yes
vars:
aws_regions:
us-east-1 #North Virginia
us-east-2 #Ohio
us-west-1 #North California
us-west-2 #Oregon
ap-south-1 #Mumbai
ap-northeast-2 #Seoul
ap-southeast-1 #Singapore
ap-southeast-2 #Sydney
ap-northeast-1 #Tokyo
ca-central-1 #Canada Central
eu-central-1 #Frankfurt
eu-west-1 #Ireland
eu-west-2 #London
eu-west-3 #Paris
eu-north-1 #Stockholm
sa-east-1 #Sau Paulo
verbose: true #set this to true to display results to screen or false to not display to screen
owner_id: {your aws account number} #this is used to find which ami's belong to you

aws_regions – this defines all the AWS regions which will be checked for provisioned resources

verbose – to display the results both on screen and to write it to file, set this to true. Setting this to false just writes the results to file.

owner_id – this is the account id for the AWS account that is being inventoried. It is used to retrieve all the Amazon Machine Images (AMI) that are owned by this account

Next, the column headers for each of the csv files is defined.

#define all output file headers
vpc_outputfile_header: "Region;VPC ID;Is_Default;State;CIDR Block;Enable DNS Hostnames;Enable DNS Support;DHCP Options ID;Instance Tenancy"
subnet_outputfile_header: "Region;Subnet ID;VPC ID;avaialability zone;cidr_block;available_ip_address_count;default_for_az;map_public_ip_on_launch;state"
igw_outputfile_header: "Region;IGW ID;VPC ID;State;Tags"
cgw_outputfile_header: "Region;CGW ID;BGP ASN;ip address;state;type;tags"
vgw_outputfile_header: "Region;VGW ID;State;type;attachments;Tags"
ami_outputfile_header: "Region;image_id;name;creation_date;state;is_public;description"
eip_outputfile_header: "Region;allocation_id;association_id;domain;attached to(instance_id);network_interface_id;private_ip_address;public_ip;public_ipv4_pool"
snapshot_outputfile_header: "Region;snapshot_id;owner_id;start_time;progress;state;encrypted;volume_id;volume_size;description"
volume_outputfile_header: "Region;volume_id;volume_type;size;iops;encrypted;status;region;zone;create_time;attach_time;attached_to;attached_as;delete on termination;volume_status"
routetable_outputfile_header: "Region;Routetable ID;VPC ID;Routes (use http://www.yamllint.com)"
securitygroup_outputfile_header: "Region;SG Name;SG ID;VPC ID;Description;Ingress Rules (use http://www.yamllint.com); Egress Rules (use http://www.yamllint.com)"
nacl_outputfile_header: "Region;NACL ID;VPC ID;Is Default;Subnets Associated with;Ingress Rules (use http://www.yamllint.com); Egress Rules (use http://www.yamllint.com)"
ec2_outputfile_header: "Region;EC2_Instance_ID;EC2_Instance_Name;EC2_Instance_Type;EC2_Image_ID;Private IP;Availability Zone;Public IP;SubnetID;Source Dest Check;Security Groups;VPC ID;EC2_Launch_Time;EC2_Current_State"
elb_outputfile_header: "Region;ELB_Type;ELB_Name;ELB_DNS_Name;Zones;Subnets;VPC_ID;Instances;Scheme;Security Groups;Listeners;State"
rds_instance_outputfile_header: "Region;db_instance_identifier;availability_zone;allocated_storage;auto_minor_version_upgrade;availability_zone;backup_retention_period;instance_class;db_instance_port;db_instance_status;db_parameter_groups;db_security_groups;db_subnet_group;engine;engine_version;preferred_backup_window;preferred_maintenance_window;publicly_accessible;storage_type;security_groups;tags"
rds_snapshot_outputfile_header: "Region;db_snapshot_identifier;snapshot_create_time;snapshot_type;db_instance_identifier;encrypted;percent_progress;allocated_storage;availability_zone;tags"
s3_outputfile_header: "Region;Bucket Name;Creation Date"

After this, the output filenames are defined. Do note that the filenames use timestamps (for when the playbook is run) as prefixes. This ensures that they don’t overwrite any output files from previous runs.

#define output file names. These will be prepended with run date/time in iso6801 format
output_root_folder: /Users/nivleshc/Documents/OneDrive/Personal/Studies/AWS/output/raw/
outputfile_variablename_suffix: "_outputfile"
outputfileheader_variablename_suffix: "_outputfile_header"
vpc_outputfile: "{{ output_root_folder }}{{ ansible_date_time.iso8601 }}_vpc.csv"
subnet_outputfile: "{{ output_root_folder }}{{ ansible_date_time.iso8601 }}_subnet.csv"
igw_outputfile: "{{ output_root_folder }}{{ ansible_date_time.iso8601 }}_igw.csv"
cgw_outputfile: "{{ output_root_folder }}{{ ansible_date_time.iso8601 }}_cgw.csv"
vgw_outputfile: "{{ output_root_folder }}{{ ansible_date_time.iso8601 }}_vgw.csv"
ami_outputfile: "{{ output_root_folder }}{{ ansible_date_time.iso8601 }}_ami.csv"
eip_outputfile: "{{ output_root_folder }}{{ ansible_date_time.iso8601 }}_eip.csv"
snapshot_outputfile: "{{ output_root_folder }}{{ ansible_date_time.iso8601 }}_snapshot.csv"
volume_outputfile: "{{ output_root_folder }}{{ ansible_date_time.iso8601 }}_volume.csv"
routetable_outputfile: "{{ output_root_folder }}{{ ansible_date_time.iso8601 }}_routetable.csv"
securitygroup_outputfile: "{{ output_root_folder }}{{ ansible_date_time.iso8601 }}_securitygroup.csv"
nacl_outputfile: "{{ output_root_folder }}{{ ansible_date_time.iso8601 }}_nacl.csv"
ec2_outputfile: "{{ output_root_folder }}{{ ansible_date_time.iso8601 }}_ec2.csv"
elb_outputfile: "{{ output_root_folder }}{{ ansible_date_time.iso8601 }}_elb.csv"
rds_instance_outputfile: "{{ output_root_folder }}{{ ansible_date_time.iso8601 }}_rds_instance.csv"
rds_snapshot_outputfile: "{{ output_root_folder }}{{ ansible_date_time.iso8601 }}_rds_snapshot.csv"
s3_outputfile: "{{ output_root_folder }}{{ ansible_date_time.iso8601 }}_s3.csv"

When I was generating the inventory list, at times I found that I needed only a subset of resource types inventoried, instead of all (for instance when I was looking for only EC2 instances). For this reason, I found it beneficial to have boolean variables to either enable or disable inventory checks for specific resource types.

The next section lists boolean variables that control if a particular resource type should be checked or not. Set this to true if it is to be checked and false if it is to be skipped. You can set this to your own preference.

#the following variables are used to enable or disable inventory for a particular resource. Use true to enable and false to disable
inventory_vpc: true
inventory_subnet: true
inventory_igw: true
inventory_cgw: true
inventory_vgw: true
inventory_ami: true
inventory_eip: true
inventory_snapshot: true
inventory_volume: true
inventory_routetable: true
inventory_securitygroup: true
inventory_nacl: true
inventory_ec2: true
inventory_elb: true
inventory_rds_instance: true
inventory_rds_snapshot: true
inventory_s3: true

After all the variables have been defined, the tasks that will be carried out are configured.

The first task initialises the output csv files with the column headers.

tasks:
name: initialise output files with headers
lineinfile:
state: present
create: yes
path: "{{ lookup('vars', item + outputfile_variablename_suffix) }}"
line: "{{ lookup('vars', item + outputfileheader_variablename_suffix) }}"
insertbefore: BOF
with_items:
vpc
subnet
igw
cgw
vgw
ami
eip
snapshot
volume
routetable
securitygroup
nacl
ec2
elb
rds_instance
rds_snapshot
s3

Once the initialisation has been completed, the inventory process is started by looping through each of the specified AWS regions and calling the worker ansible playbook to check for provisioned resources.

name: find all applicable resources within region
include_tasks: ansible-aws-inventory-worker.yml
loop: "{{ aws_regions }}"
loop_control:
loop_var: aws_region
label: "{{ aws_region }}"

The last task displays the path for the output files.

name: print out the output filenames for each of the resoureces inventoried
debug:
msg:
"{{ item }} output filename: {{ lookup('vars', item + outputfile_variablename_suffix) }}"
with_items:
vpc
subnet
igw
cgw
vgw
ami
eip
snapshot
volume
routetable
securitygroup
nacl
ec2
elb
rds_instance
rds_snapshot
s3

The main ansible playbook (ansible-aws-inventory-main.yml) can be downloaded from https://gist.github.com/nivleshc/64ea7201fb0ba8cb6f87d06adc6152de

The worker playbook (ansible-aws-inventory-worker.yml) has the following format

  • go through each of the defined resource types and confirm that it is to be checked (checks for a particular resource type are enabled using the boolean variable that is defined in the main playbook)
  • If checks are enabled for that particular resource type, find all provisioned resources of that type in the region provided by the main ansible playbook
  • write the results to the respective output file
  • if verbose is enabled, write the results to screen

The worker file (ansible-aws-inventory-worker.yml) can be downloaded from https://gist.github.com/nivleshc/bedd2c440c816ebc86dbaeddef50d500

Running the ansible playbooks

Use the following steps to run the above mentioned ansible playbooks to perform an inventory of your AWS account.

1. On a computer that has ansible installed, create a folder and name it appropriately (for instance inventory)

2. Download ansible-aws-inventory-main.yml from https://gist.github.com/nivleshc/64ea7201fb0ba8cb6f87d06adc6152de and put it in the folder that was created in step 1 above

3. Download ansible-aws-inventory-worker.yml from https://gist.github.com/nivleshc/bedd2c440c816ebc86dbaeddef50d500 and put it in the folder that was created in step 1 above

4. Download the ansible inventory file from https://gist.github.com/nivleshc/bc2e300fe1d2779ecc15c0876fc4db62 , rename it to hosts and put it in the folder that was created in step 1 above

5. Customise ansible-aws-inventory-main.yml by adding your account id as the owner_id and change the output folder by updating the output_root_folder variable. If you need to disable inventory for certain resource types, you can set the respective boolean variable to false.

6. Create a user account with access keys enabled within your AWS account. For checking all the resources defined in the playbook, at a minimum, the account must have the following permissions assigned

AmazonVPCReadOnlyAccess
AmazonEC2ReadOnlyAccess
ElasticLoadBalancingReadOnly
AmazonRDSReadOnlyAccess
AmazonS3ReadOnlyAccess

7. Open a command line and then run the following to configure environment variables with credentials of the account that was created in step 6 above (the following commands are specific to MacOS)

export AWS_ACCESS_KEY_ID="xxxxx"
export AWS_SECRET_ACCESS_KEY="xxxxxxx"

8. There is a possibility that you might encounter an error with boto complaining that it is unable to access region us-west-3. To fix this, define the following environment variable as well

export BOTO_USE_ENDPOINT_HEURISTICS=True

9. Run the ansible playbook using the following command line

ansible-playbook -i hosts ansible-aws-inventory-main.yml

Depending on how many resources are being inventoried, the playbook can take anywhere from five to ten minutes to complete. So, sit back and relax, while the playbook runs.

I found a bug with the ansible “aws s3 bucket facts” module. It ignores the region parameter and instead of returning s3 buckets in a specific region, it returns buckets in all regions. Due to this, the s3 buckets csv file will have the same buckets repeated in all the regions.

Hope you enjoy the above ansible playbooks and they make your life much easier when trying to find all resources that are deployed within your AWS account.

Till the next time, enjoy!

Feature Photo by Samuel Zeller on Unsplash

A scenario-based tutorial for Azure Kubernetes Service – Part 2

Introduction

In this blog, we will dig a little deeper into Azure Kubernetes Service (AKS). What better way to do this than by building an AKS cluster ourselves! Just a heads-up, I will be using terminology that was introduced in part 1 of this mini-blog series. If you haven’t read it, or need a refresher, you can access it at https://nivleshc.wordpress.com/2019/03/04/a-scenario-based-tutorial-for-azure-kubernetes-service-part-1

Let’s start by describing the AKS cluster architecture. The diagram below provides a great overview.

(Image copied from https://docs.microsoft.com/en-au/azure/aks/media/concepts-clusters-workloads/cluster-master-and-nodes.png)

The AKS Cluster is made up of two components. These are described below

  • cluster master node is an Azure managed service, which takes care of the Kubernetes service and ensures all the application workloads are properly running.
  • node is where the application workloads run.

The cluster master node is comprised of the following components

  • kube-apiserver – this api server provides a way to interface with the underlying Kubernetes API. Management tools such as kubectl or Kubernetes dashboard interact with this to manage the Kubernetes cluster.
  • etcd – this provides a key value store within Kubernetes, and is used for maintaining state of the Kubernetes cluster and state
  • kube-scheduler – the role of this component is to decide which nodes the newly created or scaled up application workloads can run on, and then it starts these workloads on them.
  • kube-controller-manager – the controller manager looks after several smaller controllers that perform actions such as replicating pods and handing node operations

The node is comprised of the following

  • kubelet – this is an agent that handles the orchestration requests from the cluster master node and also takes care of scheduling the running of the requested containers
  • kube-proxy – this component provides networking services on each node. It takes care of routing network traffic and managing IP addresses for services and pods
  • container runtime – this allows the container application workloads to run and interact with other resources within the node.

For more information about the above, please refer to https://docs.microsoft.com/en-au/azure/aks/concepts-clusters-workloads

Now that you have a good understanding of the Kubernetes architecture, lets move on to the preparation stage, after which we will deploy our AKS cluster.

Preparation

AKS subnet size

AKS uses a subnet to host nodes, pods, and any other Kubernetes and Azure resources that are created for the AKS cluster. As such, it is extremely important that the subnet is appropriately sized, to ensure it can accommodate the resources that will be initially created, and still have enough room for any future updates.

There are two networking methods available when deploying an Azure Kubernetes Service cluster

  • Kubenet
  • Azure Container Networking Interface (CNI)

AKS uses kubnet by default, and in doing so, it automatically creates a virtual network and subnets that are required to host the pods in. This is a great solution if you are learning about AKS, however if you need more control, it is better to go with Azure CNI. With Azure CNI, you get the option to use an existing virtual network and subnet or you can create a custom one. This is a much better option, especially when deploying into a production environment.

In this blog, we will use Azure CNI.

The formula below provides a good estimate on how large your subnet must be, in order to accommodate your AKS resources.

Subnet size = (number of nodes + 1) + ((number of nodes + 1) * maximum number of pods per node that you configure)

When using Azure CNI, by default each node is setup to run 30 pods. If you need to change this limit, you will have to deploy your AKS cluster using Azure CLI or Azure Resource Manager templates.

Just as an example, for a default AKS cluster deployment, using Azure CNI with 4 nodes, the subnet size at a minimum must be

IPs required = (4 + 1) + ((4+ 1) * (30 pods per node)) = 5 + (5 * 30) = 155

This means that the subnet must be at least a /24.

For this blog, create a new resource group called myAKS-resourcegroup. Within this new resource group, create a virtual network called AKSVNet with an address space of 10.1.0.0/16. Inside this virtual network, create a subnet called AKSSubnet1 with an address range of 10.1.3.0/24.

Deploying an Azure Kubernetes Service Cluster

Let’s proceed on to deploying our AKS cluster.

  1. Login to your Azure Portal and add Kubernetes Service
  2. Once you click on Create, you will be presented with a screen to enter your cluster’s configuration information
  3. Under Basics
  • Choose the subscription into which you want to deploy the AKS cluster
  • Choose the resource group into which you want to deploy the AKS cluster. One thing to point out here is that the cluster master node will be deployed in this resource group, however a new resource group with a name matching the naming format MC_<AKS master node resource group name>_<AKS cluster name>_region will be created to host the nodes where the containers will run (if you use the values specified in this blog, your node resource group will be named MC_myAKS-resourcegroup_mydemoAKS01_australiaeast)
  • Provide the Kubernetes cluster name (for this blog, let’s call this mydemoAKS01)
  • Choose the region you want to deploy the AKS cluster in (for this blog, we are deploying in australiaeast region)
  • Choose the Kubernetes version you want to deploy (you can choose the latest version, unless there is a reason to choose a specific version)
  • DNS name prefix – for simplicity, you can set this to the same as the cluster name
  • Choose the Node size. (for this blog, lets choose D2s v3 (2 vcpu, 8 GB memory)
  • Set the Node count to 1 (the Node count specifies the number of nodes that will be initially created for the AKS cluster)
  • Leave the virtual nodes to disabled

Under Authentication

  • Leave the default option to create a service principal (you can also provide an existing service principal, however for this blog, we will let the provisioning process create a new one for us)
  • RBAC allows you to control who can view the Kubernetes configuration (kubeconfig) information and to limit the permissions that they have. For now, leave RBAC turned off

Under Networking

  • Leave HTTP application routing set to No
  • As previously mentioned, by default AKS uses kubenet for networking. However, we will use Azure CNI. Change the Network configuration from Basic to Advanced
  • Choose the virtual network and subnet that was created as per the prerequisites (AKSVNet and AKSSubnet1)
  • Kubernetes uses a separate address range to allocate IP addresses to internal services within the cluster. This is referred to as Kubernetes service address range. This range must NOT be within the virtual network range and must not be used anywhere else. For our purposes we will use the range 10.2.4.0/24. Technically, it is possible to use IP addresses for the Kubernetes service address range from within the cluster virtual network, however this is not recommended due to potential of IP address overlaps which could potentially cause unpredictable behaviour. To read more about this, you can refer to https://docs.microsoft.com/en-au/azure/aks/configure-azure-cni.
  • Leave the Kubernetes DNS service IP address as the default 10.2.4.10 (the default is set to the tenth IP address within the Kubernetes service address range)
  • Leave the Docker bridge address as the default 172.17.0.1/16. The Docker Bridge lets AKS nodes communicate with the underlying management platform. This IP address must not be within the virtual network IP address range of your cluster, and shouldn’t overlap with other address ranges in use on your network

Under Monitoring

  • Leave enable container monitoring set to Yes
  • Provide an existing Log Analytics workspace or create a new one

Under Tags

  • Create any tags that need to be attached to this AKS cluster
     4.  Click on Next: Review + create to get the settings validated.
After validation has successfully passed, click on Create.
Just be aware that it can take anywhere from 10 – 15 minutes to complete the AKS cluster provisioning.

While you are waiting

During the AKS cluster provisioning process, there are a number of things that are happening under the hood. I managed to track down some of them and have listed them below.

  • Within the resource group that you specified for the AKS cluster to be deployed in, you will now see a new AKS cluster with the name mydemoAKS01
  • If you open the virtual network that the AKS cluster has been configured to use and click on Connected devices, you will notice that a lot of IP addresses that have been already allocated.

    I have noticed that the number of IP addresses equals

    ((number of pods per node) + 1) * number of nodes

   FYI – for the AKS cluster that is being deployed in this blog, it is 31

  • A new resource group with the name complying to the naming format MC_<AKS master node resource group name>_<AKS cluster name>_region will be created. In our case it will be called MC_myAKS-resourcegroup_mydemoAKS01_australiaeast. This resource group will contain the virtual machine for the node (not the cluster master node), including all the resources that are needed for the virtual machines (availability set, disk, network card, network security group)

What will this cost me?

The cluster master node is a managed service and you are not charged for it. You only pay for the nodes on which the application workloads are run (these are those resources inside the new resource group that gets automatically created when you provision the AKS cluster).

In the next blog, we will delve deeper into the newly deployed AKS cluster, exposing its configuration using command line tools.

Happy sailing and till the next time, enjoy!

Using Ansible to deploy an AWS environment

Background

Over the past few weeks, I have been looking at various automation tools for AWS. One tool that seems to get a lot of limelight is Ansible, an open source automation tool from Red Hat. I decided to give it a go, and to my amazement, I was surprised at how easy it was to learn Ansible, and how powerful it can be.

All that one must do is to write up a list of tasks using YAML notation in a file (called a playbook) and get Ansible to execute it. Ansible reads the playbook and executes the tasks in the order that they are written. Here is the biggest advantage, there are no agents to be installed on the managed computers! Ansible connects to each of the managed computers using ssh or winrm.

Another nice feature of Ansible is that it supports third party modules. This allows Ansible to be extended to support many of the services that it natively does not understand.

In this blog, we will be focusing on one of the third-party modules, the AWS module. Using this, we will use Ansible to deploy an environment within AWS.

Scenario

For this blog, we will use Ansible to provision an AWS Virtual Private Cloud (VPC) in the North Virginia (us-east-1) region. Within this VPC, we will create a public and a private subnet. We will then deploy a jumphost in the public subnet and a server within the private subnet.

Below is a diagram depicting what will be done.

Figure 1: Environment that will be deployed within AWS using Ansible Playbook

Preparation

The computer that is used to run Ansible to manage all other computers is referred to as the control machine. Currently, Ansible can be run from any machine with Python 2 (version 2.7) or Python 3 (version 3.5 or higher) installed. The Ansible control machine can run the following operating systems

  • Red Hat
  • Debian
  • CentOS
  • macOS
  • any of the BSD variants

Note: Currently windows operating system is not supported for running the control machine.

For this blog, I am using a MacBook to act as the control machine.

Before we run Ansible, we need to get a few things done. Let’s go through them now.

  1. We will use pip (Python package manager) to install Ansible. If you do not already have pip installed, run the following command to install it
    sudo easy_install pip
  2. With pip installed, use the following command to install Ansible
    sudo pip install ansible

    For those that are not using macOS for their control machine, you can get the relevant installation commands from https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html.

  3. Next, we must install the AWS Command Line Interface (CLI) tools. Use the following command for this.
    sudo pip install awscli

    More information about the AWS CLI tools is available at https://aws.amazon.com/cli/

  4. To provision items within AWS, we need to provide Ansible with a user account that has the necessary permissions. Using the AWS console, create a user account ensuring it is assigned an access key and a secret access key. At a minimum, this account must have the following policies assigned to it.
    AmazonEC2FullAccess
    AmazonVPCFullAccess

    Note: As this is a privileged user account, please ensure that the access key and secret access key is kept in a safe place.

  5. To provision AWS Elastic Compute Cloud (EC2) instances, we require key pairs created in the region that the EC2 instances will be deployed in. Ensure that you already have key pairs for the North Virginia (us-east-1) region. If not, please create them.

Instructions

Create an Ansible Playbook

Use the following steps to create an Ansible playbook to provision an AWS environment.

Open your favourite YAML editor and paste the following code

hosts: localhost
connection: local
gather_facts: yes
vars:
vpc_region: us-east-1
my_useast1_key: my_northvirginia_keypair

The above code instructs Ansible that it should connect to the local computer, to run all the defined tasks. This means that Ansible modules will use the local computer to connect to AWS APIs in order to carry out the tasks.

Another thing to note is that we are declaring two variables. These will be used later in the playbook.

  • vpc_region – this is the AWS region where the AWS environment will be provisioned (currently set to us-east-1)
  • my_useast1_key – provide the name of your key pair for the us-east-1 region that will be used to provision EC2 instances

Next, we will define the tasks that Ansible must carry out. The format of the tasks is as follows

  • name – this gives a descriptive name for the task
  • module name – this is the module that Ansible will use to carry out the task
  • module Parameters – these are parameters passed to the module, to carry out the specific task
  • register – this is an optional keyword and is used to record the output that is returned from the module, after the task has been carried out.

Copy the following lines of code into your YAMl file.

tasks:
name: create VPC for Ansible
ec2_vpc_net:
name: ansibleVPC
state: present
cidr_block: 172.32.0.0/16
region: "{{ vpc_region }}"
register: ansibleVPC
name: display ansibleVPC results
debug: var=ansibleVPC

The above code contains two tasks.

  • the first task creates an AWS Virtual Private Cloud (VPC) using the ec2_vpc_net module. The output of this module is recorded in the variable ansibleVPC using the register command
  • the second task outputs the contents of the variable ansibleVPC using the debug command (this displays the output of the previous task)

Side Note

  • Name of the VPC has been set to ansibleVPC
  • The CIDR block for the VPC has been set to 172.32.0.0/16
  • The state keyword controls what must be done to the VPC. In our case, we want it created and to exist, as such, the value for state has been set to present.
  • The region is being set by referencing the variable that was defined earlier. Variables are referenced with the notation “{{ variable name }}”

Copy the following code to create an AWS internet gateway and associate it with the newly created VPC. The second task in the below code displays the result of the internet gateway creation.

name: create internet gateway for ansibleVPC
ec2_vpc_igw:
state: present
region: "{{ vpc_region }}"
vpc_id: "{{ ansibleVPC.vpc.id }}"
tags:
Name: ansibleVPC_IGW
register: ansibleVPC_igw
name: display ansibleVPC IGW details
debug: var=ansibleVPC_igw

The next step is to create the public and private subnets. However, instead of hardcoding the availability zones into which these subnets will be deployed, we will pick the first availability zone in the region for our public and the second availability zone in the region for our private subnet. Copy the following code into your YAML file to show all the availability zones that are present in the region, and which ones will be used for the public and private subnets.

name: obtain all AZ present in region {{ vpc_region }}
aws_az_facts:
region: "{{ vpc_region }}"
register: az_in_region
name: display all AZ present in region {{ vpc_region }}
debug: var=az_in_region
#create public subnet in first az and private subnet in second az
name: display AZ that will be used for public and private Subnets
debug:
msg:
"public subnet in AZ: {{ az_in_region.availability_zones[0].zone_name }}"
"private subnet in AZ: {{ az_in_region.availability_zones[1].zone_name }}"

Copy the following code to create the public subnet in the first availability zone in us-east-1 region. Do note that we are provisioning our public subnet with CIDR range 172.32.1.0/24

name: create public subnet in AZ {{ az_in_region.availability_zones[0].zone_name }}
ec2_vpc_subnet:
state: present
cidr: 172.32.1.0/24
az: "{{ az_in_region.availability_zones[0].zone_name }}"
vpc_id: "{{ ansibleVPC.vpc.id }}"
region: "{{ vpc_region }}"
map_public: yes
tags:
Name: public subnet
register: public_subnet
name: show public subnet details
debug: var=public_subnet

Copy the following code to deploy the private subnet in the second availability zone in us-east-1 region. It will use the CIDR range 172.32.2.0/24

name: create private subnet in AZ {{ az_in_region.availability_zones[1].zone_name }}
ec2_vpc_subnet:
state: present
cidr: 172.32.2.0/24
az: "{{ az_in_region.availability_zones[1].zone_name }}"
vpc_id: "{{ ansibleVPC.vpc.id }}"
region: "{{ vpc_region }}"
resource_tags:
Name: private subnet
register: private_subnet
name: show private subnet details
debug: var=private_subnet

Hold on! To make a public subnet, it is not enough to just create a subnet. We need to create routes from that subnet to the internet gateway! The below code will address this. The private subnet does not need any such routes, it will use the default route table.

name: create new route table for public subnet
ec2_vpc_route_table:
state: present
region: "{{ vpc_region }}"
vpc_id: "{{ ansibleVPC.vpc.id }}"
tags:
Name: rt_ansibleVPC_PublicSubnet
subnets:
"{{ public_subnet.subnet.id }}"
routes:
dest: 0.0.0.0/0
gateway_id: "{{ ansibleVPC_igw.gateway_id }}"
register: rt_ansibleVPC_PublicSubnet
name: display public route table
debug: var=rt_ansibleVPC_PublicSubnet

As planned, we will be deploying jumphosts within the public subnet. By default, you won’t be able to externally connect to the EC2 instances deployed within the public subnet because the default security group does not allow this.

To remediate this, we will create a new security group that will allow RDP access and assign it to the jumphost server. For simplicity, the security group will allow RDP access from anywhere, however please ensure that for your environment, you have locked it down to a few external IP addresses.

name: create a security group for jumphosts
ec2_group:
state: present
name: sg_ansibleVPC_publicsubnet_jumphost
description: security group for jumphosts within the public subnet of ansible VPC
vpc_id: "{{ ansibleVPC.vpc.id }}"
region: "{{ vpc_region }}"
rules:
proto: tcp
ports:
3389
cidr_ip: 0.0.0.0/0
rule_desc: allow rdp to jumphost
register: sg_ansibleVPC_publicsubnet_jumphost
name: display details for jumphost security group
debug: var=sg_ansibleVPC_publicsubnet_jumphost

Phew! Finally, we are ready to deploy our jumphost! Copy the following code for this

name: deploy a windows 2016 jumphost
ec2:
key_name: "{{ my_useast1_key }}"
instance_type: t2.micro
image: ami-0bf148826ef491d16
group_id: "{{ sg_ansibleVPC_publicsubnet_jumphost.group_id }}"
vpc_subnet_id: "{{ public_subnet.subnet.id }}"
assign_public_ip: yes
region: "{{ vpc_region }}"
instance_tags:
Name: win2016jh
count_tag:
Name: win2016jh
exact_count: 1
register: win2016jh
name: display details for windows 2016 jumphost
debug: var=win2016jh

I would like to point out a few things

  • The jumphost is running on a t2.micro instance. This instance type is usually sufficient for a jumphost in a lab environment, however if you need more performance, this can be changed (changing the instance type from t2.micro can take you over the AWS free tier limits and subsequently add to your monthly costs)
  • The image parameter refers to the AMI ID of the Windows 2016 base image that is currently available within the AWS console. AWS, from time to time, changes the images that are available. Please check within the AWS console to ensure that the AMI ID is valid before running the playbook
  • Instance tags are tags that are attached to the instance. In this case, the instance tags have been used to name the jumphost win2016jh.

Important Information

The following parameters are extremely important, if you do not intend on deploying a new EC2 instance for the same server every time you re-run this Ansible playbook.

exact_count – this parameter specifies the number of EC2 instances of a server that should be running whenever the Ansible playbook is run. If the current number of instances doesn’t match this number, Ansible either creates new EC2 instances for this server or terminates the extra EC2 instances. The servers are identified using the count_tag

count_tag – this is the instance tag that is used to identify a server. Multiple instances of the same server will have the same tag applied to them. This allows Ansible to easily count how many instances of a server are currently running.

Next, we will deploy the servers within the private subnet. Wait a minute! By default, the servers within the private subnet will be assigned the default security group. The default security group allows unrestricted access to all EC2 instances that have been attached to the default security group. However, since the jumphost is not part of this security group, it will not be able to connect to the servers in the private subnet!

Let’s remediate this issue by creating a new security group that will allow RDP access from the public subnet to the servers within the private subnet (in a real environment, this should be restricted further, so that the incoming connections are from particular servers within the public subnet, and not from the whole subnet itself). This new security group will be associated with the servers within the private subnet.

Copy the following code into your YAML file.

#create a security group for the private subnet which allows restricted access from public subnet
name: create a security group for servers in private subnet with only tcp 3389 incoming
ec2_group:
state: present
name: sg_ansibleVPC_privatesubnet_servers
description: security group for private subnet that allows limited access from public subnet
vpc_id: "{{ ansibleVPC.vpc.id }}"
region: "{{ vpc_region }}"
rules:
proto: tcp
ports: 3389
group_name: sg_ansibleVPC_publicsubnet_jumphost
rule_desc: allow only rdp access from public to private subnet servers
register: sg_ansibleVPC_privatesubnet_servers
name: display details for private subnet security group
debug: var=sg_ansibleVPC_privatesubnet_servers

We are now at the end of the YAML file. Copy the code below to provision the windows 2016 server within the private subnet (the server will be tagged with name=win2016svr)

name: deploy a windows 2016 server in private subnet
ec2:
key_name: "{{ my_useast1_key }}"
instance_type: t2.micro
image: ami-0bf148826ef491d16
group_id: "{{ sg_ansibleVPC_privatesubnet_servers.group_id }}"
vpc_subnet_id: "{{ private_subnet.subnet.id }}"
assign_public_ip: no
region: "{{ vpc_region }}"
instance_tags:
Name: win2016svr
count_tag:
Name: win2016svr
exact_count: 1
register: win2016svr
name: display details for windows 2016 server in private subnet
debug: var=win2016svr

Save the playbook with a meaningful name. I named my playbook Ansible-create-AWS-environment.yml

The full Ansible playbook can be downloaded from https://gist.github.com/nivleshc/344dca91e3d0349c8a359b03853886be

Running the Ansible Playbook

Before we run the playbook, we need to tell Ansible about all the computers that are within the management scope. This is done using an inventory file, which contains a group name within square brackets eg [webservers] and below that, all the computers that will be in that group. Then in the playbook, we just target the group, which in turn targets all the computers in that group.

However, in our scenario, we are directly targeting the local computer (refer to the second line in the YAML file that shows hosts: localhost). In this regard, we can get away with not providing an inventory file. However, do note that doing so will mean that we can’t use anything other than localhost to reference a computer within our playbook.

Let’s create an inventory file called hosts in the same folder as where the playbook is saved. The contents of the file will be as listed below.

[local]
localhost

We are ready to run the playbook now.

Open a terminal session and change to the folder where the playbook was saved.

We need to create some environment variables to store the user details that Ansible will use to connect to AWS. This is where the access key and secret access key that we created initially will be used. Run the following command

export AWS_ACCESS_KEY_ID={access key id}
export AWS_SECRET_ACCESS_KEY={secret access key}

Now run the playbook using the following command (as previously mentioned, we could get away with not specifying the inventory file, however this means that we only can use localhost within the playbook)

ansible-playbook -i hosts ansible-create-aws-environment.yml

You should now see each of the tasks being executed, with the output being shown (remember that after each task, we have a follow-up task that shows the output using the debug keyword? )

Once the playbook execution has completed, check your AWS console to confirm that the following items have been created within the us-east-1 (North Virginia) region

  • A VPC called ansibleVPC with the CIDR 172.32.0.0/16
  • An internet gateway called ansibleVPC_igw
  • A public subnet in the first availability zone with CIDR 172.32.1.0/24
  • A private subnet in the second availability zone with CIDR 172.32.2.0/24
  • A route table called rt_ansibleVPC_PublicSubnet
  • A security group for jumphosts called sg_ansibleVPC_publicsubnet_jumphost
  • A security group for the servers in private subnet called sg_ansibleVPC_privatesubnet_servers
  • An EC2 instance in the public subnet representing a jumphost named win2016jh
  • An EC2 instance in the private subnet representing a server named win2016svr

Once the provisioning is complete, to test, connect to the jumphost and then from there connect to the server within the private subnet.

Don’t forget to turn off the EC2 instances if you don’t intend on using them

Closing Remarks

Ansible is a great automation tool and can be used to both provision and manage infrastructure within AWS.

Having said that, I couldn’t find an easy way to do post provisioning tasks (eg assigning roles, installing additional packages etc) after the server has been provisioned, without getting Ansible to connect directly to the provisioned server. This can be a challenge if the Ansible control machine is external to AWS and the provisioned server is within an AWS private subnet. With AWS CloudFormation, this is easily done. If anyone has any advice on this, I would appreciate it if you can leave it in the comments below.

I will surely be using Ansible for most of my automations from now on.

Till the next time, enjoy!

A scenario-based tutorial for Azure Kubernetes Service – Part 1

Introduction

Containers are gaining a lot of popularity these days. They provide an easy way to run applications, without having to worry about the underlying infrastructure.

As you might imagine, managing all these containers can become quite daunting, especially if there are numerous containers. This is where orchestration tools such as Kubernetes are very useful.

Kubernetes was developed by Google and is heavily based on their internal Borg system. It is an excellent tool to manage containers, where you provide a desired state for your containers and Kubernetes takes care of everything to ensure the containers are always in that state (for example, if a pod dies, Kubernetes will automatically start a new pod for that container, to ensure that the defined number of pods are always running). Kubernetes also provides an easy process to scale the number of pods or the number of nodes.

Soon after releasing Kubernetes, Google partnered with the Linux Foundation to form the Cloud Native Computing Foundation (CNCF). Kubernetes was then made open-source, with the Cloud Native Computing Foundation acting as its guardian. A nice writeup for Kubernetes history can be found at https://en.wikipedia.org/wiki/Kubernetes.

Kubernetes is abbreviated as k8s. If you are like me and are wondering how can the word Kubernetes be possibly shortened to k8s? Well, the 8 in k8s represents the number of characters between the letters k and s in the word Kubernetes.

With the popularity of Kubernetes soaring, Microsoft recently adopted it for its Azure environment, providing Azure Kubernetes Service as a managed service. The service entered general availability in June 2018. If you are interested in reading about this announcement, a good article to read is https://redmondmag.com/articles/2018/06/13/azure-kubernetes-service-ga.aspx .

This blog is the first in the mini-series that I will be publishing about Azure Kubernetes Service. I will take you through the process of creating an Azure Kubernetes Service (AKS) Cluster and then we will create an environment within the AKS cluster using some custom docker images.

In this first blog I will introduce some key Kubernetes terminologies and map out the scenario that the blog mini-series will focus on.

Terminology

Below are some of the key concepts which I believe will help immensely in understanding Kubernetes.

Pods

If you think about a pea pod, there can be one or many peas inside it. Treating each pea as a container, this translates to a pod being an encapsulation of an application container (or, in some cases, multiple containers).

As per the formal definition, a pod is an encapsulation of an application container (or, in some cases, multiple containers), storage resources, a unique network IP, and options that govern how the container(s) should run. A pod represents a unit of deployment, a single instance of an application in Kubernetes, which might consist of either a single container or a small number of containers that are tightly coupled and that share resources. A more detailed explanation is available at https://kubernetes.io/docs/concepts/workloads/pods/pod-overview/.

One key point to remember is that pods are ephemeral, they are created and at times they die as well. In that regard, any application that directly accesses pods will eventually fail when the pod dies. Instead, you should always interact with Services, when trying to access containers deployed within Kubernetes.

Services

Due to the ephemeral nature of pods, any application that is directly accessing a pod will eventually suffer a downtime (when the pod dies, and another is created to replace it). To get around this, Kubernetes provides Services.

Think of a Service to be like an application load balancer, it provides a front end for your container, and then routes the traffic to a pod running that container. Since your applications are always connecting to a Service (the properties for the Service remain unchanged during its lifetime), they are shielded from any pod deaths. For information about services, refer to https://kubernetes.io/docs/concepts/services-networking/service/ .

Namespaces

Namespaces provide a logical way of grouping your Kubernetes cluster. This allows you to provide access to different resources to different sets of users. Namespaces also provide a scope for names. Names must be unique within a namespace however they do not need to be unique across namespaces. A more in-depth description can be found at https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/

Kubernetes Control Plane (master)

The Kubernetes master (this is a collection of processes) ensures the Kubernetes cluster is working as expected by maintaining the clusters desired state.

Kubernetes Nodes

The nodes are where the containers and workflows are run. The nodes can be virtual machines, physical machines etc. The Kubernetes master controls each node.

Scenario

The diagram below shows the environment we will be deploying within our Azure Kubernetes Service (AKS) cluster.

In summary, we will deploy three pods, each running a customised nginx container. The nginx containers will be listening on non-http/https ports. As Kubernetes does not natively provide a way to route non-http/https traffic to services, we will be deploying nginx ingress controllers to enable this functionality.

Figure 1 – Infrastructure that will be deployed within the Azure Kubernetes Service cluster

In the next blog in this mini-series, we will deploy the Azure Kubernetes Service cluster.

Happy sailing and see you soon!

Enabling Source Control for locally stored code using Git, Visual Studio Code and Sourcetree

Introduction

Coming from a system administration background, I am used to writing scripts to get mundane tasks done. Whenever I saw repeatable tasks, I saw an opportunity to script them, and pass them onto a junior to do 😉

However, writing scripts brings about its own challenges.

Ok, time to fess up 😉 Hands up those that have modified a script, only to realise that the modifications broke it! To make matters worse, you forgot to take a copy of the original!

Don’t worry, I have been in that boat, and can remember the countless hours I spent, getting the script back to what it was (mind you, I am not talking about a formal business change here, which is governed by strict change control, but about personal scripts, that you have created to make your daily tasks easier)

To make a copy of a script, I would normally suffix the file with the current time and date. This provided me with a timestamp of when I changed the file and a way of reverting my changes. However, there were instances when I was making backups of the modified script because I had tested a modification and it worked, however I didn’t want to risk breaking it when further modifying the file. Guess what, these are the times when I found I made the worst mistakes! I used to get so engrossed with my modifications that I would forget to make a backup of the changes and end up with an unworkable script. The only version to revert to was the original, which meant all my hard work went to waste!

This is why I started my search for a better change tracking system. One that will show me the changes I had made, and which will allow me to easily revert to a previous version.

Guess what! I think I just found this golden goose and it is truly amazing!

In this blog I will show you how you can use Git, an open source version control system,  to track changes to scripts stored locally on your computer. The main use of Git is for source control of files that a team contributes to. In these situations, a Git Server is used to store the repository.

Please ensure that the local folder you are tracking for source control is backed up either to the cloud or to an external hard disk.

For editing our code/script, we will use Microsoft’s Visual Studio Code, a free IDE that has Git support in-built. We will also use Sourcetree, Atlassian’s free Git client.

 

Introducing Git

Git is an awesome opensource distributed version control system. When working in a team, it allows you to have your files centrally managed, and at the same time, allowing multiple people to work on them. Team members can pull the repository to their local computer. They can also branch a part of the repository, update the files in that part and then merge them with the master. If there are no conflicts, Git will update the files in its repository. However, if there are conflicts, Git will inform that team member, showing them the conflicts. The team member then can either resolve the conflicts and then re-merge or discard their changes altogether.

If you want to read more on git, check out https://git-scm.com

To host the repositories for your team, two commonly used solutions are a Git Server or Visual Studio Teams Server. You can also use Github, however, your repositories will be public, unless you sign up for a paid account.

For personal use, you can store your git repositories in a local directory that is backed up to the cloud. For my personal projects, I use a Dropbox synchronised folder.

To use Git, you need to use a Git client. If you have a MacBook, a git client comes built-in. For windows, there are lots of clients available, however in my view, Sourcetree is one of the best (more about this abit later).

For MacBook users, below are some basic commands you can use from a terminal session

#change to directory where you will store your repository
cd /Users/tomj/Documents/git-repo/personalproject

#create a git repo in this folder
git init

#you can copy files into this folder
#to get git to start tracking the changes in the newly added files use the following command
git add .

 

As mentioned above, https://git-scm.com is an awesome site to learn more about Git.

You can also check out this page https://www.atlassian.com/git/tutorials/atlassian-git-cheatsheet for some quick git commands.

Using Sourcetree

For those that prefer a GUI client, I found that Sourcetree, from Atlassian, is an awesome Git client.

It gives you all the features of a good Git client plus it also shows you the history of all the changes made to the repository.

For this blog, I will be using Sourcetree to create and manage the Git repository.

So head over to https://www.sourcetreeapp.com to download and then install the client.

After installing Sourcetree, you will be prompted for a login account. Follow the links provided in the Sourcetree app to create a free Bitbucket account and then login.

Ok, lets begin.

Create a new repository

A repository is essentially a collection of files (or file) that we will track for changes. You can think of it as a directory.

To create a new repository, open Sourcetree.

From the menu, click on File and then click New. You will get the following screen.

Sourcetree_newRepo

Next, click on New and then click on Create Local Repository.

In the next window, for Destination Path, select the folder that will contain the scripts that you want to monitor for source control.

For  Name leave it to the default (name of the folder). Ensure the Type is Git and then click Create.

Guess what, thats all it takes to create a local repository! Simple ?

Once the repository is created, you will see a screen similar to the one shown below (my repository is called temp)

Sourcetree_newRepoCreated.

Double click on the newly created repository (as shown above). This will show the dashboard where everything happens 😉

Sourcetree_RepoDashboard

 

To see all the changes that have been made to the repository, click on History in the above screen.

Visual Studio Code

Ok, so we have created our repository and it is being monitored for changes. Now, we can start coding.

As mentioned above, we will be using Visual Studio Code, a free IDE from Microsoft. If you haven’t got it already, download it from https://code.visualstudio.com

Once installed, open Visual Studio Code.

From the menu, click on File and then click on  Open.  Next, choose the folder that you created the repository for above and then click on Open.

You will now see the folder structure, with all the files inside it in the left pane.

You can open any of the existing files or create new ones. For new ones, ensure you save them in the repository’s folder.

As soon as you save the file, you will notice the Source Control icon shows the number of changes that are currently ready to be staged (Source Control section is denoted by the “stethoscope” icon – ok it’s not really that but it surely looks like it 😉 )

VSC_SourceControl_Update

Now, one thing to note about Source Control via Git is that, you have to stage your changes. When you stage your changes, those changes will be written to the Git repository when you click Commit.

Click into the Source Control section and then under Changes click the + for each of the files, to stage the change.

VSC_SourceControl_Update_Changes

To commit the changes, enter a short description of what the changes were and then click on the tick at the top.

VSC_SourceControl_Update_Changes_Commit

That’s it. Your changes has now been successfully committed to the Git repository.

To view a history of all the changes that have been done to your repository, open Sourcetree and then click on History.

Notice the description column. This contains the comments you wrote when committing your staged changes. This provides a quick reminder of what the changes were. To drill down deeper into the changes, check the pane at the bottom right. Here, you will see the actual changes that were made (green denotes additions and red denotes deletion of characters). If there are multiple people committing to the same repo (as would be the case in a team), the names of each person will be shown beside each line in the History section.

Sourcetree_ViewHistory

Now, lets say that after you did your commit, you realised that you didn’t want that change, and in-fact you prefer what the file was before the commit. All you need to do is go into Sourcetree, in the History section, find the change and then right click on it and then click on Reverse commit. This reverses the commit and changes the file to what it previously was. If after that, you want to get back the change? Well, you can reverse the reverse commit 😉 (this is so much better than my method of copying the last suffixed version to the current version)

Soucetree_ReverseCommit

Closing Remarks

I am absolutely loving Git. It is an awesome tool and I would highly recommend it to each and everyone. For me personally, it helps in controlling the various changes I make to my code, with easy auditability and view of the changes I make between versions.

For teams, Git provides even more benefits. Using a central server (Git Server or Visual Studio Team Services) to host the Git repositories, the whole team can work on the files without blocking each other. The files will be stored centrally (actually with Git, when you pull a repo, you download the full repo to your local computer. If you merge your changes, the files are merged to the copy on the server). The changes to the files are easily trackable and there is an easy way to revert to a previous version should issues arise due to modifications.

I hope you embrace Git as I have and use it to track all your code changes.

Till the next time, Enjoy 😉