Visualise Network Traffic in AWS using VPC Flow Logs and Grafana – Part 1 – VPC Setup

Introduction

Have you ever wondered what happens at the edge of the internet? Let me give you some context. Suppose you have provisioned an Amazon EC2 instance in your AWS Account. It is deployed in a public subnet, has a public ip address, and some open ports. Now, answer me this. Do you believe that no one else tries to connect to your server? That it is left alone by the millions of netizens?

Not long ago, I woke up with this exact question. From what I had read ages ago, the internet is filled with bots, which try to scan machines connected to the internet. These could range from benign scripts that are just trying to map out the devices on the internet, to something more nefarious. For instance, a reconnaissance tool that hackers use to find vulnerable machines!

Reading about such things is good, however it’s even better when you can see it for yourself. And that is exactly what I did!

In this blog, I will share my journey in unravelling the answer to the above-mentioned question. It was an enlightening experience, it got more interesting as I dug deeper.

Have I got you interested yet? Without further ado let’s begin!

A sneak peek into what I built

Before we start, let me give you a sneak peek into what I built. Below is a screenshot of the Grafana dashboard I created. It gives a nice view of the number of accepted and rejected connections to my server. The left two gauges show statistics since I provisioned the server, which was approximately 2 hours. The right two gauges show statistics for the last 30 minutes. Holy smokes! I was extremely surprised to see so many connections in such a short time!

High Level Architecture

Let’s get some ideas on the board, on as to how we will create the above dashboard. I have pasted the high-level architecture diagram for the solution below. The important parts are labelled with a number and are explained in detail underneath the diagram.

The key points in the above diagram, as denoted by the numbers are explained below.

  1. Traffic from the internet reaches our Amazon EC2 instance running inside the public subnet. We will use this instance to find out who all are trying to connect to servers on the internet (well in my side of the internet anyways). The same instance will also be used to host our Grafana server as well.
  2. The traffic is first interrogated by the AWS network access control lists (NACL) and then by the AWS security groups. Based on this, the traffic is either allowed through to the Amazon EC2 instance or it is dropped. The details of the network traffic, along with the action that was taken on it, is recorded in an Amazon CloudWatch Log stream by AWS VPC Flow Logs.
  3. The Grafana dashboard connects to the Amazon CloudWatch Logs stream and fetches the data. It then displays it using a dashboard (as shown in the sneak peek).
  4. A user can access the Grafana console and view the dashboard.

In this blog, we will create the Amazon Virtual Private Cloud (VPC) along with the public and private subnets, route tables. We will also create an Identity and Access Management (IAM) role, which we will use to configure the VPC flow logs. An Amazon CloudWatch Log group will also be created. This will be used to store the VPC Flow Logs.

In part 2 of this blog, we will complete the solution by deploying an Amazon EC2 instance. This will be used to find out who all are trying to connect to servers, and it will also be installed with a Grafana server. This will allow us to visualise the data that VPC flow logs produces.

Now that we have had a 10,000 feet view, let’s get into the weeds and see how this will actually be implemented in code.

Walk through of the Code

  1. Start by cloning the code from my GitHub repository using the following command.
    git clone -b vpc https://github.com/nivleshc/blog-visualise-network-traffic.git
  2. Open the folder that contains the cloned repository files (blog-visualise-network-traffic) and then browse into the folder named vpc. Open the file named main.tf using your favourite IDE.
  3. The first resource block in this file creates the VPC.
# create the VPC
resource "aws_vpc" "main" {
cidr_block = var.vpc["cidr_block"]
instance_tenancy = var.vpc["instance_tenancy"]
tags = var.vpc["tags"]
}

4. The next two resource blocks create the private and public subnets respectively.

# create the private subnet
resource "aws_subnet" "private" {
vpc_id = aws_vpc.main.id
cidr_block = var.private_subnet["cidr_block"]
availability_zone = var.private_subnet["availability_zone"]
tags = var.private_subnet["tags"]
}
# create the public subnet. This is where the Grafana Server will be installed
resource "aws_subnet" "public" {
vpc_id = aws_vpc.main.id
cidr_block = var.public_subnet["cidr_block"]
availability_zone = var.public_subnet["availability_zone"]
map_public_ip_on_launch = true
tags = var.public_subnet["tags"]
}

5. The internet gateway is created by the next resource block.

# create the internet gateway so that we can connect to the internet
resource "aws_internet_gateway" "igw" {
vpc_id = aws_vpc.main.id
tags = {
Name = "igw"
}
}

6. The private and public subnets are created by the next two resource blocks.

# create a private route table. This will be associated with the private subnet
resource "aws_route_table" "private-rt" {
vpc_id = aws_vpc.main.id
tags = {
Name = "private route table"
}
}
# create a public route table. This will be associated with the public subnet
resource "aws_route_table" "public-rt" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.igw.id
}
tags = {
Name = "public route table"
}
}

7. The last two resource blocks in this file attach the route tables to the respective subnets.

# associate the private route table with the private subnet
resource "aws_route_table_association" "private-rt" {
subnet_id = aws_subnet.private.id
route_table_id = aws_route_table.private-rt.id
}
# associate the public route table with the public subnet
resource "aws_route_table_association" "public-rt" {
subnet_id = aws_subnet.public.id
route_table_id = aws_route_table.public-rt.id
}

8. Open the file named cloudwatch-logs.tf. The resource block in this file creates an Amazon CloudWatch log group. This will be used by VPC Flow logs to store the traffic information.

resource "aws_cloudwatch_log_group" "vpc_flow_logs" {
name = var.vpc["vpc_flow_logs"]["cloudwatch_log_group_name"]
retention_in_days = var.vpc["vpc_flow_logs"]["cloudwatch_log_group_retention_in_days"]
skip_destroy = false # force delete the log group when destroying the VPC
}

9. Open the file named iam.tf. This contains resource blocks to create an Identity and Access Management role. This will be used by VPC flow logs when storing traffic data into the Amazon CloudWatch Log group.

# create a role that will be used for enabling vpc flow logs
data "aws_iam_policy_document" "assume_role" {
statement {
effect = "Allow"
principals {
type = "Service"
identifiers = ["vpc-flow-logs.amazonaws.com"]
}
actions = ["sts:AssumeRole"]
}
}
resource "aws_iam_role" "vpc_flow_log_role" {
name = "vpc_flow_log_role"
assume_role_policy = data.aws_iam_policy_document.assume_role.json
}
data "aws_iam_policy_document" "vpc_flow_log_policy_document" {
statement {
effect = "Allow"
actions = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:DescribeLogGroups",
"logs:DescribeLogStreams",
]
resources = ["*"]
}
}
resource "aws_iam_role_policy" "vpc_flow_log_policy" {
name = "vpc_flow_log_policy"
role = aws_iam_role.vpc_flow_log_role.id
policy = data.aws_iam_policy_document.vpc_flow_log_policy_document.json
}

10. Open the file named vpc-flow-logs.tf. This file contains the resource block that will create the vpc flow log.

# enable vpc flow logs
resource "aws_flow_log" "vpc_flow_log" {
iam_role_arn = aws_iam_role.vpc_flow_log_role.arn
log_destination = aws_cloudwatch_log_group.vpc_flow_logs.arn
traffic_type = var.vpc["vpc_flow_logs"]["traffic_to_capture"]
vpc_id = aws_vpc.main.id
tags = {
Name = format("%s-%s", "vpc-flow-logs", var.vpc["vpc_flow_logs"]["traffic_to_capture"])
}
}

11. Open the file named outputs.tf. This contains the module outputs which can be consumed by the caller of this module.

output "vpc_id" {
description = "The VPC ID"
value = aws_vpc.main.id
}
output "private_subnet_id" {
description = "The ID of the private subnet"
value = aws_subnet.private.id
}
output "public_subnet_id" {
description = "The ID of the public subnet"
value = aws_subnet.public.id
}

12. Open the file named variables.tf. It contains declarations for the variables that are used in the vpc module. The values for these variables will be passed when this module is called. The variables declared for this module are objects, which means that you have to supply values to it in a certain format, otherwise terraform will fail with an error.

variable "vpc" {
description = "Configuration values for the VPC"
type = object({
cidr_block = string
instance_tenancy = string
vpc_flow_logs = object({
cloudwatch_log_group_name = string
cloudwatch_log_group_retention_in_days = number
traffic_to_capture = string
})
tags = map(string)
})
}
variable "private_subnet" {
description = "Configuration values for the private subnet"
type = object({
cidr_block = string
availability_zone = string
tags = map(string)
})
}
variable "public_subnet" {
description = "Configuration values for the public subnet"
type = object({
cidr_block = string
availability_zone = string
tags = map(string)
})
}

13. Go one level up in the folder structure. Open the file named locals.tf. The local values defined in this file are the values that will be assigned to the variables declared in the vpc module. Lets go through each of these local values to understand them better.
The vpc locals value contains the following configuration items:

  • cidr block that will be assigned to the vpc
  • the instance tenancy to use for the vpc
  • the vpc flow log settings. This includes the following items:
    • the name to assign to the Amazon CloudWatch Log group that will be created
    • the number of days that the Amazon CloudWatch Log group streams should be retained for
    • the type of traffic that should be captured. This has to be one of ALL, ACCEPT or REJECT.
  • the tags that should be assigned to the vpc resource

The subnets locals value contains the following configuration items:

  • a declaration for the private subnet which contains the following items:
    • cidr block that will be assigned to this subnet.
    • the availability zone that this subnet will be created in
    • the tags that will be assigned to this subnet
  • a declaration for the public subnet which contains the following items:
    • cidr block that will be assigned to this subnet.
    • the availability zone that this subnet will be created in
    • the tags that will be assigned to this subnet

The file also contains a declaration for default tags. The aws provider will add this to all resources that it creates. At a minimum, this needs to contain the Project tag. This will be used by the resource group configuration (explained later on). The resource group will be used to view all the provisioned resources for this project from a single portal.

locals {
vpc = {
cidr_block = "10.0.0.0/16"
instance_tenancy = "default"
vpc_flow_logs = {
cloudwatch_log_group_name = "/aws/vpc/flowlogs"
cloudwatch_log_group_retention_in_days = 7
traffic_to_capture = "ALL"
}
tags = {
Name = "traffic-analysis"
}
}
subnets = {
private = {
cidr_block = "10.0.1.0/24"
availability_zone = "ap-southeast-2a"
tags = {
Name = "private-subnet"
}
}
public = {
cidr_block = "10.0.2.0/24"
availability_zone = "ap-southeast-2b"
tags = {
Name = "public-subnet"
}
}
}
default_tags = {
Project = "traffic-analysis"
}
}

14. Next, open the providers.tf file. It defines the version constraints for the providers that will be used. You will also see that the region and the default tags are being set for the aws provider in this file.

terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.41.0"
}
}
}
provider "aws" {
region = "ap-southeast-2"
default_tags {
tags = local.default_tags
}
}

15. Now open the file named main.tf The first resource block creates an AWS resource group. A resource group allows you to easily view resources that match certain tags. In our case, the resource group will query for resources that match the supplied value for the Project tag. Remember that in the providers.tf file, we configured the aws provider to add the Project tag to all provisioned resources by default?

# create a resource group. This will allow us to easily see all resources
# provisioned by this project using the AWS Management Console
resource "aws_resourcegroups_group" "project_resources" {
name = "${local.default_tags["Project"]}-resources"
description = format("%s %s %s","All resources provisioned by the", local.default_tags["Project"], "project")
resource_query {
query = <<JSON
{
"ResourceTypeFilters": [
"AWS::AllSupported"
],
"TagFilters": [
{
"Key": "Project",
"Values": ["${local.default_tags["Project"]}"]
}
]
}
JSON
}
tags = {
Name = "${local.default_tags["Project"]}-resources"
}
}

16. The second resource block inside main.tf uses the vpc module to provision the vpc. It supplies the locals values as inputs to the variables.

# create the vpc
module "vpc" {
source = "./vpc"
vpc = local.vpc
private_subnet = local.subnets["private"]
public_subnet = local.subnets["public"]
}

Deploying the code

Now it’s time to have some fun. Let’s deploy the code.

  1. The following tools are required to deploy this code into your AWS Account. Confirm that they are installed on the machine that you will be using.
    • Terraform
    • AWS CLI utility installed and configured with a default profile that has credentials to connect to your AWS Account.
  2. Using a command line utility, such as terminal, go to the root of the folder where you cloned the repository for this project.
  3. Run the command listed below to initialise terraform. This will download all the required terraform providers and modules.
    terraform init
  4. Next, run the command listed below to provision the vpc resources into your AWS Account. When asked for your confirmation to proceed, type yes and press enter.
    terraform apply
  5. After terraform has successfully created the resources, login to your AWS Account using the AWS Management Console and confirm that all expected resources have been created.
    Oh wait! Do you remember that resource group that we saw in the Terraform code? And that we added a default “Project” tag to all our provisioned project resources?
    Well, you are in luck my little grasshopper! Instead of going through the various AWS Service portals to confirm that the resources exist, you can just view them all from one service portal! Isn’t that easy?
    • Open the Resource Groups & Tag Editor service in the AWS Management Console.
    • In the left-hand side menu, click on Saved Resource Groups. Then in the right-hand side, look for a group named {Project}-resources where {Project} is the value for the tag that you defined in the default_tags block in the locals.tf file. By default, this will be set to traffic-analysis, which means your resource group name will be traffic-analysis-resources. Click on this and scroll down to Group resources.
      Viola! You should see all the resources that Terraform provisioned for your project here! Isn’t this amazing!

Thats all folks! We have successfully provisioned the vpc resources and enabled vpc flow logs. In the next part of this blog, we will build on this and deploy a Grafana server in the public subnet. This will be an interesting exercise because after this, you will be able to see who all are trying to connect to your server!

Before we finish, lets destroy all that we created. We will re-create this in the next part, before continuing on. To destroy, type the following command and when asked for your confirmation to proceed, type yes and press ENTER.

terraform destroy

A few callouts

Here are a couple of things to be mindful of in regards to this project:

  1. The terraform state file is stored locally. If it gets corrupted, or accidentally deleted, then you won’t be able to manage the provisioned resources. Worse yet, if you need to destroy the resources, you will have to do it manually.
    You can always use the resource group we created to see all the provisioned resources in one place, and then easily delete them from there.
    If you want to protect your terraform state file, then you can use an Amazon S3 backend and use an Amazon DynamoDB table for statefile locking. Here are some good articles to get you started on this. https://developer.hashicorp.com/terraform/language/state/backends
    https://developer.hashicorp.com/terraform/language/settings/backends/s3
  2. The Amazon CloudWatch Log group that is created in this project is not encrypted. If you require this functionality, update the supplied terraform code to first create an AWS KMS Customer Managed Key and then modify the Amazon CloudWatch log group resource block so that it uses it to encrypt the logs. Here are some good articles on this.
    https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/kms_key
    https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_log_group

I hope you enjoyed this blog, however I promise you that you will have more fun in the second part! This is where things get extremely interesting because you will get to find out who all are trying to connect your poor ol’ server! Spoiler alert! I found a couple of security companies scanning my server for open ports! I also found some traffic that most probably came from hacker bots!

Till the next time, stay safe and I will see you soon.