Using AWS EC2 Instances to train a Convolutional Neural Network to identify Cows and Horses

Background

Machine Learning (ML) and Artificial Intelligence (AI) has been a hobby of mine for years now. After playing with it approximately 8 years back, I let it lapse till early this year, and boy oh boy, how things have matured! There are products in the market these days that use some form of ML – some examples are Apple’s Siri, Google Assistant, Amazon Alexa.

Computational power has increased to the point where calcuations that took months can now be done within days. However, the biggest change has come about due to the vast amounts of data that the models can be trained on. More data means better accuracy in models.

If you have taken any programming course, you would remember the hello world program. This is a foundation program, which introduces you to the language and gives you the confidence to continue on. The hello world for ML is identifying cats and dogs. Almost every online course I have taken, this is the first project that you build.

For anyone wanting a background on Machine Learning, I would highly recommend Andrew Ng’s https://www.coursera.org/learn/machine-learning in Coursera. However, be warned, it has a lot of maths 🙂 If you are able to get through it, you will get a very good foundational knowledge on ML.

If theory is not your cup of tea, another way to approach ML is to just implement it and learn as you go. You don’t need to get a PhD in ML to start implementing it. This is the philosophy behind Jeremy Howard’s and Rachel Thomas’s http://www.fast.ai. They take you through the implementation steps and introduce you to the theory on a need to know basis, in essence you are doing a top down approach.

I am still a few lessons away from finishing the fast.ai course however, I have learnt so much and I cannot recommend it enough.

In this blog, I will take you through the steps to implement a Convolutional Neural Network (CNN) that will be able to pick out horses from cows. CNNs are quite complicated in nature so we won’t go into the nitty-gritty details on creating them from scratch. Instead, we will use the foundational libraries from fast.ai’s lesson 1 and modify it abit, so that instead of identifying cats and dogs, we will use it to identify cows and horses.

In the process, I will introduce you to a tool that will help you scrape Google for your own image dataset.

Most important of all, I will show you how the amount of data used to train your CNN model affects its accuracy.

So, put your seatbelts on and lets get started!

 

1. Setting up the AWS EC2 Instance

ML requires a lot of processing power. To get really good throughput, it is recommended to use GPUs instead of CPUs. If you were to build a kit to try this at home, it can easily cost you a few thousands of dollars, not to mention the bill for the cooling and electricity usage.

However, with Cloud Computing, we don’t need to go out and buy the whole kit, instead we can just rent it for as long as we want. This provides a much affordable way to learn ML.

In this blog, we will be using AWS EC2 instances. For the cheapest GPU cores, we will use a p2.xlarge instance. Be warned, these cost $0.90/hr, so I would suggest turning them off after using them, otherwise you will surely rack up a huge bill.

Reshma has done a fantastic job of putting together the instructions on setting up an AWS Instance for running fast.ai course lessons. I will be using her instructions, with a few modifications. Reshma’s instructions can be found here.

Ok lets begin.

  • Login to your AWS Console
  • Go to the EC2 section
  • On the top left menu, you will see EC2 Dashboard. Click on Limits under it
  • AWS_Dashboard_EC2_Limits
  • Now, on the right you will see all the type of EC2 instances you are allowed to run. Search for p2.xlarge instances. These have a current limit of zero, meaning you cannot launch them. Click on Request limit increase and then fill out the form to justify why you want a p2.xlarge instance. Once done, click on Submit. In my case, within a few minutes, I received an email saying that my limit increase had been approved.
  • Click on EC2 Dashboard from the left menu
  • Click on Launch Instance
  • In the next screen, in the left hand side menu, click on Community AMIs
  • On the right side of the screen, search for fast.ai
  • From the results, select fastai-part1v2-p2
  • In the next screen (Instance Type) filter by GPU compute and choose p2.xlarge
  • In the next screen configure the instance details. Ensure you get a public IP address (Auto-assign Pubic IP) because you will be connecting to this instance over the internet. Once done, click Next: Add Storage
  • In the next screen, you don’t need to do anything. Just be aware that the community AMI comes with a 80GB harddisk (at $0.10/GB/Month, this will amount to $8/Month). Click Next
  • In the next screen, add any tags for the EC2 Instance. To give the instance a name, you can set the Key to Name and the Value to fastai. Click Next
  • For security groups, all you need to do is allow SSH to the instance. You can leave the source as 0.0.0.0/0 (this allows connections to the EC2 instance from any public IP address). However, if you want to be super secure, you can set the source to your current ip address. However, doing this means that should your public ip address change (hardly any ISPs give you a static IP address, unless you pay extra), you will have to go back into the AWS Console and update the source in the security group. Click Next
  • In the next section, check that all details are correct and then click on Launch. You will be asked for your key pair. You can either choose an existing key pair or create a new one. Ensure you keep the key pair in a safe place because whoever possesses it can connect to your EC2 instance.
  • Now, sit back and relax, Within a few minutes, your EC2 instance will be ready. You can monitor the progress in the EC2 Dashboard

DON’T FORGET TO SHUTDOWN THE INSTANCE WHEN NOT USING IT. AT $0.90/hr, IT MIGHT NOT SEEM MUCH, HOWEVER THE COST CAN EASILY ACCUMULATE TO SOMETHING QUITE EXPENSIVE

2. Creating the dataset

To train our Convolutional Neural Network (CNN), we need to get lots of images of cows and horses. This got me thinking. Why not get it off Google? But, then this provided another challenge. How do I download all the images? Surely I don’t want to be sitting there right clicking each search result and saving it!

After some googling, I landed on https://github.com/hardikvasa/google-images-download. It does exactly as to what I wanted. It will do a google image search using a keyword and download the results.

Install it using the instructions provided in the link above. By default, it only downloads 100 images. As CNNs need lots more, I would suggest installing chromedriver. The instructions to do this is in the Troubleshooting section under ## Installing the chromedriver (with Selenium)

To download 1000 images of cows and horses, use the following command line (for some reason the tool only downloads around 800 images)

  • the downloaded images will be stored in the subfolder cows/downloaded and horses/downloaded in the /Users/x/Documents/images folder.
  • keyword denotes what we are searching for in google. For cows, we will use cow because we want a single cow’s photo. The same for horses.
  • –chromedriver provides the path to where the chromedriver has been stored
  • the images will be in jpg format
googleimagesdownload --keywords "cow" --format jpg --output_directory "/Users/x/Documents/images/" --image_directory "cows/downloaded" --limit 1000 --chromedriver /Users/x/Documents/tools/chromedriver
googleimagesdownload --keywords "horse" --format jpg --output_directory "/Users/x/Documents/images/" --image_directory "horses/downloaded" --limit 1000 --chromedriver /Users/x/Documents/tools/chromedriver

3. Finding and Removing Corrupt Images

One disadvantage of using googleimagedownload script is that, at times a downloaded image cannot be opened. This will cause issues when our CNN tried to use it for training/validating.  To ensure our CNN does not encounter any issues, we will do some housekeeping before hand and remove all corrupt images (images that cannot be opened).

I wrote the following python script to find and move the corrupt images to a separate folder. The script uses the matplotlib library (the same library used by the fast.ai CNN framework) If you don’t have it, you will need to download it from https://matplotlib.org/users/installing.html.

The script assumes that within the root folder, there is a subfolder called downloaded which contains all the images. It also assumes there is a subfolder called corrupt within the root folder. This is where the corrupt images will be moved to. Set the root_folder_path to the parent folder of the folder where the images are stored.

#this script will go through the downloaded images and find those that cannot be opened. These will be moved to the corrupt folder.

#load libraries
import matplotlib.pyplot as plt
import os

#image folder
root_folder_path = '/Users/x/Documents/images/cows/'
image_folder_path = root_folder_path + 'downloaded/'
corrupt_folder_path = root_folder_path + 'corrupt' #folder were the corrupt images will be moved to

#get a list of all files in the img folder
image_files = os.listdir(f'{image_folder_path}')

print (f'Total Image Files Found: {len(image_files)}')
num_image_moved = 0

#lets go through each image file and see if we can read it
for imageFile in image_files:
 filePath = image_folder_path + imageFile
 #print(f'Reading {filePath}')
 try:
 valid_img = plt.imread(f'{filePath}')
 except:
 print (f'Error reading {filePath}. File will be moved to corrupt folder')
 os.rename(filePath,os.path.join(corrupt_folder_path,imageFile))
 num_image_moved += 1

print (f'Moved {num_image_moved} images to corrupt folder')

For some unknown reason, the script, at times, moves good images into the corrupt folder as well. I would suggest that you go through the corrupt images and see if you can open them (there won’t be many in the corrupt folder). If you can, just manually move them back into the downloaded folder.

To make the images easier to handle, lets rename them using the following format.

  • For the images in the cows/downloaded folder rename them to a format CowXXX.jpg where XXX is a number starting from 1
  • For the images in the horses/downloaded folder rename them to a format HorseXXX.jpg where XXX is a number starting from 1

 

4. Transferring the images to the AWS EC2 Instance

In the following sections, I am using ssh and scp which come builtin with MacOS. For Windows, you can use putty for ssh and WinSCP for scp

A CNN (or any other Neural Network model) is trained using a set of images. Once training has finished, to find how accurate the model is, we give it a set of validation images (these are different to those it was trained on, however we know what these images are of) and ask it to identify the images. We then compare the results with what the actual image was, to find the accuracy.

 

In this blog, we will first train our CNN on a small set of images.

Do the following

  • create a subfolder inside the cows folder and name it train
  • create a subfolder inside the cows folder and name it valid
  • move 100 images from the cows/downloaded folder into the cows/train folder
  • move 20 images from the cows/downloaded folder into the cows/valid folder

Make sure the images in the cows/train folder are not the same as those in cows/valid folder

Do the same for the horses images, so basically

  • create a subfolder inside the horses folder and name it train
  • create a subfolder inside the horses folder and name it valid
  • move 100 images from the horses/downloaded folder into the horses/train folder
  • move 20 images from the horses/downloaded folder into the horses/valid folder

Now connect to the AWS EC2 instance the following command line

ssh -i key.pem ubuntu@public-ip

where

  • key.pem is the key pair that was used to create the AWS EC2 instance (if the key pair is not in the current folder then provide the full path to it)
  • public-ip is the public ip address for your AWS EC2 instance (this can be obtained from the EC2 Dashboard)

Once connected, use the following commands to create the required folders

cd data
mkdir cowshorses
mkdir cowhorses/train
mkdir cowhorses/valid
mkdir cowhorses/train/cows
mkdir cowhorses/train/horses
mkdir cowhorses/valid/cows
mkdir cowhorses/valid/horses

Close your ssh session by typing exit

Run the following commands to transfer the images from your local computer to the AWS EC2 instance

To transfer the cows training set
scp -i key.pem /Users/x/Documents/images/cows/train/*  ubuntu@public-ip::~/data/cowshorses/train/cows

To transfer the horses training set
scp -i key.pem /Users/x/Documents/images/horses/train/*  ubuntu@public-ip::~/data/cowshorses/train/horses

To transfer the cows validation set
scp -i key.pem /Users/x/Documents/images/cows/valid/*  ubuntu@public-ip::~/data/cowshorses/valid/cows

To transfer the horses validation set
scp -i key.pem /Users/x/Documents/images/horses/valid/*  ubuntu@public-ip::~/data/cowshorses/valid/horses

5. Starting the Jupyter Notebook

Jupyter Notebooks are one of the most popular tools used by ML and data scientists. For those that aren’t familiar with Jupyter Notebooks, in a nutshell, it a web page that contains descriptions and interactive code. The user can run the code live from within the document. This is possible because Jupyter Notebook’s execute the code on the server it is running on and then displays the result in the web page. For more information, you can check out http://jupyter.org

In our case, we will be running the Jupyter Notebook on the AWS EC2 instance. However, we will be accessing it through our local computer. For security reasons, we will not publish our Jupyter Notebook to the whole wide world (lol that does spell www).

Instead, we will use the following ssh command to bind our local computer’s tcp port 8888 to the AWS EC2 instance’s tcp port 8888 (this is the port on which the Jupyter Notebook will be running) when we connect to it. This will allow us to access the Jupyter Notebook as if it is running locally on our computer, however the connection will be tunnelled to the AWS EC2 instance.

ssh  -i key.pem ubuntu@public-ip -L8888:localhost:8888

Next, run the following commands to start an instance of Jupyter Notebook

cd fastai
jupyter notebook

After the Jupyter Notebook starts, it will provide a URL to access it, along with the token to authenticate with. Copy it and then paste it into a browser on your local computer.

You will now be able to access the fastai Jupyter Notebook.

Follow the steps below to open Lesson 1.

  • click on the courses folder
  • once inside the courses folder,  click on the  dl1 folder

In the next screen, find the file lesson1.ipynb and double-click it. This will launch the lesson1 Jupyter Notebook in another tab.

Give yourself a big round of applause for reaching so far!

Now, start from the top of lesson1 and go through the first three code sections and execute them. To execute the code, put the mouse pointer in the code section and then press Shift+Enter.

In the next section, change the path to where we moved the cows and horses pictures to. It should look like below

PATH = "data/cowshorses/"

Then, execute this code section.

Skip the following sections

  • Extra steps if NOT using Crestle or Paperspace or our scripts
  • Extra steps if using Crestle

Just a word of caution. The original Jupyter Notebook is meant to distinguish between cats and dogs. However, since we are using it to distinguish between cows and horses, whenever you see a mention of cats, change it to cows and whenever you see a mention of dogs, change it to horses.

The following lines don’t need any changing, so just execute them as they are

os.listdir(PATH)
os.listdir(f'{PATH}valid')

In the next line, replace cats with cows so that you end up with the following

files = !ls {PATH}valid/cows | head
files

Execute the above code. A list of the first 10 cow image files will be displayed.

Next, lets see what the first cow image looks like.

In the next line, change cats to cows to get the following.

img = plt.imread(f'{PATH}valid/cows/{files[0]}')
plt.imshow(img);

Execute the code and you will see the cow image displayed.

Execute the next two code sections. Leave the section after that commented out.

Now, instead of creating a CNN model from scratch, we will use one that was pre-trained on ImageNet which had 1.2 million images and 1000 classes. So it already knows quite a lot about how to distinguish objects. To make it suitable to what we want to do, we will now train it further on our images of cows and horses.

The following defines which model to use and provides the data to train on (the CNN model that we will be using is called resnet34). Execute the below code section.

data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(resnet34, sz))
learn = ConvLearner.pretrained(resnet34, data, precompute=True)

And now for the best part! Lets train the model and give it a learning rate of 0.01.

learn.fit(0.01, 1)

After you execute the above code, the model will be trained on the cows and horses images that were provided in the train folders. The model will then be tested for accuracy by getting it to identify the images contained in the valid folders. Since we already know what the images are of, we can use this to calculate the model’s accuracy.

When I ran the above code, I got an accuracy of 0.75. This is quite good since it means the model can identify cows from horses 75% of the time. Not to forget, we used only 100 cows and 100 horses images to train it, and it didn’t even take that long to train it !

Now, lets see what happens when we give it loads more images to train on.

BTW to get more insights into the results from the trained model,  you can go through all the sections between the lines learning.fit(0.01,1) and Choosing a learning rate.

Another take at training the model

From all the literature I have been reading, one point keeps on repeating. More data means better models. Lets put this to the test.

This time around we will give the model ALL the images we downloaded.

Do the following.

  • on your local computer, move the photos back to the downloaded folder
    • move photos from cows/train to cows/downloaded
    • move photos from cows/valid to cows/downloaded
    • move photos from horses/train to horses/downloaded
    • move photos from horses/valid to horses/downloaded
  • on your local computer, move 100 photos of cows to cows/valid folder and the rest to the cows/train folder
    • move 100 photos from cows/downloaded to cows/valid folder
    • move the rest of the photos from cows/downloaded to cows/train folder
  • on your local computer, move 100 photos for horses to horses/valid and the rest to horses/train folder
    • move 100 photos from horses/downloaded to horses/valid folder
    • move the rest of the photos from horses/downloaded to horses/train folder
  • on the AWS EC2 instance, delete all the photos under the following folders
    • /data/cowshorses/train/cows
    • /data/cowshorses/train/horses
    • /data/cowshorses/valid/cows
    • /data/cowshorses/valid/horses

Use the following commands to copy the images from the local computer to the AWS EC2 Instance

To transfer the cows training set
scp -i key.pem /Users/x/Documents/images/cows/train/*  ubuntu@public-ip::~/data/cowshorses/train/cows

To transfer the horses training set
scp -i key.pem /Users/x/Documents/images/horses/train/*  ubuntu@public-ip::~/data/cowshorses/train/horses

To transfer the cows validation set
scp -i key.pem /Users/x/Documents/images/cows/valid/*  ubuntu@public-ip::~/data/cowshorses/valid/cows

To transfer the horses validation set
scp -i key.pem /Users/x/Documents/images/horses/valid/*  ubuntu@public-ip::~/data/cowshorses/valid/horses

Now that everything has been prepared, re-run the Jupyter Notebook, as stated under Starting Jupyter Notebook above (ensure you start from the top of the Notebook).

When I trained the model on ALL the images (less those in the valid folder) I got an accuracy of 0.95 ! Wow that is soo amazing! I didn’t do anything other than increase the amount of images in the training set.

Final thoughts

In a future blog post, I will show you how you can use the trained model to identify cows and horses from a unlabelled set of photos.

For now, I would highly recommend that you use the above mentioned image downloader to scrape Google for some other datasets. Then use the above instructions to train the model on those images and see what kind of accuracy you can achieve (maybe try identifying chickens and ducks?)

As mentioned before, once finished, don’t forget to shut down your AWS EC2 instance. If you don’t need it anymore, you can terminate it, to save on storage costs as well.

If you are keen about ML, you can check out the courses at http://www.fast.ai (they are free)

If you want to dabble in the maths behind ML, as perviously mentioned, Andrew Ng’s https://www.coursera.org/learn/machine-learning is one of the finest.

Lastly, if you are keen to take on some ML challenges, check out https://www.kaggle.com They have lots and lots competitions running all the time, some of which pay out actual money. There are lots of resources as well and you can learn off others on the site.

Till the next time, Enjoy 😉

Advertisements

Enabling Source Control for locally stored code using Git, Visual Studio Code and Sourcetree

Introduction

Coming from a system administration background, I am used to writing scripts to get mundane tasks done. Whenever I saw repeatable tasks, I saw an opportunity to script them, and pass them onto a junior to do 😉

However, writing scripts brings about its own challenges.

Ok, time to fess up 😉 Hands up those that have modified a script, only to realise that the modifications broke it! To make matters worse, you forgot to take a copy of the original!

Don’t worry, I have been in that boat, and can remember the countless hours I spent, getting the script back to what it was (mind you, I am not talking about a formal business change here, which is governed by strict change control, but about personal scripts, that you have created to make your daily tasks easier)

To make a copy of a script, I would normally suffix the file with the current time and date. This provided me with a timestamp of when I changed the file and a way of reverting my changes. However, there were instances when I was making backups of the modified script because I had tested a modification and it worked, however I didn’t want to risk breaking it when further modifying the file. Guess what, these are the times when I found I made the worst mistakes! I used to get so engrossed with my modifications that I would forget to make a backup of the changes and end up with an unworkable script. The only version to revert to was the original, which meant all my hard work went to waste!

This is why I started my search for a better change tracking system. One that will show me the changes I had made, and which will allow me to easily revert to a previous version.

Guess what! I think I just found this golden goose and it is truly amazing!

In this blog I will show you how you can use Git, an open source version control system,  to track changes to scripts stored locally on your computer. The main use of Git is for source control of files that a team contributes to. In these situations, a Git Server is used to store the repository.

Please ensure that the local folder you are tracking for source control is backed up either to the cloud or to an external hard disk.

For editing our code/script, we will use Microsoft’s Visual Studio Code, a free IDE that has Git support in-built. We will also use Sourcetree, Atlassian’s free Git client.

 

Introducing Git

Git is an awesome opensource distributed version control system. When working in a team, it allows you to have your files centrally managed, and at the same time, allowing multiple people to work on them. Team members can pull the repository to their local computer. They can also branch a part of the repository, update the files in that part and then merge them with the master. If there are no conflicts, Git will update the files in its repository. However, if there are conflicts, Git will inform that team member, showing them the conflicts. The team member then can either resolve the conflicts and then re-merge or discard their changes altogether.

If you want to read more on git, check out https://git-scm.com

To host the repositories for your team, two commonly used solutions are a Git Server or Visual Studio Teams Server. You can also use Github, however, your repositories will be public, unless you sign up for a paid account.

For personal use, you can store your git repositories in a local directory that is backed up to the cloud. For my personal projects, I use a Dropbox synchronised folder.

To use Git, you need to use a Git client. If you have a MacBook, a git client comes built-in. For windows, there are lots of clients available, however in my view, Sourcetree is one of the best (more about this abit later).

For MacBook users, below are some basic commands you can use from a terminal session

#change to directory where you will store your repository
cd /Users/tomj/Documents/git-repo/personalproject

#create a git repo in this folder
git init

#you can copy files into this folder
#to get git to start tracking the changes in the newly added files use the following command
git add .

 

As mentioned above, https://git-scm.com is an awesome site to learn more about Git.

You can also check out this page https://www.atlassian.com/git/tutorials/atlassian-git-cheatsheet for some quick git commands.

Using Sourcetree

For those that prefer a GUI client, I found that Sourcetree, from Atlassian, is an awesome Git client.

It gives you all the features of a good Git client plus it also shows you the history of all the changes made to the repository.

For this blog, I will be using Sourcetree to create and manage the Git repository.

So head over to https://www.sourcetreeapp.com to download and then install the client.

After installing Sourcetree, you will be prompted for a login account. Follow the links provided in the Sourcetree app to create a free Bitbucket account and then login.

Ok, lets begin.

Create a new repository

A repository is essentially a collection of files (or file) that we will track for changes. You can think of it as a directory.

To create a new repository, open Sourcetree.

From the menu, click on File and then click New. You will get the following screen.

Sourcetree_newRepo

Next, click on New and then click on Create Local Repository.

In the next window, for Destination Path, select the folder that will contain the scripts that you want to monitor for source control.

For  Name leave it to the default (name of the folder). Ensure the Type is Git and then click Create.

Guess what, thats all it takes to create a local repository! Simple ?

Once the repository is created, you will see a screen similar to the one shown below (my repository is called temp)

Sourcetree_newRepoCreated.

Double click on the newly created repository (as shown above). This will show the dashboard where everything happens 😉

Sourcetree_RepoDashboard

 

To see all the changes that have been made to the repository, click on History in the above screen.

Visual Studio Code

Ok, so we have created our repository and it is being monitored for changes. Now, we can start coding.

As mentioned above, we will be using Visual Studio Code, a free IDE from Microsoft. If you haven’t got it already, download it from https://code.visualstudio.com

Once installed, open Visual Studio Code.

From the menu, click on File and then click on  Open.  Next, choose the folder that you created the repository for above and then click on Open.

You will now see the folder structure, with all the files inside it in the left pane.

You can open any of the existing files or create new ones. For new ones, ensure you save them in the repository’s folder.

As soon as you save the file, you will notice the Source Control icon shows the number of changes that are currently ready to be staged (Source Control section is denoted by the “stethoscope” icon – ok it’s not really that but it surely looks like it 😉 )

VSC_SourceControl_Update

Now, one thing to note about Source Control via Git is that, you have to stage your changes. When you stage your changes, those changes will be written to the Git repository when you click Commit.

Click into the Source Control section and then under Changes click the + for each of the files, to stage the change.

VSC_SourceControl_Update_Changes

To commit the changes, enter a short description of what the changes were and then click on the tick at the top.

VSC_SourceControl_Update_Changes_Commit

That’s it. Your changes has now been successfully committed to the Git repository.

To view a history of all the changes that have been done to your repository, open Sourcetree and then click on History.

Notice the description column. This contains the comments you wrote when committing your staged changes. This provides a quick reminder of what the changes were. To drill down deeper into the changes, check the pane at the bottom right. Here, you will see the actual changes that were made (green denotes additions and red denotes deletion of characters). If there are multiple people committing to the same repo (as would be the case in a team), the names of each person will be shown beside each line in the History section.

Sourcetree_ViewHistory

Now, lets say that after you did your commit, you realised that you didn’t want that change, and in-fact you prefer what the file was before the commit. All you need to do is go into Sourcetree, in the History section, find the change and then right click on it and then click on Reverse commit. This reverses the commit and changes the file to what it previously was. If after that, you want to get back the change? Well, you can reverse the reverse commit 😉 (this is so much better than my method of copying the last suffixed version to the current version)

Soucetree_ReverseCommit

Closing Remarks

I am absolutely loving Git. It is an awesome tool and I would highly recommend it to each and everyone. For me personally, it helps in controlling the various changes I make to my code, with easy auditability and view of the changes I make between versions.

For teams, Git provides even more benefits. Using a central server (Git Server or Visual Studio Team Services) to host the Git repositories, the whole team can work on the files without blocking each other. The files will be stored centrally (actually with Git, when you pull a repo, you download the full repo to your local computer. If you merge your changes, the files are merged to the copy on the server). The changes to the files are easily trackable and there is an easy way to revert to a previous version should issues arise due to modifications.

I hope you embrace Git as I have and use it to track all your code changes.

Till the next time, Enjoy 😉

 

Deploying an Active Directory Forest using AWS CloudFormation

Introduction

Wow, it is amazing how time flies. Almost two years ago, I wrote a set of blogs that showed how one can use Azure Resource Manager (ARM) templates and Desired State Configuration (DSC) scripts to deploy an Active Directory Forest automatically.

For those that would like to take a trip down memory lane, here is the link to the blog.

Recently, I have been playing with AWS CloudFormation and I am simply in awe by its power. For those that are not familiar with AWS CloudFormation, it is a tool, similar to Azure Resource Manager, that allows you to “code” your computing infrastructure in Amazon Web Services. Long gone are the days when you would have to sit down, pressing each button and choosing each option to deploy your environment. Cloud computing provides you with a way to interface with the fabric, so that you can script the build of your environment. The benefits of this are enormous. Firstly, it allows you to standardise all your builds. Secondly, it allows you to have a live as-built document (the code is the as-built document). Thirdly, the code is re-useable. Most important of all, since the deployment is now scripted, you can automate it.

In this blog I will show you how to create an AWS CloudFormation template to deploy an AWS Elastic Compute Cloud (EC2) Windows Server instance. The template will also include steps to promote the EC2 instance to a Domain Controller in a new Active Directory Forest.

Guess what the best part is? Once the template has been created, all you will have to do is to load it into AWS CloudFormation, provide a few values and sit back and relax. AWS CloudFormation will do everything for you from there on!

Sounds interesting? Lets begin.

Creating the CloudFormation Template

A CloudFormation template starts with a definition of the parameters that will be used. The person running the template (lets refer to them as an operator) will be asked to provide the values for these parameters.

When defining a parameter, you will provide the following

  • a name for the parameter
  • its type
  • a brief description for the parameter so that the operator knows what it will be used for
  • any constraints you want to put on the parameter, for instance
    • a maximum length (for strings)
    • a list of allowed values (in this case a drop down list is presented to the operator, to choose from)
  • a default value for the parameter

For our template, we will use the following parameters.

Next, we will define some mappings. Mappings allow us to define the values for variables, based on what value was provided for a parameter.

When creating EC2 instances, we need to provide a value for the Amazon Machine Image (AMI) to be used. In our case, we will use the OS version to decide which AMI to use.

To find the subnet into which the EC2 instance will be deployed in, we will use the Environment and AvailabilityZone parameters to find it.

The code below defines the mappings we will use

The next section in the CloudFormation template is Resources. This defines all resources that will be created.

If you have any experience deploying Active Directory Forests, you will know that it is extremely simple to do it using PowerShell scripts. Guess what, we will be using PowerShell scripts as well 😉 Now, after the EC2 instance has been created, we need to provide the PowerShell scripts to it, so that it can run them. We will use AWS Simple Storage Service (S3) buckets to store our PowerShell scripts.

To ensure our PowerShell scripts are stored securely, we will allow access to it only via a certain role and policy.

The code below will create an AWS Identity and Access Management (IAM) role and policy to access the S3 Bucket where the PowerShell scripts are stored.

We will use cf-init to do all the heavy lifting for us, once the EC2 instance has been created. cf-init is a utility that is present by default in EC2 instances and we can ask it to perform tasks for us.

To trigger cf-init, we will use the Userdata feature of EC2 instance provisioning. cf-init, when started, will check the EC2 Metadata for the credentials it will use, and it will also check it for all the tasks it needs to perform.

Below is the metadata that will be used. For simplicity, I have hardcoded the URL to the files in the S3 bucket.

As you can see, I have first defined the role that cf-init will use to access the S3 bucket. Next, the following tasks will be carried out, in the order defined in the configuration set

  • get-files
    • it will download the files from S3 and place them in the local directory c:\s3-downloads\scripts.
  • configure-instance (the commands in this section are run in alphabetical order, that is why I have prefixed them with a number, to ensure it follows the order I want)
    • It will change the execution policy for PowerShell to unrestricted (please note that this is just for demonstration purposes and the execution policy should not be made this relaxed).
    • next, the name of the server will be changed to what was provided in the Parameters section
    • the following Windows Components will be installed (as defined in the Add-WindowsComponents.ps1 script file)
      • RSAT-AD-PowerShell
      • AD-Domain-Services
      • DNS
      • GPMC
    • the Active Directory Forest will be created, using the Configure-ADForest.ps1 script and the values provided in the Parameters section

In the last part of the CloudFormation template, we will provide the UserData information that will trigger cfn-init to run and do all the configuration. We will also tag the the EC2 instance, based on values from the Parameters section.

For simplicity, I have hardcoded the security group that will be attached to the EC2 instance (this is defined as GroupSet under NetworkInterfaces). You can easily create an additional parameter for this, if you want.

Finally, our template will output the instance’s hostname, environment it has been created in and its privateip. This provides an easy way to identify the EC2 instance once it has been created.

Below is the last part of the template

Now all you have to do is login to AWS CloudFormation, load the template we have created, provide the parameter values and sit back and relax.

AWS CloudFormation will take it from here and do everything for you 😉

How easy was that? Magic 🙂

The complete CloudFormation template is available at https://gist.github.com/nivleshc/867b1a2ca119c7d22cf215b5a9a5de02

The two PowerShell Scripts that are used in the CloudFormation template can be downloaded using the links below

Add-WindowsComponents.ps1

Configure-ADForest.ps1

For anyone deploying an Active Directory Forest in AWS, I hope the above comes in handy.

Enjoy 😉

Notes From The Field – Enabling GAL Segmentation in Exchange Online

Introduction

A few weeks back, I was tasked with configuring Global Address List (GAL) Segmentation for one of my clients. GAL Segmentation is not a new concept, and if you were to Google it (as you would do in this day and age), you will find numerous posts on it.

However, during my research, I didn’t find any ONE article that helped me. Instead I had to rely on multiple articles/blogposts to guide me into reaching the result.

For those that are new to GAL Segmentation, this can be a daunting task. This actually is the inspiration for this blog, to provide the steps from an implementers view, so that you get the full picture about the preparation, the steps involved and the gotchas so that you feel confident about carrying out this simple yet scary change.

This blog will be focus on GAL Segmentation for an Exchange Online hybrid setup.

So what is GAL Segmentation?

I am glad you asked 😉

By default, in Exchange Online (and On-Premises Exchange environment as well), a global address list is present. This GAL contains all mail enabled objects contained in the Exchange Organisation. There would be mailboxes, contacts, rooms, etc.

This is all well and good, however, at times a company might not want everyone to see all the objects in the Exchange environment. This might be for various reasons, for instance, the company has too many employees and it won’t make sense to have a GAL a mile long. Or, the company might have different divisions, which do not require to correspond to each other. Or the company might be trying to sell off one of its divisions, and to start the process, is trying to separate the division from the rest of the company.

For this blog, we will use the last reason, as stated above. A “filter” will be applied to all users who are in division to be sold off, so that when they open their GAL, they only see objects from their own division and not everyone in the company. In similar fashion, the rest of the company will see all objects except the division that will be sold off. Users will still be able to send/receive emails with that particular division, however the GAL will not show them.

I would like to make it extremely clear that GAL Segmentation DOES NOT DELETE any mail enabled objects. It just creates a filtered version of the GAL for the user.

Introducing the stars

Lets assume there was once a company called TailSpin Toys. They owned the email namespace tailspintoys.com and had their own Exchange Online tenant.

One day, the board of TailSpin Toys decided to acquire a similar company called WingTip ToysWingTip Toys had their own Exchange Online Tenant and used the email namespace wingtiptoys.com. After the acquisition, WingTip Toys email resources were merged into the TailSpin Toys Exchange Online tenant, however WingTip Toys still used their wingtiptoys.com email namespace.

After a few years, the board of TailSpin Toys decided it was time to sell of WingTip Toys. As a first step, they decided to implement GAL Segmentation between TailSpin Toys and WingTip Toys users.

Listed below is what was decided

  • TailSpin Toys users should only see email objects in their GAL corresponding to their own email namespace (any object with the primary smtp address of @tailspintoys.com). They should not be able to see any WingTip Toys email objects.
  • Only TailSpin Toys users will be able to see Public Folders in their GAL
  • WingTip Toys users should only see email objects in their GAL corresponding to their own email namespace (any object with the primary smtp address of @wingtiptoys.com). They should not be able to see any TailSpin Toys email objects.
  • The All Contacts in the GAL will be accessible to both WingTip Toys and TailSpin Toys users.

The Steps

Performing a GAL Segmentation is a very low risk change. The steps that will be carried out are as follows

  • Create new Global Address Lists, Address Lists, Offline Address Book and Address Book Policy for TailSpin Toys and WingTip Toys users.
  • Assign the respective policy to TailSpin Toys users and WingTip Toys users

The only issue is that by default, no users are assigned an Address Book Policy (ABP) in Exchange Online (ABPs are the “filter” that specifies what a user sees in the GAL).

Due to this, when we are creating the new address lists, users might see them in their GAL as well and get confused as to which one to use. If you wish to carry out this change within business hours, the simple remedy to the above issue is to provide clear communications to the users about what they could expect during the change window and what they should do (in this case use the GAL that they always use). Having said that, it is always a good practice to carry out changes out of business hours.

Ok, lets begin.

  • By default, the Address Lists Management role is not assigned in Exchange Online. The easiest way to assign this is to login to the Exchange Online Portal using a Global Administrator account and add this role to the Organization Management role group. This will then provide all the Address List commands to the Global Administratos.
  • Next, connect to Exchange Online using PowerShell
  • For TailSpin Toys
    • Create a default Global Address List called Default TST Global Address List
    • New-GlobalAddressList -Name "Default TST Global Address List" -RecipientFilter {((Alias -ne $null) -and (((ObjectClass -eq 'user') -or (ObjectClass -eq 'contact') -or (ObjectClass -eq 'msExchSystemMailbox') -or (ObjectClass -eq 'msExchDynamicDistributionList') -or (ObjectClass -eq 'group') -or (ObjectClass -eq 'publicFolder'))) -and (WindowsEmailAddress -like "*@tailspintoys.com") )}
    • Create the following Address Lists
      • All TST Distribution Lists
      • New-AddressList -Name "All TST Distribution Lists" -RecipientFilter {((Alias -ne $null) -and (ObjectCategory -like 'group') -and (WindowsEmailAddress -like "*@tailspintoys.com"))}
      • All TST Rooms
      • New-AddressList -Name "All TST Rooms" -RecipientFilter {((Alias -ne $null) -and (((RecipientDisplayType -eq 'ConferenceRoomMailbox') -or (RecipientDisplayType -eq 'SyncedConferenceRoomMailbox'))) -and (WindowsEmailAddress -like "*@tailspintoys.com"))}
      • All TST Users
      • New-AddressList -Name "All TST Users" -RecipientFilter {((Alias -ne $null) -and (((((((ObjectCategory -like 'person') -and (ObjectClass -eq 'user') -and (-not(Database -ne $null)) -and (-not(ServerLegacyDN -ne $null)))) -or (((ObjectCategory -like 'person') -and (ObjectClass -eq 'user') -and (((Database -ne $null) -or (ServerLegacyDN -ne $null))))))) -and (-not(RecipientTypeDetailsValue -eq 'GroupMailbox')))) -and (WindowsEmailAddress -like "*@tailspintoys.com"))}
    • Create an Offline Address Book called TST Offline Address Book (this uses the Default Global Address List that we had just created)
    • New-OfflineAddressBook -Name "TST Offline Address Book" -AddressLists "Default TST Global Address List"
    • Create an Address Book Policy called TST ABP
    • New-AddressBookPolicy -Name "TST ABP" -AddressLists "All Contacts", "All TST Distribution Lists", "All TST Users", “Public Folders” -RoomList "All TST Rooms" -OfflineAddressBook "TST Offline Address Book" -GlobalAddressList "Default TST Global Address List"
  • For WingTip Toys
    • Create a default Global Address List called Default WTT Global Address List
    • New-GlobalAddressList -Name "Default WTT Global Address List" -RecipientFilter {((Alias -ne $null) -and (((ObjectClass -eq 'user') -or (ObjectClass -eq 'contact') -or (ObjectClass -eq 'msExchSystemMailbox') -or (ObjectClass -eq 'msExchDynamicDistributionList') -or (ObjectClass -eq 'group') -or (ObjectClass -eq 'publicFolder'))) -and (WindowsEmailAddress -like "*@wingtiptoys.com") )}
    • Create the following Address Lists
      • All WTT Distribution Lists
      • New-AddressList -Name "All WTT Distribution Lists" -RecipientFilter {((Alias -ne $null) -and (ObjectCategory -like 'group') -and (WindowsEmailAddress -like "*@wingtiptoys.com"))}
      • All WTT Rooms
      • New-AddressList -Name "All WTT Rooms" -RecipientFilter {((Alias -ne $null) -and (((RecipientDisplayType -eq 'ConferenceRoomMailbox') -or (RecipientDisplayType -eq 'SyncedConferenceRoomMailbox'))) -and (WindowsEmailAddress -like "*@wingtiptoys.com"))}
      • All WTT Users
      • New-AddressList -Name "All WTT Users" -RecipientFilter {((Alias -ne $null) -and (((((((ObjectCategory -like 'person') -and (ObjectClass -eq 'user') -and (-not(Database -ne $null)) -and (-not(ServerLegacyDN -ne $null)))) -or (((ObjectCategory -like 'person') -and (ObjectClass -eq 'user') -and (((Database -ne $null) -or (ServerLegacyDN -ne $null))))))) -and (-not(RecipientTypeDetailsValue -eq 'GroupMailbox')))) -and (WindowsEmailAddress -like "*@wingtiptoys.com"))}
    • Create an Offline Address Book called WTT Offline Address Book (this uses the Default Global Address List that we had just created)
    • New-OfflineAddressBook -Name "WTT Offline Address Book" -AddressLists "Default WTT Global Address List"
    • Create an Address Book Policy called WTT ABP
    • New-AddressBookPolicy -Name "WTT ABP" -AddressLists "All Contacts", "All WTT Distribution Lists", "All WTT Users" -RoomList "All WTT Rooms" -OfflineAddressBook "WTT Offline Address Book" -GlobalAddressList "Default WTT Global Address List"
  • Once you create all the Address Lists, after a few minutes, you will be able to see them using Outlook Client or Outlook Web Access. One of the obvious things you will notice is that they are all empty! If you are wondering if the recipient filter is correct or not, you can use the following to confirm the membership
  • Get-Recipient -RecipientPreviewFilter (Get -AddressList -Identity {your address list name here}).RecipientFilter

    Aha, you might say at this stage. I will just run the Update-AddressList cmdlet. Unfortunately, this won’t work since this cmdlet is only available for On-Premises Exchange Servers. There is none for Exchange Online. Hmm. How do I update my Address Lists ? Its not too difficult. All you have to do is change some attribute of the members and they will start popping into the Address List! For a hybrid setup, this means we will have to change the setting using On-Premise Exchange Server and use Azure Active Directory Connect Server to replicate the changes to Azure Active Directory, which in turn will update Exchange Online objects, thereby updating the newly created Address Lists. Simple? Yes. Lengthly? Yes indeed

  • I normally use CustomAttribute for such occasions. Before using any CustomAttribute, ensure it is not used by anything else. You might be able to ascertain this by checking if for all objects, that CustomAttribute currently holds any value or not. Lets assume CustomAttribute10 can be used.
    #Get all On-Premise Mailboxes
    $OnPrem_MBXs = Get-Mailbox -Resultsize unlimited
    
    #Get all Exchange Online Mailboxes
    $EXO_MBXs = Get-RemoteMailbox -Resultsize Unlimited
    
    #Get all the Distribution Groups
    $All_DL = Get-DistributionGroup -Resultsize unlimited
    
    #Update the CustomAttribute10 Value
    #Since Room mailboxes are a special type of Maibox, the following update will
    #address Room Mailboxes as well
    
    $OnPrem_MBXs | Set-Mailbox -CustomAttribute10 “GAL”
    $EXO_MBXs | Set-RemoteMailbox -CustomAttribute10 “GAL”
    
    $All_DL | Set-DistributionGroup -CustomAttribute10 “GAL”
  • Using your Azure Active Directory Connect server run a synchronization cycle so that the updates are synchronized to Azure Active Directory and subsequently to Exchange Online
  • One gotcha here is if you have any Distribution Groups that are not synchronised from OnPremises. You will have to find these and update their settings as well. One simple way to find them is to use the property isDirSynced. Connect to Exchange Online using PowerShell and then use the following command
  • $All_NonDirsyncedDL = Get-DistributionGroup -Resultsize unlimited| ?{$_.isdirsynced -eq $FALSE}   
    
    #Now, we will update CustomAttribute10 (please check to ensure this customAttribute doesn't have any values)
     
    $All_NonDirSyncedDL | Set-DistributionGroup -CustomAttribute10 "GAL"
  • Check using Outlook Client or Outlook Address Book to see that the new Address Lists are now populated
  • Once confirmed that the new Address Lists have been populated, lets go assign the new Address Book Policies to TailSpin Toys and WingTip Toys users It can take anywhere from 30min – 1hr for the Address Book Policy to take effect
  • $allUserMbx = Get-Mailbox -RecipientTypeDetails UserMailbox -Resultsize unlimited
    
    #assign "TST ABP" Address Book Policy to TailSpin Toys users
    
    $allUserMbx | ?{($_.primarysmtpaddress -like "*@tailspintoys.com")} | Set-Mailbox -AddressBooksPolicy “TST ABP”
    
    #assign "WTT ABP" Address Book Policy to WingTip Toys users
    $allUserMbx | ?{($_.primarysmtpaddress -like "*@wingtiptoys.com")} | Set-Mailbox -AddressBooksPolicy “WTT ABP”
  • While waiting, remove the CustomAttribute10 values you had populated. Using PowerShell on On-Premises Exchange Server, run the following
  • #Get all On-Premise Mailboxes
    
    $OnPrem_MBXs = Get-Mailbox -Resultsize unlimited
    
    #Get all Exchange Online Mailboxes
    
    $EXO_MBXs = Get-RemoteMailbox -Resultsize Unlimited
    
    #Get all the Distribution Groups
    
    $All_DL = Get-DistributionGroup -Resultsize unlimited
    
    #Set the CustomAttribute10 Value to null
    
    #Since Room mailboxes are a special type of Maibox, the following update will
    
    #address Room Mailboxes as well
    
    $OnPrem_MBXs | Set-Mailbox -CustomAttribute10 $null
    
    $EXO_MBXs | Set-RemoteMailbox -CustomAttribute10 $null
    
    $All_DL | Set-DistributionGroup -CustomAttribute10 $null
  • Connect to Exchange Online using PowerShell and remove the value that was set for CustomAttribute10 for nonDirSynced Distribution Groups
  • $All_NonDirsyncedDL = Get-DistributionGroup -Resultsize unlimited| ?{$_.isdirsynced -eq $FALSE}   
    
    #Change CustomAttribute10 to $null
    
    $All_NonDirSyncedDL | Set-DistributionGroup -CustomAttribute10 $null

     

    Thats it folks! Your GAL Segmentation is now complete! Users from TailSpin Toys will only see TailSpin Toys mail enabled objects and WingTip Toys users will only see WingTip Toys mail enabled objects

A few words of wisdom

In the above steps, I would advise that once the new Address Lists have been populated

  • apply the Address Book Policy to a few test mailboxes
  • wait between 30min – 1 hour, then confirm that the Address Book Policy has been successfully applied to the test mailboxes and has the desired result
  • once you have confirmed that the test mailboxes had the desired result for ABP, then and ONLY then continue to apply the ABP to the rest of the mailboxes

This will give you confidence that the change will be successful. Also, if you find that there are issues, the rollback is not too difficult and time consuming.

Another thing to note is that when users have their Outlook client configured to use  cached mode, they might notice that their new GAL is not fully populated. This is because their Outlook client uses the Offline Address Book to show the GAL and at that time, the Offline Address Book would not have regenerated to include all the new members. Unfortunately in Exchange Online, the Offline Address Book cannot be regenerated on-demand and we have to wait for the the Exchange Online servers to do this for us. I have noticed the regeneration happens twice in 24 hours, around 4am and 4pm AEST (your times might vary). So if users are complaining that their Outlook Client GAL doesn’t show all the users, confirm using Outlook Web Access that the members are there (or you can run Outlook in non-cached mode) and then advise the users that the issue will be resolved when the Offline Address Book gets re-generated (in approximately 12 hours). Also, once the Offline Address Book has regenerated, it is best for users to manually download the latest Offline Address Book, otherwise Outlook client will download it at a random time in the next 24 hours.

The next gotcha is around which Address Lists are available in Offline mode (refer to the screenshot below)

GAL01

When in Offline mode, the only list available is Offline Global Address List . This is the one that is pointed to by the  green arrow. Note that the red arrow is pointing to Offline Global Address List as well however this is an “Address List” that has been named Offline Global Address List by Microsoft to confuse people! To repeat, the Offline Global Address List pointed to by the green arrow is available in Offline mode however the one pointed to by red is not!

In our case, the Offline Global Address List is named Default TST Global Address List and Default WTT Global Address List).

If you try to access any others in the drop down list when in Offline mode, you will get the following error

AddressListError

This has always been the case, unfortunately hardly anyone tries to access all the Address Lists in Offline mode. However, after GAL Segmentation, if users receive the above error, it is very easy to blame the GAL Segmentation implementation 😦 Rest assured, this is not the case and this “feature” has always been present.

Lastly, the user on-boarding steps will have to be modified to ensure that when their mailbox is created, the appropriate Address Book Policy is applied. This will ensure they only see the address lists that they are supposed to (on the flip side, if no address book policy is applied, they will see all address lists, which will cause a lot of confusion!)

With these words, I will now stop. I hope this blog comes in handy to anyone trying to implement GAL Segmentation.

If you have any more gotchas or things you can think of regarding GAL Segmentation, please leave them in the comments below.

Till the next time, Enjoy 😉

 

 

 

Amazon QuickSight – An elegant and easy to use business analytics tool

Introduction

Recently, I had a requirement for a tool to visualise some data I had collected. My requirements were very simple. I didn’t want something that would cost me a lot, and at the same time I wanted the reports to be elegant and informative. Most of all, I didn’t want to have to go through pages and pages of documentation to learn how to use it.

As my data was within Amazon Web Services (AWS), I thought to check if AWS had any such offerings. Guess what, there was indeed a tool just for what I wanted, and after using it, I was amazed at how simple and elegant it is.

In this blog, I will show how you can easily get started with Amazon QuickSight. I will take you through the steps to import your data into Amazon QuickSight and then create some informative visualisations.

Some background on Amazon QuickSight

Pricing

Amazon QuickSight is very inexpensive, infact, if your data is not too much, you won’t have to pay anything!

For standard edition use, Amazon QuickSight provides 1GB of SPICE for the first user free per month. SPICE is an acronym for Super-fast, Parallel, In-memory, Calculation Engine and it uses a combination of columnar storage, in-memory technologies enabled through the latest hardware innovations, machine code generation, and data compression to allow users to run interactive queries on large datasets and get rapid responses.  SPICE is the calculation engine that Amazon QuickSight uses.

Any additional SPICE is priced at $USD0.25 per GB/month. For the latest pricing, please refer to https://aws.amazon.com/quicksight/#Pricing

Data Sources

Currently Amazon QuickSight supports the following data sources

  • Relational Data Sources
    • Amazon Athena
    • Amazon Aurora
    • Amazon Redshift
    • Amazon Redshift Spectrum
    • Amazon S3
    • Amazon S3 Analytics
    • Apache Spark 2.0 or later
    • Microsoft SQL Server 2012 or later
    • MySQL 5.1 or later
    • PostgreSQL 9.3.1 or later
    • Presto 0.167 or later
    • Snowflake
    • Teradata 14.0 or later
  • File Data Sources
    • CSV/TSV – (comma separated, tab separated value text files)
    • ELF/CLF – Extended and common log format files
    • JSON – Flat or semi-structured data files
    • XLSX – Microsoft Excel files

Unfortunately, currently Amazon DynamoDB is not supported as a native data source. Since my data is in Amazon DynamoDB, I had to write some custom lambda functions to export it to a csv file, so that it could be imported into Amazon QuickSight.

Ok, time for that walk-through I promised earlier.  For this blog, I will be using an S3 bucket as my data source. It will contain the CSV files that I will use for analysis in Amazon QuickSight.

Step 1 – Create S3 buckets

If you haven’t already done so, create an S3 bucket that will contain the csv files. The S3 bucket does not have to be publicly accessible. Once created, upload the csv files into the S3 bucket.

In my case, the csv file is called orders.csv and its location is https://s3.amazonaws.com/sample/orders.csv (to get the URL to your S3 file, login to the S3 console and navigate to the S3 bucket that contains the file. Click the S3 bucket to open it, then click the file name to open its properties. Under Overview you will see Link. This is the URL to the file)

Step 2 – Create an Amazon QuickSight Account

Before you start using Amazon QuickSight, you must create an account. Unfortunately, I couldn’t find a way for creating an Amazon QuickSight account without creating an Amazon AWS account. If you don’t have an existing Amazon AWS account, you can create an AWS Free Tier account. Once you have got an AWS account, go ahead and create an Amazon QuickSight account at https://aws.amazon.com/quicksight/.

While creating your Amazon QuickSight account, you will be asked if you would like Amazon QuickSight to auto-discover your Amazon S3 buckets. Enable this and then click to Choose S3 buckets. Choose the S3 bucket that you created in Step 1 above. This will give Amazon QuickSight read-only access to the S3 bucket, so that it can read the data for analysis.

Step 3 – Create a manifest file

A manifest file is a JSON file that provides the location and format of the data files to Amazon QuickSight. This is required when creating a data set for S3 data sources. Please refer to https://docs.aws.amazon.com/quicksight/latest/user/supported-manifest-file-format.html if you would like more information about manifest files.

Below is my manifest file, which I have affectionately named ordersmanifest.json.

{
   "fileLocations": [
      {
         "URIs": [
            "https://s3.amazonaws.com/sample/orders.csv"
         ]
      },
   ],
   "globalUploadSettings": {
      "format": "CSV",
      "delimiter": ",",
      "textqualifier": "'",
      "containsHeader": "true"
   }
}

Once created, upload the manifest file into the same S3 bucket as to where the csv file is stored.

Step 4 – Create a data set

  • Login to your Amazon QuickSight account. From the top right, click on Manage data
  • In the next screen, click on New data set
  • In the next screen, for Create a Data Set FROM NEW DATA SOURCES, click on S3
  • In the next screen
    • provide a name for the data source
    • for Upload a manifest file ensure URL is clicked and enter the URL to the manifest file (you can get the url by logging into the S3 console, and then clicking on the manifest file to reveal its properties. Under the Overview tab, you will see Link. This is the URL to the manifest file).NewS3DataSource
    • Click Connect
    • Amazon QuickSight will now read the manifest file and then import the csv file to SPICE. You will see the following screenFinishDataSetCreation
    • Click on Edit/Preview data.
    • In the next screen, you will see the contents of the data file that was imported, along with the Fields name on the left. If you want to exclude any columns from the analysis, simply untick them (I unticked orderTime (S) since I didn’t need it) EditPreviewDataSet
    • By default, the data is called Group 1. To customise the name, replace Group 1 with a text of your choice (I have renamed my data to Orders Data)RenameGroup1Label
    • Click Save & visualize from the top menu

Step 5 – Create Visualisations

Now that you have imported the data into SPICE, you can start analysing it and creating visualisations.

After step 4, you should be in the Analysis section.

  • Depending on which visualisation you want, you can select the respective type under Visual types from the bottom left hand side of the screen. For my visualisations, I chose Pie Chart (side note – you will notice that orderTime (S)  isn’t listed under Fields list. This is because we had unticked it in the previous screen)OrdersDataAnalysis-01
  • I want to create two Pie Charts, one to show me analysis about what is the most popular foodName and another to find out what is the most popular drinkName. For the first Pie Chart, drag foodName (S) from the Fields list to the Value – Add a measure here box  in the top of the screen. Then drag foodName (S) from the Fields list to the Group/Color – Add a dimension here box in the top of the screen. You will see the followingOrdersDataAnalysis-02
  • You can customise the visualisation title Count of Foodname (S) by Foodname (S) by clicking it and then changing the text (I have changed the title to Popularity of Food Types)FoodNamePopularity
  • If you look closely, the legend on the right hand side doesn’t serve much purpose since the pie slices are already labelled quite well. You can also get rid of the legend and get more space for your visual. To do this, click on the down arrow above FoodName (S) on the right and then select Hide legend FoodNameHideLengend
  • Next, lets create a Pie Chart visualisation for drinkName. From the top menu, click on Add and then Add visual drinkNameAddVisual
  • You will now have another Canvas at the bottom of the first Pie Chart. Click this new canvas area to select it (a blue border will appear to show that it is selected). From Visual types at the bottom left hand side, click on the Pie Chart visual. Then from the top, click on Field wells to expose the Value and Group/Color boxes for the second canvas drinkNameCanvas
  • From the Field list on the left, drag drinkName (S) to the Value – Add a measure here box  in the top of the screen. Then drag drinkName (S) from the Fields list to the Group/Color – Add a dimension here box in the top of the screen. You will now see the following foodanddrinkvisual
  • We are almost done. I actually want the two Pie Charts to sit side by side, instead of one ontop ofthe other. To do this, I will show you a neat trick. In each of the visuals, at the bottom right border, you will see two diagonal lines. If you move your mouse pointer over them, they change to a resizing cursor. Use this to resize the visual’s canvas area. Also, in the middle of the top border of the visual, you will see two rows of gray dots. Click your mouse pointer on this and drag to the location you want to move the visual to.VisualResizeandMove
  • I have hidden the legend for the second visual, customised the title and resized both the visuals and moved them side by side. Viola! Below is what I get. Not bad aye!BothVisualsSidebySide

Step 6 – Create a dashboard

Now that the visuals have been created, they can be shared it with others. This can be done by creating a dashboard. A dashboard is a read-only snapshot of the analysis. When you share the dashboard with others, they can view and filter the dashboard data, however any filters applied to the dashboard visual exist only when the user is viewing the dashboard, and aren’t saved once it is closed.

One thing to note about sharing dashboards – you can only share dashboards with users who have an Amazon QuickSight account.

Creating a dashboard is very easy.

  • In the Analysis screen, on the top right corner, click on Share and then select Create dashboardCreateDashboard
  • You can either replace an existing dashboard or create a new one. In our case, since we are creating a new dashboard, select Create a new dashboard as and enter a name for the dashboard. Once finished, click Create dashboardCreateDashboard-Name
  • You will then be asked to enter the username or email address of those you want to share the dashboard with. Enter this and click on Share ShareDashboard
  • That’s it, your dashboard is now created. To access it, go to the Amazon QuickSight home screen (click on the Amazon QuickSight icon on the top left hand side of the screen) and then click on All dashboards. Those that you have shared the dashboard with will also be able to see it once they login to their Amazon QuickSight account.AllDashboards

Step 6 – Refreshing the Data Set

If your data set continually changes, your visualisations/dashboards will not show the updated information. This can be done by refreshing the data set. Doing this will import the new data into SPICE, which will then automatically update the analysis/visualisations and dashboards

Note: you will have to manually reload the webpage to see the updated visualisations and dashboard

There are two ways of refreshing data sets. One is to do it manually while the other is to use a schedule. The scheduled data refresh allows for the data to be automatically refreshed at a certain time daily, weekly or monthly. A maximum of five scheduled refreshes can be configured.

The steps below show how you can manually refresh the data or create schedules to refresh the data

  • From the Amazon QuickSight main screen, click on Manage data from the top left of the screen ManageData
  • In the next screen, you will see all your currently configured data sets. Click the Orders Data dataset (this is the one we had created previously).
  • In the next screen, you will see Refresh Now and Schedule refreshManualScheduleDataRefresh
  • Clicking on Refresh Now will manually refresh the data. Clicking on Schedule refresh will bring up the screen where you can configure a schedule for refreshing the data automatically.

 

That’s it folks! Wasn’t that simple? If you already have an Amazon AWS account, I would strongly recommend giving Amazon QuickSight a try for all your analytics needs. Even if you don’t have an Amazon AWS account, I would still suggest getting an AWS free tier account to try it out.

Enjoy 😉

 

Using Microsoft Identity Manager Synchronisation Server’s Global Address List Synchronisation feature to create a shared global address book across three Exchange Forests

Introduction

Over the life of a company, there can be many acquisitions and mergers. During such events, the parent and the newly acquired entities have their IT “merged”. This allows for the removal of redundant systems and the reduction of expenses. It also fosters collaboration between the two entities. Unfortunately, the marriage of the two IT systems, can at times, take a long time.

To enable a more collaborative space between the parent and the newly acquired company, a shared “global address book” can be created, which will allow employees to quickly look up each others contact details easily.

In this blog, I will show how we can use Microsoft Identity Manager (MIM) 2016 Synchronisation Server’s  GALSync feature to extend the Global Address Book (GAL) of three Exchange Forests. The GAL will be populated with contacts corresponding to  mailboxes in the other Exchange Forests, and this will be automatically maintained, to ensure the contacts remain up-to-date.

Though this blog focuses on three Exchange Forests, it can easily be adapted for two Exchange Forests, if you remove all reference to the third AD Forest, AD Domain and Exchange Forest

For reference, we will be using the following:

Name: Contoso Limited (parent company)
Active Directory Forest: contoso.com
Active Directory Domain: contoso.com
Active Directory Forest Level: Windows Server 2008 R2
Exchange Server FQDN: CEX01.contoso.com
Exchange Server Version: Exchange 2010 SP3
Email Address Space owned: contoso.com, contoso.com.au
Number of employees: 2000

Name: Northwind Traders (newly acquired)
Active Directory Forest: northwind.com
Active Directory Domain: northwind.com
Active Directory Forest Level: Windows Server 2008 R2
Exchange Server FQDN: NWEX01.northwind.com
Exchange Server Version: Exchange 2010 SP3
Email Address Space owned: northwind.com, northwind.com.au
Number of employees: 400

Name: WingTip Toys (newly acquired)
Active Directory Forest: wingtiptoys.com
Active Directory Domain: wingtiptoys.com
Active Directory Forest Level: Windows Server 2008 R2
Exchange Server FQDN: WTTEX01.wingtiptoys.com
Exchange Server Version: Exchange 2010 SP3
Email Address Space owned: wingtiptoys.com, wingtiptoys.com.au
Number of employees: 600

 

Contoso, Northwind and WingTip Toys are connected using a wide area network and it has been decided that the MIM Synchronisation Server will be installed and configured in the the Contoso domain.

Preparation

Before we start, some preparation work has to be done to ensure there are no roadblocks or issues.

  • Cleanup of “inter forest” email objects
    • This is one of the most important things that must be done and I can’t stress this enough. You will have to go through all your email objects (mailboxes, contacts, mailuser objects) in each of the three Exchange Forests (Contoso, Northwind, WingTip Toys) and find any that are forwarding to the other Exchange forests. If there are any, these must be removed. GALSync will create email enabled contacts corresponding to the mailboxes in the other Exchange Forests, with  externalemailaddress of these new objects set to the primary email address of the other Exchange Forest’s objects. If duplicates arise because there were existing objects in the local Exchange Forest corresponding to the other Exchange Forest’s objects, this will cause the local Exchange Server to get confused and it will keep on queuing emails for these objects and will not deliver them [if after implementing GALSync, some users complain about not receiving emails from a certain Exchange Forest, this could be a possible reason]
  • Creation of Organisational Units (OU) that will be used by GALSync
    • Create the following Organisational Units in the three Active Directory domains
      • contoso.com\GALSync\LocalForest\Contacts
      • contoso.com\GALSync\RemoteForest\Contacts
      • northwind.com\GALSync\LocalForest\Contacts
      • northwind.com\GALSync\RemoteForest\Contacts
      • wingtiptoys.com\GALSync\LocalForest\Contacts
      • wingtiptoys.com\GALSync\RemoteForest\Contacts
  • Service Accounts
    • The following service accounts must be created in the specified Active Directory domains. You can change the name to comply with your own naming standards
      • MIM Synchronisation Server Service Account
        • UPN: svc-mimsync@contoso.com
        • AD Domain to create in: contoso.com
        • Permissions: non-privileged Active Directory service account
      • Management Agent Account to connect to Contoso.com AD Domain
        • UPN: svc-mimadma@contoso.com
        • AD Domain to create in: contoso.com
        • Permissions
          • non-privilged Active Directory service account
          • Grant “Replicating Directory Changes” permission
          • Grant the following permissions on the GALSync OU in the Contoso AD Domain that was created above. Ensure the permissions propagate to all sub-OUs within the GALSync OU
            • Create Contact Objects
            • Delete Contact Objects
            • Read all Properties
            • Write all Properties
          • Add to the Exchange Organization Management Active Directory security group in Contoso AD Domain
      • Management Agent Account to connect to Northwind.com AD Domain
        • UPN: svc-mimadma@northwind.com
        • AD Domain to create in: nothwind.com
        • Permissions
          • non-privilged Active Directory service account
          • Grant “Replicating Directory Changes” permission
          • Grant the following permissions on the GALSync OU in the Northwind AD Domain that was created above. Ensure the permissions propagate to all sub-OUs within GALSync OU
            • Create Contact Objects
            • Delete Contact Objects
            • Read all Properties
            • Write all Properties
          • Add to the Exchange Organization Management Active Directory security group in Northwind AD Domain
      • Management Agent Account to connect to WingTiptoys.com AD Domain
        • UPN: svc-mimadma@wingtiptoys.com
        • AD Domain to create in: wingtiptoys.com
        • Permissions
          • non-privilged Active Directory service account
          • Grant “Replicating Directory Changes” permission
          • Grant the following permissions on the GALSync OU in the WingTipToys AD Domain that was created above. Ensure the permissions propagate to all sub-OUs within GALSync OU
            • Create Contact Objects
            • Delete Contact Objects
            • Read all Properties
            • Write all Properties
          • Add to the Exchange Organization Management Active Directory security group in Northwind AD Domain
      • Service account used for the scheduled task job that will run the MIM RunProfiles script on the MIM Synchronisation Server
        • UPN: svc-mimscheduler@contoso.com
        • AD Domain to create in: Contoso.com (this can also be a local account on the MIM Synchronisation Server)
        • Permissions
          • non-privileged Active Directory service account
          • Grant “Log on as a batch job” user right on the MIM Synchronisation Server
          • Add to FIMSyncOperators security group on the MIM Synchronisation Server (this security group is created locally on the MIM Synchronisation Server after MIM Synchronisation Server has been installed)
  • SQL Server Permissions
    • MIM Synchronisation Server requires a Microsoft SQL Server to host its database. On the SQL Server, grant SQL SYSADMIN role to the account that you will be logged on as when installing MIM Synchronisation Server

Configuration

Provision a Microsoft Windows Server 2012 R2 in the Contoso.com Active Directory domain and install MIM 2016 Synchronisation Server. During installation, specify svc-mimsync@contoso.com as the account under which the MIM Synchronisation Service will run.

One thing to note is that GALSync will update the proxyaddress field for all mailboxes in its scope (mailboxes for which it will be creating contacts in the other Exchange Forests) with X500 entries.

Management Agent Configuration

  1. Once the MIM Synchronisation Server has been successfully installed, use the following steps to create the GALSync Management Agents. Open the Synchronisation Service Manager
    • Create GALSync Management Agent for Contoso.com AD Forest
      • From Tools menu, click Management Agents and then click Create
      •  In the Management Agent drop-down list, click Active Directory global address list (GAL) 
      • In the name type GALSyncMA for Contoso.com
      • On the Connect to an Active Directory Forest page, type the forest name, the MIM MA account details (svc-mimadma@contoso.com) and the domain name
      • In the next screen, specify the OUs that GALSync will query to find mailboxes to create contacts for in the other forests. Also, place a tick beside contoso.com\GALSync (this selects GALSync and all sub-OUs)
      • In the Containers screen, for
        • Target Container select Contoso.com\GALSync\RemoteForest\Contactsthis is the OU where MIM GALSync will create contacts corresponding to the mailboxes in Northwind and WingTipToys Exchange Forest
        • Source Container select Contoso.com\GALSync\LocalForest\Contactsthis is where MIM GALSync will create contacts corresponding to Contoso.com mailboxes. These will be sent to the GALSync/RemoteForest/Contacts OU in Northwind and WingTipToys AD Domain (personally, I haven’t seen any objects created in this OU)
      • In Exchange Configuration click Edit and enter all the email suffixes that belong to Contoso.com. The email suffixes listed here are used to filter out which email addresses from the original email object are added to the corresponding contact in the other Exchange Forests. In this case the email suffixes will be @contoso.com and @contoso.com.au. Note the @ before the email suffix)
      • Leave everything else as default and proceed to the Configure Extensions section. One thing I would like to mention here is that in Configure Connection Filter section, the Filter Type for user is supposed to be Declared (and is the default setting), not Rules extension as stated in https://technet.microsoft.com/en-us/library/cc708642(v=ws.10).aspx
      • In the Configure Extensions section, set the following
      • Click OK
    • Create GALSync Management Agent for Northwind.com AD Forest
      • From Tools menu, click Management Agents and then click Create
      •  In the Management Agent drop-down list, click Active Directory global address list (GAL) 
      • In the name type GALSyncMA for Northwind.com
      • On the Connect to an Active Directory Forest page, type the forest name, the MIM MA account details (svc-mimadma@northwind.com) and the domain name
      • In the next screen, specify the OUs that GALSync will query to find mailboxes to create contacts for in the other forests. Also, place a tick beside northwind.com\GALSync (this selects GALSync and all sub-OUs)
      • In the Containers screen, for
        • Target Container select Northwind.com\GALSync\RemoteForest\Contactsthis is the OU where MIM GALSync will create contacts corresponding to the mailboxes in Contoso and WingTipToys Exchange Forest
        • Source Container select Northwind.com\GALSync\LocalForest\Contactsthis is where MIM GALSync will create contacts corresponding to Northwind.com mailboxes. These will be sent to the GALSync/RemoteForest/Contacts OU in Contoso and WingTipToys AD Domain (personally, I haven’t seen any objects created in this OU)
      • In Exchange Configuration click Edit and enter all the email suffixes that belong to Northwind.com. The email suffixes listed here are used to filter out which email addresses from the original email object are added to the corresponding contact in the other Exchange Forests. In this case the email suffixes will be @northwind.com and @northwind.com.au. Note the @ before the email suffix)
      • Leave everything else as default and proceed to the Configure Extensions section. One thing I would like to mention here is that in Configure Connection Filter section, the Filter Type for user is supposed to be Declared (and is the default setting), not Rules extension as stated in https://technet.microsoft.com/en-us/library/cc708642(v=ws.10).aspx
      • In the Configure Extensions section, set the following
      • Click OK
    • Create GALSync Management Agent for WingTipToys.com AD Forest
      • From Tools menu, click Management Agents and then click Create
      •  In the Management Agent drop-down list, click Active Directory global address list (GAL) 
      • In the name type GALSyncMA for WingTipToys.com
      • On the Connect to an Active Directory Forest page, type the forest name, the MIM MA account details (svc-mimadma@wingtiptoys.com) and the domain name
      • In the next screen, specify the OUs that GALSync will query to find mailboxes to create contacts for in the other forests. Also, place a tick beside wingtiptoys.com\GALSync (this selects GALSync and all sub-OUs)
      • In the Containers screen, for
        • Target Container select WingTipToys.com\GALSync\RemoteForest\Contactsthis is the OU where MIM GALSync will create contacts corresponding to the mailboxes in Contoso and Northwind Exchange Forest
        • Source Container select WIngTipToys.com\GALSync\LocalForest\Contactsthis is where MIM GALSync will create contacts corresponding to WingTipToys.com mailboxes. These will be sent to the GALSync/RemoteForest/Contacts OU in Contoso and Northwind AD Domain (personally, I haven’t seen any objects created in this OU)
      • In Exchange Configuration click Edit and enter all the email suffixes that belong to WingTipToys.com. The email suffixes listed here are used to filter out which email addresses from the original email object are added to the corresponding contact in the other Exchange Forests. In this case the email suffixes will be @wingtiptoys.com and @wingtiptoys.com.au. Note the @ before the email suffix)
      • Leave everything else as default and proceed to the Configure Extensions section. One thing I would like to mention here is that in Configure Connection Filter section, the Filter Type for user is supposed to be Declared (and is the default setting), not Rules extension as stated in https://technet.microsoft.com/en-us/library/cc708642(v=ws.10).aspx
      • In the Configure Extensions section, set the following
      • Click OK
  2. Enable provisioning by using the following steps
    • In the Synchronisation Service Manager, from Tools select Options
    • Under Metaverse Rules Extensions ensure the following have been ticked
      • Enable metaverse rules extensions
      • Enable Provisioning Rules Extension

Run Profiles Execution Order

Congratulations! All configuration has now been completed. All we have to do now is to run the synchronisation jobs to get the mailbox object information from the three AD Forests into the MIM metaverse, let MIM GALSync do a bit of processing to find out which contacts are to be created in the other Exchange Forests, and then carry out an export, to create those contacts in the other Exchange Forests. Unfortunately, MIM has no way of finding out if the exports were successful, and that is why we will have to do a confirming import on all the management agents, so that MIM can find out if everything had been exported as expected.

From my testing, I have found that when MIM GALSync does its processing, it compares the mailboxes that an Exchange Forest has with what is in the MIM metaverse. MIM then exports out, as contacts, all objects that are in the metaverse but not in that particular Exchange Forest. These are populated in that AD Domains GALSync/RemoteForest/Contacts OU as AD objects and subsequently mail enabled using the Exchange RPS URI (remote PowerShell url)

CAUTION! Before you continue, you need to find out if a synchronisation solution had previously been deployed in the environment.

If any of the AD Forests had previously had a synchronisation solution deployed, then we will need to follow the run profile execution order mentioned below. This is done to ensure no duplicate contacts are created during the initial GAL synchronisation.

  1. Full Import (Staging Only) on GALSyncMA for Contoso.com
  2. Full Import (Staging Only) on GALSyncMA for Northwind.com
  3. Full Import (Staging Only) on GALSyncMA for WingTipToys.com
  4. Delta Synchronisation on GALSyncMA for Contoso.com
  5. Delta Synchronisation on GALSyncMA for Northwind.com
  6. Delta Synchronisation on GALSyncMA for WingTipToys.com
  7. Repeat Delta Synchronisation on GALSyncMA for Contoso.com
  8. Repeat Delta Synchronisation on GALSyncMA for Northwind.com
  9. Repeat Delta Synchronisation on GALSyncMA for WingTipToys.com
  10. Export on GALSyncMA for Contoso.com
  11. Export on GALSyncMA for Northwind.com
  12. Export on GALSyncMA for WingTipToys.com
  13. Delta Import on GALSyncMA for Contoso.com
  14. Delta Import on GALSyncMA for Northwind.com
  15. Delta Import on GALSyncMA for WingTipToys.com

 

If there hasn’t been any previous synchronisation solutions deployed in any of the AD Forests, then use the following runprofile order for the initial run

  1. Full Import (Staging Only) on GALSyncMA for Contoso.com
  2. Full Import (Staging Only) on GALSyncMA for Northwind.com
  3. Full Import (Staging Only) on GALSyncMA for WingTipToys.com
  4. Full Synchronisation on GALSyncMA for Contoso.com
  5. Full Synchronisation on GALSyncMA for Northwind.com
  6. Full Synchronisation on GALSyncMA for WingTipToys.com
  7. Export on GALSyncMA for Contoso.com
  8. Export on GALSyncMA for Northwind.com
  9. Export on GALSyncMA for WingTipToys.com
  10. Delta Import on GALSyncMA for Contoso.com
  11. Delta Import on GALSyncMA for Northwind.com
  12. Delta Import on GALSyncMA for WingTipToys.com

 

Once the initial synchronisation has completed, you will see contacts in each AD Domain’s GALSync\RemoteForest\Contacts OU corresponding to mailboxes in the other two Exchange Forests. These will have been email enabled and will show in the Exchange console and the online Global Address List.

Outlook clients that use offline address books won’t see the new contacts until the offline address book generation process has run on the Exchange servers and the updated offline address book has been downloaded by the outlook client.

To ensure the GALSync generated contacts remain up-to-date, the following runprofile execution order must be used from hereon. This should be repeated every 1 hour (or as per your required interval. Keep in mind that if after one cycle of the following order, if anything is still pending an Export, then this will be run at the next runprofile execution, so changes might not be seen for at most two runcycle intervals)

  1. Delta Import (Staging Only) on GALSyncMA for Contoso.com
  2. Delta Import (Staging Only) on GALSyncMA for Northwind.com
  3. Delta Import (Staging Only) on GALSyncMA for WingTipToys.com
  4. Delta Synchronisation on GALSyncMA for Contoso.com
  5. Delta Synchronisation on GALSyncMA for Northwind.com
  6. Delta Synchronisation on GALSyncMA for WingTipToys.com
  7. Export on GALSyncMA for Contoso.com
  8. Export on GALSyncMA for Northwind.com
  9. Export on GALSyncMA for WingTipToys.com
  10. Delta Import on GALSyncMA for Contoso.com
  11. Delta Import on GALSyncMA for Northwind.com
  12. Delta Import on GALSyncMA for WingTipToys.com

I don’t imagine anyone would want to run the runprofiles manually every hour 😉 So below is a script that can be used to do it.

Export all the runprofiles using the Synchronisation Service Manager as vbs scripts and place them in a folder c:\scripts\runprofiles on the MIM Synchronisation Server.

Copy the below script and save it as GALSync_RunProfiles.cmd in c:\scripts

@echo off
REM This script will run the MIM RunProfiles in the correct order
REM Author nivleshc@yahoo.com

set _script_dir="c:\scripts\runprofiles\"

REM Delta Import (Stage Only)
echo ContosoGALSyncMA Delta Import -StageOnly
CSCRIPT //B %_script_dir%ContosoGALSyncMA_Delta_Import_StageOnly.vbs

echo NorthwindGALSyncMA Delta Import -StageOnly
CSCRIPT //B %_script_dir%NorthwindGALSyncMA_Delta_Import_StageOnly.vbs

echo WingTipToysGAlSyncMA Delta Import -StageOnly
CSCRIPT //B %_script_dir%WingTipToysGAlSyncMA_Delta_Import_StageOnly.vbs

REM Delta Sync
echo ContosoGALSyncMA Delta Sync
CSCRIPT //B %_script_dir%ContosoGALSyncMA_Delta_Sync.vbs

echo NorthwindGALSyncMA Delta Sync
CSCRIPT //B %_script_dir%NorthwindGALSyncMA_Delta_Sync.vbs

echo WingTipToysGAlSyncMA Delta Sync
CSCRIPT //B %_script_dir%WingTipToysGAlSyncMA_Delta_Sync.vbs

REM Export
echo ContosoGALSyncMA Export
CSCRIPT //B %_script_dir%ContosoGALSyncMA_Export.vbs

echo NorthwindGALSyncMA Export
CSCRIPT //B %_script_dir%NorthwindGALSyncMA_Export.vbs

echo WingTipToysGAlSyncMA Export
CSCRIPT //B %_script_dir%WingTipToysGAlSyncMA_Export.vbs

REM Delta Import
echo ContosoGALSyncMA Delta Import
CSCRIPT //B %_script_dir%ContosoGALSyncMA_Delta_Import.vbs

echo NorthwindGALSyncMA Delta Import
CSCRIPT //B %_script_dir%NorthwindGALSyncMA_Delta_Import.vbs

echo WingTipToysGAlSyncMA Delta Import
CSCRIPT //B %_script_dir%WingTipToysGAlSyncMA_Delta_Import.vbs

 

Create a scheduled task on the MIM Synchronisation Server to run GALSync_RunProfiles.cmd script every 1 hour (or for an interval of your choice). Use the task scheduler account that had been created during the preparation stage to run this scheduled task.

Some Gotchas

I have found that sometimes some mailboxes fail to be imported into the MIM Metaverse and report an mv-constraing-restriction violation on the msExchSafeSenderHash attribute. This error occurs because the AD attribute msExchSafeSenderHash is much longer than what the corresponding MIM Metaverse attribute is. Since this attribute is not being used to create the contacts in the other Exchange Forests, it can be dropped from the attribute flow.

Use the steps outlined in the following article to resolve this issue. https://social.technet.microsoft.com/wiki/contents/articles/10733.troubleshooting-galsync-mv-constraint-violation-msexchsafesenderhash.aspx

 

I hope this blog helps those that might be wanting to create a shared “global address book” among multiple Exchange Forests.

As mentioned previous, the above steps can be used to create a  shared “global address book” for two Exchange Forests as well. In that case, just remove any mention of the third AD Forest, AD Domain and Exchange Forest from the above mentioned steps.

Enjoy 😉

 

Moving an Office 365 domain to a new tenant

There are times when companies acquire other companies to increase their portfolio. When this happens, the acquired company’s IT infrastructure normally gets merged with the parent company. The opposite of this situation is also true. There are also times when a company sells off one of its acquired companies or divisions. In this scenario, the company that is being sold off must have all its IT infrastructure separated from the parent company.

In this blog I will show the steps that can be used to separate a previously acquired company’s email system from its parent company, after the child company has been sold off.

Background

To provide some context, let me provide some details for the parent company and the child company that is being sold off

  • The parent company is called Tail Spin Toys and owns the domain name tailspintoys.com
  • Tail Spin Toys have their own Active Directory Forest called tailspintoys.com and this is synchronised to Azure Active Directory using Azure Active Directory Connect (AADC) server.
  • Tail Spin Toys has a hybrid Exchange Online setup. Their Office 365 tenant is called tailspintoys and has the tenant address of tailspintoys.onmicrosoft.com
  • Tail Spin Toys users have their userprincipalname and primary email address in the form of firstname.lastname@tailspintoys.com
  • The previously acquired company (referred to as child company in this blog) is trading as Wing Tip Toys and owns the domain name wingtiptoys.com. This domain has been added to Tail Spin Toys Office 365 tenant as an additional domain.
  • Wing Tip Toys have their own Active Directory Forest called wingtiptoys.net and all their users, computers and servers are part of this Active Directory Forest.
  • Wing Tip Toys users have their mailboxes provisioned in Tail Spin Toys Office 365 tenant (the process will be explained in the next bullet point). Wing Tip Toys only use email services in Office 365.
  • Each Wing Tip Toys user has two identities. The first is in the wingtiptoys.net Active Directory Domain, which is used to login to their computers and to access their local resources (fileshares, printers etc).
  • The other identity is a user object created in Tail Spin Toys Active Directory Domain. This is used to provision a mailbox in Tail Spin Toys Office 365 tenant. Unfortunately, for Wing Tip Toys users, their Office 365 experience is not very elegant since their userprincipalname for Office 365 is of the form firstname.lastname@tailspintoys.com but their primary email address is of the form firstname.lastname@wingtiptoys.com. The users login to their computers using the wingtiptoys.net samaccountname. However, all users are aware of this and it doesn’t affect them too much.
  • To ensure users are not burdened with having to manage both the user object passwords, Password Change Notification System has been implemented. This replicates any password changes in the wingtiptoys.net Active Directory domain to the respective user object in tailspintoys.com Active Directory domain.

Requirement

Now that Wing Tip Toys has been sold off, all of its data must be separated from Tail Spin Toys. The following will be done for its emails.

  • A new Office 365 tenant will be provisioned for Wing Tip Toys
  • All mailbox data will be migrated to the new Office 365 tenant
  • Once all mailbox data has been migrated, after a few weeks, the Wing Tip Toys mailboxes in Tail Spin Toys Office 365 tenant will be disabled (or deleted)

Migration Plan

Ok, lets get started.

  • Due to the small size of Wing Tip Toys, it has been decided to use a Cloud-Only Office 365 Deployment. ( details on how to configure a Cloud-Only Office 365 deployment can be found here ) however a hybrid Exchange Online environment will still work with the following steps ( a few extra steps will have to be added)
  • BitTitan MigrationWiz will be used to copy the mailbox data from Tail Spin Toys Office 365 tenant to Wing Tip Toys Office 365 tenant

Here are the steps

  1. Add an additional UPN suffix (wingtiptoys.com) to the Wing Tip Toys Active Directory Domain
  2. Configure all Wing Tip Toys users in the wingtiptoys.net domain to have a userprincipalname of the form firstname.lastname@wingtiptoys.com
  3. For all the Wing Tip Toys users in the Tail Spin Toys Office 365 tenant, export their email addresses (we are after the proxyaddresses). Ensure the X500 addresses are also exported.
  4. In the wingtiptoys.net Active Directory domain, for each user, find their email addresses from the export that was done in step 3 and import it into the proxyaddresses attribute of their Active Directory user object. Ensure only email addresses for the domain wingtiptoys.com are imported (if you import any email address for which the domain hasn’t been added to wingtiptoys Office 365 tenant, these will display as the default Office 365 tenant address [@wingtiptoys.onmicrosoft.com] instead of the actual email address). Also, make sure the X500: addresses are also imported. Check to ensure the proxyaddress firstname.lastname@wingtiptoys.com is prefixed with an uppercase SMTP:. This will make it the primary smtp address
  5. Create a new Office 365 tenant for Wing Tip Toys. For simplicity, we will assume that the tenant name is wingtiptoys and the tenant address is wingtiptoys.onmicrosoft.com. Since the domain wingtiptoys.com is already attached to the Tail Spin Toys Office 365 tenant, we won’t be able to attach it to the new tenant just yet.
  6. Install Azure Active Directory Connect (AADC) Server and configure it to synchronise objects from the wingtiptoys.net Active Directory domain to the Wing Tip Toys Azure AD. For Single Sign On, we will enable AADC server for password hash synchronisation however Active Directory Federation Services (ADFS) servers can also be used.
  7. If required, configure the transport rules in Wing Tip Toys Office 365 tenant to match those in Tail Spin Toys Office 365 tenant. For example, to apply disclaimers to all outgoing emails.
  8. Export configuration of all Distribution groups in Tail Spin Toys Office 365 tenant that belong to Wing Tip Toys.
  9. Create a user in Tail Spin Toys Office 365 tenant with the upn migrationwiz@tailspintoys.onmicrosoft.com. For all Wing Tip Toys mailboxes in Tail Spin Toys Office 365 tenant, give migrationwiz@tailspintoys.onmicrosoft.com full mailbox access. This account will be used by Bittitan MigrationWiz.
  10. Create a user in Wing Tip Toys Office 365 tenant with the upn migrationwiz@wingtiptoys.onmicrosoft.com and give it Global Administrator access. This account will be used by Bittitan MigrationWiz.
  11. Provision Office 365 licenses (at least Exchange Online Plan 2) to all user objects in Wing Tip Toys Office 365 tenant that have been synchronised from the on-premise Active Directory domain. This will create mailboxes for them. At this stage, they will only have the email address firstname.lastname@wingtiptoys.onmicrosoft.com (this is because the domain wingtiptoys.com hasn’t been added to this tenant).
  12. Create an account at Bittitan and purchase licenses for MigrationWiz. One license is required for one mailbox data migration, so purchase the required number of licenses ( when you create an account, you are provided with 3 trial licenses. You can use these to test out the migration process).
  13. Use the steps listed at https://help.bittitan.com/hc/en-us/articles/115008106827-Office-365-to-Office-365-Migration-Guide-While-Keeping-the-Same-Domain-Name to configure Bittitan MigrationWiz. The source and destination mailboxes within MigrationWiz must be specified using the Office 365 tenant address, for example in our case, the source mailboxes will be firstname.lastname@tailspintoys.onmicrosoft.com and the destination mailboxes will be firstname.lastname@wingtiptoys.onmicrosoft.com
  14. Pre-Stage the mailbox data for the Wing Tip Toys mailboxes by using MigrationWiz to copy everything except the last 20 days of data from the Tail Spin Toys Office 365 tenant to the Wing Tip Toys Office 365 tenant mailboxes (the more data you pre-stage, the faster the cut-over will be).
  15. Add wingtiptoys.com domain to the Wing Tip Toys Office 365 tenant. You will get a TXT record, which must be used to create a DNS entry in wingtiptoys.com domain, to prove domain owership. Create the DNS record and have the TTL set to 300s (5min). You won’t be able to verify the domain in Wing Tip Toys Office 365 tenant as it is currently attached to the Tail Spin Toys Office 365 tenant.
  16. Change the TTL for the MX DNS entry for wingtiptoys.com to 300s (5min). Make a note of the existing MX entries.
  17. At this stage, all the preparation has been done and we are ready for the cut-over. I would suggest planning the cutover out of office hours, preferably on a Friday evening. Send out user communications at least a week prior to cut-over so that everyone is aware of the change. In the communications, ensure you state that users will not be able to receive new emails in their Mobile and Outlook clients until it has been reconfigured to use the new Office 365 tenant. Provide instructions on how users can reconfigure their Outlook and Mobile clients and also provide the Outlook Web App address (this would be similar to https://outlook.office.com/wingtiptoys.com/ )
  18. Twenty minutes prior to cut-over time, change the MX for wingtiptoys.com to invalid.outlook.com. Setting the MX record to an invalid domain just ahead of the migration causes the sender’s email server to queue the emails and retry later. This will reduce the possibility of Office 365 sending a “recipient not found” Non Delivery Receipt if the email is received for a Wing Tip Toys user after the wingtiptoys.com domain has been removed from the Tail Spin Toys Office 365 tenant but hasn’t been added to the Wing Tip Toys Office 365 tenant.
  19. At cut-over time, remove the domain wingtiptoys.com from Tail Spin Toys Office 365 tenant. A message will be displayed stating that user logins and email addresses will need to be reconfigured. It will also state that the automatic domain removal process will change them to the default domain (tailspintoys.com) and that they will no longer receive emails at wingtiptoys.com. Click Remove. The time taken to remove the domain depends on the amount of mailboxes that have to be reconfigured to use the tenant email address.
  20. Add wingtiptoys.com to the Wing Tip Toys Office 365 tenant. If you receive an error stating that the domain is already attached to another Office 365 tenant, wait for a few minutes and then try again.
  21. Run a Full Synchronisation using Azure Active Directory Connect server and wait for it to complete. Confirm that the userprincipalname and primary smtp address for the users in the Wing Tip Toys Office 365 tenant has now changed to the format firstname.lastname@wingtiptoys.com.
  22. Change the MX for wingtiptoys.com to the value shown in the Wing Tip Toys Office 365 Portal under Domain Settings.
  23. Run a full migration using MigrationWiz so that the remaining mailbox data is copied across.
  24. Create a DNS entry in the internal and external DNS server for autodiscover.wingtiptoys.com. The value for this is shown in the Wing Tip Toys Office 365 Portal, under the Domains section.
  25. Test the following scenarios
    • you can login using Outlook Web App to a mailbox in the Wing Tip Toys Office 365 tenant and access all emails.
    • you can configure Outlook client and a Mobile email client to access a mailbox in the new Wing Tip Toys Office 365 tenant using autodiscover (this depends on the autodiscover.wingtiptoys.com dns record. If there are errors, check to ensure this record has been correctly populated and is pointing to the value shown in the Office 365 portal)
    • emails from external senders are successfully received in a wingtiptoys mailbox in the Wing Tip Toys Office 365 tenant.
    • emails sent from a wingtiptoys user in the Wing Tip Toys Office 365 tenant to an external recipient is successfully received
    • emails sent between wingtiptoys users in the Wing Tip Toys Office 365 tenant is successfully received
  26. Import the Distribution groups that were exported from Tail Spin Toys Office 365 tenant
  27. The cut-over is now complete.
  28. By now, users must be able to access their emails using Outlook Web App. If they have followed the instructions, their Outlook and Mobile devices must be working as well.
  29. The mailboxes belonging to the Wing Tip Toys users in the Tail Spin Toys Office 365 tenant can be kept active for at least 30 days (this will come in handy for those cases where a user reports that some emails were not copied across). After this, the mailboxes can be backed up and then deleted (If the Wing Tip Toys user mailboxes in the Tail Spin Toys Office 365 tenant had other email aliases, on which emails were being received, it would be a good idea to configure an Out of Office rule to state that all emails sent to this user should instead be sent to firstname.lastname@wingtiptoys.com)

 

Thats it folks. After following the above steps, you would have moved an additional Office 365 domain to a new Office 365 tenant and would have also moved their mailbox data to the new tenant.

I hope the above helps those that are looking at getting this done.

Have a great day 😉