Use HashiCorp’s Packer to create a Custom VM Image for Cloud Provider

Previously on [a show that assumes you have no memory] …

… you built a Kafka VM image locally, entirely reproducible on demand. That’s nice, I really am pleased for you. But it’s akin to having a Betamax camcorder in 1988. Sure, you can make that cool homemade Spiderman movie you always wanted to, but distributing the gem amongst your friends is not without its challenges!

It would be far more useful to create the image in a format that your favourite cloud provider requires, then you can reuse that very same image through your various envs. That way you can use it as an atomic building block for the various environments through your deployment pipeline. At this point I’ve strayed a little from the anecdote!

We’ll create a pre-configured Kafka image using HashiCorp’s Packer aimed at a specific cloud platform.

Monetary Cost To Build VM Image

In this article I’ll be using Google Cloud Platform as the target platform for our custom VM image – I happen to use GCP for some personal projects, hence the my choice of aerosol for this article. You can of course follow along using another Cloud Provider and just read and apply the specific docs for that particular Packer Builder.

Important: Following through this article will cost ( a little ) money

But either way, I would like to highlight than unless you are on a free trial, you are likely to be charged a small amount for the process of building the image since it requires creating and spinning up a VM to configure and then storage costs of the image. It shouldn’t typically cost more than a few dollars, but I would check your providers pricing in the first instance.

We will delete our images at the end to prevent any ongoing cost.

From now on I’ll just assume you’re also using GCP to save writing space.

Tools and Environment

Some Terminology

The focus of this article is Hashicorp’s Packer, which will build the custom VM image for us – and specifically for GCP in this case. Packer essentially provides a consistent interface at the high level ( i.e. packer commands for scripts to hit ) regardless of which providers and builders you are using. However rather than end up with some leaky abstraction the config files used specifically expose the details of each implementation choice – so for example if you are targeting an Amazon Instance then you must specify an ami_name which obviously makes no sense for OpenStack, DigitalOcean etc.

There is a small amount of terminology to learn, the main ones for us being the following :-

  • Templates – are JSON files which define one or more builds by configuring the various components of Packer. Packer is able to read a template and use that information to create multiple machine images in parallel.
  • Provisioners -components of Packer that install and configure software within a running machine prior to that machine being turned into a static image. So Chef, Puppet, Salt etc but in out case Ansible
  • Builders – components of Packer that are able to create a machine image for a single platform, i.e. specific to the target (cloud) environment. There are builders for many including AWS EC2 (AMI), Azure Resource Manager, Openstack etc. In our case we will be targeting Google Compute Engine.

Steps to setup the Environment

Accompanying Code

We are actually going to write very little “code” in this article as most of the hard work was done in the last post, in the main with Ansible – and even then we reused 2 roles. In terms of the code in github, I’ve copied last weeks project into a new repository, violating DRY but keeping the articles in line respectively with their subject article.

So here is the,

Lets Go!

cd lastArticleCodeBase

Assuming you have the same code or downloaded above cd to the top level directory of project.

Create Packer Template File for Our Kafka Image

Create a file in your top level directory called kafka-instance-template.json

So if you do an ls you should have …

$ ls  kafka-instance-template.json  provisioning  Vagrantfile

Google Cloud Authentication

In order to authenticate with google cloud you can read the documentation. My preference is to use the SDK locally …

gcloud config set project brownian-motion-driven-dev

You also will need to enable the Google Compute Engine API. If you have called the project brownian-motion-driven-dev then this link should point you there, but otherwise go to the API Dashboard for your project and enable it.

Configure The Builder

Minimal Config

Looking at the template documentation the only thing we need to specify is the builder. We actually also need to specify a type of builder. Looking at our Google Compute documentation we see see that our type is googlecompute.

The minimal content for our kafka-instance-template.json to pass packer inspect is

 "builders" :  [{
      "type" : "googlecompute"

And if you do packer inspect kafka-instance-template.json it will pass. This configuration we’ve created is clearly not useful in the slightest but we’ve made a start.

Note: The template takes an array of builders, meaning that you can target multiple Cloud images which are built in parallel. Powerful stuff, and a heterogeneous cloud infrastructure can be one way of protecting against any single Cloud Providers issues & downtime.

Working Config

For each builder there are required and optional attributes. For GCE we can see what the required attributes are. And as can be seen in the gotchas it is worth also specifying an ssh_username.

So as a very basic template we might have something like …

 "builders" :
      "type" : "googlecompute",
      "project_id" : "brownian-motion-driven-dev",
      "source_image_family": "ubuntu-1604-lts",
      "zone": "europe-west1-b",
      "ssh_username" : "ubuntu"
  • source_image_family value is selected from the available image families.
  • zone is basically selected as to what is more appropriate in your circumstances. See zones. Images you create are available across zones and regions, but since this is a one region, one zone demo then selecting your “closest” is likely your best choice.

Create A Basic Image on Google Compute Engine

With the configuration above this should now “work”. Before continuing, understand that the following command is where the money meter starts ticking – i.e. there will be a ( small ) monetary cost. If you run …

packer build kafka-instance-template.json

…then you should see something like …


So what has just happened here? Packer connected to Google Cloud Engine and specifically your project brownian-motion-driven-dev. It basically,

  • spun up a VM based on our Ubuntu LTS version
  • waited until it was running and had an IP address
  • <did nothing> – it is at this point any provisioning config takes place. In this instance we didn’t lift a finger but later we’ll set Ansible loose.
  • stopped it
  • created an image from the disk – which we didn’t apply any changes to from the vanilla Ubuntu base, so rather pointless in this case – and stored it in our region under the name packer-1488380065 – catchy!

If you list the images available to your project you’ll see it there,

gcloud compute images list

which should spit out something like …


There’s ours at the top!

At the moment the family name is blank, and the image name itself is somewhat unmemorable to carbon based life. Lets remedy that. Change kafka-instance-template.json to look like …

 "builders" :
      "type" : "googlecompute",
      "project_id" : "brownian-motion-driven-dev",
      "source_image_family" : "ubuntu-1604-lts",
      "zone": "europe-west1-b",
      "ssh_username" : "ubuntu",
      "image_name" : "kafka-{{timestamp}}",
      "image_family" : "brownian-motion-driven-dev-images",
      "image_description" : "Image is Kafka VM preconfigured for Brownian Motion Driven Development",
      "instance_name" : "kafka-{{uuid}}",
      "machine_type" : "n1-standard-1",
      "disk_type" : "pd-ssd"

The google compute documentation explains the options in detail. Now if you rebuild then you’ll have sensible names when you list them out.

Specify Provisioner to configure Apache Kafka

So far so good, but we have basically just renamed the Ubuntu image … nothing else is different. The whole point of creating you own image is to have pre-configured infrastructure atoms for your deployments.

Last time we created the VirtualBox VM with Vagrant which in turn used Ansible. The ability to reuse that work here is a true source happiness 🙂 – point Packer to the exact same Ansible playbook and it will build the image we created locally, except this time it will be stored on Google Compute Engine.

Once again we’ll be changing kafka-instance-template.json. In this case we want to add a provisioner. We simply specify Ansible and point it to the playbook to run. Simple as! The file should then look like …

  "provisioners" : [
      "type": "ansible",
      "playbook_file": "./provisioning/playbook.yml"

  "builders" :
      "type" : "googlecompute",
      "project_id" : "brownian-motion-driven-dev",
      "source_image_family" : "ubuntu-1604-lts",
      "zone": "europe-west1-b",
      "ssh_username" : "ubuntu",
      "image_name" : "kafka-{{timestamp}}",
      "image_family" : "brownian-motion-driven-dev-images",
      "image_description" : "Image is Kafka VM preconfigured for Brownian Motion Driven Development",
      "instance_name" : "kafka-{{uuid}}",
      "machine_type" : "n1-standard-1",
      "disk_type" : "pd-ssd"

But before you run packer again …

… Update Ansible Java Task to install OpenJDK 8

Before running packer again, I note I get an issue with the Ansible task to install Java. Basically it can’t find the java 7 version. This makes sense – java 7 being end-of-life; at Ubuntu Packages if we search for openjdk-7-jre-headless then we note Xenial ( AKA ubuntu-16.04-LTS ) is not listed. “Trusty” being the latest version of Ubuntu that does. And searching for openjdk-8-jre-headless does list Xenial.

So lets update the JRE version. Assuming you installed Ansible roles per last article then …

sudo vi /etc/ansible/roles/ansible-kafka/tasks/java.yml

… and update the “Install OpenJDK” task to be jre 8. i.e.

- name: "Install OpenJDK"
  apt: name=openjdk-8-jre-headless state=present
  sudo: yes
  when: check_java.rc != 0
    - kafka-install
    - java

… then we should be in business.

Create Fully Configured Kafka VM For Google Cloud

Now run,

packer build kafka-instance-template.json

The output should essentially be a mix of the earlier vanilla build and similar to the vagrant build of the last post … i.e. showing the Ansible tasks being performed.

Start the Custom Instance

Now we want to start the custom instance.

For us that means doing …

gcloud compute instances create mykafka --image-family brownian-motion-driven-dev-images --image-project brownian-motion-driven-dev

and, after a minute or two, you should see something like this …


Nice one!

Let’s Test Our Little Beastie!

Firstly connect.

$ gcloud compute ssh mykafka

Then, same as last time …

$ cd /usr/local/kafka

Let’s see if our two topics were created …

$ bin/ --list --zookeeper localhost:2181

Let’s send a couple of messages to the test-one topic …

$ bin/ --broker-list localhost:9092 --topic test-one
Howdy doodly doo! Talkie's the name, toasting's the game.
Would anyone like any toast?

And finally let’s check that the messages were sent okay to that topic …

$ bin/ --bootstrap-server localhost:9092 --topic test-one --from-beginning
Howdy doodly doo! Talkie's the name, toasting's the game.
Would anyone like any toast?

^CProcessed a total of 2 messages



In order to save $’s lets delete any images that we have created.

Delete All Created Instances

On your local machine – i.e. remember to log off your VM 😉

$ gcloud compute instances list
$ gcloud compute instances delete mykafka

Note this can take some time … few mins max

Delete All Created Images

$ gcloud compute images list | grep brownian-motion-driven-dev | awk '{print $1}'
$ gcloud compute images delete ` !! `

Quick Summary

We’ve got to a pretty decent spot today. If we use this code in our CD pipeline then we have a reusable Kafka image for each env. And ideally, living the true IAC ethos, you would have tests for anytime the config above changes to test things like security, functionality ( e.g. topics ), performance, network tests etc. i.e. like any code. This should be done before it goes to even dev environment.

That is left for a homework exercise 😉

What’s Up Ahead?

We are rising through abstraction levels.

The next step in building our dev environment out is to plan our (very small-sized) infrastructure. It will be a very simple setup but should illustrate what we can do with the right tools.

  • First we’ll create a DNS Subzone – dev.[MY_DOMAIN] – specifically for our trusty Kafka Server so we can connect to it with a nice DNS name rather than chasing IP addresses or service lookups. We’ll be assuming you already have a domain you own for this one.
  • Secondly we’ll create a dev environment with 1 Kafka Server,[MY_DOMAIN], and a 3 node Kubernetes cluster- empty of services for now.

And a little bit further down the road we’ll have a couple of micro-services running in the environment pinging messages over Kafka topics – i.e. getting real!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Website Powered by

Up ↑

%d bloggers like this: