Automated ESXi Nested Lab Installation

In this article i will explain how to automate nested ESXi lab build. Although everything can be done using single PowerShell script, i’ll try to demonstrate how this can be done other way by using combination of multiple tools. This assumes that you already have vCenter server installed and ready to use. I’ve broken down the whole process into multiple workflows:

Workflow 1 : Create ESXi Template.
Tool used : Packer
Workflow 2 : Create needed amount of nested ESXi hosts.
Tool used : Terraform
Workflow 3 : Create vSphere datacenter and clusters, add hosts to clusters, create vDS and portgroups.
Tool used : Terraform
Workflow 4 : Create NSX-T Managers and NST-Edges on a new infrastructure.
Tool used : Ansible

All workstreams are available here : Nested-ESXi

Let’s review each workflow in details:

Workflow 1: Creating ESXi Template

Packer is a great tool that can be used to create various guest templates (Ubuntu,CentOS,Windows,etc). I’ve decided to leverage it to create our template for actual ESXi itself. This simply does installation using kickstart file. You will need to place kickstart file somewhere accessible to the host. I’ve chosen NFS for this purpose. We will need to have 3 files as described below:

ks.cfg - This is kickstart file that will be used during installation
esxi-vars.json - Variables file for packer
esxi.json - Main file that we will run

For the Packer to work, it expects IP address to be available via DHCP. So we will need to make sure that we have DHCP available to provide address to our templated VM. This is temporary and we will modify this once we will spin up actual nested VMs.

network --bootproto=dhcp

Second thing to mention is communicator, which is SSH in this case. We will need to enable SSH on a template VM

# enable & start SSH
vim-cmd hostsvc/enable_ssh
vim-cmd hostsvc/start_ssh

Last but not least are important settings that are required for nested environment. Below setting will ensure that everytime when physical NIC changes, vmk0 MAC will also be changed.
Reference KB article is : KB 1031111

esxcli system settings advanced set -o /Net/FollowHardwareMac -i 1

And we will need to clear UUID value from our template, as this will allow DHCP to work properly. Otherwise you may get the same IP address everytime when VM gets cloned.

sed -ir '/uuid/d' /etc/vmware/esx.conf

esxi-vars.json file is straightforward. Modify according your environment. As we will be using NFS to store our kickstart file, adjust the following variable by indicating full NFS export path

"nfs_server_path" : "192.168.156.11/volume1/vmware/ks.cfg"

esxi.json file is the actual file that we run and doesn’t require any modifications. The trick that we are doing is passing key sequences during installation and indicating location of our kickstart file. Simple enough:

"boot_command": [
        "<enter>",
        "<SHIFT+O>",
        " ks=nfs://{{user `nfs_server_path`}}",
        "<enter>" ,
        ""
      ]

Finally start packer build by executing the following from Workflow 1 directory:

packer build -var-file=esxi-vars.json esxi.json

Here how it will look like:

It took around 7 minutes to install ESXi and convert it to a template. Actual time will be more as i already had image uploaded to datastore and packer had it copied to it’s cache.

This concludes Workflow 1. Let’s move to the next one.

Workflow 2: Create nested ESXi hosts

Once our template is available we can start cloning it and customize settings per our needs. For demonstration we are going to add two extra disks to enable VSAN. Totally we will have 5 hosts created with 4 vmnics each. As was mentioned, we will use Terraform to do this. Let’ get started with reviewing files in this workflow:

variables.tf : This is variables file where we can assign default values and provide description.If no default value is provided, then variable has to be assigned a value using different file which will be explained next.
terraform.tfvars : This is variables definition file, where we will assign values to our variables.
main.tf : This is main file that we will run.

The variable from first file that requires attention is listed below:

variable "vm_names" {
default = {
  "vesxi101" = 101
  "vesxi102" = 102
  "vesxi103" = 103
  "vesxi104" = 104
  "vesxi105" = 105
  }
}

Here we will need to provide VM names for our Nested ESXi hosts and indicate last octet of IP address. This is needed for our provisioner to set correct static IP address. Note that we will need to use DHCP to assign temporary address to our hosts. Once this is done, provisioner will run post-install script to change network settings like hostname,DNS,NTP and set static IP address for vmk0 interface.

The variables from terraform.tfvars file are straightforward except the following:

guest_start_ip = "172.23.10."

Assign first three octet of IP address to this variable. The last octet will be assigned using different variable from previous file.

Our last and main file main.tf won’t require any major modifications. Let’s review provisioner block:

provisioner "remote-exec" {
    inline = ["esxcli system hostname set -H=${each.key} -d=${var.guest_domain}",
    "esxcli network ip dns server add --server=${var.guest_dns}",
    "echo server ${var.guest_ntp} > /etc/ntp.conf && /etc/init.d/ntpd start",
    "esxcli network vswitch standard uplink add --uplink-name=vmnic1 --vswitch-name=vSwitch0",
    "esxcli network ip interface ipv4 set -i vmk0 -t static -g ${var.guest_gateway} -I ${var.guest_start_ip}${each.value} -N ${var.guest_netmask} ",
    ]
}

We are going to assign hostname and domain name using for_each loop. DNS and NTP servers will be added as well. Second vmnic1 will be assigned to vSwitch0 to add redundancy for management traffic. And finally we are setting static IP address for our vmk0 interface.

Let’s run our terraform script from Workflow 2 directory and check results. I omitted first two commands from the screenshots.

terraform init
terraform plan
terraform apply -auto-approve


Let’s verify that our hosts have been successfully created:

So we see 5 hosts, with 4 NICs and 3 Hard Drive. Let’s pick random host and check all network settings that we told provisioner to change:

It took around 12 minutes to spin up 5 Nested ESXi hosts and modify network settings. My NAS is not the fastest one, so times maybe less for other systems. This concludes workflow 2. Let’s move to the next one.

Workflow 3: Create new vSphere infrastructure

Once our hosts are ready, it is time to add them into new vCenter. We will create the following:

  1. One DataCenter.
  2. Two Clusters for compute and management.
  3. Add hosts into respective clusters.
  4. Create Virtual Distributed Switch and add 2 PNICs of the hosts to it.
  5. Create new port-groups

Seems very straightforward. We will use Terraform again here. Two caveats to mention:

  1. Some resources will have to be created first before they can be referrenced by other resources.
  2. In order to add hosts into cluster we will need to know SHA-1 thumbprint somehow.

First caveat can be resolved by built-in “depends_on” block, where we will explicitly tell which resources are depending on each other.
For the second one, we will utilize small Python script that will login to newly created hosts,extract thumbprint and return values back to Terraform. This can be done using “External Provider”.

Same set of files as in workflow 2. One additional file called Esxi-connect.py will be used to collect SHA-1 thumbprint and doesn’t need any modifications.
Let’s review some variables that require attention:

variable "all_hosts" {
  default =["vesxi101.home.lab","vesxi102.home.lab","vesxi103.home.lab","vesxi104.home.lab","vesxi105.home.lab"]
}

all_hosts variable has to contain all hosts (FQDN or IP) that will be added to vCenter.

variable "host_names_mgmt" {
default = {
  "vesxi101.home.lab" = 1
  "vesxi102.home.lab" = 2
  }
}

“host_names_mgmt” variable has to contain all hosts (FQDN or IP) that will be part of Management Cluster.

variable "host_names_comp" {
default = {
  "vesxi103.home.lab" = 3
  "vesxi104.home.lab" = 4
  "vesxi105.home.lab" = 5
  }
}

“host_names_comp” variable similarly has to contain all hosts (FQDN or IP) that will be part of Compute Cluster.

variable "pg" {
  default = {
   "dvs-mgmt" = 10
   "dvs-vmotion" = 20
   "dvs-vsan" = 25
   "dvs-nsx-edge-uplink1" = 30
   "dvs-nsx-edge-uplink2" = 40
  }
}

“pg” variable has to contain names of port-groups along with their VLAN ID.

main.tf file doesn’t require major modiciations. Only adjust if amount of added hosts is more or less. The block that needs to be adjusted is withing VDS resource creation:

resource "vsphere_distributed_virtual_switch" "dvs" {
  name          = var.vds_name
  datacenter_id = vsphere_datacenter.target_dc.moid
  max_mtu = var.vds_mtu
    depends_on = [vsphere_host.h1,vsphere_host.h2]


  uplinks         = ["uplink1", "uplink2"]
  
  host {
    host_system_id = vsphere_host.h1["vesxi101.home.lab"].id
    devices        = var.network_interfaces
  }

Change host_system_id parameter to indicate FQDN/IP of new host or remove that block completely if adding less amount of hosts.

So let’s run our Terraform plan from Workflow 3 directory and review results:

terraform init
terraform plan
terraform apply -auto-approve

Let’s check whether everything got created successfully:

One small thing to mention is when we used kickstart file from NFS, after hosts installation, we can see datastore named “remote-install-location”. We may call it a bonus as we don’t need to mount NFS datastore to the hosts.

So it took around 1 minute to finish this workflow. Let’s move to the last one.

Workflow 4: Create NSX-T Manager and NSX-T Edges

So far we have spent 21 minutes in our first 3 worklows (maybe add 4-5 minutes for image upload to the datastore that was not counted). Now it is time to install NSX-T Manager and NSX-T Edges leveraging Ansible this time. Let’s review key files here.

ansible.cfg - Configuration file to indicate system level settings. We will use it to indicate our host file and disable fact gathering.
hosts.yml - Hosts file contains variables and attributes for the hosts to be worked on. In our case it will be VMs to be deployed and their parameters.
deploy-nsx.yml - Main file that we will run. It indicates which groups/hosts we will be working on and calls Ansible roles.
roles/…/tasks/main.yaml - Main file that will be called for execution. Contains all necessary parameters for VM deployments.

Let’s review important aspects of the files.

managers:
  hosts:
    nsxmanager101.home.lab: 
      nsx_mgr_ip : '172.23.10.61' 
edges:
  hosts:
    nsxedge101.home.lab:
      nsx_edge_ip: '172.23.10.64' 
    nsxedge102.home.lab:
      nsx_edge_ip: '172.23.10.65'

We have two groups here: managers with only one host and edges with two hosts. Each host has unique variable that is only going to be assigned to them.

 vars:
    vcenter: 'vc01.home.lab'
    vcenter_user: 'administrator@lab.home'
    vcenter_password: 'VMware1!VMware1!'
    vcenter_datacenter: 'DC-2'
    vcenter_folder: '/Ansible'
    vcenter_cluster: 'management'
    vcenter_datastore: 'remote-install-location'
    nsx_mgmt: 'dvs-mgmt'
    nsx_edge_fp : 'dvs-trunk'
    manager_ova: "/Users/nmammadov/Downloads/OVAISO/nsx-unified-appliance-3.0.0.0.0.15946739.ova"
    edge_ova: "/Users/nmammadov/Downloads/OVAISO/nsx-edge-3.0.0.0.0.15946012.ova"
    nsx_passwd_0 : "VMware1!VMware1!"
    nsx_cli_passwd_0 : "VMware1!VMware1!"
    nsx_cli_audit_passwd_0 : "VMware1!VMware1!"
    nsx_gateway_0 : "172.23.10.252"
    nsx_netmask_0 : "255.255.255.0"
    nsx_dns1_0 : "192.168.156.11"
    nsx_domain_0 : "home.lab"
    nsx_ntp_0 : "192.168.156.11"
    nsx_isSSHEnabled : "True"

vars block has variables that are shared between group members,i.e NSX-T Manager and Edges. Note datastore name “remote-install-location” that we pointed here. If you recall that was created automatically.

There are two roles and each of them will deploy respective OVA and assign correct settings to our VMs.

The most important part of main.yml file for each role is the following function to answer all OVA properties question like root/admin/audit password as well as network settings.

inject_ovf_env: yes
        properties:
            nsx_passwd_0 : '{{ nsx_passwd_0 }}'
            nsx_cli_passwd_0 : '{{ nsx_cli_passwd_0 }}'
            nsx_cli_audit_passwd_0 : '{{ nsx_cli_audit_passwd_0 }}'
            nsx_hostname : '{{ inventory_hostname }}'
            nsx_gateway_0 : '{{ nsx_gateway_0 }}'
            nsx_ip_0 : '{{ nsx_edge_ip }}'
            nsx_netmask_0 : '{{ nsx_netmask_0 }}'
            nsx_dns1_0 : '{{ nsx_dns1_0 }}'
            nsx_domain_0 : '{{ nsx_domain_0 }}'
            nsx_ntp_0 : '{{ nsx_ntp_0 }}'
            nsx_isSSHEnabled : '{{ nsx_isSSHEnabled }}'

All we need to do is run playbook from Workflow 4 directory to start deployment process:

ansible-playbook deploy-nsx.yml

It will take some time as overall size of OVAs is more than 12 GB.

Let’s verify that everything got deployed:

It took around 1hr 40 minutes for me to deploy those 3 VMs. Again i don’t have fastest NAS so actual time should be lower. Overall if we add all 4 worklows it makes around 2 hours for all infrastrucure deployment and NSX-T rollout with edges.

Cheers :)