How to Auto Deploy stateless nested ESXi hosts with NICs on trunked VLAN port groups

When using Auto Deploy for a stateless ESXi server (ie. PXE boot) over a NIC that is connected to a trunked switch port (in my case 0-4094), it will default to using the native VLAN, and thus the DHCP server on your native VLAN (in most cases VLAN 1).

This is almost certainly undesirable for your environment; in a physical environment you have the ability to configure the BIOS (on particular servers) to PXE boot on a specified VLAN, unfortunately that isn’t an option using a nested ESXi host as the BIOS doesn’t have this ability.

Stateless nested host NIC configuration

Each of my nested hosts is setup like this:

 

You can see that vmnic2 and vmnic3 are both direct connected to my IP storage over 10GbE; vmnic/vmnic1 are the adapters that we would want to attempt to PXE boot from for a nested stateless host.

As mentioned above, if we tried to PXE boot off either vmnic0 or vmnic1 at the moment you would need to have your DHCP server on your native VLAN configured for PXE and be able to route traffic to the VLAN that your tftp and vCenter servers are running on, not desirable.

The workaround

What I did to get around this was add another NIC to my nested host VMs (only for the stateless hosts, stateful ones don’t require it). This 5th NIC would reside on the same VLAN that I have configured for my management vmkernel adapters (VLAN 50, 192.168.5.0/24 for reference). Notice the extra NIC (vmnic4) below:

 

Next, we need to configure the BIOS to boot from our newly created NIC as its first option (NOTE: when using UEFI, you can explicitly disable all other boot options, this isn’t possible using BIOS, hence the ordering change).

Once we have configured the boot priority, it’s time to power on the VM that you intend to use as your stateless ESXi server.

Once booted, the host will show in vCenter as in Maintenance Mode, that is because we still need to configure the host customisation options against the host profile for the deploy rule this server booted from. So go ahead, remediate the host and specify the relevant customisation options.

Where has the host gone?!?

Your host has almost certainly just lost connectivity with vCenter, this would have happened after applying the host profile with customisations. I spent a good while scratching my head as to why this happened, here’s what I observed:

  • Nested host boots on our newly created NIC (on the same port group that we intend to have the management vmkernel adapter on)
  • Nested host registers in vCenter, shows as being in Maintenance Mode, need to configure host customisation
  • Apply host profile with customisations, after some time this operation will fail because vCenter can no longer communicate with the host
  • Check the nested host via Console, the management IP is now different to what it booted with

What has happened is that the host has registered the vmkernel adapter on the same port group as we PXE booted on, but because it has a different MAC address to the NIC that we PXE booted on, it will get its own lease from the DHCP server. vCenter registered this host to communicate on the IP/hostname that it first presented with after PXE booting, but now the management vmkernel adapter is listening on a different address. Quite the dilemma…

The final tweak

The way around this is to do the following:

  • Edit the host profile applied to your Auto Deploy rule set
  • Under:
    • Networking configuration
      • Host virtual NIC
        • Port Group that your management vmkernel adapter will sit on, ie in my case “1GbE-Physical-10GbE-Virtual-vDS : 192.168.5.0-24_ESXi-Mgmt”
          • Determine how MAC address for vmknic should be decided
            • Use the MAC address from which the system was PXE booted

This will ensure that we don’t get a fresh DHCP lease for our management vmkernel adapter due to a differing MAC address to the NIC that the host was PXE booted from. We won’t lose connectivity with vCenter after our stateless host is booted and has the host profile applied.

After ensuring that this is set in the Host Profile, you should be right to boot the nested stateless host without any issues.

For reference, this was tested in my lab against vCenter Server Appliance 6.5.0d and ESXi 6.5.0d.

Comments

  1. Ian
    December 1, 2017 / 6:51 pm

    Holy cow, this is exactly what I an playing with in my home lab and scratching my head over. Thanks for the awesome save!

    • admin
      December 1, 2017 / 11:52 pm

      You’re welcome, I was scratching my head over it for some time too! Glad I could help.

Leave a Reply

Your email address will not be published. Required fields are marked *