TKG with Multiple vCenters and VMC

Just to set the tone, from a TKG side, this post is totally unsupported from VMware. In other words, don’t call GSS if you decide to do this. : )

With that out of the way, let’s dive in. We have a customer who is looking to get out of the datacenter business and wants to utilize VMC as much as they can in place of their physical data centers. 

Prototyping some ideas, we decided to see if we could first deploy a vCenter Server into VMC to mange standalone hosts at remote locations (cell sites). We also wanted to host the TKG Management Cluster on VMC, and then only worker nodes of a workload cluster on the standalone hosts, this meant control-plane and worker node separation across vCenter Servers.

So how did it go? Surprisingly well. Over 2 days, with help from Timmy Carr and using William Lam as my sounding board, I was able to get all but one of the above working.

First task was a vCenter Server running in VMC managing remote ESXi hosts. Upon first try of installing the vCenter to VMC I was greeted with this lovely message:

A quick search lead me to a post where William had the same issue and was able to get around this by using the cli, I went that route with the same success. Side note, I reached out internally so hopefully we’ll get this resolved so the UI works as well.

So the first step was done, vCenter Server running on VMC. Next I needed some hosts for it to manage. The easiest for me was to create a site-to-site VPN between my home lab and VMC, I decided on the Route Based approach as I already run BGP in my lab. As luck would have it Eric Shanks posted a blog on how he set this up the same day I was going to do it so I had little to actually figure out on my own. One thing I did have to do that Eric didn’t cover was to create a local prefix policy as I was advertising 0.0.0.0/0 to VMC which was forcing all traffic from VMC to traverse the VPN and go out my Internet connection, not what I wanted. So a quick prefix list of only the subnets I wanted to advertise and a BGP restart on my end cleared that up.

I stood up a few nested ESXi hosts and added them to the inventory over on the hosted vCenter Server running in VMC.

Next up was to get a TKG Management Cluster running on VMC. I was deploying 1.31 patch 1 and honestly thought this was going to be a piece of cake. After a few hours of the control plane nodes going into high CPU loops, not joining the cluster and multiple re-tries I decided to reach out to William and see if he had ever seen this behavior and of course he had. He provided the fix, which was also in the release notes (doh!).

Once that was sorted TKG Management Cluster came up with no issues.

The next item in the list was where the real challenge started. We now have our TKG Management Cluster running on VMC and we have a 2nd vCenter Server hosted on VMC managing remote ESXi hosts.

Simply trying to use tanzu cluster create commands failed for a few different reasons. First the mangement cluster has no idea about that other vCenter Server (Cloud Instance) even when you do give it the info in the yaml and we’re also limited to one set of credentials (a service account) for TKG. I deployed with the VMC default [email protected] account, which obviously doesn’t exist on the other vCenter Server. So at first I thought we were done, it was about midnight and my brain just wasn’t working. Timmy had the great idea of using an external identity source. So since I already had the VPN up between my lab and VMC I added my AD domain as an authentication source to both VMC and the other vCenter Server. I just used the default cloudamin role in VMC and administrator “on-prem”. Then I had to delete the existing TKG mangement cluster and re-create it using an account from AD. This was painless and just worked as expected.

Now I thought I had this in the bag and issued a tanzu cluster create command with the values of our on-prem vCenter Server and watched as it failed miserably. TKG was still looking in its default cloud (VMC vCenter Server) for the objects (datacenter, folder, datastore, template, etc) I specified in the yaml, it was just flat out ignoring the vCenter Server information I was giving it. Timmy suggested an interesting workaround and it worked! 

What we had to do was take our mangement cluster yaml, copy it somewhere, edit the VIP in it, and use it as a dry run to get the a yaml file.

tanzu cluster create compute01 -f compute-vmc.yaml -d > onpremcompute.yaml

This gave us a yaml that was validated by TKG, onpremcompute.yaml, as it would have deployed to VMC. Now if we take that onpremcompute.yaml file and edit it we can replace the VMC VC values with our on-prem values and simply run a kubectl apply -f onpremcompute.yaml. This worked! Just to reiterate my opening sentence, while this did work, it is totally unsupported.

❯ kubectl apply -f onpremcompute.yaml
cluster.cluster.x-k8s.io/compute03 created
vspherecluster.infrastructure.cluster.x-k8s.io/compute03 created
vspheremachinetemplate.infrastructure.cluster.x-k8s.io/compute03-control-plane created
vspheremachinetemplate.infrastructure.cluster.x-k8s.io/compute03-worker created
kubeadmcontrolplane.controlplane.cluster.x-k8s.io/compute03-control-plane created
kubeadmconfigtemplate.bootstrap.cluster.x-k8s.io/compute03-md-0 created
machinedeployment.cluster.x-k8s.io/compute03-md-0 created
secret/compute03-antrea-addon created
secret/compute03-vsphere-cpi-addon created
secret/compute03-vsphere-csi-addon created
secret/compute03-kapp-controller-addon created
secret/compute03-tkg-metadata-namespace-role created
secret/compute03-tkg-metadata-configmap created
secret/compute03-tkg-metadata-bom-configmap created
clusterresourceset.addons.cluster.x-k8s.io/compute03-tkg-metadata created
secret/compute03-metrics-server-addon created
machinehealthcheck.cluster.x-k8s.io/compute03 created
clusterresourceset.addons.cluster.x-k8s.io/compute03-default-storage-class created
secret/compute03-default-storage-class created

I was feeling pretty good now. I took my findings back to engineering in the Telco group and they now asked that I test the control plane of compute cluster running on VMC and only the worker node run on the remote host. I thought, no problem, I had gotten this far! 

I edited my onpremcompute.yaml, obviously changing values such as the VIP, but also changed the control plane vCenter Server info. The yaml has two sections, one for control plane and one for worker nodes.

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: VSphereMachineTemplate
metadata:
  name: compute03-control-plane
  namespace: default
spec:
  template:
    spec:
     datacenter: /SDDC-Datacenter
      datastore: /SDDC-Datacenter/datastore/WorkloadDatastore
      diskGiB: 40
      folder: /SDDC-Datacenter/vm/Workloads
      memoryMiB: 8192
      network:
        devices:
        - dhcp4: true
          networkName: sddc-net01
      numCPUs: 2
      resourcePool: /SDDC-Datacenter/host/Cluster-1/Resources/Compute-ResourcePool
      server: VMC-vCenter
      storagePolicyName: ""
      template: /SDDC-Datacenter/vm/Templates/photon-3-kube-v1.20.5+vmware.2


....

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: VSphereMachineTemplate
metadata:
  name: compute03-worker
  namespace: default
spec:
  template:
    spec:
      cloneMode: fullClone
      datacenter: /RAN-DC01
      datastore: /RAN-DC01/datastore/esx01:DS01
      diskGiB: 40
      folder: /RAN-DC01/vm/k8s
      memoryMiB: 8192
      network:
        devices:
        - dhcp4: true
          networkName: ran01-vds01-management
      numCPUs: 2
      resourcePool: /RAN-DC01/host/RAN-Site01/esx01.vcdx71.net/Resources
      server: vc01.vcdx71.net
      storagePolicyName: ""
      template: /RAN-DC01/vm/Templates/photon-3-kube-v1.20.5+vmware.2

Notice the variables are different between control plane and compute. What happened when trying to deploy this is all VM’s were cloned to their correct vCenter Server in the location provided in the yaml. But since we can only define a single Cloud Provider, which I had set to the VMC vCenter Server, the nodes coming up in the on-prem vCenter Server were never acknowledged as up and were continually destroyed and re-created. So this scenario just doesn’t work. 

Not ready to give up, I wanted to validate I could split the control plane and worker nodes if all the hosts where inside the same vCenter Server but on different standalone hosts or clusters. I created a vSphere Cluster in the on-prem vCenter Server and added a single host to it (it’s only a test), updated my yaml above to reflect everything in the same vCenter Server, just different paths, networks, etc.. This time everything came up and worked as expected.

As you can see in the above image under RAN-Site01, compute02 has a worker node, while under the Control vSphere Cluster you can see compute02’s control plane cluster. Compute01 has both the control plane and worker nodes on the standalone host under RAN-Site01 and then compute04 is split between RAN-Site01 and the vSphere Cluster named control.

Last bit I wanted to check was LCM. I was curious if TKG would be able to upgrade a workload cluster running in another vCenter Server. So I deployed a 1.19 cluster using the above method, ran a tanzu cluster upgrade against it and it just worked, no issues at all.

❯ tanzu cluster list
  NAME       NAMESPACE  STATUS   CONTROLPLANE  WORKERS  KUBERNETES        ROLES   PLAN  
  compute01  default    running  1/1           1/1      v1.20.5+vmware.2  <none>  prod  
  compute02  default    running  3/3           1/1      v1.20.5+vmware.2  <none>  prod  
  compute04  default    running  1/1           1/1      v1.19.9+vmware.2  <none>  prod  


❯ tanzu cluster upgrade compute04
Upgrading workload cluster 'compute04' to kubernetes version 'v1.20.5+vmware.2'. Are you sure? [y/N]: y
Validating configuration...
Verifying kubernetes version...
Retrieving configuration for upgrade cluster...
Create InfrastructureTemplate for upgrade...
Upgrading control plane nodes...
Patching KubeadmControlPlane with the kubernetes version v1.20.5+vmware.2...
Waiting for kubernetes version to be updated for control plane nodes
Upgrading worker nodes...
Patching MachineDeployment with the kubernetes version v1.20.5+vmware.2...
Waiting for kubernetes version to be updated for worker nodes...
updating additional components: 'metadata/tkg' ...
updating additional components: 'addons-management/kapp-controller' ...
Cluster 'compute04' successfully upgraded to kubernetes version 'v1.20.5+vmware.2'


❯ tanzu cluster list
  NAME       NAMESPACE  STATUS   CONTROLPLANE  WORKERS  KUBERNETES        ROLES   PLAN  
  compute01  default    running  1/1           1/1      v1.20.5+vmware.2  <none>  prod  
  compute02  default    running  3/3           1/1      v1.20.5+vmware.2  <none>  prod  
  compute04  default    running  1/1           1/1      v1.20.5+vmware.2  <none>  prod  

Notice in the Kubernetes column the version between the two tanzu cluster list commands.

Quite fun to work on this as it’s not something I’ve seen done before. Thanks to Timmy and William for jumping in and helping out!