Google Distributed Cloud Virtual for vSphere (GDCV vSphere) enables customers to deploy the same Kubernetes as Google Kubernetes Engine in the cloud, on their own hardware and data centers, with GKE Enterprise tooling to manage clusters at enterprise scale. Enterprises rely on GDCV (vSphere) to support their primary business applications, providing a scalable, secure, and highly available architecture. By integrating with their existing VMware vSphere infrastructure, GDCV makes it easy to deploy secure, consistent Kubernetes on-premises. GDCV (vSphere) integrates with VMware vSphere to meet high availability requirements, including HA admin and user clusters, auto-scaling, node repair, and now, VMware’s advanced storage framework.

Problem

Many VMware customers use aggregation functionalities such as datastore clusters to automate their virtual disk deployment. By combining many datastores into one object, they can let vSphere decide where to put a virtual disk, basically picking the best location for the given requirement.

Storage interactions between the Kubernetes cluster and vSphere are driven through the Container Storage Interface (CSI) driver module. VMware releases its own CSI driver, vSphere CSI. While this integration allows you to get started very quickly, it also comes with limitations, since even the VMware-delivered driver does not support datastore clusters. Instead, VMware relies on Storage Policy Based Management (SPBM) to enable administrators to declare datastore clusters for workloads and provide the placement logic that ensures automatic storage placement. Until this point, SBPM was not supported in GDCV making storage on these clusters harder and less intuitive for VM admins after being used to the flexibility of SBPM for VMs.

Solution

With version 1.16, GDCV (vSphere) now supports SPBM, enabling customers to leverage a consistent way to declare datastore clusters and deploy workloads. GDCV’s implementation of SPBM provides the flexibility to maintain and manage the vSphere storage without touching GDCV or Kubernetes. In this way, GDCV lifecycle management fully leverages a modern storage integration on top of vSphere, allowing for higher resilience and a much lower planned maintenance window.

This new model of storage assignment is through the integration of GDCV with VMware’s SPBM, which is also enabled by making advanced usage of the VMware CSI driver. Storage Policy Based Management (SPBM) is a storage framework that provides a single unified control plane across a broad range of data services and storage solutions. The framework helps to align storage with application demands of your virtual machines. Simply put, SPBM lets you create storage policies mapping, linking VMs or applications to their required storage needs.

By integrating with VMware’s SBPM, creating clusters with GDCV from a storage perspective is now just a matter of referencing a particular storage policy in the clusters’ install configurations.

Storage policies can be used to combine multiple single datastores to address them as if they were one, similar to datastore clusters. However, when queried, the policy currently delivers a list of all compliant datastores, but does not favor one for the best placement. In order to do that, GDCV will take care of that for you. It will analyze all compliant datastores in a given policy and pick the optimal datastore for the disk placement. And that is done dynamically for every storage placement using SPBM with GDCV.

The beauty of it is, from an automation point of view, when anything changes from a storage capacity or maintenance perspective — all changes are made from the storage end. Operations like adding more storage capacity can now be done without the need of changing GDCVs configuration files. .

This simplifies storage management within GDCV quite a bit.

A closer look at SBPM policies

With SBPM, a VMware administrator can build different storage policies based on the capabilities of the underlying storage array. They can define one/many datastores to one/many policies — and then assign the VM a policy that best defines its storage requirements. So in practice, if we want gold level storage (e.g. SSD only for production environment) we first create a policy defining all Gold level storage and add all the matching datastores to that policy. We then assign that policy to the VMs. Same for e.g. Bronze level storage. Simply create a Bronze storage policy with bronze level datastores (e.g. HDD only for Dev environment), and apply it to the relevant VMs.

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_S5OCK3z.max-2000x2000.jpg

How SPBM works in GDCV

In order to use the storage policy feature in GDCV, The VMware admin needs to set up at least one storage policy in VMware which is compatible with one or more datastores your GDCV cluster can access.

GDCV supports datastore tag-based placement. VMware admins can create specific categories and tags based on pretty much anything related to the available storage — think performance levels, cluster name, and disk types as just a few examples.

Let’s look at two key storage requirements in a cluster — the VM disk placement and the persistent volume claims (PVC) for stateful applications — and how to use SBPM policies to manage these.

VM disk placement

Let’s go back to our Gold and Bronze examples above. We are going to define storage requirements for a GDCV User cluster. Specifically, we want to refer to two different storage policies within the User cluster configuration file:

  1. cluster wide storage policy – this will be the default policy for the cluster and in our case will be our “Bronze Policy”
  2. A storage policy for specific node pools within the user cluster — our “Gold” policy

First, the “Gold” and “Bronze” tags are assigned to the different datastores available to the GDCV node VMs. In this case, “Gold” refers to SSD disks only; “Bronze” refers to HDD disks only.

To create and assign tags, follow the documentation; noting that tags can also be applied to datastore clusters, or datastores within the cluster

Once the tags are created, the storage policy is defined as per the official documentation.

After creating a storage policy, the datastores compatible with the policy can be reviewed — see example below:

https://storage.googleapis.com/gweb-cloudblog-publish/images/2_YulWgFT.max-1500x1500.jpg

Now, let’s apply some storage policies to our user cluster configuration files.

Define cluster-wide policy (“Bronze” policy)

ApiVersion: v1
kind: UserCluster
# A unique name for this cluster
name: "cluster-001"
...
...
# # (optional) vCenter configuration
vCenter:
  storagePolicyName: "bronze"

In this user cluster config file snippet, the storage policy name “Bronze” is set at the cluster-level. This means all the provisioned VMs in all of the nodepools will use this storage policy to find compatible datastores and dynamically select the one that has sufficient capacity to use.

Define node pool policy (“Gold”)

ApiVersion: v1
kind: UserCluster
# A unique name for this cluster
name: "cluster-001"
...
nodePools:
- name: nodepool-1
  cpus: 4
  memoryMB: 8192
  replicas: 5
  vsphere:
    storagePolicyName: "gold"
- name: nodepool-2
  cpus: 4
  memoryMB: 8192
  replicas: 5

In this user cluster config file snippet, a storage policy (“Gold”) is at a node-pool level. This policy will be used to provision VMs at that node pool, and all other storage provisioning will use the storage policy specified at the cluster section.

Using storage policies like this abstracts the storage details from the cluster admin. Also, if there is a storage problem — for example with capacity — then more datastores can be tagged so that they are available within the storage policy — typically by the VMware admin. The GDCV cluster admin does not need to do anything, and the extra capacity that is made available through the policy is seamlessly incorporated by the cluster. This lessens the administrative load on the cluster admin and automates the cluster storage management.

Persistent Volume Claims

A user cluster can have one or more StorageClass objects, where one of them is designated as the default StorageClass. When you create the cluster using the documented install guide, Google will have created a default storage class.

Additional storage classes can be created, which can be used instead of the default. The vSphere CSI driver allows the creation of storage classes with a direct reference to any existing storage policy within the vCenter where GDCV user cluster runs on.

This means that volumes created by PVCs in the cluster will be distributed across the datastores that are compatible with the storage policy defined in our user clusters. These storage classes can map to VMFS, NFS and vSAN storage policies within vSphere.

The file below configures a StorageClass that references a policy — “cluster-sp-fast”

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: cluster-sp-fast
provisioner: csi.vsphere.vmware.com
parameters:
    storagePolicyName: cluster-sp-fast

This storage class can then be referenced in a persistent volume claim. See below:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pvcsc-vmfs
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi
  storageClassName: cluster-sp-fast

Volumes with this associated claim will be automatically placed on the optimal datastore included in the “cluster-sp-fast” vSphere storage policy.

Conclusion

So in this post we have discussed the integration of GDCV with VMware’s SPBM framework. This integration is great news for GDCV admins as it allows the automation of storage management by taking it away from hard links between specific datastores and moving it more towards a dynamic storage assignment, managed from the VMware side. This means less overhead and less down times for the GDCV clusters and more flexibility in storage management

Learn more about the Google Distributed Cloud, a product family that allows you to unleash your data with the latest in AI from edge, private data center, air-gapped, and hybrid cloud deployments. Available for enterprise and public sector, you can now leverage Google’s best-in-class AI, security, and open-source with the independence and control that you need, everywhere your customers are.