VMware vSAN on Cisco UCS Part 2 – UCS Profile

Is mentioned in part 1 this is part of a larger post that I never got around to finishing because it just grew to unmanagable size. This is part 2 and I will touch on my configurations of the UCS profiles that I use to run vSAN. Primarily this part is done because, despite Cisco having vSAN Ready Nodes, they lack a validated design of how to set it up – if you know of one please let me know!

Preamble

Cisco UCS managed servers have a great advantage of being easy to make consistently configured while still maintaining easy options to update configurations across multiple servers. I have been working with UCS Manager for close to 8 years now and have had a lot of problems occur and learned alot about the function of the product through that. These “recommendations” are based on my personal preferences and borrows from different types of best practices and configurations that I have encountered over the years.

As I primarily work with M5 configurations today I will focus on that and inject points for some M4 stuff I have encountered and their fixes/workarounds

One thing that I, all though not necessary, do is to separate my clusters into separate Sub-Organizations inside of UCS Manager. This gives a nice clean look where I can make generic policies in the Root organiation and specific policies under each sub-organization.

Boot Drive configuration

On M5 (and M6) I use a Storage Profile define the OS LUN for ESXi. As all other disks are in JBOD mode nothing needs to be done other than confirm JBOD mode (which is default if you select a SAS HBA for the server). The Storage Profile consists of a Disk Group policy and a Local LUN definition.

The Disk Group policy I usually define is setting RAID Level to RAID 1 Mirrored and then flipping Disk Group Configuration to manual. I then defined disk number 253 and 254 as the constituents of the Disk Group as these are always the two disks on the M.2 HW RAID controller. Everything else I leave deafult.

With this Disk Group Policy in hand I create a Storage Profile and under the Local LUNs section I create a LUN. I normally call the LUN OS and set a Size of 32 GB. Auto Deploy is set and Expand To Available is checked and finally the Disk Group Policy is set.

I could set the 32 GB larger but given the Expand to Available is enabled it will automatically fill in the 240 GB RAID 1 volume or 960 if choosing the large boot drives.

For M4 I use a different method which will be mentioned in the Boot Policy section.

Network Policies

Next up to configure is networks. Here I have borrowed a bit of what Cisco HyperFlex does. Hyperflex is Cisco answer to vSAN and it works to some extent in a similar manner.

First thing to do is to allow for QoS to have the correct MTU settings so that I can utilize CoS Preserve on the upstream switches if need be. Below table shows the settings I use in my environments.

PriorityCoSPacket DropWeightMTU
Platinum5No49216
Gold4Yes41600
Silver2Yesbest-effort1600
Bronze1Yesbest-effort9216
Best EffortAnyYesbest-effort9216
QoS System Class

Iuse Platinum for vSAN storage traffic, Gold for VM guest traffic, Silver for the ESXi management traffic and Bronze for vMotion interfaces. Note that both Bronze and Platinum allow MTU 9000 Jumbo frames to be used inside ESXi for optimum performance. Make sure the upstream switches from your Fabric Interconnects support MTU 9216.

I take these classes and create matching QoS Policies from. Simply use the same name and select the priority and I use all default settings otherwise. I need these policies when configuring vNICs.

I usually also create a Network Control Policy that allows CDP and LLDP both recieve and transmit, allow forged MAC and set the action to Link Down when an uplink fails. More on that later.

Before we start defining vNICs and LAN Connectivity Policies we need MAC addresses for the vNICs. UCS Manager allows you to define your own MAC addresses inside of the 00:25:B5 and then defining as much of the remaining as you want. You can easily just create a single pool and have UCS Manager assign MAC addresses from that pool but we borrow an idea from how Hyperflex designs their MAC pools.

What you do in Hyperflex is to select the 4th octet of the MAC as a prefix for a cluster e.g A1 so that start of each MAC is 00:25:B5:A1. That means you can identify a cluster in your network based on the 4th octet alone. Neat!

Next Hyperflex uses the 5th octet to define vNIC number and attached fabric. This means that vNIC1 will have A1 and vNIC2 will have B2. That means when setting it up you can match the 5th octet to a function. I use A1 and B2 for esxi management, vSAN on A3 and B4, guest traffic on A5 and B6, vMotion on A7 and B8 and any additional required NICs continue from there.

I create MAC pools to match a minimum of 8 vNics (2 mgmt, 2 vSAN, 2 guest and 2 vMotion). Then add 2 for NFS and 2 for virtual networking if needed.

With the MAC pools in hand and the policies from above I create a set of vNICs for ESXi. I prefer to have 2 for each function, one on fabric A and one on fabric B without fabric failover – ESXi can easily handle the failover and if I set it up like this ESXi can use both links from the server if need be and in case of a failure on one of the links I would rather see the vNICs go down and have ESXi handle the failover instead of it being transparent for ESXi.

Each vNic name is suffixed with the expected fabric so e.g. esxi-mgmt-a and esxi-mgmt-b. I set the “-a” as primary template in a Peer redundancy setup an “-b” to the secondary. This allows me to only update vlans and configuration on the “-a” vNIC and configuration will be in sync with the “-b” vNIC. The Template type is set to Updating to allow for adding things like additional vLANs to all servers using this vNIC without having to go through every profile. MTU needs to match the QoS policy selected and defined above. Select the matching MAC pool and set the Network Control Policy and done. Then repeat for each required vNIC.

I use the created vNICs to create a LAN Connectivity Policy which contains all the vNICs and setsthe adapter policy to VMWare (yes Cisco capitalizes it wrong 🙁 ). And that is it for networking for now. We will use the LAN Connectivity Policy when defining the Server Profile Template.

Server Profile Policies

I need a couple of Server Policies before we can create the Server Profile Template. First one I create is a Scrub Policy. This policy I generally make in the Root scope as I globally want scrub to be disabled for all types; Disk, BIOS, FlexFlash and Persistent Memory. I generally don’t want UCS to wipe settings unless specifically instructed to do so.

Next up is a Boot Policy. For M5 I define a Policy that uses Boot Mode UEFI and with Secure Boot enabled. Then I add a single boot option of type Local LUN using the LUN Name OS, which we defined in the Storage Profile previously.

If attempting to boot from an internal drive in an M4 as described in Part 1 some special options need to be set. Instead of using Local LUN select Embedded Disk and then modify the Uefi Boot Parameters option to set Boot Loader Name to “BOOTX64.EFI” and Boot Loader Path to “\EFI\BOOT\”. This is the only way I found to do UEFI secure boot on those drives.

I setup a Maintenance Policy for the Server Profile as well to set every action that might require reboots to “User Ack” which means that I need to manually approve any reboots of the host from profile changes. I also set the “On Next Boot” option to allow for easy firmware updating while updating ESXi. On Next Boot will apply any pending changes if the host reboots like when applying ESXi updates. Convenient!

Lastly I create a Host Firmware Package policy which sets the version of firmware to use in that cluster. As firmware packages can contain firmware for the SAS HBAs I want tight control as to which firmware is used. This also allows me to change the firmware level of the cluster in one step and then have pending changes for each host ready for when I’m ready to do the reboot to update firmware.

Server Profile Template

With all those profiles and things ready I can now create the template that each server will be instantiated from. This will be an updating template to allow for changes to be done consistently on all hosts and avoid configuration drift.

I usually just run through the wizard and select the policies created where applicable. As we don’t have any FC in our setup I usually don’t setup any vHBA’s. These can be added later given the Updating setting.

Only thing I do manually is to select the LAN Connectivity Policy to get the required vNICs for ESXi attached. Once added I complete the Wizard and go back into the network tab of the template to click “Modify vNIC/vHBA Placement”. I do this because the view to edit is easier to manage when access from there instead of in the wizard. I then manually place the vNICs in the order I want to force.

Conclusion

With all that there is now a profile template that can be used to produce identical ESXi hosts for vSAN usage. The profile even works on “compute only” nodes that don’t provide any storage to the system as long as they still use the M.2 HWRAID boot module. Very nice in my opinion.

Next up in part 3 I will go over some of my ESXi configurations that I prefer in the vSAN pods I run.

VMware vSAN on Cisco UCS Part 1 – Hardware

I have had parts of this post saved in draft for months without getting it finished because it was turning to a monster of a post if I tried covering it all. I finally found the drive to finish it when I realized that this was probably better if I split it in multiple posts instead of trying to include hardware considerations, UCS manager / standalone profile configurations and ESXi configurations into one single post.

So without further ado lets dive into the hardware part of VMware vSAN on Cisco UCS.

Please do note that these are my personal opinions and may or may not align with what you need in your datacenter solution. Most designs are individual for at specific use case and as such cannot be taken directly from here.

Base models

Cisco has a bunch of certified vSAN Ready nodes based on M4, M5 and M6 branches of servers. M3 isn’t supported as the hardware is both EOL and most of the controllers available for M3 models weren’t powerful enough for running vSAN workloads. The most common to use are Cisco’s C240 M5SX 2U models which allow for 24-26 drive bays total. For smaller deployments the C220 M5SX is also an excellent option with up 10 drives in 1U.

It is technically possible to run vSAN on other types of servers like the S3260 and B200 blades but they limit your options in terms of storage to compute ratio (S3260 being able to provide massive amounts of storage but little in compute and B200 being the opposite due to only having 2 disk slots).

One thing to note is if you plan on using NVMe storage options you need to focus on M5 and M6. M5 allows for up to 4 NVMe devices in U.2 format while M6 can support up to 24 NVMe devices. M4 only supports PCIe NVMe devices.

Boot options

Cisco has traditionally been a network boot company and as such the primary local boot option on M3 and M4 is SD cards if you don’t want to waste disk slots on boot devices. On B200 M4 with only 2 disk slots SD card is currently the only option as the disk slots are needed for a caching and capacity disk. On all M5 and M6 models (B200 included) there is a new dedicated slot for inserting a UCS-M2-HWRAID controller which can fit 2 M.2 drives (either 240 or 960 GB) and can do actual RAID that ESXi supports. Do not use the UCS-MSTOR-M2 controller which fits the same slot and fits 2 M.2 as well but this only supports the onboard LSI-SW RAID from the Intel chipset and that is only supported by Windows and Linux and not ESXi. It is not that expensive – just by the HWRAID controller 🙂

Specifically on the C240 M4 if you choose a UCSC-PCI-1C-240M4 you can insert up to two drives internally in the server that are managed by the onboard controller. You won’t have RAID functionality but it beats SD card booting by miles!

NIC

My go to here is using M5 servers with a UCSC-MLOM-C40Q-03 (VIC 1387) in combination with 6300 series Fabric Interconnects. That provides 2x40G per server which pairs nicely if your upstream network is 40 or 100G. On M6 that would be UCSC-M-V100-04 (VIC 1477) that provides the same.

If you are using 6400 series Fabric Interconnects and a 25G infrastructure you might want to go with UCSC-MLOM-C25Q-04 (VIC 1457) on M5 and UCSC-M-V25-04 (VIC 1467) on M6 to give 4×10/25G connections instead. Depends on your infrastructure.

On M4 it is technically possible to use the UCSC-MLOM-C40Q-03 (VIC 1387) all though the UCSC-MLOM-CSC-02 (VIC 1227) adapter is way more common but only provides 2x10G connections. If you run a pure 10G infrastructure and continue to do so I recommend adding an additional UCSC-PCIE-CSC-02 (VIC 1225) to provide 2x10G. I see this combination primarily used with 6200 series Fabric Interconnects.

For blades the standard is UCSB-MLOM-40G-03 (VIC 1340) for M4 and UCSB-MLOM-40G-04 (VIC 1440) for M5 and M6. Both cards are 2x40G. These need to be paired with IOM’s in the blade chassis which can limit the speed of the vNICs presented. Usually you get 2x20G on IOM 2304 and 2208. Consult your Cisco vendor to confirm how to get optimum speeds for your setup.

Controllers

Now the probably most crucial part of the any vSAN deployment – the controller. Albiet less important if you go for all-NVMe or even the new ESA option in vSAN 8 you need at SAS/SATA controller to handle your disks.

On C240 M4 this is usually UCSC-SAS12GHBA or UCSC-MRAID12G with a UCSC-MRAID12G-1GB cache module. Both are on the HCL but SAS HBA is prefferable over the RAID controller

On C220 and C240 M5 the only real options for vSAN are UCSC-SAS-M5 and UCSC-SAS-M5HD respectively. Primary difference is how many drives the controller is capable of utilizing which of course needs to be higher for the C240.

On the C240 M6 the option is CSC-SAS-M6T (UCSC-SAS-240M6) which allows for up to 16 disks but to be honest – if you are going for M6 nodes you should probably go for an M6N og M6SN for all NVMe configuration instead.

Disks

I won’t touch too much on this as various use cases and requirements need different numbers of disk groups and capacity devices. You use case may vary. We primarily use 3.8 TB Enterprise Value SATA SSD’s for capacity simply because they are fast enough and readily available to us. We aim to use NVMe caching devices if at all possible but if not we select a high endurance and performance SAS SSD for caching.

One note to have in mind. M4 only supports PCIe NVMe devices. On the C220 M5SX two front slots can be used for NVMe and on C220 M5SN all 10 slots can be NVMe. On the C240 M5SX slots 1 and 2 as well as 25 and 26 (on the rear) can be used for NVMe’s and on the C240M5SN bays 1-8 can be used for NVMe.

If you are retrofitting NVMe’s into existing C2x0 M5’s note that on the C220 M5 you need a CBL-NVME-220F to be able to use the front facing NVMe drives if not already present.

On the C240 M5 I recommend going for a UCSC-RIS-2C-240M5 which supports both 2xfront and 2xrear mounted NVMe’s if you remember to order a CBL-NVME-240SFF and UCSC-RNVME-240M5 to connect the front and rear slots respectively to the riser. This configuration allows you up to 4 NVMe caching devices while using SAS/SATA capacity drives up to 5 drives per group which can be a lot of disk and performance.

Conclusion

So those are the notes on hardware I have. I have not touched on CPU types and memory configurations at all as this is something that needs to match your workload. Somethings might need 3.0 Ghz base clock and no memory or loads of cores and memory. Pick something that matches the workload but I would recommend sticking to Xeon Gold CPU’s to get a good balance of performance and cores and selecting a configuration of 12 DIMMs for M5’s to get maximum memory bandwidth.

In the next article I’ll touch on the UCS Manager configurations that I use for vSAN.

vCloud Usage Meter 4.3 .local resolution issues

As part of our ongoing engagement with VMware we are required to operate vCloud Usage Meter to measure rental license usage for reporting back to VMware. We have been running an older build for a long time now waiting for the 4.3 release to come out because this new release could correctly measure vRealize Automation usage based on the Flex bundle Addon model rather than per OSI.

I got the appliance deployed just before the holidays but ran into several issues that I’d like to share with you.

First issue I ran into actually prompted me to redeploy because the migration of configuration from the old appliance ended in a bad state. It was caused by two things 1) I was missing a Conditional Forwarder for a domain on the DNS servers on the new appliance was using and 2) systemd-resolved is a nightmare to work with!

It like to focus in on the systemd-resolved. I really don’t like this piece of software as it is insanely frustrating to troubleshoot on. What it basically does is set the /etc/resolv.conf server to a local address on the server (127.0.0.53) and on that IP a daemon is listening for requests. If it can answer the request it does otherwise it passes the request onwards as normal.

But – and this is the crucial part – it handles “.local” domains a bit different. What it actually does I cannot answer completely but .local is being used by some services like Bonjour and mDNS. This is crucial as if you do not explicitly state that a .local domain needs to be resolved via actual DNS systemd-resolved won’t do it.

To jump a bit – the new Usage Meter 4.3 appliance runs on Photon OS which uses systemd. The older appliances use SLES which doesn’t and thus don’t have the issues. I had to do a lot of tinkering to get this working but managed by following this article: https://github.com/vmware/photon/issues/987 and making sure that both my required .local domains were present in the search path parameter and that the DNS servers were explicitly inserted into the 10-eth0.network config file.

I had to do both things otherwise it did not work. Search path can be configured correctly on deploy if you remember it. The DNS settings must be done after deployment but before running the migration script. Double check DNS resolution before attempting migration – it’ll save you headaches!

The appliance has been deployed and config migrated which prompted me with to errors – that old 5.5 vCenter that hadn’t been fixed yet and a currently unknown bug in registering a vRealize Automation 7.6 install – VMware support are investigating!

VMworld 2020 and General Announcements

Ohh it has been a while again since the last time I got to writing. Being busy with maintenance work is not really something that makes for great blog articles.

But last week I got to attend VMworld 2020! This year due to the situation world wide it was a virtual setting so for me it was two days in the home office watching a lot of great content on Kubernetes, NSX, vSAN and much more.

So many great things we announced. But the thing that struck me first was the acquisition of SaltStack. This is a major move to actually incorporate a configuration management system into the VMware portfolio and will certainly strengthen vRealize Automation in the future and hopefully also other parts of the ecosystem!

Another very huge announcement was Project Monterey. Although I’m still trying to wrap my head around the use cases and oppertunities this presents I do like the idea very much! Being able to offload vSAN and NFV workloads to the a SmartNIC is a great idea and I hope to see it evolve in the future.

This week also saw some the GA release of several new versions of the core products from VMware. These were announced previously but I was not aware that they would be releasing so soon – but that is just the cherry on top!

First up is the release of vSphere 7 U1! Biggest new feature has got to be the ability to run vSphere with Tanzu as well as new scalability maximums for VMs.

Along with vSphere 7 U1 there is of course also a vSAN 7 U1 release! Here features like HCI mesh allowing you to share the vsanDatastore natively between vSAN pods is one of my top features. Improvements to the fileservices of vSAN also landed as well as the option to only run compression on vSAN and not both compression and deduplication. Great features! For those running 2-node clusters or stretched clusters requiring witness a huge improvement has also landed allowing a witness server to be shared by up to 64 clusters! Very nice!

Another feature also seems to have crept in as detailed by John Nicholson. It is the option to run the iSCSI feature on stretched clusters. Again a very nice feature to have included for those needing it.

Last bit of GA material that I wanted to comment on aswell is the release of vRealize Automation 8.2. There are much needed improvements to the multi-tenancy of vRA as well as improvements to Infrastructure-as-code workflows and Kubernetes.

It can be a daunting task to keep up with all the releases from VMware but their ability to push new releases and features never ceases to amaze me!

WSL2 issues – and how to fix some of them

I have been waiting in anticipation for WSL2 (Windows Subsystem for Linux) and on May 28th when the update released for general availability I updated immediately.

At first I was super hyped. WSL2 and the Ubuntu 20.04 image just worked and ran smoothly and quickly. Combined it with the release version of Windows Terminal it was a real delight.

I also went and grabbed Docker Desktop for Windows as it now has support for WSL2 as the underlying system. And joy it just installed and worked. Now being capable of running Docker containers directly from my shell without doing some of doing it the way I did before having a Ubuntu VM running in VMware Workstation and connecting to it via docker-machine on my WSL1 Ubuntu image. A hassle to get to work and not a very smooth operation.

Having the option to just start Docker containers is amazing!

But then I had to get some actual work done and booted up VMware Workstation to boot a VM. And it failed. With a Device Guard error. I followed the guides and attempted to disable Device Guard to no avail. Then it dawned on my. WSL2 probably enables the Hyper-V role! And that is exactly what happened.

Hyper-V and Workstation (or VirtualBox for that matter) do not mix well – that is until VMware released Workstation 15.5.5 to fix this exact problem just the day after WSL2 released. Perfect timing!

Simple fix – just update Workstation to 15.5.5 and reboot and WSL2 and Workstation now coexisted fine!

I played a bit more with WSL2 in the following days but ended up hitting some wierd issues where networking would stop working in the WSL2 image. No real fixes found. Many indicate DNS issues and stuff like that. Just Google “WSL2 DNS not working” and look at the mountains of issues.

But I suspected something else because DNS not working was just a symptom – routing out of the WSL2 image was not working. Pinging IPs outside the image did not work. Not even the gateway IP. And if the default gateway is not working of course DNS is not working.

I found that restarting fixed the issue so got past it that way but today it was back. I was very interested in figuring out what happened. And then I realized the potential problem and tested the fix. I was connected to my work network via Cisco AnyConnect. I tried disconnecting from VPN and testing connectivity in WSL again – now it works. Connected to VPN again and connectivity was gone.

Okay – source found – what’s the fix? I found this thread on Github that mentions issues with other VPN providers even when not connected. Looking through the comments I found a reference to a different issue of the same problem but regarding AnyConnect specifically.

I looked through the comments and many fixes around changing DNS IP and other things but the fix that seem to do the trick was running the following two lines of Powershell in an elevated shell after connecting to VPN

Get-NetIPInterface -InterfaceAlias "vEthernet (WSL)" | Set-NetIPInterface -InterfaceMetric 1
Get-NetAdapter | Where-Object {$_.InterfaceDescription -Match "Cisco AnyConnect"} | Set-NetIPInterface -InterfaceMetric 6000

Those two lines change the Interface Metric so that the WSL interface has a higher priority than the VPN connection. This inadvertently also fixed an issue that I had with local breakout when on VPN not working correctly.

Downside of the fix is that this needs to be run every time you connect to VPN. I implemented a simple Powershell function in my profile so I just have to open an elevated shell and type “Fix-WSLNet”.

That is all for now!

vRealize Orchestrator 8.1 (and others) announced!

I’m late to the party as usual but simply needed to write up a little quick post on this.

VMware announced a whole slew of new releases yesterday with the primary focus being on vSphere 7 and the new Kubernetes integrations that brings. I hope to get time to look more into Kubernetes on vSphere once that becomes available as this is an area I have much interest in learning more about.

But the biggest thing for me as of right now is the announcements for vRealize Orchestrator 8.1!

I have really wanted to like the new HTML 5 interface that came in 7.6 but it had issues! No lie there. And as I have not had the time to test it in 8.0 yet I hope that 8.1 will bring back some of the glory to vRO!

Among the features I will look forward to the most is the return of the “Tree-View” to show a hierarchical sorting and bundling of related workflows. The tag based approach used in 7.6 and 8.0 don’t really appeal to me. I like to be able to tag workflows but not being able to sort and organize them in any other way is not optimal.

But that said. The absolute biggest wish on my wishlist for vRO has come true! To quote the announcement:

Multiple Scripting Languages: PowerShell, Node.js,Python. Support for multiple scripting languages have been added: PowerShell, Node.js, and Python. This makes vRealize Orchestrator more accessible and easier to use for non-JavaScript users. “

Finally Powershell will be directly available in vRO not requiring a complicated setup using a Windows host and all of the double hop authentication issues that arise from this. And to get Python as well! It’s almost Christmas!

I can’t and won’t go over all the announcements yesterday – other bloggers out there are already doing this and I’d like to give some credit to those working hard on this. For that reason I will point you all to Eric Siebert’s list of links to articles and annoucements regarding vSphere 7 and related releases.

Take look at the list here: http://vsphere-land.com/news/vsphere-7-0-link-o-rama.html

vRealize Orchestrator VC plugin version

I keep forgetting this to be a problem so might as well write it down for myself and anyone else stumbling upon this.

When using vRO, in my case 7.5 or 7.6, you might get a problem where you are unable to add a vCenter instance of a vCenter version 6.7. The error is not very informative:

It doesn’t really scream out what the error is. But as I had seen the error before I had a hunch when my colleague was configuring vRO in our vRealize Automation platform.

On the vRO VMTN forum there is a post that contains the latest release of the vRO VC plugin – https://communities.vmware.com/docs/DOC-32872

Simply download the zip attached. Unpack the vmoapp file. Login to the vRO control-center on https://<FQDNorIP>:8283/vco-controlcenter/ and select “Manage Plug-ins”. Here under “Install plug-in” click browse and select the vmoapp file and upload. Accept EULA and install. After about 2 minutes the vRO will have restarted and the plugin updated.

vCenter instances can now be added 🙂

Updated udp_client.py for testing UDP heartbeats

A while back i stumbled on a set of KB’s for testing UDP heartbeat connectivity between ESXi and vCenter. I wrote this article to describe how to do it.

Now today I had to do the same and went back to these KB’s to find the script. This was however on newer 6.5 U2 hosts and not old 5.5 hosts. And as KB1029919 describes it is only applicable to 4.0.x to 5.5.x versions of ESXi.

Why is this important? Because between ESXi 5.5.x and 6.5 U2 the included Python was updated from 2 to 3. Some of you may know that there are many breaking changes in Python 3 when compared to Python 2 and some of those were present in the original udp_client.py script.

So I took the time to fix the few issues that the script had and upload a version to GitHub here. In the Python folder there is a version of udp_client.py that is Python 3 compatible and I included the original script as udp_client-v2.py for reference.

The major changes were in line 25 that print is now a function and has to be used with parentheses and the “%” change to “,” as seen here:

original:
print "\nSent %s UDP packets over port %s to %s" % (numtimes,port,host) 

python 3:
print("\nSent %s UDP packets over port %s to %s", (numtimes, port, host)) 

After syntax error was fixed I found that there was a change to how “socket.sendto” works and it now expects a bytearray instead of a string. Simple fix was to introduce a int variable “datasize” set to 100 and change the “data” variable from “100” to “bytearray(datasize)” as seen here:

original:
data = "100" 

python 3:
datasize = 100       
data = bytearray(datasize) 

After this the scipt works on a 6.5 U2 host and I was able to UDP connectivity.

This also marks the first time I have my own public Github repsitory so – yay! 🙂

System logging not Configured on host

A few weeks ago I noticed a warning on some of our hosts in our HyperFlex clusters and wondered what was going on. It was only hitting Compute Only nodes in the clusters.

The warning is indicating that the Syslog.global.logDir is not set as per KB2006834. But when I looked via ssh on the host it was logging data and the config option was set so it was working – so why the warning?

Well it turns out to be something not that complicated to fix. The admin who set up the nodes set the option to:

[] /vmfs/volumes/<UUID>/logs/hostname

That is giving it an absolute path on the host like you would do with the ScratchConfig.ConfiguredScratchLocation option. This works but triggers the warning as if it was not set.

The fix is simple. Simply change it to use the DatastoreName notation as this:

[DatastoreName] logs/hostname

This immediately removed the warning and everything continued as it had before.

Migrating VMs to an older ESXi version

Hi all, just a short post about a small task I had on my desk last week. Customer needed to migrate a 3 servers from current provider to one of our older platforms. Few issues to overcome. First off only had VPN access to the provider and access to the ESXi 6.0 web UI that was running the VMs. So how to export them without downtime? No way to do something like a Veeam replication as I only had the VPN connection. Had to do an export-import scenario. Clone them? Requires vCenter. Hmm.

So what did I do? Well I asked nicely and was allowed to deploy a temporary VCSA onto the host and add the host itself to the vCenter. This allowed me to clone the 3 VMs (after removing a metric f***ton of snapshots). Then I could export the cloned VMs as OVFs to a machine in our network. It was lucky that I could do this and did not need to do the entire operation in a service window. The last copy of the machines were not needed as it was more of a “configuration copy” than anything else. So while the customers systems were running we could move the cloned VMs.

Now came the tricky part that I did not foresee but quickly identified! I exported from a vCSA 6.7 U1 and an ESXi 6.0 host. This makes a SHA256 signed OVF. Trying to import this to a 5.5 vCenter fails as the 5.5 vCenter does not support SHA256. OVFtool has a nice feature where you can convert the OVF from SHA256 to SHA1 by making a new package with the following command:

ovftool --shaAlgorithm=SHA1 .\path\to\file.ovf .\path\to\destionation.ova

Simple! Converts the OVF to an OVA with SHA1 instead of SHA256. Import now works. Wait not it did not! The machines are VMX-11 which does not run on ESXi 5.5. What to do. Recommended approaches are to use VMware converter to convert the VM and downgrade the hardware version. I took a slightly more simple but probably also more unsported route.

VMX version is defined in the OVF som it was simple to open the OVF file and locate the “SystemType” parameter and change it from “vmx-11” to “vmx-10”. This however breaks the manifest files SHA256 hash of the OVF file. This is simple to fix aswell using “certutil” on Windows. Following command generates a SHA256 has for a file:

certutil -hashfile .\path\to\file.ovf sha256

Simple replace the SHA256 thumbprint in the manifest file with the one generated above. Rerun the SHA1 conversion above and import now works. My colleague who needed the VMs converted reported back later that day and confirmed machines were booting fine and running as they should so he could continue reconfiguration of the machines.