Using tags for something interesting

Hello all

Thought I would try and share a bit of what I have been working on internally for a while now. As we are an organization that is still does not have all processes and data management in place I have been working on rewriting an Python based page I hacked together with a colleague to list information about initially virtual machines but it grew to include DNS, physical machines, NAT relations as well as relying on ARP data to resolve MAC to IP where needed. I finished version 1.0 of this new system written in Django 1.10 a few weeks back and am already working on a 1.1 release with a few feature requests.

Now the system grew out of a need to identify people and relations around virtual machines in our infrastructure and today heavily relies on Tags in the web client. We were already using tags to include VM’s in backup jobs in Veeam (something I can highly recommend doing!).

So what and how do we do it? We use 5 tag categories with the names Owner, Department, SysAdm, AppAdm and Service. These 5 tag types give a lot of information directly on the VM on who to contact and what service the VM belongs to. This information is then on a daily basis exported to the new system in the form of a CSV file at the moment. This CSV file along with a file from our NAT device, on from the datacenter routers with ARP, a list of DNS records, a CSV file from our Racktables installation for physical machines where the 5 tag categories are also defined and a list of systems with the SCOM agent installed.

The data is then imported into a data model defined in Django’s Object-relational mapping that tries to correlate some of the information from the different files. The end result is a web page where all systems and DNS records in theory are listed and can be searched, filtered, sorted etc. Where one can find a system(physical or virtual) based on the IP, DNS record (A, CNAME or MX), name, type, OS, you name it. This has some of the characteristics of a CMDB or CMS but instead of showing what it should be it is showing what it actually is at the time of the latest export. We have used this to help ourselves in the Infrastructure department as well as allowing some supports to find information to help route incidents and service requests to the correct groups in our service management system.

Below I have included a blurred screenshot from the system to show one of the views defined. A cut out from a simple system list. Note that the last 5 columns are the tag categories from vSphere/Racktables with the danish names we chose for them.

So that is an example of how we use tags in our day to day operations. We still in some cases miss the old Custom attributes (I know they are still in the API but not exposed in the web client) with the option of inputting variable information like expiry dates etc. Having to do something like that with tags would in my opinion be a mess (imagine a tag for every single date of expiry).

A bonus of doing this you can actually correlate this information with information from vRops via e.g PowerCLI. As an example one could send an email based on an alert in vRops to the addresses set via the SysAdm tags.

VMworld Europe 2016 – Day 1

Early morning day 2 of my VMworld 2016 trip seems like the time to do a short recap of yesterday.

Yesterday started with the General Session keynote where Pat Gelsinger and several others presented the view from VMware. Amongst his points I found the following things most interesting:

  • THE buzzword is Digital Transformation
  • Everyone is looking at Traditional vs Digital business
  • However only about 20% of companies are actively looking at doing this. 80% are stuck behind in traditional IT and spend time optimizing predictable processes.
  • Digital Business is the new Industrial Revolution

In 2016 – 10 years ago AWS was launched. Back there were about 29 million workloads running in IT. 2% of that was in the cloud mostly due to Salesforce. 98% was in traditional IT. Skip 5 years ahead now we have 80 million workloads and 7% in public cloud and 6% in private. Remaining 87% still in traditional perhaps virtualized IT. This year we are talking 15% public and 12% private cloud and 73% traditional of 160 million workloads. Pat’s research time have set a specific time and date for when cloud will be 50% (both public and private). That date is June 29th 2021 at 15:57 CEST. We will have about 255 million workloads by then. In 2030 50% of all workloads will be in public clouds. The hosting market is going to keep growing.

Also the devices we are connecting will keep growing. By 2021 we will have 8.7 billion laptops, phones, tablets etc connected. But looking at IoT by Q1 2019 there will be more IoT devices connected than laptops and phones etc and by 2021 18 billion IoT devices will be online.

In 2011 at VMworld in Copenhagen (please come back soon 🙂 ) the SDDC was introduced by Raghu Raghuram. Today we have it and keep expanding on it. So with today vSphere 6.5 and Virtual San 6.5 were announced for release as well as VMware Cloud Foundation as a single SDDC package and VMware Cross Cloud Services for managing your mutliple clouds.

vSphere 6.5 brings a lot of interesting new additions and updates – look here at the announcement. Some of the most interesting features from my view:

  • Native VC HA features with and Active, Passive, witness setup
  • HTML 5 web client for most deployments.
  • Better Appliance management
  • Encryption of VM data
  • And the VCSA is moving from SLES to Photon.

Updates on vCenter and hosts can be found here and here.

I got to stop by a few vendors at the Solutions exchange aswell and talk about new products:

Cohesity:

I talk to Frank Brix at the Cohesity booth who gave me a quick demo and look at their backup product. Very interesting hyper converged backup system that includes backup software for almost all need use cases and it scales linearly. Built-in deduplication and the possibility of presenting NFS/CIFS out of the deduped storage. Definitely worth a look if your are reviewing your backup infrastructure.

HDS:

Got a quick demo on Vvols and how to use it on our VSP G200 including how to move from the old VMFS to Vvols instead. Very easy and smooth process. I also got an update on the UCP platform that now allows for integration with an existing vCenter infrastructure. Very nice feature guys!

Cisco:

I went by the Cisco booth and got a great talk with Darren Williams about the Hyperflex platform and how it can be used in practice. Again a very interesting hyper-converged product with great potential.

Open Nebula:

I stopped by at OpenNebula to look at their vOneCloud product as an alternative to vRealize Automation now that VMware removed it from vCloud Suite Standard. It looks like a nice product – saw OpenNebula during my education back in 2011 I think while it was still version 1 or 2. They have a lot of great features but not totally on par with vRealize Automation – at least yet.

Veeam:

Got a quick walkthrough of the Veeam 9.5 features as well as some talk about Veeam Agent for Windows and Linux. Very nice to see them move to physical servers but there is still some ways to go before the can talk over all backup jobs.

 

Now for Day 2’s General Session!

Disabling “One or more ports are experiencing network contention” alert

From day one of deploying vRealize Operations Manager 6.0 I had a bunch of errors in our environment on distributed virtual port group ports. They listed with the error:

One or more ports are experiencing network contention

Digging into the exact ports that were showing dropped packets resulted in nothing. The VMs connected to these ports were not registering any packet drops. Odd.

It took a while before any info came out but it apparently was a bug in the 6.0 code. I started following this thread on the VMware community boards and found that I was not alone in seeing this error. In our environment the error was also only present when logging in as the admin user. vCenter admin users were not seeing it so this pointed towards a cosmetic bug.

A KB article was released about the bug and that the alert can be disabled but it does not described exactly how to disable the alert. The alert is disabled by default in the 6.0.1 release but if you installed 6.0 and upgraded to 6.0.1 and did not reset all settings (as I did not do) there error is still there.

To remove the error login to the vROPS interface and navigate to Administration then Policy and lastly Policy Library as marked in the image below:

PolicyOnce in the Policy Library view select the active policy that is triggering the alert. For me it was Default Policy. Once selected click the pencil icon to Edit the profile as show below:

EditOn the Policy Editor click the 5. step – Override Alert / Symptom Definitions. In Alert Definitions click the drop-down next to Object Type, fold out vCenter Adapter and click vSphere Distributed Port Group. To alerts will now show. Next to the “One or more ports are experiencing…” alert click the error by State and select Local with the red circle with a cross as show below.

Local DisabledI had a few issues with clicking Save after this. Do not know exactly what fixed it but I had just logged in as admin when it worked. This disables the alert! Easy.

 

 

 

Since the last time

So it’s been a while since my last post, a lot has been going on. I have been through my first “employee performance interview” (Medarbejderudviklingssamtale or MUS in Danish). It was good and a lot of things were discussed in regard to this new organization. Some steps to increase my skills were also planned and I will get back to that later.

Since last time I attended VMWorld Europe 2013 in Barcelona! It was an awesome conference as always and I got a lot of new things with me home. One of the things I did different this year as compared to the previous two years was spend a lot more time on the Solutions Exchange. I focused primarily on storage vendors as I have taken fancy in new flash accelerated or all-flash storage systems. So I think I visited every booth with just the slightest connection to storage.

I also had the chance to discuss some of the new technologies coming out of VMware and also discuss the upgrade procedure for vSphere 5.5 when running with an SSO behind a load balancer. That was really useful and insightful and provided me with most of the information I need to perform an upgrade of the SSO and Web Client in our environment to vSphere 5.5 to relieve all the AD problems we have had. I will post a blog article on this later as there are still some hick-ups in the documentation and procedure that I need to test out and receive confirmation on from VMware support.

Our consolidation process has not been moving that much. Shortly after returning from Barcelona I took part in a live migration of VMs between our data center and a remote server room across a distance of about a kilometer. Without going into details about how everything was connected suffice to say that we had a single 10Gbit Ethernet connection between our one of our data center routers and one of the server room’s routers. We also had a single FC connection between a storage array in the data center and the blade chassis in the server room. This allowed us to evacuate a single blade in the server room and move it to an identical blade chassis after this we used another blade in the server room as the “transport host”. We vMotion VMs onto it as it could see both data center and server room storage arrays. The Storage vMotioned the VMs to the data center array and finally vMotion onto the host in the data center. Then one by one we evacuated all blades in the server room and moved them to the data center. The process took about 2 days including the move of a few physical hosts as well and was all in all very successful.

We had a single error during the move which caused an unexplained HA restart. The largest of the VMs (1TB storage spread on 4 different VMDKs) was set to change format to thin provisioned during the Storage vMotion. But at some point during migration we got an unexpected error (This was the actual message from the vSphere client). 30 seconds later HA spontaneously rebooted the VM even though Virtual Machine monitoring was disabled and the host didn’t crash. Luckily the VM handled the reboot well and it occurred close to midnight with no users online.

Right now my colleagues are planning the consolidation of two other VMware installation which will most likely be done with cold migrations. The amount of VMs is small and the fiber connections and licenses of these installations will not allow us to do a live migration. They are also planning a move similar to the one I worked on which we hope to complete some time in December. I am working on a cold migration of VMware installation as well where most of the VMs will be reinstalled on a new cluster rather than migrating them.

That was a status on what we are working on. Also back to the “I will get back to this later”. During the next month I will be working on a test installation of vCloud Automation Center to experiment with it and research if this is something we can use in our organization. The initial tests will be confined to the infrastructure department but if it works out it might be scaled up.