vExpert 2016!

Yay – again this year I was awarded vExpert and I am proud to be able to keep the title for another year.

Robert Jensen has made a nice list of all the Danes that were awarded this year – you can check it out here: http://www.robert-jensen.dk/2016/02/06/danish-vexperts-2016/

It feels great to be awarded again – really makes one want to work to continue to contribute as best as possible to the community.

The complete list of vExperts for 2016 is available here: http://blogs.vmware.com/vmtn/2016/02/vexpert-2016-award-announcement.html

 

Failed to get size of IP buffer error

Hello everyone

Just a brief post today. Back in the start of January we saw and older Server 2008 32-bit showing this error in the title. It would spam the alert in the event log of the server until it became inaccessible. Not much was to be found about the error but I did find this post from Alex575 who also saw the error in January.

As no answers had been made on the post I decided to follow it and try and work out a solution. We haven’t updated ESXi and Tools above 9359 since ESXi 5.5 U3 so I started thinking that maybe the new VMware Tools 10 package could solve the issues as the event log entries came from the Tools service (vmsvc).

We upgraded the servers tools version to 10245 (Version 10.0.5) and from crashing every 10 days it has as of yet not crashed (14 days and counting).

VMware Tools from Version 10 will ship outside of vSphere releases as blogged by Brian Graf here: https://blogs.vmware.com/vsphere/2015/09/vmware-tools-10-0-0-released.html

The 10.0.5 release can be downloaded here: https://my.vmware.com/group/vmware/details?downloadGroup=VMTOOLS1005&productId=491

vRealize Orchestrator 6.0.2.1 -> 7.0

Oh such end of the year content!

I set about updating our vRealize Orchestrator (vRO) appliance from 6.0.2.1 to 7.0 today to solve the recently released security issues (VMware Security Advisory ID: VMSA-2015-0008.1).

Easy update with the VAMI available but I quickly ran into this issue:

FailedUpgradeNot very informative – so looked at the updatecli.log file in the given location and it only told me that the pre and post installs had failed. Again not very informative. I looking into the vami.log file and saw that it had downloaded all the files and had made a file to mark a reboot required. So I thought – better try and reboot before starting the install again. This looked at first to work! But alas, the update just later threw this error:

FailedUpgrade2Will update post when I find solution!

Production Cluster Upgrade

During the spring of this year me and a few of my colleagues spent several months of meetings with storage solution providers and server hardware manufacturers to figure out if we should try out something new for our VMware production clusters. We had a budget for setting up a new cluster so we wanted to look at our options for trying something other than our traditional blade solutions we a spinning disk FC array which we have been using for years.

Some of the considerations we made regarding storage were that we wanted to start to leverage flash in some way or form to boost intense workloads. So the storage solution would need to use flash to accelerate IO. We also wanted to look at if server side flash could accelerate our IO as well. This lead us to the conclusion that we would like to avoid blades this time around. We would have more flexibility using rack servers with respect to more disk slots, PCIe expansions etc. Going with e.g. 1U server we would be sacrificing 6 additional rack units compared to 16 blades in a 10U blade chassis. Not huge in our infrastructure.

So we a bunch of different storage vendors, some new ones like Nimble Storage, Tintri, Pure Storage and some of the old guys like Hitachi and EMC. On the server side we talk to the regulars like Dell and HP but also Hitachi and Cisco.

All in all it was a great technically interesting spring and by summer we were ready to make our decision. In the end we decided to go with a known storage vendor but a new product. We chose a Hitachi VSP G200 as it in controller strength was on par with our existing HUS130 controllers but with smarter software and more cache. The configuration we went with was a tiered storage pool with a tier 1 layer consisting of 4 FMD 1.6TB in RAID10. This gives us 3.2TB Tier 1 storage and from the tests we have run – this tier is REALLY fast! The second and last tier is a large pool of 10K 1.2 TB disks for capacity. Totally we have just shy of 100TB of disk space on the array. It is setup so all new pages are written to the 10k layer but if data is hot it is migrated to the FMD layer within 30 seconds utilising Hitachi’s Active Flash technology. This feature takes some CPU cycles from the controller but from what we see right now this is a good trade off. We can grow to twice the size in capacity and performance as the configuration is at the moment so we should be safe for the duration of the arrays life.

On the server side we chose something new to us. We went with a rack server based Cisco UCS solution. A cluster consisting of 4x C220 M4 with 2x E5-2650V3 CPU’s and 384GB memory. We use a set of 10k disks in RAID1 for ESXi OS (yes we are very traditional and not very “Cisco UCS” like). The servers are equipped with 4x 10G in the form of a Cisco VIC 1227 MLOM and a Cisco VIC 1225 PCIe. As we were not really that hooked on setting up a SSD read cache (looking at vFlash for now) in production with out trying it we actually got a set of additional Cisco servers for some test environments. These are identical to the above but as some of my colleagues needed to test additional PCIe cards we went with C240 M4 instead for the additional PCIe slots. Two of these servers got a pair of 400GB SSD’s to test out vFlash. If it works we are moving those SSD’s to the production servers for use.

As I said we got the servers late summer and put the into production about 2½ months ago and boy we are not disappointed. Some of our workloads have experienced 20-50% improvements in performance. We ended up installing ESXi5.5 U3a and joining our existing 5.5 infrastructure due to time constraints. We are still working on getting vSphere 6.0 ready so hopefully that will happen in early spring next year.

We have made some interesting configurations on the Cisco UCS solution regarding the network adapters and vNic placement so I will throw up something later on how this was done. We also configured AD login using UserPrincipalName instead of sAMAccountName which was not in the documentation – stay tuned for that as well. And finally – have a nice Christmas all!

vRops 6.1 – follow up

Backup in September I wrote a piece when vRealize Operations Manager 6.1 was released. We were pretty excited about it because we were having a few issues with the 6.0.2 version we were running on. Among the problems we were having was vCenter SSO users suddenly not being able to login via the “All vCenters” option on the frontpage and selecting the individual vCenters to login to gave unpredictable results (logging in to vCenter A showed vCenter B’s inventory?!). We also had issues with alerts that we could not cancel – they would just keep piling up and about once a week I would shut the cluster down and start it again as it allowed me to cancel the alerts if I did it at the right time within 10-15 minutes after starting the cluster again.

However as you could also read we ran into an issue with 6.1 update and were forced to roll back and update to 6.0.3 that solved all issues but the login problem. But as we were the first to try an upgrade in production it took a while before a KB came out on the issue. I have had a to do item to write this up for a while so I can’t remember when the KB actually came out however it has not been updated for a month. The KB is 2133563 and notes that there is currently no resolution to the issue.

I recently spoke to a VMware employee who told me that the issue is in the xdb database and that the upgrade process is encountering something that either should not be in the xdb or that is missing. This causes the conversion from xdb to Cassandra to fail and the upgrade process to fail. I’m looking forward to seeing when a proper fix will come out.

We are closing in on the end of the year so I hope to be able to finish up a few blog articles before entering the new year – on the to do are a few items about vRA 7 and Cisco UCS with ESXi 5.5 and 6.

PowerCLI: Datastore Cluster and Tags

I was trying to help out a colleague yesterday when I realized that a quick fix to the problem would be to tag the datastore clusters in our environment and get them based on these tags instead of trying to determine which datastore cluster to choose when deploying a VM from PowerCLI.

So I decided to do this quickly and will show what I did (code snippets are from my vSphere 6.0 lab but the it is the same on our 5.5 production).

New-TagCategory -Name "CDC" -Cardinality Single -EntityType DatastoreCluster
New-Tag -Name "DC2" -Category CDC
Get-DatastoreCluster DatastoreCluster | New-TagAssignment -Tag "DC2"

Now I hope we can agree that I have created a new TagCategory that applies to Datastore Clusters and allows for one tag per object. We have also created a tag in this category called “DC2”. Lastly we have added the tag to the datastore cluster “DatastoreCluster”. Now if I run the following I get what I would expect:

C:\> Get-DatastoreCluster DatastoreCluster | Get-TagAssignment

Tag                                      Entity
---                                      ------
CDC/DC2                                  DatastoreCluster
C:\>

But if I run this I get something that I did not expect

C:\> Get-DatastoreCluster -Tag "DC2"
C:\>

This means that it is not working the same as for Virtual Machines with the “get-vm” cmdlet:

C:\> New-TagCategory -Name "VMTest" -Cardinality Single -EntityType VirtualMachine
Name                                     Cardinality Description
----                                     ----------- -----------
VMTest                                   Single
C:\> New-Tag -Name "Test" -Category "VMTest"
Name                           Category                       Description
----                           --------                       -----------
Test                           VMTest
C:\> Get-VM testvm01 | New-TagAssignment Test
Tag                                      Entity
---                                      ------
VMTest/Test                              testvm01
C:\> get-vm | Get-TagAssignment
Tag                                      Entity
---                                      ------
VMTest/Test                              testvm01
C:\> get-vm -Tag "Test"
Name                 PowerState Num CPUs MemoryGB
----                 ---------- -------- --------
testvm01             PoweredOff 1        4,000

So I do not know if this is the way it was meant to work but I is definitely not what I expected!

vRealize Operations 6.1 is out!

As of midnight danish local time vRealize Operations 6.1 is out! This is great as we have been waiting for this release to fix some issues we have been having with our environment running on 6.0.2. Last communication from VMware Technical Support a month ago was that our two remaining problems would be fixed in this release.

I’ve look through the list of fixes but did not see it directly so hoping they still made it 🙂

Release notes can be found here.

UPDATE: Upgrading the VA-OS pak file worked but applying the VA pak file failed to complete. The logs showed that it was the conversion from xDB to cassandra that failed. VMware tech support were fast today and recommended rollback and applying 6.0.3 instead until further diagnostics could be made on 6.1 -> apparently we were the first to submit a case on 6.1 install 🙂

vExpert 2015!

YAY!! Can’t really get my arms down yet. I was not sure I would make the cut this year but I did! So happy.

I was on vacation last week when the announcement of vExpert 2015 second half went out. A bit scared when I open the page and started scrolling only to realize that searching would probably be easier 🙂 So I did expecting not to find myself. But I did! So proud and humble that it happened and to me.

Now this announcement motivated to me to try and take my contributions a bit further. I will attempt to put out more content via this blog as often as possible and attending VMUGDK and trying to come up with more sessions to present. This is not my strongest side but it is a side that I believe I need to improve.

Thank you VMware for granting me this title! and thank you VMUGDK for the great danish VMware community!

Racktables: Datacenter Management

Hello All,

Today I will be doing a little write up about a piece of software that is not related to VMware. Shock! But something that is related to the infrastructure that you need to have your VMs running.

We have for a while now had our cable management and rack space management in several Excel worksheets. This is far from optimal! We ended up using Excel worksheets because coming from a decentralized IT environment to a central one, all of the locally used tools were not scaling to the size or use case we needed. So in lack of better a few Excel worksheets were thrown together “until a solution was found”.

These few Excel worksheets became more worksheets and those became even more worksheets. At some point you just get sick of worksheets! Trying to manage thousands of cables and ports in a worksheet and managing rack space the same way was not even remotely entertaining. We needed something more!

Racktables to the rescue!

A colleague of mine was setting up some new equipment and started wondering if there weren’t any simple free tools for handling this instead of the rack space worksheets. He stumbled upon Racktables and showed it to me. It looked promising but as no resources were allocated to finding a replacement for the worksheets getting traction on a new tool was close to impossible.

So as all great tools in an IT department – this started under the radar! I installed Racktables 0.20.8 and a few plugins (Link Management is a MUST) and started playing with it in the fall of 2014 and after adding a few devices and racks I showed it to my team lead. He was impressed but still no resources.

Later on my colleague had to do some documentation for a user outside of IT and decided to add all the user’s servers and network connections and used Link Management’s ObjectMap as part of the documentation. This was the first real use.

Months went by and at a few team meetings discussing documentation of our server rooms I mentioned the software and that it was ripe for use. My team leader was getting convinced slowly.

And the suddenly a few months ago he put one of our trainees on the job of moving everything from our rack space worksheets into Racktables with the goal of eliminating the worksheets. Yay! Finally some traction.

Our trainee input everything from the worksheets and then summer holidays hit. So as me being the young guy in the office I have been working for the first part of the main vacation weeks. I have been double checking the inputted data in Racktables and making updates to where needed.

During this time I have had a lot of talks with the architect in my team who has been working on eliminating our cable worksheets and I showed him what Racktables could do in this regard. Within a day of playing with it he was pretty much hooked and has been documenting large parts of our fibre infrastructure.

We have last week and this week been showing it to the team responsible for mounting and connecting devices in our infrastructure and they are also pretty impressed with what this simple tool can do.

The juicy stuff – what does it do!

Now I have been talking about how we started using this tool but now what can it actually do? From the front page of their site:

Racktables is a nifty and robust solution for datacenter and server room asset management. It helps document hardware assets, network addresses, space in racks, networks configuration and much much more! – http://racktables.org/

It has functionality for managing racks in different locations and rows and sizes. Objects that can be mounted in racks and connected to other objects. IPv4 and IPv6 address management. IP SLB and 802.1Q (VLAN) management. It can even track your virtual infrastructure and has a built in patch cable database for inventory.

From a programming perspective it is a highly modular yet implemented in quite a simple way building on a MySQL database. Its about ~35 PHP files and ~75 tables in the database.

Most all of the logic comes from the Dictionary in which you can define Object Types, server, switch, router, software etc models and even add your own types and sub types. Attributes can be attached to ObjectTypes to expand on the info of an object. It is possible to define Parent-Child relationships between object types and also defining port types, connector types and how they are compatible with each other.

Racktables comes with a lot of things defined in the dictionary by default and those are a great basis for starting out. You will probably soon realize that a lot of the objects you have don’t exist but with the simple setup it is VERY easy to add them.

Racktables can also be extended with plugins and there are built in integrations with Cacti and Munin if you use those tools.

There is also support for using CDP/LLDP and SNMP against switches to allow for auto-populating the objects with ports and connections – we have not used this feature.

I can only recommend this tool. It’s interface looks like something that should have been long gone but it just works – even on tablets and any platform and browser because no fancy Flash/Silverlight or Javascript pop in here and there. It’s simple, it’s easy and it just works.

And if you are asking about scalability? We have added over 600 objects in almost 70 racks and have made almost 1000 links between ports and no slowdown has been noticed so far –  and as it is simple just add more resources to your web server to handle the load 🙂

vROPS: the peculiar side

vROPS is running again in a crippled state with AD login issues, licensing issues and alert issues but at least it is showing me alerts and emailing me again.

While digging through vROPS today in a Webex with VMware Technical Support I stumbled upon an efficiency alert that I simply had to share.

In summary the image below shows me that if I somehow manage to reclaim this snapshot space I don’t think I will have any storage capacity problems for a considerable amount of time!

RidiculousRead again – that is almost 2.8 billion TB (or 2.8 zettabytes) of disk space! on a 400GB VM. How many snapshots would that even take to fill? By my estimates around 7 billion full snapshots that were fully written. I’m not sure that is within vSphere 5.5 configuration maximums for snapshots per VM.