December 14, 2015

SSO is not initialized

Background
After upgrading vCenter from 6.0 to 6.0U1 we had the vCenter GUI back. This HTML5 based GUI will allow you to manipulate certificates and several other things that you could only configure from appliancesh before U1.

Problem
After the upgrade we experienced an error message within this GUI: "SSO is not initialized". This system was running an external PSC and authentication was working nicely as it should do. We didn't quite understand why this error message was there.

Solution
We had a support case going on this problem for a few weeks. We were repeatedly told to repoint our SSO until they finally told us that this error message was in fact a bug: "...this is something that we are looking to rectify as this information should not be shown when using an external PSC.
Our Engineering department are aware of this are looking to make a graphical change to this.
With regards to your environment however, I can confirm that SSO is functioning correctly and you are not experiencing an issue with SSO at this time."


November 21, 2015

Replacing a vSAN caching disk

Background
Replacing disks in vSAN could be a bit less smooth than some of the traditional Storage Arrays. For normal disks used for storage it's quite easy, but disks used for caching it can be a slightly different story. If you get a dead caching disk you should remove it from the config before removing it physically from the server. Otherwise you will get the problems described in this posting.

Problem
Once the disk has been replaced you will be unable to delete the disk or the disk group both from the vSphere Web client and RVC. The reason this fails is that it can't find the disk. The disk will show up with a status of "Dead or Error" or "Absent" (depending on where you look)
.

"esxcli vsan storage list" will show all the other disks belonging to vsan on that server, but not the missing SSD disk.

Listing out the disks in RVC with the command vsan-host_info shows that the disk is in an Absent status:


Trying to use RVC with "vsan.host_wipe_vsan_disks -f" to remove the disk also fails:

Solution
A solution that did work in the end was to use partedUtil to remove the partitions of all spinning disks of this disk group. partedUtil is a very dangerous tool so if you have multiple disk groups on your host (like we had) you must make sure you're working with the correct disks. We found it best to locate the naa IDs of the failed disk group from the web client.

After removing both partitions of all the disks belonging to this disk group, the disk group was gone and we could create a new one where we were able to use our new SSD disk and all the spinning ones.

Appendum
The official way to solve thisproblem is to remove the disk from the pool while it's still present in the server. In our case that was not possible. The SSD disk had for some unknown reason entered "Foreign mode", which is a Dell disk controller feature. We had to enter the Perc controller BIOS settings (from POST), clear the Foreign Config and we also had to configure the disk in the controller config in order to use it again. Because of these things the disk came up with a new naa ID even though we didn't really have a failed disk.


March 23, 2015

vSphere AutoDeploy and Trend Micro Deep Security

Background
When researching online documentation to see if we could get Trend Micro Deep Security implemented in our VMware vSphere AutoDeploy environment, the only references we could find were a japanese blog posting and a japanese white paper. My language abilities is a bit limited, but I still found the screen shots valuable.

Overview
To get Deep Security working there are several components that needs to get fixed in a given order:
  1. Manually load vShield Endpoint driver on one of the ESXi hosts
  2. Update Host Profile based on ESXi host with vShield Endpoint driver
  3. Edit Host profile in order to get it working
  4. Create new ESXi image with Image builder that includes the vShield Endpoint driver and Trend Micro Filter driver
  5. Boot ESXi hosts from new ESXi Image
  6. Remediate new Host Profile for these hosts
  7. Deploy DSVA per ESXi host
Details
1. You need to use vShield Manager to install the vShield Endpoint driver. Note that the ESXi host should not be in maintenance mode when doing this. This may sound strange, but you'll get an error message after installing it if the host was in maintenance mode.


2. Go to host profiles and either create a new Host Profile based on Host, or update an existing Host based on the host you installed the driver on.
3. You need to edit the Host Profile. In addition to other tasks that needs to be done when a Host Profile has been updated from a host config, you now also need to make this new vShield based endpoint network work automatically. There are basically three things that needs to be done: Unselect a vShield Connection ID field, Don't get asked for a MAC address and Set a static ip address. This address is always 169.254.1.1 and is an internal  (host only) network on each host.


4. The following needs to be added to the VMware vSphere Image Builder script:


Add-EsxSoftwareDepot -DepotUrl "e:\vmware\drivers\vShield-Endpoint-Mux.zip"
Add-EsxSoftwareDepot -DepotUrl "e:\vmware\drivers\FilterDriver-ESX_5.0-9.5.3-2750.x86_64.zip"

Add-EsxSoftwarePackage -ImageProfile $imageprofile -SoftwarePackage epsec-mux
Add-EsxSoftwarePackage -ImageProfile $imageprofile -SoftwarePackage dvfilter-dsa
5. Activate the new image using the cmdlet Repair-DeployRuleSetCompliance
6. Remediate the host with the new Host Profile.
7. You can now see that the ESXi host has a prepared status and you can now start deploying DSVAs.

March 22, 2015

vSphere AutoDeploy and Apex 2800 cards

When reading through the Teradici documentation you can't find a single reference of neither Autodeploy nor Image Builder. The good news is that it does indeed work out of the box. All you need is to add a few lines to the image builder config:
....
Add-EsxSoftwareDepot -DepotUrl "e:\vmware\drivers\apex2800-rel-2.4.0.35302-esxi.5.5.0.zip"
Add-EsxSoftwarePackage -ImageProfile $imageprofile pcoip-ctrl
Add-EsxSoftwarePackage -ImageProfile $imageprofile tera2
.... 
You can now build the image like you normally do and the driver will load if there's an APEX card in the server.



January 26, 2015

Bulk registering vSAN disks for controllers not supporting pass-through mode

When configuring VSAN the amount of initial setup time is highly dependent on the type of disk controller you're using. Some controllers support pass-through mode and will not need the additional configuration described in this posting.

If you however are using a controller such as the Dell PERC H710, you will first need to setup each disk in the RAID controller's BIOS; with every disk in it's own disk group where you enable write through, disable read ahead and select initialize.



After doing this you will see the individual disks within VMware vCenter under the esx host / manage / storage / storage controller / devices. The disks are however not detected correctly as the controller gives no information about the type of disks shared in these RAID 0s.

In order for vSAN to make sense of these disks you will need to create rules that specify what type of disks that are being used.

Spinning disk command:
esxcli storage nmp satp rule add --satp=VMW_SATP_LOCAL --device <device id> --option "enable_local"

SSD disk command: 
esxcli storage nmp satp rule add --satp=VMW_SATP_LOCAL --device <device id> --option "enable_local enable_ssd"

The device id in question here is the naa lun id. Some suggest that you use the command esxcli storage core device list, but in a system with many disks I've found it easier to filter out the needed info by using the command fdisk -l by identifying the disk types by looking at the disk sizes.

You can compile the list of naa lun ids for a given disk type and run the following commands:
for i in <paste list of spinning disk naa lun ids here>
do
esxcli storage nmp satp rule add --satp=VMW_SATP_LOCAL --device $i --option "enable_local"
done

for i in <paste list of ssd disk naa lun ids here>
do
esxcli storage nmp satp rule add --satp=VMW_SATP_LOCAL --device $i --option "enable_local enable_ssd"
done




You will now need to reboot the host for the new config to become active. Repeat these steps for all of your vSAN hosts and you'll soon be able to start configuring vSAN.

November 22, 2014

vSAN and HP 5400 switches

While setting up vSAN we found several guides for Cisco switches, but none for HP. Even the HP vSAN reference architecture was using Cisco Nexus switches.

We did initially see the error message: "Host cannot communicate with all other nodes in the VSAN enabled cluster" even though all vSAN enabled vmkernel interfaces could ping each other. vSAN has some special multicast requirements that needs to be taken care of.

We were trying to get HP 5400 series 10GbE switches to work with vSAN.

After playing around for a bit with the switch config we came up with the following working config:
vlan 53
   name "vSAN network 1"
   tagged C1-C8
   ip address 172.16.53.1 255.255.255.0
   ip igmp
   jumbo
   exit
Within a few minutes the error messages were gone, status went to Normal with a green icon and vSAN started working nicely.


Since we had 2x 10GbE nics dedicated to vSAN we also setup a secondary vlan for vSAN and bound each of the vlans to different nics in order to get maximum performance.

November 18, 2014

Accessing the GK Cloud Labs from Linux

Last week I attended vSAN training in Stockholm. The requirements for attending this class was that you needed to bring your own laptop with RDP capabilities.
When attending the class I discovered that there were a few extra things into this requirement. According to the class manual it required you to install an ActiveX component in Internet Explorer in order to get this working.

As I'm a Linux user they did of course not provide any info on how to do it, but that's part of the game I guess. In case I couldn't figure things out I could always start a Windows VM from within VMware Workstation. They did however provide info for Apple Macintosh users. By reading through the Mac docs I found what was really going on behind the scenes. The RDP session required a proxy config and encryption.

The standard Ubuntu RDP client didn't provide support for an RDP proxy, but I found an alternate client, called FreeRDP that I installed by following this HowTo.

I could now the access the labs by using the info from the login info sheet we had been provided with the following command:
xfreerdp  /v:cloud.labs.globalknowledge.net /d:gklabs /u:Wxxxx-Studentx-x /p:PassWord /g:gw1.labs.globalknowledge.net /w:1920 /h:1080 -nego
The connection now worked perfectly, even though it spent some time setting up the initial connection. Looks like it was trying to verify the certificate, even with the -nego switch that is supposed to tell it to ignore the certificate. Well, it does in fact ignore it in the sense you're not warned about a self signed certificate, but it still waits for it to time out before starting the connection.



All in all the training was a great experience, giving a better insight into vSAN than the HOL lab.