August 18, 2016

Configuring the HPE 6125XLG Ethernet Blade Switch for use in a VMware environment - part 2

Background
Many people are using Flex Fabric for Ethernet (+FC) connectivity for their HP Blade environments. For better functionality and control we've chosen to use HPE 6125XLG blade switches instead and documenting how we achieved this. It's interesting to note that the 6125XLG is using the exact same hardware that is also used in the FlexFabric -20/40 F8.

Problem
I've found the documentation for the H3C line of switches is a bit confusing and some times wrong. Our switches are using a command set known as Comware7 while many examples are using Comware5.

Solution
We have configured our system with the following features:

  1. The switches are stacked and works as one big switch. See part 1 for a closer description.
  2. There are two 10GbE uplinks from each of these switches to two Cisco 6500 series switches.
  3. The trunk between the 6125XLGs and Cisco 6500 is setup with LACP.
  4. Spanning tree between switches is configured to RSTP
  5. CDP has been setup between switches and servers
  6. VMware ESXi is setup with distributed switch using LBT+NetIOC
  7. Logs are forwarded to logstash
  8. SNMP has been configured (for future use)
  9. NTP

There are two 6125XLG switches in the C7000 and each of the blades has one nic connected to each of these switches. The two switches has 4 10GbE ports connected to each other and these are normally used for stacking (IRF) and FCoE (you dedicate a pair for each). Each switch also has 8x 10GbE SFP+ ports and 4x 40GbE QSFP+ ports. It's recommended to use original HPE GBICs, but third party GBICs has also proven also work nicely. 
Logical view


1. Stacking

When you configure IRF you have 4 ports to choose from. You can either use two or four of these (you can dedicate two for FCoE if you need to). In this example we're using all four ports to aggregate the switches into one large one. In H3C language this is called Intelligent Resilient Framework.
 irf mac-address persistent timer
 irf auto-update enable
 undo irf link-delay
 irf member 1 priority 10
 irf member 2 priority 1

irf-port 1/1
 port group interface Ten-GigabitEthernet1/0/17
 port group interface Ten-GigabitEthernet1/0/18
 port group interface Ten-GigabitEthernet1/0/19
 port group interface Ten-GigabitEthernet1/0/20
#
irf-port 2/2
 port group interface Ten-GigabitEthernet2/0/17
 port group interface Ten-GigabitEthernet2/0/18
 port group interface Ten-GigabitEthernet2/0/19
 port group interface Ten-GigabitEthernet2/0/20
interface Ten-GigabitEthernet1/0/17
 description IRF
#
interface Ten-GigabitEthernet1/0/18
 description IRF
#
interface Ten-GigabitEthernet1/0/19
 description IRF
#
interface Ten-GigabitEthernet1/0/20
#
interface Ten-GigabitEthernet2/0/17
 description IRF
#
interface Ten-GigabitEthernet2/0/18
 description IRF
#
interface Ten-GigabitEthernet2/0/19
 description IRF
#
interface Ten-GigabitEthernet2/0/20
 description IRF
#

 2. Trunk ( stp, LACP, 4x 10GbE, CDP) 

On each of the two 6125 switches we establish a trunk facing the core Cisco switches. In our example we decided to use rstp for spanning tree. We use CDP instead of LLDP for our external facing interfaces.
 stp mode rstp
 stp global enable
#
interface Bridge-Aggregation1
 port link-type trunk
 port trunk permit vlan all
 link-aggregation mode dynamic


Interfaces on switch 1:
interface Ten-GigabitEthernet1/1/5
 port link-mode bridge
 description Trunk 6500
 port link-type trunk
 port trunk permit vlan all
 lldp compliance admin-status cdp txrx
 port link-aggregation group 1
#
interface Ten-GigabitEthernet1/1/6
 port link-mode bridge
 description Trunk 6500
 port link-type trunk
 port trunk permit vlan all
 lldp compliance admin-status cdp txrx
 port link-aggregation group 1
Interfaces on switch 2:
interface Ten-GigabitEthernet2/1/5
 port link-mode bridge
 description Trunk 6500
 port link-type trunk
 port trunk permit vlan all
 lldp compliance admin-status cdp txrx
 port link-aggregation group 1
#
interface Ten-GigabitEthernet2/1/6
 port link-mode bridge
 description Trunk 6500
 port link-type trunk
 port trunk permit vlan all
 lldp compliance admin-status cdp txrx
 port link-aggregation group 1
#

3. Interfaces facing ESXi hosts

Each of the ESXi hosts have a config for each of it's nics, one on each switch. Flow control is enabled by default on all ESXi nics so we also enable it on the switch. Since we are using LBT+NetIOC we are not using etherchannel / LACP on the ESXi ports (like most examples provided by HPe do).
interface Ten-GigabitEthernet1/0/1
 port link-mode bridge
 description xyz-esx-01
 port link-type trunk
 port trunk permit vlan all
 flow-control
 stp edged-port
 lldp compliance admin-status cdp txrx


interface Ten-GigabitEthernet2/0/1
 port link-mode bridge
 description xyz-esx-01
 port link-type trunk
 port trunk permit vlan all
 flow-control 
 stp edged-port
 lldp compliance admin-status cdp txrx
#

4. Management (clock, syslog, snmp,  ssh, ntp)


#
 clock timezone CET add 01:00:00
 clock summer-time CETDT 02:00:00 March last Sunday 03:00:00 October last Sunday 03:00:00
#
 info-center synchronous
 info-center logbuffer size 1024
 info-center loghost 10.20.30.40 port 20514
#
 snmp-agent
 snmp-agent local-engineid 800063A280BCEAFA031F8600000001
 snmp-agent community write privatecleartextpassword
 snmp-agent community read publiccleartextpassword
 snmp-agent sys-info version all
#
 ssh server enable
#
 ntp-service enable
 ntp-service unicast-server 1.2.3.4
#

Conclusion

Finding the right syntax that we needed to configure this switch was a bit challenging as many of the examples we found didn't work right out of the box since the command set is slightly different of different versions. After having overcome the initial obstruction we were able to configure the switch exactly as we needed. In the next part we will see how the distributed switch was configured and how we imported the vlan config from cisco.  


March 18, 2016

Configuring the HPE 6125XLG Ethernet Blade Switch for use in a VMware environment - part 1

Background
In a HPE C7000 blade system a common method of accessing the network is through Flex Fabric/Flex-10 modules. These modules are not fully featured switches, but still have some switch features built in. Another alternative is to use a real switch such as the HPE 6125XLG or Cisco Nexus B22HP FEX.

A real switch has many technical benefits over a FlexFabric system, but has a different approach for configuration than the FlexFabric (that has server admins as their main target and is often hated by people who know networking). The 6125XLG has a CLI that has a similar feel as IOS, but not as much as NXOS or ProCurve. The 6125XLG is the heritage of a cooperation between 3Com and Huawei that HPE bought a few years back and is often referred to as H3C and the CLI is referred to as Comware. It's a blade integrated switch with 10GbE facing the blade servers and both 10GbE (SFP+) and 40GbE (QSFP+) uplinks that can be used to connect to the network.


Problem
One problem I found while trying to configure this switch was the lack of good documentation. There is a lot of documentation available, but a lot of it is for Comware v5 while the 6125 uses Comware v7.  The 6125XLG Fundamentals Configuration Guide stated that it was important to use the command line class aux as part of the stacking (IRF) process, but this command was not available on my switches.
[HP]line class aux
^
% Unrecognized command found at '^' position.
It turned out that the firmware that came preinstalled had a bug that prevented you from stacking the two switches without the use of a RS232 cable. The HPE forums had many helpful posts, but posting there didn't provide me any answers from active users. I did however find a couple of blog posts that helped me going even though they didn't really provide a solution.

Solution
Upgrading the firmware of both switches from Release 2306 to Release 2422P01 before trying to do anything else solved this problem. The firmware upgrade is described at length in the firmware download package.  I chose to upload the firmware image to the switches using ftp. I could now stack my switches according to the Fundamentals Guide (and this HPE Support article: HP 6125g Switch Series - How to Configure Intelligent Resilient Framework (IRF).



February 16, 2016

Accessing the Global Knowledge labs from Ubuntu Linux

Background
While attending training I tried accessing the labs from my BYOD computer (Buy Your Own Device). I was warned before the training that the Global Knowledge labs were best working with an OS that supported Internet Explorer: "please use an Operating System that supports the Internet Explorer Browser. We have found that Mac Books do not work well when connecting to this environment".

Problem
I while back I was able to make the labs work from my personal Linux desktop, but it seems that the labs have been changed and my old method would not work anymore.
getaddrinfo: Name or service not known [17:22:43:165] [20836:1234650880] [INFO][com.freerdp.core.gateway.tsg] - TS Gateway Connection Success [17:22:44:030] [20836:1234650880] [ERROR][com.freerdp.core.capabilities] - expected PDU_TYPE_DEMAND_ACTIVE 0001, got 0007 [17:22:44:030] [20836:1234650880] [ERROR][com.freerdp.core] - ERRINFO_SERVER_INSUFFICIENT_PRIVILEGES (0x00000009):The user cannot connect to the server due to insufficient access privileges. [17:22:44:031] [20836:1234650880] [ERROR][com.freerdp.core.capabilities] - expected PDU_TYPE_DEMAND_ACTIVE 0001, got 0007 [17:22:44:047] [20836:1234650880] [ERROR][com.freerdp.core.rdp] - DisconnectProviderUltimatum: reason: 1

Solution
The solution was however quite simple. The Remote labs portal has information about accessing from a variety of devices. I've also got a document describing some NTLMv2 requirements. I used Firefox and logged in to the portal. When trying to connect I was offered to download an .rdp config file. I chose to save this file in the default location.

Logging in to the portal


Launch the Remote Labs!

Save file

Now I could use this file as an input to freerdp (version 1.20) and connect without problems by using the command:
xfreerdp cpub-vcloud-launcher-RemoteApps-CmsRdsh.rdp /d:gklabs /u:username /p:password  -nego




December 14, 2015

SSO is not initialized

Background
After upgrading vCenter from 6.0 to 6.0U1 we had the vCenter GUI back. This HTML5 based GUI will allow you to manipulate certificates and several other things that you could only configure from appliancesh before U1.

Problem
After the upgrade we experienced an error message within this GUI: "SSO is not initialized". This system was running an external PSC and authentication was working nicely as it should do. We didn't quite understand why this error message was there.

Solution
We had a support case going on this problem for a few weeks. We were repeatedly told to repoint our SSO until they finally told us that this error message was in fact a bug: "...this is something that we are looking to rectify as this information should not be shown when using an external PSC.
Our Engineering department are aware of this are looking to make a graphical change to this.
With regards to your environment however, I can confirm that SSO is functioning correctly and you are not experiencing an issue with SSO at this time."


November 21, 2015

Replacing a vSAN caching disk

Background
Replacing disks in vSAN could be a bit less smooth than some of the traditional Storage Arrays. For normal disks used for storage it's quite easy, but disks used for caching it can be a slightly different story. If you get a dead caching disk you should remove it from the config before removing it physically from the server. Otherwise you will get the problems described in this posting.

Problem
Once the disk has been replaced you will be unable to delete the disk or the disk group both from the vSphere Web client and RVC. The reason this fails is that it can't find the disk. The disk will show up with a status of "Dead or Error" or "Absent" (depending on where you look)
.

"esxcli vsan storage list" will show all the other disks belonging to vsan on that server, but not the missing SSD disk.

Listing out the disks in RVC with the command vsan-host_info shows that the disk is in an Absent status:


Trying to use RVC with "vsan.host_wipe_vsan_disks -f" to remove the disk also fails:

Solution
A solution that did work in the end was to use partedUtil to remove the partitions of all spinning disks of this disk group. partedUtil is a very dangerous tool so if you have multiple disk groups on your host (like we had) you must make sure you're working with the correct disks. We found it best to locate the naa IDs of the failed disk group from the web client.

After removing both partitions of all the disks belonging to this disk group, the disk group was gone and we could create a new one where we were able to use our new SSD disk and all the spinning ones.

Appendum
The official way to solve thisproblem is to remove the disk from the pool while it's still present in the server. In our case that was not possible. The SSD disk had for some unknown reason entered "Foreign mode", which is a Dell disk controller feature. We had to enter the Perc controller BIOS settings (from POST), clear the Foreign Config and we also had to configure the disk in the controller config in order to use it again. Because of these things the disk came up with a new naa ID even though we didn't really have a failed disk.


March 23, 2015

vSphere AutoDeploy and Trend Micro Deep Security

Background
When researching online documentation to see if we could get Trend Micro Deep Security implemented in our VMware vSphere AutoDeploy environment, the only references we could find were a japanese blog posting and a japanese white paper. My language abilities is a bit limited, but I still found the screen shots valuable.

Overview
To get Deep Security working there are several components that needs to get fixed in a given order:
  1. Manually load vShield Endpoint driver on one of the ESXi hosts
  2. Update Host Profile based on ESXi host with vShield Endpoint driver
  3. Edit Host profile in order to get it working
  4. Create new ESXi image with Image builder that includes the vShield Endpoint driver and Trend Micro Filter driver
  5. Boot ESXi hosts from new ESXi Image
  6. Remediate new Host Profile for these hosts
  7. Deploy DSVA per ESXi host
Details
1. You need to use vShield Manager to install the vShield Endpoint driver. Note that the ESXi host should not be in maintenance mode when doing this. This may sound strange, but you'll get an error message after installing it if the host was in maintenance mode.


2. Go to host profiles and either create a new Host Profile based on Host, or update an existing Host based on the host you installed the driver on.
3. You need to edit the Host Profile. In addition to other tasks that needs to be done when a Host Profile has been updated from a host config, you now also need to make this new vShield based endpoint network work automatically. There are basically three things that needs to be done: Unselect a vShield Connection ID field, Don't get asked for a MAC address and Set a static ip address. This address is always 169.254.1.1 and is an internal  (host only) network on each host.


4. The following needs to be added to the VMware vSphere Image Builder script:


Add-EsxSoftwareDepot -DepotUrl "e:\vmware\drivers\vShield-Endpoint-Mux.zip"
Add-EsxSoftwareDepot -DepotUrl "e:\vmware\drivers\FilterDriver-ESX_5.0-9.5.3-2750.x86_64.zip"

Add-EsxSoftwarePackage -ImageProfile $imageprofile -SoftwarePackage epsec-mux
Add-EsxSoftwarePackage -ImageProfile $imageprofile -SoftwarePackage dvfilter-dsa
5. Activate the new image using the cmdlet Repair-DeployRuleSetCompliance
6. Remediate the host with the new Host Profile.
7. You can now see that the ESXi host has a prepared status and you can now start deploying DSVAs.

March 22, 2015

vSphere AutoDeploy and Apex 2800 cards

When reading through the Teradici documentation you can't find a single reference of neither Autodeploy nor Image Builder. The good news is that it does indeed work out of the box. All you need is to add a few lines to the image builder config:
....
Add-EsxSoftwareDepot -DepotUrl "e:\vmware\drivers\apex2800-rel-2.4.0.35302-esxi.5.5.0.zip"
Add-EsxSoftwarePackage -ImageProfile $imageprofile pcoip-ctrl
Add-EsxSoftwarePackage -ImageProfile $imageprofile tera2
.... 
You can now build the image like you normally do and the driver will load if there's an APEX card in the server.