September 7, 2018

"Status of other host hardware objects" on HPE Gen10 servers

Problem
On HPE Gen10 servers we have observed several error messages in hardware monitoring and VMware vCenter reports these as a critical problem.

Device IO Module 4 NIC_Link_01P4
This problem has been reported on both vSphere 6.5 and 6.7.

Solution
Update 21Nov2018: This issue has been fixed in ILO version 1.37: https://vmoller.dk/index.php/2018/11/17/lom-warning-in-vmware-on-hpe-gen10-servers/

It turned out that the servers that had this problem were using ILO version 1.30 and 1.35 while the servers that did not have this problem were using ILO 1.20. After downgrading ILO to version 1.20 the problem was resolved. Hopefully this will be fixed in a future version.


Downgrading ILO is a new feature of ILO 5:
The components can be organized in to install sets and can be used to rollback/Patch faulty firmware
Note that while you can upgrade ILO directly with the Update Firmware functionality, you can't downgrade it the same way. In order to be able to downgrade ILO you must upload it to the ILO Repository first. Once it has been uploaded you can downgrade the ILO firmware.







After having downgraded ILO you will need to go into each of the ESXi host's hardware status and press the reset sensors button and everything will be fine.

June 15, 2018

LLDP not available on Intel X710 running ESXi 6.5U1

Problem
While setting up 6.5U1 on new HPE DL380 Gen10 servers we could not get LLDP working. We had the error message "Link Layer Discovery Protocol is not available on this physical network adapter."

It looks like the X710 card is doing LLDP in hardware, but I'm not sure why that could be a problem.


Solution
The solution to this problem is as follows:

  1. Upgrade firmware the firmware as provided in Service Pack for ProLiant (SPP) Version 2018.03.0, where the intel firmware version 6.0.1 is provided.
  2. The X710 driver needs to be on version 1.5.6 which is available on the HPE Custom Image for ESXi 6.5U1
  3. You also need to run the following command on each host:  esxcli system module parameters set -m i40en -p LLDP=0,0,0,0    where the number of zeros is the number of x710 interfaces in your system.
  4. Reboot







October 24, 2017

Accept button greyed out when trying to update VCSA

Background

Updating to new minor versions of VCSA is really simple since all the update functionality is built into the web interface living on port 5480 of the vCenter Server, also known as VAMI (vCenter Appliance Management Interface).

Problem

When trying to upgrade from 6.5.0.10000 Build Number 5973321 to 6.5.0.10100 Build Number 6671409 I wasn't able to click the Accept button because it was greyed out. Clicking the EULA link several times was of no help.

Solution

Switching browser helped. I originally tried using Chrome version 61.0.3163.100. Switching to Firefox version 56 revealed the Accept button and I could proceed with the update..

September 10, 2017

Visibility of private VMware services on the public internet

Background

VMware services like ESXi hosts and vCenter are services you would normally place in your private networks. Preferably not in your average internal networks, but in your management network along with other services you provide management for. VMs on the other hand are placed in other networks like internal networks, DMZ networks and similar.

Results

By using Shodan I was able to find 4644 (probable) vCenter Servers (servers with the vsphere web client on port 9443):

With the same search engine it's also easy to find computers that are hosting VMs (ESXi, Workstation, Player, by looking for computers with VMware Authentication daemon (providing VNC) on port 902) and the number is quite astonishing:

Most of these systems are not identified by OS (only ~3k of ~200k), but I suspect that a big majority here is hosted products and not ESXi hosts. We can also tell by the version of the VMware Autherntication Daemon that some of the systems are dated with pre 2009 versions.





We can even search for the VMware Self Signed certificate that is installed by default by most VMware services:
By looking at the certificate information you're also able to either get the internal ip address or the local hostname of the service.

By monitoring these queries over some time I've observed that the number of systems reported are changing on a semi weekly basis by up to 20%. Some times up and sometimes down.

By using Richard Garsthagens tool https://github.com/AnykeyNL/vmware_scanner you can also reveal that many of these systems are very old.

Conclusion

That these systems are available on the internet may not seem like a big issue at the moment as things may seem to be working as expected. 

The main reason it is not recommended to expose these services is that this is the doorway to manage and control all of your virtual environment. All you need is a valid username and password. Those who have monitored the logs of internet exposed systems know that automated systems will try to login on a regular basis.  

We also know that even though some services are regarded safe and have no known security holes over many years they still may turn out with some hole at some point and can potentially give people access without a valid username and password.

Many of the systems exposed seem to be very old and we all know that is bad karma to leave an old unpatched system open to the internet.

August 18, 2016

Configuring the HPE 6125XLG Ethernet Blade Switch for use in a VMware environment - part 2

Background
Many people are using Flex Fabric for Ethernet (+FC) connectivity for their HP Blade environments. For better functionality and control we've chosen to use HPE 6125XLG blade switches instead and documenting how we achieved this. It's interesting to note that the 6125XLG is using the exact same hardware that is also used in the FlexFabric -20/40 F8.

Problem
I've found the documentation for the H3C line of switches is a bit confusing and some times wrong. Our switches are using a command set known as Comware7 while many examples are using Comware5.

Solution
We have configured our system with the following features:

  1. The switches are stacked and works as one big switch. See part 1 for a closer description.
  2. There are two 10GbE uplinks from each of these switches to two Cisco 6500 series switches.
  3. The trunk between the 6125XLGs and Cisco 6500 is setup with LACP.
  4. Spanning tree between switches is configured to RSTP
  5. CDP has been setup between switches and servers
  6. VMware ESXi is setup with distributed switch using LBT+NetIOC
  7. Logs are forwarded to logstash
  8. SNMP has been configured (for future use)
  9. NTP

There are two 6125XLG switches in the C7000 and each of the blades has one nic connected to each of these switches. The two switches has 4 10GbE ports connected to each other and these are normally used for stacking (IRF) and FCoE (you dedicate a pair for each). Each switch also has 8x 10GbE SFP+ ports and 4x 40GbE QSFP+ ports. It's recommended to use original HPE GBICs, but third party GBICs has also proven also work nicely. 
Logical view


1. Stacking

When you configure IRF you have 4 ports to choose from. You can either use two or four of these (you can dedicate two for FCoE if you need to). In this example we're using all four ports to aggregate the switches into one large one. In H3C language this is called Intelligent Resilient Framework.
 irf mac-address persistent timer
 irf auto-update enable
 undo irf link-delay
 irf member 1 priority 10
 irf member 2 priority 1

irf-port 1/1
 port group interface Ten-GigabitEthernet1/0/17
 port group interface Ten-GigabitEthernet1/0/18
 port group interface Ten-GigabitEthernet1/0/19
 port group interface Ten-GigabitEthernet1/0/20
#
irf-port 2/2
 port group interface Ten-GigabitEthernet2/0/17
 port group interface Ten-GigabitEthernet2/0/18
 port group interface Ten-GigabitEthernet2/0/19
 port group interface Ten-GigabitEthernet2/0/20
interface Ten-GigabitEthernet1/0/17
 description IRF
#
interface Ten-GigabitEthernet1/0/18
 description IRF
#
interface Ten-GigabitEthernet1/0/19
 description IRF
#
interface Ten-GigabitEthernet1/0/20
#
interface Ten-GigabitEthernet2/0/17
 description IRF
#
interface Ten-GigabitEthernet2/0/18
 description IRF
#
interface Ten-GigabitEthernet2/0/19
 description IRF
#
interface Ten-GigabitEthernet2/0/20
 description IRF
#

 2. Trunk ( stp, LACP, 4x 10GbE, CDP) 

On each of the two 6125 switches we establish a trunk facing the core Cisco switches. In our example we decided to use rstp for spanning tree. We use CDP instead of LLDP for our external facing interfaces.
 stp mode rstp
 stp global enable
#
interface Bridge-Aggregation1
 port link-type trunk
 port trunk permit vlan all
 link-aggregation mode dynamic


Interfaces on switch 1:
interface Ten-GigabitEthernet1/1/5
 port link-mode bridge
 description Trunk 6500
 port link-type trunk
 port trunk permit vlan all
 lldp compliance admin-status cdp txrx
 port link-aggregation group 1
#
interface Ten-GigabitEthernet1/1/6
 port link-mode bridge
 description Trunk 6500
 port link-type trunk
 port trunk permit vlan all
 lldp compliance admin-status cdp txrx
 port link-aggregation group 1
Interfaces on switch 2:
interface Ten-GigabitEthernet2/1/5
 port link-mode bridge
 description Trunk 6500
 port link-type trunk
 port trunk permit vlan all
 lldp compliance admin-status cdp txrx
 port link-aggregation group 1
#
interface Ten-GigabitEthernet2/1/6
 port link-mode bridge
 description Trunk 6500
 port link-type trunk
 port trunk permit vlan all
 lldp compliance admin-status cdp txrx
 port link-aggregation group 1
#

3. Interfaces facing ESXi hosts

Each of the ESXi hosts have a config for each of it's nics, one on each switch. Flow control is enabled by default on all ESXi nics so we also enable it on the switch. Since we are using LBT+NetIOC we are not using etherchannel / LACP on the ESXi ports (like most examples provided by HPe do).
interface Ten-GigabitEthernet1/0/1
 port link-mode bridge
 description xyz-esx-01
 port link-type trunk
 port trunk permit vlan all
 flow-control
 stp edged-port
 lldp compliance admin-status cdp txrx


interface Ten-GigabitEthernet2/0/1
 port link-mode bridge
 description xyz-esx-01
 port link-type trunk
 port trunk permit vlan all
 flow-control 
 stp edged-port
 lldp compliance admin-status cdp txrx
#

4. Management (clock, syslog, snmp,  ssh, ntp)


#
 clock timezone CET add 01:00:00
 clock summer-time CETDT 02:00:00 March last Sunday 03:00:00 October last Sunday 03:00:00
#
 info-center synchronous
 info-center logbuffer size 1024
 info-center loghost 10.20.30.40 port 20514
#
 snmp-agent
 snmp-agent local-engineid 800063A280BCEAFA031F8600000001
 snmp-agent community write privatecleartextpassword
 snmp-agent community read publiccleartextpassword
 snmp-agent sys-info version all
#
 ssh server enable
#
 ntp-service enable
 ntp-service unicast-server 1.2.3.4
#

Conclusion

Finding the right syntax that we needed to configure this switch was a bit challenging as many of the examples we found didn't work right out of the box since the command set is slightly different of different versions. After having overcome the initial obstruction we were able to configure the switch exactly as we needed. 


March 18, 2016

Configuring the HPE 6125XLG Ethernet Blade Switch for use in a VMware environment - part 1

Background
In a HPE C7000 blade system a common method of accessing the network is through Flex Fabric/Flex-10 modules. These modules are not fully featured switches, but still have some switch features built in. Another alternative is to use a real switch such as the HPE 6125XLG or Cisco Nexus B22HP FEX.

A real switch has many technical benefits over a FlexFabric system, but has a different approach for configuration than the FlexFabric (that has server admins as their main target and is often hated by people who know networking). The 6125XLG has a CLI that has a similar feel as IOS, but not as much as NXOS or ProCurve. The 6125XLG is the heritage of a cooperation between 3Com and Huawei that HPE bought a few years back and is often referred to as H3C and the CLI is referred to as Comware. It's a blade integrated switch with 10GbE facing the blade servers and both 10GbE (SFP+) and 40GbE (QSFP+) uplinks that can be used to connect to the network.


Problem
One problem I found while trying to configure this switch was the lack of good documentation. There is a lot of documentation available, but a lot of it is for Comware v5 while the 6125 uses Comware v7.  The 6125XLG Fundamentals Configuration Guide stated that it was important to use the command line class aux as part of the stacking (IRF) process, but this command was not available on my switches.
[HP]line class aux
^
% Unrecognized command found at '^' position.
It turned out that the firmware that came preinstalled had a bug that prevented you from stacking the two switches without the use of a RS232 cable. The HPE forums had many helpful posts, but posting there didn't provide me any answers from active users. I did however find a couple of blog posts that helped me going even though they didn't really provide a solution.

Solution
Upgrading the firmware of both switches from Release 2306 to Release 2422P01 before trying to do anything else solved this problem. The firmware upgrade is described at length in the firmware download package.  I chose to upload the firmware image to the switches using ftp. I could now stack my switches according to the Fundamentals Guide (and this HPE Support article: HP 6125g Switch Series - How to Configure Intelligent Resilient Framework (IRF).



February 16, 2016

Accessing the Global Knowledge labs from Ubuntu Linux

Background
While attending training I tried accessing the labs from my BYOD computer (Buy Your Own Device). I was warned before the training that the Global Knowledge labs were best working with an OS that supported Internet Explorer: "please use an Operating System that supports the Internet Explorer Browser. We have found that Mac Books do not work well when connecting to this environment".

Problem
I while back I was able to make the labs work from my personal Linux desktop, but it seems that the labs have been changed and my old method would not work anymore.
getaddrinfo: Name or service not known [17:22:43:165] [20836:1234650880] [INFO][com.freerdp.core.gateway.tsg] - TS Gateway Connection Success [17:22:44:030] [20836:1234650880] [ERROR][com.freerdp.core.capabilities] - expected PDU_TYPE_DEMAND_ACTIVE 0001, got 0007 [17:22:44:030] [20836:1234650880] [ERROR][com.freerdp.core] - ERRINFO_SERVER_INSUFFICIENT_PRIVILEGES (0x00000009):The user cannot connect to the server due to insufficient access privileges. [17:22:44:031] [20836:1234650880] [ERROR][com.freerdp.core.capabilities] - expected PDU_TYPE_DEMAND_ACTIVE 0001, got 0007 [17:22:44:047] [20836:1234650880] [ERROR][com.freerdp.core.rdp] - DisconnectProviderUltimatum: reason: 1

Solution
The solution was however quite simple. The Remote labs portal has information about accessing from a variety of devices. I've also got a document describing some NTLMv2 requirements. I used Firefox and logged in to the portal. When trying to connect I was offered to download an .rdp config file. I chose to save this file in the default location.

Logging in to the portal


Launch the Remote Labs!

Save file

Now I could use this file as an input to freerdp (version 1.20) and connect without problems by using the command:
xfreerdp cpub-vcloud-launcher-RemoteApps-CmsRdsh.rdp /d:gklabs /u:username /p:password  -nego