April 2, 2011

The wonderful Service Console

When I was first introduced to the Service Console back in the ESX 2.0 days (in 2003) I was delighted with what VMware had done. The Service Console is a linux (sort of) VM which is what you see when you boot up an ESX server. The environment you see is a linux environment, but the hardware seen from this environment is not the full blown one as it is running as a virtual machine with some special privileges that normal VMs don't have. This means that you can also access and configure the VMware (vmkernel) environment.  The MUI (management user interface) was also nice, but everything you saw of statistics in the MUI was derived from stats that you could find within the proc nodes of the Console OS, which was the name it was known by back then. It made sense that a GUI wasn't always as correct as the “real deal” where the raw numbers were sitting, so if you were in doubt of anything you could always double check it if you knew where to look. If you knew some basic unix scripting you could also in a short amount of time write a script that monitored what you where looking for.

When ESX 3.0 was introduced things had changed a bit. The Service Console still looked the same at first glance, but when you had a closer look you could tell that things had changed beyond the change of Linux version. These values that you could look at where still there, but they didn't make much sense. It turned out that /proc/vmware wasn't the main source of statistics anymore, as VMware had created their own stats interface called VSI(VMware SysInfo). The proc nodes within 3.0 was either a leftover from 2.0 that wasn't complete for 3.0 or it was a half done conversion interface from VSI. I don't know which one, but surely enough you couldn't rely on the proc nodes for anything anymore even though it was there. 

The reasons for abandoning the proc nodes may have been valid. The proc nodes is where all linux distros today and many other unix variants (aix, solaris, etc) present their performance numbers through. The resolution of stats within the proc nodes on linux is always 100Hz even though the system timer could be higher. The stats within the kernel could be of a higher resolution, but the numbers presented to applications are always 100ms accurate as per USER_HZ. The real system timer rate is not accessible to user applications because it should not be necessary for them to know the real timer rate (there exists patches that overrides this). The reason for this is compatibility reasons with normal linux tools such as ps, top, uptime, etc. This means that the accuracy of stats were only as +accurate as 100ms while the accuracy in the stats in the VSI interface was 1000ms. Wouldn't normally mean a lot on most stats, but more accurate graphs are of course for the win.

The main console tool for monitoring ESX performance is esxtop. Esxtop in version 3.x looked quite similar to the one in 2.x, but it had been rewritten from a different project than before. In esx 2.x, esxtop was derived from GNU code, while in 3.x it was now derived from BSD code. VMware had also extended it quite a bit since 2.x with various stats that they retrieved from VSI. The best tools for dumping values from VSI was however esxcfg-info that dumped all available stats in a given category. In 3.5 you could also choose the preferred formatting of the dumped stats if you wanted them in xml or perl table formats.

When troubleshooting performance issues on ESX you would normally start with the stats in viclient to get an overview, then look at esxtop, and for further details you could use esxcfg-info and vscsiStats for more detailed information. esxcfg-info and vscsiStats where new tools introduced to the Service Console in ESX 3.0 and 3.5.

There was never much documentationof the statistics of neither the values in the proc nodes or in esxcfg-info, but it was possible to  interpret what they were by comparing the stats you were looking for to the ones shown in esxtop and in the viclient. There was eventually some documentation provided by vmware on the esxtop stats (by performance guru Scott Drummonds).

The announcement of the death of the Service Console @ vmworld 2007 has been received with mixed feelings. In ESX 3 and newer you can do everything you need from the GUI and may never need to know that there's a Service Console available. They also introduced a console less hypervisor (ESXi) we can all see the advantages it brings. As the Service Console is a Redhat based linux install, it needs additional  patching which is in reality unrelated to VMware's core business of virtualization. By having a thin hypervisor of the ESX hosts you get a smaller attack surface and it should in theory also give you slightly better performance as you don't have the additional load of an extra linux VM running (even though it's fairly lightly loaded unless you've filled it with agents).

ESXi (or ESX 3i as it was known as upon announcement) was introduced to be the future successor of normal ESX, but without a Service Console. In ESX3 you can do everything that you need to configure and manage your ESX environment from the GUI (viclient), but troubleshooting issues is often easier done from the command line. They couldn't just stop shipping normal ESX with a console right away since many third party vendors have software agents that depend on the Service Console.

To provide command line access similar to the Service Console, VMware initially introduced the Remote CLI appliance. It was based on Debian 3.1 and wasn't that bad of an idea. It provided you a command line interface where you could run resxtop and similar commands to get statistics from an ESX host and you could also to a fair bit of other management related tasks. It did include a version of esxtop, but both esxcfg-info and vscsiStats were missing.  It also came with all of the developer tools a console oriented linux programmer needed to interface with a VMware environment. VMware also had their own debian repository available for this appliance so you could add the packages you needed similar to a normal debian distro. In the original Service Console, there was no online software repository, but the ESX install CD provided extra packages that you could install.

The Remote CLI appliance's software repository was suddenly gone and VMware introduced another appliance that had similar capabilities. The VIMA=Virtual Infrastructure Management Assistant. It was an appliance that included the same remote cli command set, but was still different. It did not include any development tools, it included a method of setting up a trust between the ESX servers and this appliance so that you didn't need to authenticate for each command (vi-fastpass). It also didn't have the ability to add any packages, but was based on RHEL 5.2 so you could take packages from RHEL risking breaking any support from VMware if you should need to call them.

In ESX4 this appliance has a new version and also a new name. It's now known as vMA 4.0, vSphere Management Assistant and is basicly the same appliance as VIMA 1.0, but with updated packages and you now also have the ability to setup a trust to the vCenter server in order to run commands on all of your ESX servers instead of setting up trusts to each ESX server in your environment. It still lacks both esxcfg-info and vscsiStats for deep level performance troubleshooting.

ESX4 still has the vmware proc nodes present, and whether they are a reliable source of information or not I'm not 100% sure. But I doubt they have done much with it since no software relies on it anymore and the core info lives in the VSI.

A new (official) feature that was introduced with ESX4 is thin provisioning. Thin provisioned disks only store the blocks that have data on them and is a good way of saving data while utilizing your storage subsystems better (not always performance wise, but atleast data wise). One should however be aware that when viewing a thin provisioned disk from the local Service Console, it will always show the full size of the disk, even when a fraction of the disk is allocated physically. To get the correct disk usage of this VM you will have to use the viclient (GUI) so things are now beginning to be the other way around from what we were used to in the “old days”. I don't know what other features that comes out wrongly in the console nowadays that are correct in viclient, but wouldn't be surprised if this wasn't the only one.

The death of the Service Console is indeed coming up. VMware have stated that it will die with the next release of ESX. I'm not sure if that means ESX4.5 or 5.0, but it surely will die soon enough and we already see that the focus is shifting in that direction. I hope that when that day comes, all of the functionality we are using today will be available through supported means on the new platform.

No comments:

Post a Comment