April 14, 2011

Limits are evil

In my work as a consultant I meet many different VMware environments on a daily basis and from time to time I'm called out to troubleshoot performance issues. For troubleshooting such issues I've compiled a list of things to be aware on vmfaq.com: I need more performance out of my VMware environment

This article has been the most popular article on that site for quite some time. That list is not meant to be a final solution to all problems, but more like a quick list that will rule out the most common errors. Out of those 10 steps listed there, there is one thing that I've seen being the root cause for many issues that I've seen lately. This problem normally involve one or more servers that are having bad performance and even after the local vmware admin has tried to tune the systems with more ram and vcpus the performance is still bad.

Take a look at this screen shot:
As we can see the majority of this VMs memory has been Ballooned. Let's also take a look at the memory resource tab of this virtual machine:
This VM is setup with 4 gigabytes of ram, but has a limit set at 1000 megabytes (just below 1 Gigabyte). This VM probably started it's life with 1000 megabytes and was later given 4 gigabytes. While it had only 1000 megabytes of ram, performance was probably as good as you can expect from a VM with 1000 megabytes. Since it was given more memory, 1 gigabyte was probably not enough for it's workload. Increasing the memory to 4 gigabyte while the memory limit was still set at just below 1 Gigabyte would however not give the VM better performance. The guest OS would believe that it had 4 Gigs of ram, but in reality it had just as little as before. This means that the performance after the memory increase was actually worse than before, since the guest OS is more prone to internal swapping when it believes more memory is accessible than there really is. VMware swap was also active, but only 23MB.

After removing the memory limit we could quickly see how the balloon deflated quite quickly:

This step helped a lot for this VMs performance. How about other VMs on this system? Was there other VMs affected by this same problem? Yes, there sure was:
I'm pretty sure these limits have not been set by purpose and all the VMs affected are "old" VMs that have survived an upgrade or two. Some of them started their lives as physical servers. I'm actually not sure at what point these memory limits have been set, but this scenario is not unique. This is an issue I'm seeing on a regular basis for systems that has been around for a while. Some even have templates with limits configured, making it an issue also for newly deployed VMs.

These VMs often also have cpu limits set and also those should be removed. The cpu limits I have seen have  however affected the performance in a much lower degree than the memory limits. This mainly because the GHz of the cpus today are quite similar to the GHz of cpus five years ago.

The topic of this posting is "Limits are evil" and I think they really are evil when you don't know they are there. I can surely see the usefulness of limits in many situations as well, but there's a huge difference between doing something when it's a thought through planned setting and inheriting a setting that you wasn't aware of.

2 comments:

  1. The first screenshot seems to show there is no limit configured (limit = unlimited).

    ReplyDelete
  2. You're right, Matt. I think that picture was taken right after the limit had been removed since the balloon driver is still inflated.

    ReplyDelete