Since it has been out for more than a year, and has been developed and improved tremendously during that time, I decided to finally take the plunge and buy a year’s subscription to the Cisco VIRL software. Until now, I have been using any combination of real hardware, CSR1000Vs, and IOL instances for studying and proof of concept testing.
My first impression of VIRL is that it is a BEAST of a VM with regards to CPU and RAM consumption. I installed it on my 16GB MacBook Pro first, and allocated 8GB to it. However, its use was very limited as I was unable to load more than a few nodes. I then moved it to my ESXi server, which is definitely more appropriate for this software in its current state.
I knew that the CSR1000Vs were fairly RAM hungry, but at the same time they are meant to be production routers, so that’s definitely a fair tradeoff for good performance. The IOSv nodes, while they do take up substantially less RAM, are still surprisingly resource intensive, especially with regards to CPU usage. I thought the IOSv nodes were going to be very similar to IOL nodes with regards to resource usage, but unfortunately, that is not yet the case.
I can run several tens of instances of IOL nodes on my MacBook Pro, and have all of them up and running in less than a minute, all in a VM with only 4GB of RAM. That is certainly not the case with IOSv. Even after getting the VIRL VM on ESXi tweaked, it still takes about two minutes for the IOSv instances to come up. Reloading (or doing a configure replace) on IOL takes seconds, whereas IOSv still takes about a minute or more. I know that in the grand scheme of things, a couple of minutes isn’t a big deal, especially if you compare it to reloading an actual physical router or switch, but it was still very surprising to me to see just how much of a performance and resource usage gap there is between IOL and IOSv.
Using all default settings, my experience of running VIRL on ESXi (after going through the lengthy install process) was better than on the MBP, but still not as good as I thought it should have been. The ESXi server I installed VIRL on has two Xeon E5520 CPUs, which are Nehalem chips that are each quad core with eight threads. The system also has 48GB of RAM. I have a few other VMs running that collectively use very little CPU during normal usage, and about 24 GB of RAM, leaving 24 GB for VIRL. I allocated 20GB to VIRL, and placed the VM on an SSD.
The largest share of CPU usage comes from booting the IOSv instances (and maybe the other node types as well). The issue is that upon every boot, a crypto process is run and the IOS image is verified. This pegs the CPU at 100% until the process completes. This is what contributes the most to the amount of time the IOSv node takes to finish booting, I believe. This may be improved quite a bit in newer generation CPUs.
When I first started, I assigned four cores to the VIRL VM. The IOSv instances would take 5-10 minutes to boot. Performing a configure replace took a minimum of five minutes. That was definitely unacceptable, especially when compared to the mere seconds of time it takes for IOL to do the same thing. I performed a few web searches and found some different things to try.
The first thing I did was increase the core count to eight. Since my server only has eight actual cores, I was a little hesitant to do this because of the other VMs I am running, but here is a case where I think HyperThreading may make a difference, since ESXi sees 16 logical cores. After setting the VM to eight cores, I noticed quite a big difference, and my other VMs did not appear to suffer from it. I then read another tweak about assigning proper affinity to the VM. Originally, the VM was presented with eight single-core CPUs. I then tried allocating it as a single eight-core CPU. The performance increased a little bit. I then allocated it properly as two quad-core CPUs (matching reality), and this was where I saw the biggest performance increase with regards to both boot time and overall responsiveness.
My ESXi server has eight cores running at 2.27 GHz each, and VMware sees an aggregate of 18.13 GHz. So, another tweak I performed was to set the VM CPU limit to 16 GHz, so that it could no longer take over the entire server. I also configured the memory so that it could not overcommit. It will not use more than the 20GB I have allocated to it. In the near future, I intend to upgrade my server from 48GB to 96GB, so that I can allocate 64GB to VIRL (it is going to be necessary when I start studying service provider topologies using XRv).
I should clarify and say that it still doesn’t run as well as I think it should, but it is definitely better after tweaking these settings. The Intel Xeon E5520 CPUs that are running in my server were released in the first quarter of 2009. That seven years ago, as of this writing. A LOT of improvements have been baked into Xeon CPUs since that time, so I have no doubt that much of the slowness I experienced would be alleviated with newer-generation CPUs.
I read a comment that said passing the CCIE lab was easier than getting VIRL set up on ESXi. I assure you, that is not the case. The VIRL team has great documentation on the initial ESXi setup, and with regards to that, it worked as it should have without anything extra from their instructions. However, as this post demonstrates, extra tweaks are needed to tune VIRL to your system. It is not a point-and-click install, but you don’t need to study for hundreds of hours to pass the installation, either.
VIRL is quite complex and has a lot of different components. It is expected that complex software needs to be tuned to your environment, as there is no way for them to plan in advance a turnkey solution for all environments. Reading over past comments from others, VIRL has improved quite dramatically in the past year, and I expect it will continue to do so, which will most likely include both increased performance and ease of deployment.