One of the nicest things about migrating the data center is we get a chance to start somewhat fresh and hopefully do things correctly (or rather, more “current” from a best practices perspective). As I’ve witnessed myself, network complexity builds up over time. Temporary “band-aid” fixes unintentionally become permanent solutions. The accumulation of all these one-off problems eventually leads to many complex interdependencies that may not reveal themselves until things start to unravel, as I experienced during the move.
The new data center network is based on a leaf/spine topology with Top of Rack switches connecting to a smaller, higher-speed core. The two core switches are connected together with a 2x40G EtherChannel, and the racks where the ToRs are present have two switches, each of which are redundantly connected back to both cores with 10G links.
The new switches are Cisco Nexus series, which I had not worked with until this point (another of the many “firsts”). After working with them for awhile, and discovering which commands had changed or are now implemented differently as compared to Catalyst, I actually enjoy NX-OS and prefer the way it handles most things in the configuration. It seems more logical to me in most cases. More firsts for me included working with 10 and 40 gigabit transceivers and optics, including 4x10G fiber optic breakout cables with MPO connectors, and fiber optic shelves.
Moving to the new data center was (and still is as I write this) a process of migration. While the bulk of the move happened on a single day, much of it has been in pieces both before and after the major portion. Planning for the entire operation has been ongoing for nearly a year. The major portion consisted of moving in the servers and SANs. However, the network had to be in place to support that connectivity within the new data center, as well as still provide connectivity to the old data center while we complete the migration.
A lot of planning went into the network portion, including having all of the switches pre-configured, the network cabling pre-labeled and, where possible, pre-wired and ready to plug into the devices for instant connectivity to the rest of the network, including the old data center. I created a spreadsheet containing all of the physical interconnections, as well as device ports, switch ports, and assigned VLANs, so that we would have a reference on move day and could trace every cable to every destination if we encountered connectivity issues. We had a support person from IBM on site with us all day during the move (for our iSeries systems), and at the end of the day he told me personally that he has been a part of many data center moves, and had never seen a move go as smoothly as this one did with all the pieces falling into place. I wanted to ask him if he would put that in writing so I could put it on my resume 🙂
We decided the best way to have seamless connectivity between the old data center and the new one during the migration was with a Layer 2 Metro-E trunk from a local ISP. This let us access the new data center as if everything were still in the same room. We made the two new core switches HSRP neighbors with the old core switches, which let us shift the active gateway for the various VLANs when we were ready to do so. During different parts of the migration, we did have some traffic tromboning, but it was well within our bandwidth limits and didn’t cause any issues (including delay, since the new data center is only about 20 miles away).
However, after the majority of the systems had been moved and were in place, we did encounter a networking issue that ended up causing me to have to run across town in the middle of the night to the new data center and power-cycle a switch.
Our newest iSeries systems were placed into production during the past few months in preparation for our move. Just like how newer hypervisors replace many physical servers, the new iSeries servers do the exact same thing. A single system that used to occupy multiple racks now fits in an eighth of a single rack.
However, instead of a few 10G links, these new servers went with the old model of many 1G links (16 per server, in fact). When these servers were placed into production, they connected directly to our Catalyst 6500 core, with eight links per server going to each switch. When I configured the network to support these new servers, I was asked to make each link an 802.1q trunk, and was not asked about bonding the links together.
Unfortunately, I did not think to ask about link bonding, which ended up causing my boss and I to spend a few hours tracking down the newly-discovered issue when we migrated the servers to the new Nexus switches. This ended up being an extremely interesting problem; one that I will never forget.