My First CCIE Lab Attempt

This is the unabridged version. The abridged version is available on LinkedIn.

From the Written…

I passed the CCIE Routing & Switching v5.1 written exam in August 2017. It was a huge moment for me, and felt like a great validation of the effort I had put in to reach that point. Around the same time, my work had some Cisco Learning Credits that were about to expire, and they let me use them to purchase the full self-paced Cisco Expert Level Training (CELT, formerly 360) package, which included 25 workbook labs and 15 graded labs. I had had the INE.com AAP that I paid for myself for a couple of years that I started my CCIE training on, but I personally found the labs included with CELT to be much better for preparation. Additionally, I confirmed later that the final few workbook and graded CELT labs are indeed very similar to the real thing.

Part of what stalled me when initially studying for the lab exam was the same thing that stalled me for the written exam: not knowing the true topic scope and depth. You really can’t know for sure until you take the real exam. As painful as the current price of $450 is for the written exam, the current $1600 lab exam fee is much worse just to be able to find out. I think Cisco (and other vendors) could do a much better job of outlining exactly what topics are going to be on the exams (and to what depth) if they’re going to charge so much to take them. The blueprint can be somewhat vague and occasionally exclusive, which makes focused studying difficult at first, though in my experience the feeling lessens as you become more familiar with the major topics.

Having access to CELT made a huge difference in my preparation because it helped narrow down both the scope and depth of topics. Both are still quite vast (as you would expect for an expert-level certification), but they no longer seemed unlimited. CELT includes introductory “lesson labs” which are mini labs that are more focused on a single technology area. While studying for the written exam, I labbed things up when I needed more clarification on a particular topic, but the reality is that I did not lab very much overall to pass the written exam. When I started studying for the lab exam, I quickly discovered that I had gained a lot of “book” knowledge, but actually implementing the technologies at the command-line is quite different.

In January 2018, I wrote about some of my thoughts and experiences studying for the lab up until that point in time. Soon after that post, I decided to release the 3500-card flashcard deck I created for myself while studying for the written exam. A couple months later I released the CCIE R&S topic mind-map I created in attempt to help narrow down the topic scope a little bit (this way the scope no longer seemed to be practically unlimited).

By this time, I had completed all of the CELT topic mini-labs along with the first five workbook labs. I was still taking things comparatively slow and essentially “dabbling” into the work. I had not performed any graded labs yet. Since you can only perform the graded labs a single time, my reasoning was that I wanted to go through all of the self-paced workbook labs first so that I would hopefully be more prepared for the graded labs.

Toward the end of April 2018, I was presented with the opportunity to attempt the lab exam as part of the CCIE mobile labs for Cisco Live in Orlando. I knew at that moment I was nowhere near ready for the lab. But, living only 30 minutes away from where the mobile lab was being held, I also knew it would be much easier (and less stressful) to take the lab here just to see what it was like, rather than having to make the nine-hour trip to RTP in North Carolina.

Preparing, Part 1

It has been said before that actually scheduling the lab provides a huge boost in motivation to study. That certainly proved to be true with me. The first thing I did was generate a calendar to plan out the studying I wanted to accomplish before attempting the lab, which included finishing the workbook and taking all of the graded CELT labs. I knew this plan was going to make things tight, and I had to take a total of five days off of work to fit everything in. I performed CELT workbook labs during work days (so I could afford interruptions), and uninterrupted graded labs (with the exception of my cat howling) on weekends and days that I took off. I instantly went from relatively light and unfocused study to laser-focused for the next month and a half leading up to my scheduled lab date.

I took the GP01 graded lab, which is a four-hour introductory lab that you’re supposed to take when you first start studying under the CELT program. This lab only covers the core topics, and even though it was my first graded lab, I did very well on it, about twice as high as the average score for that lab. It also served as my introduction to the CELT grading environment and seeing how things work (such as how the grading is performed). I felt great by how well I had done, but I reminded myself not to get overconfident, and that it was meant as just an introductory lab.

The next day I took my first full 8-hour graded lab (2hrs TS, 6hrs CFG). The first thing that surprised me was that I was able to make it through the whole day without nearly as much fatigue as I was expecting. Making it through the entire lab without being mentally exhausted in the middle gave me a lot of confidence for moving forward. I once again achieved a score that was quite a bit higher than the average score for that lab. This gave me even more confidence. However, the tides were about to turn, and the graded labs quickly became much more difficult.

As I completed more graded labs, I felt myself becoming much sharper and more capable. I felt the knowledge I gained studying for the written exam start to take hold in a much more meaningful way. For example, I knew the basic differences between DMVPN Phase 1, 2 and 3, and the basics of how IPsec works, but I had difficulty memorizing the configuration. This was remedied very quickly through the forced repetition of the graded labs. Through the configuration, it became clearer what the differences are with regard to the options available for many of the covered protocols and technologies.

The other major aspect of doing so much labbing is you get to see the interaction of the various protocols and technologies. For instance, if OSPF neighbors are configured as point-to-multipoint, what impact will that have on reachability once you leave the OSPF routing domain? In what ways does changing the STP port-priority or interface cost affect where individual ports will be blocked?

Preparing, Part 2

As I performed the graded labs, I went over the final results and made notes along the way. This was an extremely powerful tool because it not only helped demonstrate the criteria by which the labs are graded, but it also served as a reference point of mistakes and misunderstandings to learn from. For instance, both in the graded and real labs, you are not necessarily graded on actual implementation (configuration), but rather on the results. This is because for many tasks, there are multiple ways to achieve the results. Your configuration may be 100% correct, but you still lose points because you didn’t verify the results. An example that Bruno van de Werve (the current program manager for the CCIE R&S certification) gave is adding an MD5 password to an established BGP neighbor. Sure, your configuration might be correct, but if you don’t bounce the neighbor, the neighbor will not actually be authenticated, and you’ll lose those points.

I had reached a pretty high level of confidence on the labs I had done to this point. That was about to change as I performed the final few labs. With CELT, the vast majority of the labs are what I call “CCIE v4 style”. That is, v5 topics are covered, but using a v4-style topology (generally 6 routers and 4 switches) and task arrangement. The final workbook and graded labs are a decent approximation of the real lab, which features a much larger topology, and the task arrangement was also in line with how the real lab is.

I did not perform nearly as well on these labs. I believe a large part of this was due to the change in strategy required. I mentioned the task arrangement because the v5 style labs seemed to include many more tasks for fewer points to gain. In other words, there is more to do, and more opportunities to get things wrong. If you have a four-point task with fifteen sub-tasks and you mess up on any of them, you lose all four points.

I did my very best to adjust my strategy for these tasks, which included trying to pay much more attention to the little details. The v4 style labs were generally more difficult from a technical point of view (many more esoteric topics tested and tighter interaction between protocols), but the v5 style labs were much more difficult from a practical standpoint. I frequently finished the v4 style labs an hour or more early, but that wasn’t the case for the final graded labs. My confidence had taken quite a nosedive.

By the end of the weekend prior to taking the real lab, I had completed all of the graded labs and nearly all of the workbook labs. I was at the point of being able to complete the vast majority of the labs within Notepad without even touching the CLI. I performed better than the average score on every single graded lab, but “average” is not “passing”. I felt like all of the core topics were very much drilled into me, and most of the miscellaneous topics also made more sense. For example, I am much more comfortable now with multicast than I was before I started the intensive labbing.

I was originally going into the lab with the attitude of it being just an “investigative attempt”, meaning I knew I was not ready when I scheduled the attempt. At the midpoint between scheduling the lab and the actual lab date, I was feeling pretty confident that I might have a real chance of passing. That confidence started to go downhill as I completed the final graded labs, but I still remained hopeful. Maybe the real lab wouldn’t be as difficult as the graded labs?

Lab Day Part 1: TS

During the couple of days prior to the lab, I monitored Google Maps to see what the required travel time would be from my house to the hotel where the lab was being held. Even though I know the area very well, I made sure to add some extra time to my route to account for extra traffic due to Cisco Live taking place. Traffic in the morning ended up not being an issue, and I arrived over an hour early. But, better early than late!

As the lab start time approached, everyone gathered together outside the doors of the room where the lab was being held. The proctor introduced himself and accounted for everyone. He told us he is the proctor for the CCIE labs in Tokyo. He is a very nice and patient person, as I’ll detail shortly. One person was running late, so he provided a couple of extra minutes. When that person did not show up, we went inside the room, the doors were shut, and the proctor provided some more introductory information about policies and expectations.

The mobile lab consisted of around 14 tables arranged in a large square, each containing a small workstation sitting behind two 24″ monitors, a keyboard and a mouse. The workstations run a minimal and locked-down install of Windows, with access to Notepad, Calculator, and the locked-down web browser which the lab is performed through, including access to the official Cisco documentation. The lab interface itself has been demonstrated publicly, such as this presentation from Bruno van de Werve. Clicking on the routers and switches in the diagram inside the web browser opens up a new PuTTY session to the particular device.

The lab experience was similar to what I had encountered previously in the graded CELT labs, but not quite exact, so it took a little bit of time for me to adjust to the real lab quirks after spending so much time with the practice labs. Maybe it was because I was so used to it, but I preferred the CELT interface to that of the real lab, even though they are very similar.

I proceeded through the troubleshooting section. The first thing I did was open Notepad and made a section for each of the TS tickets, including the point value and what I quickly perceived to be the issue or general topic realm based on reading the description. While I was very good at ascertaining this information during my graded labs, I discovered I was incorrect on my initial assessment for about half of the tickets once I started to work on them. I worked on the topics I felt I knew the best first, and then cycled through the remaining tickets. It was my perception that the TS section on the lab was actually quite a bit harder than what I experienced on the graded labs.

Though I did not set any specific mental hard limits on time for each ticket, I did remain mindful of the clock, made notes and moved on when I felt like I was taking too long. After I had attempted all of the tickets, I circled back around to work on them individually some more. By the time I finished my final practice labs, I was in decent shape, usually finishing 30 minutes early or more. That didn’t happen for the real lab. As I was approaching the two-hour mark, I knew that I was right on the border of the minimum cut-score, if I had even crossed that line. You know the one or two technologies you feel you’re a little weak on? Those are the topics that are going to be all over your delivery of the lab exam!

I was starting to feel a little hopeless, but because I believed I was at least close to the minimum cut-score and still somehow might have a chance, that kept me going. I should say, if I knew for certain that I had failed TS and therefore the entire lab, I would have kept going just to get the experience, but I wouldn’t have taken it as seriously from that point onward. I spent a few more minutes trying to see if I could somehow solve at least one more ticket, thinking maybe there was just something small that I hadn’t seen and would have an a-ha! moment. But of course, those don’t usually come until the drive home after the lab is over. I was at the two hour and 10 minute mark, and I knew I was not going to gain anything from the remaining 20 minutes, so I progressed into the diagnostics section.

Lab Day Part 2: DIAG

Just like with TS, I felt like DIAG was a lot harder in the real lab than in the practice labs. For DIAG, I believe this feeling was due to the limited experience with the section. CELT only provides three of them in their graded labs, and none in the workbook. Of the scenarios I received in the lab, I was confident on my answers in some, and had to make educated guesses on the others as I was running out of time. Part of the difficulty of DIAG is the inclusion of unnecessary information along with the critical information. I feel like they did a good job on my delivery of the exam in making it difficult to distinguish between the two. As I was about to leave DIAG, I felt the same as I did about TS, knowing that I must be right on the line of the minimum cut-score. Unlike with TS, I did not click to go to the next section, I let my timer run out.

The configuration section did not load properly for me. I waited about 30 seconds, and I had a blank white screen with a couple of buttons to click, one that said “End Session”. You know where this is headed. For some reason, I thought it said “End Section”, thinking I was still in DIAG and had to click that button to proceed to CFG. I clicked it, and I was immediately logged out. I went to log back in, and my login did not work. Uh-oh!

I felt the wave of adrenaline go down my back, similar to when you lock yourself out of a remote device. I approached the proctor and explained what happened. He said “You clicked End Session?”. I nodded. He gritted his teeth and slowly shook his head and said “I’m sorry…that’s it.”. I performed some mental gymnastics trying to stay calm, and a few seconds later he told me to go back to my station and wait, and he’d contact someone to see if anything could be done. He then stood up next to me and said, “Everyone, make sure you do NOT click the ‘End Session’ button until you are ready to exit the lab completely. Once you click that button, that is the end.” How embarrassing! Now everyone in the room knew exactly what I did. It felt like a case of instant karma for my carelessness.

I returned to my station and imagined everyone looking at me in astonishment, though I knew in reality everyone was too busy concentrating and simply now knew for sure not to click that button. While I was waiting, I discovered that they made the Mouse control panel item available. I wish I had known that at the start of the lab, because the default mouse settings were much more sensitive than I was used to, and it kept throwing me off. I adjusted it more to my liking. A few minutes later, the proctor approached me and asked me to log in again. It didn’t work.

I waited a few seconds and tried again. This time it worked, and I was dropped into the configuration section. I now saw what I was expecting on my screen (a full diagram, etc.), which I did not see before this happened, so I still don’t know if there really was something wrong with my initial lab delivery. I didn’t see where my lab was being hosted before (I assumed San Jose, CA), but I saw now that my lab was being hosted in Toronto, Canada. I thanked the proctor most graciously and told him how much I appreciated his remedy. I will never know if they recovered my previous attempt and simply resumed it, or if this was a new second lab and they stitched the two of them together at the end. Regardless, I was now in the CFG section. The proctor also kindly granted me an extra 10 minutes to complete the section.

Lab Day Part 3: CFG

I started CFG as I had learned to do in the practice labs: I wrote down the major task topics for each section along with the point value for that subsection. I didn’t get too detailed or granular, I just wrote down things like “O HQ”, “E Main”, referring to OSPF in the HQ section and EIGRP in the Main section of the topology, respectively. For standalone items I just put the general topic such as “NTP” or “FHRP”. Like with the practice labs, this was just a way for me to check off items that I had attempted, finished, and verified. The topology is very large, and the tasks are not necessarily presented in the order they should be completed in, which makes having the checklist crucial to see what you’ve done and what you have left to do.

I then took about 10 minutes to read over and quickly analyze the entire lab. I was looking specifically for tasks that could be completed independently, tasks that needed to be done in a certain order, tasks that are standalone and do not depend on or affect any other tasks (stub tasks), along with a relative level of difficulty. I then started working on generating the configurations. As I had done in the practice labs, I performed as much configuration as possible in Notepad first, pasting the configs into the devices at the end of each sub-section and doing simple verifications as I went.

After an hour or so, we paused for lunch. I’ve heard many different stories about the CCIE lunch, so I was pleasantly surprised to find a catered lunch waiting for us in the next room. There was something available for just about every person’s taste, including vegetarian options. I am no stranger to eating more than I should, but I was really surprised at how many people were really loading up, especially on carb-heavy foods like cake and pie. I stuck primarily to the chicken and didn’t have any dessert, because I did not want to experience a 2pm carb crash.

We split into two different tables. We all spoke of how we felt we were doing so far, and what we were using for training to get where we are now. There were a couple of multi-CCIEs in the group that were attempting the lab for other tracks. There was also discussion of previous lab attempts at the regular lab locations. I tried to make myself look a little less foolish by explaining what had happened with me prematurely ending my lab. The proctor gave us a couple of minutes notice, and then we went back into the room to finish our labs.

I spent the next hour working as far as I could on the core tasks. I eventually reached a point where I knew I was going to have to put in significantly more effort to figure out how to configure the remaining core tasks correctly. There were also a couple of tasks remaining that were going to require a significant amount of configuration for very little payoff in points, but they would be required in order to complete other tasks. I took a mental break by moving onto some of the stub topics.

After doing what I could to grab some easy points, I returned to trying to finish the core tasks. I kept working as long as I could, but I eventually reached a point where I knew I was not going to be able to complete some future sections because I couldn’t figure out how to configure the prerequisite sections properly in the way they wanted. The prerequisite tasks were such that I didn’t even have any illegal options for a workaround. I had enough time, but I simply had not practiced enough with the particular technologies in order to configure things the way they needed to be, despite all of my effort over the previous month and a half. With about 20 minutes remaining (plus the extra 10 minutes I was granted), I knew that there were no additional points for me to gain in this lab, and that I had done all that I could.

I ended my session (on purpose this time), and quietly thanked the proctor before I left the lab. At this point in time, I knew I had failed the lab due to not making the overall cut-score, but I felt close to the minimum cut-score on all three sections, so I thought there was a chance I would get the dreaded PASS+PASS+PASS=FAIL score (which would still have been somewhat of a consolation prize). I walked to my car and started to head home. By this time Cisco Live was starting to ramp up, and the traffic reflected it. Even though I knew I failed, I felt like a huge weight had been lifted off of me.

Aftermath

The next morning, I received the official notice that I had failed the attempt. I did not meet the minimum cut-score on either TS or CFG, but I did pass DIAG, which surprised me since I had to guess at the end. I performed slightly worse in TS than I thought, and about what I expected in CFG. One thing about my CFG result that surprised me was that I didn’t do as well as I thought on the Layer 2 section. Layer 2 was the only section that I completely finished, and I thought I verified it too. All I can guess is there must have been something real small that I somehow missed, which unfortunately was a semi-frequent occurrence during my practice labs.

One of the things I found to be interesting was that overall, I was not as nervous during the lab as I originally thought I would be. I know this is very much due to taking all of the graded CELT labs just before the real lab. There are other factors involved as well, such as the fortune of being only 30 minutes away from the lab (a luxury I will not have next time), and not having my job depend on it (as was unfortunately the case for some of my friends). This has been entirely a personal goal, and it is difficult for me to imagine the stress of the exam being a professional requirement.

The day after was essentially the first day I had had off since I scheduled the lab toward the end of April (about 45 days between scheduling and attempt). My weekends and days I took off were the most intense, since I performed the full 8-hour graded labs on those days. The day after the lab, I relaxed and tried to start organizing my thoughts about the future. Something I did not anticipate was realizing just how worn out I was from the intense focus I had on studying, because I found it very difficult for a few days to think about anything!

I knew I just needed to give myself a little break, but a few days later I tried to get back into a routine of studying, and it just wasn’t happening. I wasn’t taking in the things I was trying to learn, and I felt like I couldn’t concentrate on anything at all. I was starting to wonder if somehow I broke my mind with this! I began to realize that when I was doing the graded labs, I had a defined path established, and it was easy to stick with that path, even though it was a lot of hard work and dedication. I still knew exactly where I was going by following it.

In studying for the CCIE, I put off a lot of other things that I want to learn at a deeper level. I was suddenly thinking about all of those things again, and I was trying to go in every direction simultaneously, which of course leads to nowhere. I decided I needed to make a new mind-map file to keep track of the things I want to learn. This was a good start to getting me back on track again mentally. Any time a new thought pops into my head regarding something to even consider learning, I can add it to this file in a hierarchical manner. This also lets me keep track of and prioritize what I feel is more important at the time.

Onward!

In the months between passing the written exam and attempting the lab, I have gone through just about every thought imaginable regarding the CCIE exam. Passing the CCIE lab exam legitimately is no joke and it requires serious dedication. I have analyzed multiple times all of the pros and cons of trying to obtain the certification from just about every perspective. Through it all, I never outright decided it was no longer worth it and that I will no longer work to obtain it, though I’ve come pretty close a few times in sheer frustration.

As it stands now, even though I did not pass this first attempt, the work I put in leading up to the attempt is something that absolutely changed my life and made me a better network engineer. There are design and implementation implications that I can now see more quickly just by scanning a topology that I was unable to do before. I more fully understand and appreciate the various options (nerd/wanker knobs) and have a much better idea of when you should and should not use them. After working through all of the labs, I have a much greater appreciation for simplicity and repeatability in network designs.

These are all things I knew about before, but the experience of actually labbing the complexity has made me extremely aware of using the simplest solution to solve the problem at hand. It is for these reasons that I am not at all regretful of the amount of effort I put into attempting the lab, even though I failed. I did feel great relief in knowing that I can now get back to my regular life for a period of time. I feel comfortable in the knowledge and experience I have gained because of this process, even if I have not yet obtained the trophy. I am grateful about how much I have learned about myself and what I am capable of, and that I discovered new tools and methods to learn things at a deep level along the way. These are things that will last me for the rest of my life, regardless of what the future holds.

For a few days after the lab, I was pondering whether what I have done so far is good enough. After all, there are many great network engineers who have no certifications whatsoever, let alone the CCIE. I’ve gained excellent experience that will carry me forward in my career. Yet I suspect there are a far greater number of senior-level network engineers who have passed at least one expert-level exam legitimately in their careers, whether or not they still maintain that certification. After weighing the decision for a few days, I purchased a voucher for my next attempt, which will occur sometime in the next 12 months.

I’m also going to move ahead with the service provider topics I want to study. In the days leading up to the lab attempt, I noticed something. Before the graded labs, I was pretty good at configuring the Layer 2 topics, and shortly into studying it became second nature. Within a couple of weeks, the same thing happened with the major IGPs. In the real lab, I had no problem rolling out configurations for Layer 2 and the IGPs. I still had to put a lot of time and consideration into configuring BGP and MPLS L3VPN. Since the service provider topics build on top of this, it occurred to me that if I study in that direction, BGP and MPLS L3VPN should eventually become second nature as well (especially BGP since it is the glue that holds everything together).

In the end, it wasn’t time or even necessarily strategy that caused me to fail the CCIE lab attempt. Having been through the entire CELT program, I also cannot say I received any tasks on the real lab for which I had no previous exposure (and there certainly wasn’t anything present that was not on the blueprint). It was my perception that the lab I received was slightly more difficult than any of the v5 style CELT labs, but not tremendously so. I need to become even better at the core topics, be better at more of the major miscellaneous topics, and I need to become even more cognizant of all the little details in the tasks. I am also very lucky to have a spouse who completely supports me in this endeavor, which really does make all the difference in the world, and I will always be grateful to her for it.

Onward to the next attempt!

Using Bridge Domain Interfaces on Cisco ASR-1K Routers

I am replacing an old Cisco 3945 router with a new ASR-1001X. The 3945, which has three gigabit Ethernet interfaces, has one connection to two service providers, and a single tagged link back to the network core carrying the traffic of a few different IP subnets. The ASR-1001X has six gigabit Ethernet interfaces, so when replacing the 3945 I wanted to introduce some redundancy into the network by utilizing two physical links back to the core, with each link going to a separate physical switch. This is a great use case for some kind of MLAG technology, but what if the upstream switches don’t support MLAG?

Bridge domain interfaces in IOS-XE can resolve this situation. BDIs are somewhat of a replacement for the old BVIs in classic IOS. However, BDIs are much more feature-rich and capable than BVIs, and have all kinds of extended use cases. Bridge domains are a part of the Ethernet Virtual Circuit functionality in IOS-XE, more fully described here.

For my current needs, I am going to be replacing BVI functionality with BDI. This allows for an IP address to be terminated on the router, while having both links available for failover in case one link goes down. Only one link at a time is usable due to spanning-tree, but a single link can fail with a minimum amount of downtime (on the order of a few seconds when using RSTP).

Enable STP processing and loopguard protection on the router with the following commands:

spanning-tree mode rapid-pvst
spanning-tree loopguard default

Loopguard isn’t strictly necessary, but can offer an additional layer of protection for your network.

[UPDATE: When I first wrote this post, I labbed it up in VIRL. When I went to deploy it on an actual ASR-1001X, the spanning-tree commands did not work. As I found out, this is because this functionality is not included in the ipbase license. You need advipservices or higher to follow these steps because you will need spanning-tree support to make this work. Without spanning-tree, the same MAC address is presented to both uplinks, and your upstream switch will experience MAC flapping because it sees the same MAC address on multiple ports simultaneously.]

The ports on the upstream switches are configured as standard 802.1Q tagged ports. The router is configured with service instances and bridge domains to classify and handle the incoming traffic. Here is an example configuration under a physical interface on the ASR-1001X:

interface g1
no ip address
service instance 100 ethernet
encapsulation dot1q 100
rewrite ingress tag pop 1 symmetric
bridge-domain 100
!
service instance 200 ethernet
encapsulation dot1q 200
rewrite ingress tag pop 1 symmetric
bridge-domain 200

The service instance command associates the physical interface with an Ethernet Virtual Circuit (EVC) construct in IOS-XE. The encapsulation dot1q command says that any frames received on the physical interface that carry that particular tag will be processed according to this service instance.

The rewrite ingress tag command (as configured in this example) will remove the VLAN tag before processing the frame further, since it is not necessary for this particular application of BDI. The ‘pop 1 symmetric’ portion of the command causes the router to remove the outer VLAN tag before it sends the frame to the BDI, and to re-introduce the VLAN tag as the frame moves from the BDI back to the physical interface. If you were performing QinQ, you could set the value to 2, for example.

Finally, the bridge-domain configuration specifies the BDI to use. In my example, I matched all of the numbers in each configuration stanza as a best practice for good configuration readability, but this is not a requirement. Each of the three values (service instance, dot1q tag, bridge-domain) are completely independent. This is to allow for more interesting bridging options within the realm of Ethernet Virtual Circuits.

You can use the exact same configuration on multiple interfaces, or you can specify that certain VLANs will only be processed on certain links. For example, you could configure a service instance for VLAN 300, and place it only on interface g2, and not on g1. You can additionally use per-VLAN spanning-tree values as a form of traffic engineering. For instance, you could either modify the per-VLAN spanning-tree cost on the router, or the port-priority on the upstream switch, to specify that under normal conditions, some VLANs use one link, and other VLANs use another link. Just be careful to not oversubscribe your links so that if there is a failure, all traffic can still be carried across the surviving link(s).

Finally, configure the BDIs:

interface BDI100
ip address 10.10.10.1 255.255.255.0
no shutdown

interface BDI200
ip address 10.20.20.1 255.255.255.0
no shutdown

You can use the command show spanning-tree vlan X to verify the redundant links from a STP point of view. Trying pinging a few addresses in the same subnets. You can troubleshoot connectivity with show bridge-domain X and show arp. The first command will reveal if the destination MAC was learned on a particular interface (similar to show mac-address table on a switch), and show arp will reveal if the ARP process was successful for a particular IP address. I had some interesting issues during configuration on virtual equipment for a lab proof-of-concept, and these commands helped isolate where the issue was. In the virtual case, simply rebooting the virtual router solved the issue.

Someone reading this might be critical of relying on STP for redundancy instead of using a modern solution like MLAG. This particular solution offers a level of redundancy that does not require MLAG. The tradeoff is a few seconds of dropped traffic if STP has to reconverge around a failed link. As with all things, the tradeoff primarily involves money, and using the resources you have available to solve business needs as best as you can. This solution still beats having a single physical link with no redundancy. Previously, if the single link failed, it would mean an immediate trip to the datacenter. With the new redundancy, a failed link still probably means a trip to the datacenter, but maybe not in the middle of the night.  😛

Automating Labs…Now With YAML and Multi-Threading!

The automation described in my last post had a couple of glaring flaws. I quickly discovered the inflexibility of using a CSV file for the data source as I started to add more variables to each device. The second flaw was that for approximately 30 devices, it took about 20 minutes to generate and push the device configurations, because each device was processed serially.

I solved the first issue by using a YAML file for the data source. I initially went with a CSV file because I had not yet developed an IP addressing scheme, and I found it easier to do that in a row-and-column format. However, as I was developing the Jinja2 template, it became apparent that the CSV file wasn’t going to cut it since each device has (or will have) customizations that won’t apply to all (or even a good portion) of the devices.

For example, I am configuring basic IS-IS routing for the service provider IGP, but the CE devices will not be running that protocol. The CE devices represent nearly half of my lab, so having IS-IS options within the CSV file seemed like a waste. This led me to think a little deeper about the information I wanted to represent for each device, and YAML’s immense flexibility seemed like the perfect fit. I would also consider using a SQLite database if I were dealing with hundreds or more devices.

The most time-consuming part of learning to work with YAML files in Python is discovering how to access your data. It’s very easy to write a YAML file, but it takes some thought and testing to get the data back out (which, like most things in life, I’m sure gets easier with more experience and exposure).

Here is an example device from my YAML file:

---
PE1:
  hostip: 192.168.196.22
  port: 32795
  interfaces:
    # Interface, IP, Mask, MPLS Enabled?
    - ['lo1', '10.255.1.1', '255.255.255.255', True]
    - ['g0/0', '10.1.81.1', '255.255.255.254', True]
    - ['g0/1', '10.3.11.1', '255.255.255.254', True]
    - ['g0/2', '10.3.12.1', '255.255.255.254', True]
    - ['g0/3', '10.3.13.1', '255.255.255.254', True]
    - ['g0/4', '10.1.71.1', '255.255.255.254', True]
    - ['g0/5', '10.1.11.1', '255.255.255.254']
  isis:
    net: 49.0001.0000.0000.0010.00
    interfaces: ['lo1', 'g0/0', 'g0/4', 'g0/5']
  bgpasn: 65000
  bgp_peers:
    # Peer IP, Peer ASN, Update Source, Next-Hop-Self
    - ['10.255.1.2', '65000', 'lo1', True]
    - ['10.255.1.3', '65000', 'lo1', True]
    - ['10.255.1.4', '65000', 'lo1', True]
    - ['10.255.1.5', '65000', 'lo1', True]
    - ['10.255.1.6', '65001']

This device is described by its management IP and port, IP interfaces, IS-IS and BGP options. If I were to configure another device and it was not going to run IS-IS, I would merely leave the isis: section out.

After the YAML file is imported, it is processed by my Jinja2 template. Here is an example:

hostname {{ host }}
no ip domain-lookup

{%- if isis %}
router isis
 net {{ isis['net'] }}
{%- endif %}

{%- for iface in interfaces %}
interface {{ iface[0] }}
 ip address {{ iface[1] }} {{ iface[2] }}
{%- if isis %}
{%- if iface[0] in isis['interfaces'] %}
 ip router isis
{%- endif %}
{%- endif %}
{%- if iface[3] %}
 mpls ip
{%- endif %}
 no shutdown
{%- endfor %}

{%- if bgpasn %}
router bgp {{ bgpasn }}
 {%- for peer in bgp_peers %}
 neighbor {{ peer[0] }} remote-as {{ peer[1] }}
 {%- if peer[2] %}
 neighbor {{ peer[0] }} update-source {{ peer[2] }}
 {%- endif %}
 {%- if peer[3] %}
 neighbor {{ peer[0] }} next-hop-self
 {%- endif %}
 {%- endfor %}
{%- endif %}

end

All devices will be configured with a hostname and the no ip domain-lookup option. If the device is going to run IS-IS, that is configured, and if not, that section is skipped. Each specified interface is then configured with its IP address and mask. If the interface will participate in IS-IS or MPLS, that is configured. If the router will participate in BGP, that is configured as well. This Jinja2 template shows a generic device, but as I displayed in my last post, this can easily be modified for individual devices as well (if device == ‘Whatever’). This template also demonstrates examples of nested looping, which takes a little bit of time to test and work out the logic. Once it clicks, though, it is a thing of beauty!

I solved the timing issue with the discovery of the multi-threading library for Python. In my lab configuration script, the YAML file is read into a Python dictionary. Then, for each device represented in the YAML file, I pass its variables into the multithreading function, which then calls my function to generate and push the configuration. Each device is effectively processed simultaneously, which cut the lab configuration generation and deployment from 20 minutes to less than one.

Here is my Python script to glue the YAML and Jinja2 files together:

#!/usr/bin/env python3
import yaml
import jinja2
import time
from netmiko import Netmiko
import threading

yaml_file = 'hosts.yml'
jinja_template = 'jtemp.j2'

# Generate the configurations and send it to the devices
def confgen(vars):
    # Generate configuration lines with Jinja2
    with open(jinja_template) as f:
        tfile = f.read()
    template = jinja2.Template(tfile)
    cfg_list = template.render(vars)

    # Connect directly to host via telnet on the specified port
    conn = Netmiko(host=vars['hostip'], device_type='cisco_ios_telnet', port=vars['port'])

    # Check if host is in initial config state
    conn.write_channel("\n")
    time.sleep(1)
    output = conn.read_channel()
    if 'initial configuration dialog' in output:
        conn.write_channel('no\n')
        time.sleep(1)

    # Send generated commands to host
    output = conn.enable()
    output = conn.send_config_set(cfg_list)

    # Display results
    print('-' * 80)
    print('\nConfiguration applied on ' + vars['host'] + ': \n\n' + output)
    print('-' * 80)

    # Probably a good idea
    conn.disconnect()

# Parse the YAML file
with open(yaml_file) as f:
    read_yaml = yaml.load(f)  # Converts YAML file to dictionary

# Take imported YAML dictionary and start multi-threaded configuration generation
for hosts, vars in read_yaml.items():
    # Add host to vars dictionary
    host = {'host': hosts}
    vars.update(host)

    # Send vars dictionary to confgen function using multi-threading, one thread per-host
    threads = threading.Thread(target=confgen, args=(vars,))
    threads.start()

Threads = threading.Thread. I love it!

Automating Labs with Python, Jinja2, and Netmiko

Following up on my last post, I have set out to start automating certain aspects of my labs. I spent a few days going over the material from Kirk Byers‘ highly-recommend Python for Network Engineers course. I studied on the previous version of his course a couple of years ago (covering Python2), but this new version, which covers Python3, is even better.

I came up with a generic topology that was purposely overengineered so that I can enable and disable links on-demand to create different logical topologies without having to interact with the physical lab topology. The lab represents a single service provider core network, multiple customer sites, and two SP-attached Internet connections. Most links will remain disabled for most lab scenarios, but are there for various cross-site, DIA and backdoor options available with this design.

To automate the baseline configuration, I created a Python script that imports the inventory from a CSV file, uses a Jinja2 template to generate the configuration for each device, and Netmiko to push the configuration to the devices. It’s kind of funny to succinctly place into a blog post something that took many hours to test and troubleshoot before coming up with the final version. The best part of gaining this kind of experience is that I can use what I have already done as a template moving forward, whether for the lab or for actual production.

The CSV file is straight-forward. The header row contains the variables for each device, such as the name, management IP, port, and interface IP addresses. Each subsequent row defines individual devices:


The Jinja2 template defines configurations for all devices, which gets populated with the individual variables, and covers device-specific configurations:

 

hostname {{ device }}

interface lo1
 ip address {{ lo1ip }} 255.255.255.255

{%- if ifg00ip %}
interface g0/0
 ip address {{ ifg00ip }} {{ ifg00mask }}
 no shutdown
{%- endif %}

{%- if device == 'P1' %}
int lo2
 ip address 2.2.2.2 255.255.255.255
{%- endif %}

With this example, every device is configured with the device-specific hostname. Every device is configured with a lo1 loopback address. If the device has an IP address configured for interface g0/0, the IP and mask are configured, along with making sure the interface is not shutdown. If the g0/0 IP address is not specified in the CSV file for this particular device, that configuration section is skipped. Likewise, the final section of the template will only be used if the device is ‘P1’. All other devices will skip this particular configuration section.

The Python script is the glue between the CSV file, configuration generation, and actual configuration deployment. The script imports the csv, jinja2, time and netmiko libraries. The script then defines variables for the CSV and Jinja2 files. Next, the CSV file is imported. The details of individual devices are placed into a dictionary, and each dictionary is placed into a list representing all devices. The script then generates the configuration for each device by feeding the details into the Jinja2 template. Netmiko is then used to send the output of the Jinja2 processing to the actual devices.

This kind of automation is perfect for the lab, because the CSV file represents certain baseline aspects that are not going to change, such as the IP addressing of the links between all of the service provider ‘P’ routers. The Jinja2 template can then be modified for different lab scenarios, depending on how much configuration you want to build into the baseline, per-scenario. The script could even be expanded so that it selects a different Jinja2 template based on a menu of possible scenarios. This same type of scripting setup could be used on a production network to set up new sites or push certain standardized configurations (such as enabling NetFlow on all devices). There are all kinds of possibilities.

Continue reading “Automating Labs with Python, Jinja2, and Netmiko”

Why Network Automation?

I have been wanting to get a little deeper into some various technologies surrounding MPLS and BGP-based VPNs (beyond basic L3VPN, such L2VPN, QoS, multicast, EVPN, etc.), so I assembled a virtual lab with approximately 30 routers which represent a service provider core and several “customer” sites, along with two sources of fake Internet connectivity (or more accurately, a simulated Default-Free Zone (DFZ)). After I earn a deeper understanding of topics within a single service provider core, I will expand this to inter-provider topics. Yes, I meant “earn”, since more work will be involved beyond just reading.

I was getting ready to develop an IP addressing scheme for the core network, and I realized I have a good opportunity here to get deeper into network automation. While studying for the CCIE R&S lab, I spend quite a lot of time in a text editor building configurations to review before pasting them into the device consoles. For tasks that involve repetitive configurations, copy-and-paste is my friend. You don’t (yet) have access to anything like Python or Ansible in the CCIE R&S lab to try to automate things (though I suppose you could use TCL if you really wanted to).

A good portion of setting up a large lab environment of any kind is developing and applying the baseline configuration. I’ve done this countless times over the years, and I was getting ready to do it yet again when it occurred to me that if I invest some time now to develop and use some network automation processes, the buildup and teardown of future labs will be so much quicker. I’ve dabbled with this in the past; I learned the basics of Python and have developed a few network-oriented scripts. I found I enjoy working through the logic and seeing the working results. I also developed and deployed a simple Ansible script to push out some configurations on my current production network.

I read the mostly-complete “rough cuts” version of Network Programmability and Automation by Jason Edelman, Matt Oswalt, and Scott Lowe. This is a really fantastic book, and along with Russ White and Ethan Banksnew book, I consider it an absolute must-read for anyone wishing to progress their career in computer networking and establish a very strong set of foundational knowledge (I swear I’m not trying to name-drop, but I’ve read a LOT of networking books, and these really are toward the top of the list). When I read Network Programmability and Automation the first time, I used the knowledge as an overview of some of the things that are possible within the realm of network automation. Now I’m going through it again to develop the skills necessary to automate the deployment of my lab configurations.

One thing I believe hinders many people wanting to dig deeper into automation (myself included), is having a use case. It’s easy enough to say that if you have to do a single task more than once, you should automate it. Automate all the things, right? There are two issues I see here: the underlying knowledge of what it is you’re trying to automate, and the ability to evaluate the “bigger picture”.

For example, within the context of networking, you could learn how to automate the deployment of complex configurations for a particular routing protocol, but what good is that going to do if you don’t fully understand what those configurations mean? Automation presents you with an opportunity to make mistakes much faster, and on a much more grand scale. If you automate configurations among several devices and things end up not working as anticipated, can you troubleshoot what went wrong?

Likewise, evaluating the bigger picture helps to understand where particular forms of automation are helpful, and where you will run into diminishing returns. For example, you could automate just about every process involved in network configuration, but nearly every business is going to have exceptions that need to be handled somehow, and automation may not be the answer in those instances.

Tying both of these concepts together, I realized the opportunity to automate the things I know extremely well due to my previous knowledge and experience, such as baseline configurations involving hostnames, IP addresses, basic routing protocol configuration, etc. Because I know how all of these things work very well, I can easily automate this as well as troubleshoot potential issues if things don’t go as expected. In the bigger picture aspect, the purpose of the lab is for me to understand other topics that use the baseline configuration as a prerequisite, and therefore I am not yet ready to automate those technologies because I do not yet have a full understanding of their nuances.

In other words, the more you learn, the more you can automate. You need to develop skills on how to automate things, but if you automate things you do not understand, you are setting yourself up for future frustration. Don’t let this discourage you from learning and getting deeper into both automation and classical network engineering skills. Increasingly, the two go hand-in-hand, but you can certainly end up in a chicken-or-the-egg scenario. My advice is to “earn” the networking skills first, and automate them second.