Using Bridge Domain Interfaces on Cisco ASR-1K Routers

I am replacing an old Cisco 3945 router with a new ASR-1001X. The 3945, which has three gigabit Ethernet interfaces, has one connection to two service providers, and a single tagged link back to the network core carrying the traffic of a few different IP subnets. The ASR-1001X has six gigabit Ethernet interfaces, so when replacing the 3945 I wanted to introduce some redundancy into the network by utilizing two physical links back to the core, with each link going to a separate physical switch. This is a great use case for some kind of MLAG technology, but what if the upstream switches don’t support MLAG?

Bridge domain interfaces in IOS-XE can resolve this situation. BDIs are somewhat of a replacement for the old BVIs in classic IOS. However, BDIs are much more feature-rich and capable than BVIs, and have all kinds of extended use cases. Bridge domains are a part of the Ethernet Virtual Circuit functionality in IOS-XE, more fully described here.

For my current needs, I am going to be replacing BVI functionality with BDI. This allows for an IP address to be terminated on the router, while having both links available for failover in case one link goes down. Only one link at a time is usable due to spanning-tree, but a single link can fail with a minimum amount of downtime (on the order of a few seconds when using RSTP).

Enable STP processing and loopguard protection on the router with the following commands:

spanning-tree mode rapid-pvst
spanning-tree loopguard default

Loopguard isn’t strictly necessary, but can offer an additional layer of protection for your network.

[UPDATE: When I first wrote this post, I labbed it up in VIRL. When I went to deploy it on an actual ASR-1001X, the spanning-tree commands did not work. As I found out, this is because this functionality is not included in the ipbase license. You need advipservices or higher to follow these steps because you will need spanning-tree support to make this work. Without spanning-tree, the same MAC address is presented to both uplinks, and your upstream switch will experience MAC flapping because it sees the same MAC address on multiple ports simultaneously.]

The ports on the upstream switches are configured as standard 802.1Q tagged ports. The router is configured with service instances and bridge domains to classify and handle the incoming traffic. Here is an example configuration under a physical interface on the ASR-1001X:

interface g1
no ip address
service instance 100 ethernet
encapsulation dot1q 100
rewrite ingress tag pop 1 symmetric
bridge-domain 100
!
service instance 200 ethernet
encapsulation dot1q 200
rewrite ingress tag pop 1 symmetric
bridge-domain 200

The service instance command associates the physical interface with an Ethernet Virtual Circuit (EVC) construct in IOS-XE. The encapsulation dot1q command says that any frames received on the physical interface that carry that particular tag will be processed according to this service instance.

The rewrite ingress tag command (as configured in this example) will remove the VLAN tag before processing the frame further, since it is not necessary for this particular application of BDI. The ‘pop 1 symmetric’ portion of the command causes the router to remove the outer VLAN tag before it sends the frame to the BDI, and to re-introduce the VLAN tag as the frame moves from the BDI back to the physical interface. If you were performing QinQ, you could set the value to 2, for example.

Finally, the bridge-domain configuration specifies the BDI to use. In my example, I matched all of the numbers in each configuration stanza as a best practice for good configuration readability, but this is not a requirement. Each of the three values (service instance, dot1q tag, bridge-domain) are completely independent. This is to allow for more interesting bridging options within the realm of Ethernet Virtual Circuits.

You can use the exact same configuration on multiple interfaces, or you can specify that certain VLANs will only be processed on certain links. For example, you could configure a service instance for VLAN 300, and place it only on interface g2, and not on g1. You can additionally use per-VLAN spanning-tree values as a form of traffic engineering. For instance, you could either modify the per-VLAN spanning-tree cost on the router, or the port-priority on the upstream switch, to specify that under normal conditions, some VLANs use one link, and other VLANs use another link. Just be careful to not oversubscribe your links so that if there is a failure, all traffic can still be carried across the surviving link(s).

Finally, configure the BDIs:

interface BDI100
ip address 10.10.10.1 255.255.255.0
no shutdown

interface BDI200
ip address 10.20.20.1 255.255.255.0
no shutdown

You can use the command show spanning-tree vlan X to verify the redundant links from a STP point of view. Trying pinging a few addresses in the same subnets. You can troubleshoot connectivity with show bridge-domain X and show arp. The first command will reveal if the destination MAC was learned on a particular interface (similar to show mac-address table on a switch), and show arp will reveal if the ARP process was successful for a particular IP address. I had some interesting issues during configuration on virtual equipment for a lab proof-of-concept, and these commands helped isolate where the issue was. In the virtual case, simply rebooting the virtual router solved the issue.

Someone reading this might be critical of relying on STP for redundancy instead of using a modern solution like MLAG. This particular solution offers a level of redundancy that does not require MLAG. The tradeoff is a few seconds of dropped traffic if STP has to reconverge around a failed link. As with all things, the tradeoff primarily involves money, and using the resources you have available to solve business needs as best as you can. This solution still beats having a single physical link with no redundancy. Previously, if the single link failed, it would mean an immediate trip to the datacenter. With the new redundancy, a failed link still probably means a trip to the datacenter, but maybe not in the middle of the night.  😛

Automating Labs…Now With YAML and Multi-Threading!

The automation described in my last post had a couple of glaring flaws. I quickly discovered the inflexibility of using a CSV file for the data source as I started to add more variables to each device. The second flaw was that for approximately 30 devices, it took about 20 minutes to generate and push the device configurations, because each device was processed serially.

I solved the first issue by using a YAML file for the data source. I initially went with a CSV file because I had not yet developed an IP addressing scheme, and I found it easier to do that in a row-and-column format. However, as I was developing the Jinja2 template, it became apparent that the CSV file wasn’t going to cut it since each device has (or will have) customizations that won’t apply to all (or even a good portion) of the devices.

For example, I am configuring basic IS-IS routing for the service provider IGP, but the CE devices will not be running that protocol. The CE devices represent nearly half of my lab, so having IS-IS options within the CSV file seemed like a waste. This led me to think a little deeper about the information I wanted to represent for each device, and YAML’s immense flexibility seemed like the perfect fit. I would also consider using a SQLite database if I were dealing with hundreds or more devices.

The most time-consuming part of learning to work with YAML files in Python is discovering how to access your data. It’s very easy to write a YAML file, but it takes some thought and testing to get the data back out (which, like most things in life, I’m sure gets easier with more experience and exposure).

Here is an example device from my YAML file:

---
PE1:
  hostip: 192.168.196.22
  port: 32795
  interfaces:
    # Interface, IP, Mask, MPLS Enabled?
    - ['lo1', '10.255.1.1', '255.255.255.255', True]
    - ['g0/0', '10.1.81.1', '255.255.255.254', True]
    - ['g0/1', '10.3.11.1', '255.255.255.254', True]
    - ['g0/2', '10.3.12.1', '255.255.255.254', True]
    - ['g0/3', '10.3.13.1', '255.255.255.254', True]
    - ['g0/4', '10.1.71.1', '255.255.255.254', True]
    - ['g0/5', '10.1.11.1', '255.255.255.254']
  isis:
    net: 49.0001.0000.0000.0010.00
    interfaces: ['lo1', 'g0/0', 'g0/4', 'g0/5']
  bgpasn: 65000
  bgp_peers:
    # Peer IP, Peer ASN, Update Source, Next-Hop-Self
    - ['10.255.1.2', '65000', 'lo1', True]
    - ['10.255.1.3', '65000', 'lo1', True]
    - ['10.255.1.4', '65000', 'lo1', True]
    - ['10.255.1.5', '65000', 'lo1', True]
    - ['10.255.1.6', '65001']

This device is described by its management IP and port, IP interfaces, IS-IS and BGP options. If I were to configure another device and it was not going to run IS-IS, I would merely leave the isis: section out.

After the YAML file is imported, it is processed by my Jinja2 template. Here is an example:

hostname {{ host }}
no ip domain-lookup

{%- if isis %}
router isis
 net {{ isis['net'] }}
{%- endif %}

{%- for iface in interfaces %}
interface {{ iface[0] }}
 ip address {{ iface[1] }} {{ iface[2] }}
{%- if isis %}
{%- if iface[0] in isis['interfaces'] %}
 ip router isis
{%- endif %}
{%- endif %}
{%- if iface[3] %}
 mpls ip
{%- endif %}
 no shutdown
{%- endfor %}

{%- if bgpasn %}
router bgp {{ bgpasn }}
 {%- for peer in bgp_peers %}
 neighbor {{ peer[0] }} remote-as {{ peer[1] }}
 {%- if peer[2] %}
 neighbor {{ peer[0] }} update-source {{ peer[2] }}
 {%- endif %}
 {%- if peer[3] %}
 neighbor {{ peer[0] }} next-hop-self
 {%- endif %}
 {%- endfor %}
{%- endif %}

end

All devices will be configured with a hostname and the no ip domain-lookup option. If the device is going to run IS-IS, that is configured, and if not, that section is skipped. Each specified interface is then configured with its IP address and mask. If the interface will participate in IS-IS or MPLS, that is configured. If the router will participate in BGP, that is configured as well. This Jinja2 template shows a generic device, but as I displayed in my last post, this can easily be modified for individual devices as well (if device == ‘Whatever’). This template also demonstrates examples of nested looping, which takes a little bit of time to test and work out the logic. Once it clicks, though, it is a thing of beauty!

I solved the timing issue with the discovery of the multi-threading library for Python. In my lab configuration script, the YAML file is read into a Python dictionary. Then, for each device represented in the YAML file, I pass its variables into the multithreading function, which then calls my function to generate and push the configuration. Each device is effectively processed simultaneously, which cut the lab configuration generation and deployment from 20 minutes to less than one.

Here is my Python script to glue the YAML and Jinja2 files together:

#!/usr/bin/env python3
import yaml
import jinja2
import time
from netmiko import Netmiko
import threading

yaml_file = 'hosts.yml'
jinja_template = 'jtemp.j2'

# Generate the configurations and send it to the devices
def confgen(vars):
    # Generate configuration lines with Jinja2
    with open(jinja_template) as f:
        tfile = f.read()
    template = jinja2.Template(tfile)
    cfg_list = template.render(vars)

    # Connect directly to host via telnet on the specified port
    conn = Netmiko(host=vars['hostip'], device_type='cisco_ios_telnet', port=vars['port'])

    # Check if host is in initial config state
    conn.write_channel("\n")
    time.sleep(1)
    output = conn.read_channel()
    if 'initial configuration dialog' in output:
        conn.write_channel('no\n')
        time.sleep(1)

    # Send generated commands to host
    output = conn.enable()
    output = conn.send_config_set(cfg_list)

    # Display results
    print('-' * 80)
    print('\nConfiguration applied on ' + vars['host'] + ': \n\n' + output)
    print('-' * 80)

    # Probably a good idea
    conn.disconnect()

# Parse the YAML file
with open(yaml_file) as f:
    read_yaml = yaml.load(f)  # Converts YAML file to dictionary

# Take imported YAML dictionary and start multi-threaded configuration generation
for hosts, vars in read_yaml.items():
    # Add host to vars dictionary
    host = {'host': hosts}
    vars.update(host)

    # Send vars dictionary to confgen function using multi-threading, one thread per-host
    threads = threading.Thread(target=confgen, args=(vars,))
    threads.start()

Threads = threading.Thread. I love it!

Automating Labs with Python, Jinja2, and Netmiko

Following up on my last post, I have set out to start automating certain aspects of my labs. I spent a few days going over the material from Kirk Byers‘ highly-recommend Python for Network Engineers course. I studied on the previous version of his course a couple of years ago (covering Python2), but this new version, which covers Python3, is even better.

I came up with a generic topology that was purposely overengineered so that I can enable and disable links on-demand to create different logical topologies without having to interact with the physical lab topology. The lab represents a single service provider core network, multiple customer sites, and two SP-attached Internet connections. Most links will remain disabled for most lab scenarios, but are there for various cross-site, DIA and backdoor options available with this design.

To automate the baseline configuration, I created a Python script that imports the inventory from a CSV file, uses a Jinja2 template to generate the configuration for each device, and Netmiko to push the configuration to the devices. It’s kind of funny to succinctly place into a blog post something that took many hours to test and troubleshoot before coming up with the final version. The best part of gaining this kind of experience is that I can use what I have already done as a template moving forward, whether for the lab or for actual production.

The CSV file is straight-forward. The header row contains the variables for each device, such as the name, management IP, port, and interface IP addresses. Each subsequent row defines individual devices:


The Jinja2 template defines configurations for all devices, which gets populated with the individual variables, and covers device-specific configurations:

 

hostname {{ device }}

interface lo1
 ip address {{ lo1ip }} 255.255.255.255

{%- if ifg00ip %}
interface g0/0
 ip address {{ ifg00ip }} {{ ifg00mask }}
 no shutdown
{%- endif %}

{%- if device == 'P1' %}
int lo2
 ip address 2.2.2.2 255.255.255.255
{%- endif %}

With this example, every device is configured with the device-specific hostname. Every device is configured with a lo1 loopback address. If the device has an IP address configured for interface g0/0, the IP and mask are configured, along with making sure the interface is not shutdown. If the g0/0 IP address is not specified in the CSV file for this particular device, that configuration section is skipped. Likewise, the final section of the template will only be used if the device is ‘P1’. All other devices will skip this particular configuration section.

The Python script is the glue between the CSV file, configuration generation, and actual configuration deployment. The script imports the csv, jinja2, time and netmiko libraries. The script then defines variables for the CSV and Jinja2 files. Next, the CSV file is imported. The details of individual devices are placed into a dictionary, and each dictionary is placed into a list representing all devices. The script then generates the configuration for each device by feeding the details into the Jinja2 template. Netmiko is then used to send the output of the Jinja2 processing to the actual devices.

This kind of automation is perfect for the lab, because the CSV file represents certain baseline aspects that are not going to change, such as the IP addressing of the links between all of the service provider ‘P’ routers. The Jinja2 template can then be modified for different lab scenarios, depending on how much configuration you want to build into the baseline, per-scenario. The script could even be expanded so that it selects a different Jinja2 template based on a menu of possible scenarios. This same type of scripting setup could be used on a production network to set up new sites or push certain standardized configurations (such as enabling NetFlow on all devices). There are all kinds of possibilities.

Continue reading “Automating Labs with Python, Jinja2, and Netmiko”

Why Network Automation?

I have been wanting to get a little deeper into some various technologies surrounding MPLS and BGP-based VPNs (beyond basic L3VPN, such L2VPN, QoS, multicast, EVPN, etc.), so I assembled a virtual lab with approximately 30 routers which represent a service provider core and several “customer” sites, along with two sources of fake Internet connectivity (or more accurately, a simulated Default-Free Zone (DFZ)). After I earn a deeper understanding of topics within a single service provider core, I will expand this to inter-provider topics. Yes, I meant “earn”, since more work will be involved beyond just reading.

I was getting ready to develop an IP addressing scheme for the core network, and I realized I have a good opportunity here to get deeper into network automation. While studying for the CCIE R&S lab, I spend quite a lot of time in a text editor building configurations to review before pasting them into the device consoles. For tasks that involve repetitive configurations, copy-and-paste is my friend. You don’t (yet) have access to anything like Python or Ansible in the CCIE R&S lab to try to automate things (though I suppose you could use TCL if you really wanted to).

A good portion of setting up a large lab environment of any kind is developing and applying the baseline configuration. I’ve done this countless times over the years, and I was getting ready to do it yet again when it occurred to me that if I invest some time now to develop and use some network automation processes, the buildup and teardown of future labs will be so much quicker. I’ve dabbled with this in the past; I learned the basics of Python and have developed a few network-oriented scripts. I found I enjoy working through the logic and seeing the working results. I also developed and deployed a simple Ansible script to push out some configurations on my current production network.

I read the mostly-complete “rough cuts” version of Network Programmability and Automation by Jason Edelman, Matt Oswalt, and Scott Lowe. This is a really fantastic book, and along with Russ White and Ethan Banksnew book, I consider it an absolute must-read for anyone wishing to progress their career in computer networking and establish a very strong set of foundational knowledge (I swear I’m not trying to name-drop, but I’ve read a LOT of networking books, and these really are toward the top of the list). When I read Network Programmability and Automation the first time, I used the knowledge as an overview of some of the things that are possible within the realm of network automation. Now I’m going through it again to develop the skills necessary to automate the deployment of my lab configurations.

One thing I believe hinders many people wanting to dig deeper into automation (myself included), is having a use case. It’s easy enough to say that if you have to do a single task more than once, you should automate it. Automate all the things, right? There are two issues I see here: the underlying knowledge of what it is you’re trying to automate, and the ability to evaluate the “bigger picture”.

For example, within the context of networking, you could learn how to automate the deployment of complex configurations for a particular routing protocol, but what good is that going to do if you don’t fully understand what those configurations mean? Automation presents you with an opportunity to make mistakes much faster, and on a much more grand scale. If you automate configurations among several devices and things end up not working as anticipated, can you troubleshoot what went wrong?

Likewise, evaluating the bigger picture helps to understand where particular forms of automation are helpful, and where you will run into diminishing returns. For example, you could automate just about every process involved in network configuration, but nearly every business is going to have exceptions that need to be handled somehow, and automation may not be the answer in those instances.

Tying both of these concepts together, I realized the opportunity to automate the things I know extremely well due to my previous knowledge and experience, such as baseline configurations involving hostnames, IP addresses, basic routing protocol configuration, etc. Because I know how all of these things work very well, I can easily automate this as well as troubleshoot potential issues if things don’t go as expected. In the bigger picture aspect, the purpose of the lab is for me to understand other topics that use the baseline configuration as a prerequisite, and therefore I am not yet ready to automate those technologies because I do not yet have a full understanding of their nuances.

In other words, the more you learn, the more you can automate. You need to develop skills on how to automate things, but if you automate things you do not understand, you are setting yourself up for future frustration. Don’t let this discourage you from learning and getting deeper into both automation and classical network engineering skills. Increasingly, the two go hand-in-hand, but you can certainly end up in a chicken-or-the-egg scenario. My advice is to “earn” the networking skills first, and automate them second.

Mind Map for CCIE & CCNP Routing & Switching

I created a mindmap of topics that are covered on the current Cisco CCIE RSv5 lab exam to help myself study, and I thought my work might be useful to the general network community as well. I included CCNP R&S in the title, because there’s a lot of overlapping information that I think most people pursuing the CCNP might find useful as well. I have covered a lot of topics by way of configuration examples that I remember struggling with a little bit when I was studying for the CCNP (wow, it’s been five years already!).

This document contains a hierarchy of topics with their associated Cisco IOS configuration syntax. These commands should work on most versions of classic IOS and IOS-XE, versions 15 and later. I tried to be as comprehensive as possible with regard to the covered topics referenced against the current CCIE R&S blueprint, however it is next to impossible to truly cover every configuration aspect within a single document, mostly because any given topic set (and more) may be covered in any specific delivery of the lab exam. In other words, you won’t really know until you get there.

This document provides the configuration syntax for nearly all topics covered. In many cases, examples and explanations are also provided, but not for every single topic (you still need to do your homework first!). Likewise, verification commands are generally not included here, because that would have easily doubled or tripled the size. For any topic you wish to know more about, I highly recommend looking at the official Cisco documentation. Most topics have both explanations as well as configuration examples there. Much of what is contained in this document was sourced from the official Cisco documentation.

This document is NOT:

  • A hand-holding guide through all CCIE topics
  • Any sort of answer key for any sort of specific lab scenarios
  • A comprehensive guide to every possible topic you might encounter on the lab exam
  • A replacement for pretty much any other form of studying

That being said, this document can serve as a good supplemental quick-reference to the vast majority of topics on both the CCNP and CCIE Routing & Switching exams. CCIE topics are limited to the lab blueprint only (I didn’t cover IS-IS, for example).

The original version of this document will always be available here. It was created with the excellent and highly-recommend MindNode software for the Mac. I have included the original MindNode file here, and several other formats as well. The original MindNode file lets you expand and collapse branches of the tree as desired.

Some browser plugins may have trouble viewing some of these files. If a file does not display properly in your browser, try downloading the file and opening it with a different application.

Don’t forget about my 3500 CCIE flashcard deck, and my blueprint documentation reference guide. If you found this to be useful, please let me know on Twitter or LinkedIn. Thanks!