December 16, 2016

QoS in Action

Quality of Service is an added-value network infrastructure service that is still very important within the scope of private networks. Some might argue that QoS is not as important as it once was as we start to see more SD-WAN deployments that utilize the general Internet for transport, because the Internet has no inherent QoS. Additionally, many private networks do not utilize QoS whatsoever, and their operators essentially just “hope for the best” as all the different types of traffic traverse the various links. This may be due to lack of awareness or training on the part of the operators, or it may simply be that the business has not placed enough value in its importance.

One of the ideas behind an SD-WAN deployment is that since the Internet does not offer QoS, you can attempt to circumvent this when using the Internet for transport by having multiple connections, ideally from different service providers, and monitor the end-to-end quality of the links through metrics such as bandwidth utilization, delay and jitter. A good SD-WAN solution will monitor the links, and could be configured perhaps to send voice and other delay- sensitive traffic over the link that is the least congested and/or has the lowest delay and jitter, while sending bulk data over a different link.

Even if you are using the general Internet for your transport, QoS may still be important if you consistently use all or the majority of your available bandwidth. You can’t control how your data will flow across the Internet after it leaves your private network, but you can control all aspects of your data until it reaches your private edge. One of the major benefits of using QoS is queuing/scheduling your traffic through classification and marking.

At a high level, you implement QoS by first classifying your traffic. This can be as simple as two classes, such as delay-sensitive traffic, and everything else. The most common model is four-class, and there is also a standardized eight-class model. Most networking equipment that supports QoS allows you to get even more granular, if you wish. You determine classes based on different characteristics such as the type of treatment or relative importance of the traffic. You can also simply classify the traffic based on the source or the destination (such as all traffic to or from a particular server).

After classifying traffic, actions can be taken on the different traffic classes, such as marking or specialized treatment. Classified traffic is often marked using CoS for Layer 2 (such as Ethernet), and DSCP for Layer 3 (IP). Layer 2 is considered a local marking, whereas DSCP can be carried across the entire IP network. For example, traffic coming from an IP phone may be marked as CoS 5 by the switch the phone is connected to. Then when the traffic crosses the first-hop router (which could very well be the same switch), the Layer 2 CoS marking may be mapped to DSCP “EF” at Layer 3. The DSCP marking may be ignored at various points in the network, but it will remain inside the packet header unless a network device purposely changes it. With QoS marked in the IP header, any devices along the path that processes IP packets can examine the header and possibly take action, such offering that particular packet different treatment.

The ultimate purpose of classifying and marking traffic is for queuing/ scheduling, which is the process of determining which traffic is sent first. Network interfaces will normally use FIFO (first-in, first-out) scheduling when the link is not congested. However, when the link is congested, traffic that has been classified and marked as more important can be scheduled to be sent first.

When using the Internet for transport, you can’t control the treatment of your most important data once it leaves your network, but you can make sure at your Internet edge that the most important traffic gets sent out before any other traffic does. This is one of the main reasons why QoS is as important as it ever was, even with SD-WAN solutions that use the Internet for transport.

QoS scheduling is also important when the data is transmitting from a higher- speed link to a lower-speed link. For example, a company’s data center will almost always have much higher WAN-facing bandwidth than a branch-office WAN link. QoS scheduling once again ensures that higher-priority traffic makes it to the branch WAN link first. For example, in MPLS L3VPN environments, the service provider can offer as a service (and usually for an extra fee) QoS capabilities. If your data center has a 1 Gbps pipe toward your MPLS WAN, but your branch office is on a 1.5 Mbps T1, subscribing to the service provider’s QoS service can ensure that when a large file is blasted out to the branch office, the VoIP traffic will still receive preferential treatment because it will be scheduled first as it leaves the service provider’s router on the other end of the T1.

Another aspect of QoS is policing and shaping. A service provider will often use policing to create “sub-rate” links. For example, the SP may provide for you a physical gigabit Ethernet link, but you may be only paying for 200 Mbps of service. The SP uses policing to turn the gigabit link into an effective 200 Mbps link by dropping any traffic that goes over the 200 Mbps mark. Policing is typically used on the ingress to a network. Conversely, shaping is typically used on the egress of a network. Shaping works by temporarily buffering excess traffic, and then transmitting it when possible, which helps to avoid dropping the traffic.

Policing can also be very useful within your private network to prevent a source of traffic from overwhelming a particular destination. For example, if you have a server in your data center that provides some kind of updates to the computers in your network (such as a WSUS server), you could use granular policing to prevent it from overwhelming just the slower branch office links during regular business hours, but still offer the full available capacity after hours.

As important as QoS is, I find it pretty amazing that it is not covered at all in the current Cisco CCNP R&S curriculum. It’s covered earlier under Cisco’s Collaboration, Wireless, and Service Provider tracks, but the general R&S track does not mention QoS at all until the CCIE level (as of this writing).

Getting into QoS can seem very daunting at first. Like most technologies (or sub-technologies), there’s a new lexicon to learn, and not everything may seem obvious at first. When I first started exploring QoS as part of reading the CCIE OCG a couple years ago, it did seem a bit overwhelming, and I felt that even though I could follow along and understand what I was reading while I read it, when I was done I wasn’t really able to retain what I had just read because at that point in time, I’d never experienced it for myself. Working at my current job has changed that, fortunately.

Like so many things, witnessing it in action (especially in production) and repeated exposure to books and documentation has helped to solidify the major concepts of QoS for me. Experience is great, and it really solidifies the things you learn when you study. But I am still a firm believer that you need to obtain the knowledge first (at least in the general sense), and then build the experience afterward. If I had not taken it upon myself to move past the CCNP R&S curriculum and explore the content within the scope of the CCIE, there are several things I would not even know about. Things like QoS, the service provider side of MPLS, and working with VRFs. These things represent tools in a toolbox, and knowing what tools you have to work with is the key to solving business problems and making you a success.

Update: I found out that QoS is indeed introduced now on the current (2015) CCNA exam. This is excellent news, and I would expect it to be covered somewhere on the next revision of the CCNP R&S.