And the Journey Continues…

I mentioned earlier this Summer that I would be reviewing the CCIE Routing and Switching (RS) v5.1 OCG books, although I had no intention to follow through with the RS v5.1 path. That still holds true today. However – I am going to sit the written in a few weeks because I have a voucher for it, so why not? Having never sat it before, I am not very confident in a pass. But a pass now just earns me the right to book the lab before February 24, 2020 when the new Enterprise Infrastructure v1.0 lab will roll out. There are plenty of RS candidates who have been studying for far longer than me that are vying for a seat to earn their number before the v5.1 lab is deprecated. Even if I pass the written, if would be an immense challenge to study up for current lab and try to pass it in the next 4 months. I do still want to continue down this path though.

I’m going to try and gamify this a bit to encourage me to study more. Basically – I want to update this blog every Sunday night detailing what I studied the previous week and what I intend to review over the following week. This will help track time invested, increase my buy-in, and hopefully encourage others along their journey (CCIE or another).

So far I have reviewed the following materials (time estimates are completely fabricated, trying to be as accurate as possible):

  1. CCIE RS v5.1 OCG Volume 1 – 40 hours
  2. CCIE RS v5.1 OCG Volume 2 – 35 hours
  3. Misc blogs, etc. – 5 hours

It is Monday while I write this, and I intend to spend the next week or so completing a Coursera course called “Learning How to Learn: Powerful mental tools to help you master tough subjects”. This is a free course and was brought to my attention while reviewing Tim McConnaughy’s CCIE journey over at carpe-dmvpn.com. Tim was a great speaker at Cisco Live US 2019, presenting an introduction to IP Multicast course (BRKIPM-1261). Tim suggests that developing learning techniques helped him study immensely, and I need to develop techniques to help retain what I am reading. Up soon I will be building a lab, I’ll post more on that as it occurs.

Cisco Catalyst 9500 Stackwise Virtual Link Requires 10Gb Links

I ran into a frustrating and not well documented feature of the Catalyst 9500s the other day, while setting up a pair of C9500-16X in a StackWise Virtual (SWV) configuration. For the uninitiated, StackWise Virtual is similar to Virtual Switching System (VSS). It allows two physical switches to run as one logical switch. This is helpful for distribution or core/distribution layer when you want multitple non-blocking L2 links to access layer switches. The active switch in the SWV pair process and sends all control traffic while the master forwards traffic in the data path and can take over as the active switch in the event of a switchover (failure of active switch, forced switchover due to code upgrade, etc).

In production, the StackWise Virtual Links (SVL) that connect the two switches would most likely be a pair of 10Gb links. I was trying to get a pair of C9500-16X configured with 1Gb links temporarily and ran into a confusing issue. I ensured that both switches were on the same license and code version, and applied the correct SWV configuration. The configuration is really dead simple – just define a stackwise-virtual domain # and then device the stackwise-virtual links. Optionally, a dual-active detection (DAD) link can be defined. Switch # and priority are defined like they are in a stackwise 480 (physical stacking cables).

When both switches were booted, the DAD link and 2x SVL links were flashed on both switches during the 120-second neighbor discovery period. Whichever switch booted first (even if by a fraction of a second) woudl eventually folly boot, then a strange sequence of events would occur:

1) Both SVL links would become err-disabled due to link flap.

*Jul 19 16:51:20.566: %PM-4-ERR_DISABLE: link-flap error detected on Te1/0/16, putting Te1/0/16 in err-disable state
*Jul 19 16:51:21.008: %PM-4-ERR_DISABLE: link-flap error detected on Te1/0/15, putting Te1/0/15 in err-disable state

2) Duak-active detection link will stay up.

Switch#sh stackwise-virtual dual-active-detection 
Dual-Active-Detection Configuration:
————————————-
SwitchDad portStatus
—————————
1 TenGigabitEthernet1/0/14  up  

3) All other links not related to SWV will become err-disabled.

Switch# sh int status err

Port      Name               Status       Reason               Err-disabled Vlans
Te1/0/1                      err-disabled dual-active-recovery
Te1/0/2                      err-disabled dual-active-recovery
Te1/0/3                      err-disabled dual-active-recovery
Te1/0/4                      err-disabled dual-active-recovery
Te1/0/5                      err-disabled dual-active-recovery
Te1/0/6                      err-disabled dual-active-recovery
Te1/0/7                      err-disabled dual-active-recovery
Te1/0/8                      err-disabled dual-active-recovery
Te1/0/9                      err-disabled dual-active-recovery
Te1/0/10                     err-disabled dual-active-recovery
Te1/0/11                     err-disabled dual-active-recovery
Te1/0/12                     err-disabled dual-active-recovery
Te1/0/13                     err-disabled dual-active-recovery

4) No neighbors detected on SVL ports (makes sense, link is err-disabled).

Switch#sh stackwise-virtual neighbors 
Stackwise Virtual Link(SVL) Neighbors Information:
————————————————–
SwitchSVLLocal Port                  Remote Port
——————-                  ———–
1    TenGigabitEthernet1/0/15  
     TenGigabitEthernet1/0/16  

I tried all sorts of things to get this working, and to verify my equipment didn’t have any trouble, with no success. Finally I realized I was using GLC-SX-MMD SFPs for all of these links and moved them to SFP-10G-SR links, and everything worked fine after both switches were rebooted. Cisco explicitly states in their docs that network module ports are not capable of being SVL links, but I have not seen where they list 10Gb links as being a requirement. Also I have heard that future software releases may support SVL on network module ports. However as of 16.9.3, that configuration will not function. 

Cisco Live US 2019

I was fortunate to go to San Diego in 2019 for the Cisco Live US event. This was my first time attending Cisco Live with my only previous conference experience at Aruba’s Atmosphere conferences – which I thought were very well done. Cisco Live reminded me of that but on a larger scale. Where Atmosphere capped attendance at 3k attendees (in 2017 and 2018), Cisco Live had almost 30k attendees in 2019. However the show was still easy to navigate and I found that it wasn’t as widespread as I feared it might be.

Technical seminars were available on Sunday June 9th for an added cost and I attended an all-day talk on the Catalyst 9K portfolio. It was very well done and I learned a lot about the Catalyst 9400 (access chassis), 9500 (core/distribution fixed), and 9600 (core/distribution chassis) models that I didn’t know previously.

Given the scope of the rest of the conference, it was nice to bow out of a session that didn’t meet expectations and hang around the DevNet workshops while speakers gave live demos on topics like configuring IOS XE with Ansible, YANG models, NETCONF/RESTCONF, Git, and more. That and the announcement and the new DevNet certifications had me excited that Cisco is really on the right track for the future. Speaking of those certification changes…

I used the free testing opportunity to take the ARCH exam as I had been studying for it off and on since December when I passed the CCDA. Sure enough I earned a passing score, so now I have a CCDP, well at least until February 2020 when the Cisco certifications get restructured. Related to that, I have a few CCIE RS exam books for the written that are basically useless now. I understand they aren’t really useless, the knowledge is still valid and good. It’s just that I won’t be taking the CCIE Written due to the examination changes. However I have decided to tackle these two volumes over the next two months to see if I want to continue towards the new CCIE Enterprise Infrastructure track. If I can maintain interest and pace with these books over the busy Summer, that will be a good indicator to whether or not I should continue down this path.

CCDA Anki Flash Card Deck

Awhile ago I promised to upload my CCDA Anki Flash Card deck if I ever made one. I did – and so here it is.

There are 88 cards, mostly just random facts that I pulled from the Official Cert Guide that I figured may be on the exam. I passed the exam last December with a pretty decent score – not sure if the flash cards helped but they certainly didn’t hurt. The DESGN exam wasn’t that difficult to pass on a first attempt, provided you are familiar with most of the subjects already on at least some level.

Catalyst 6807 eFSU Upgrade

I recently did a few eFSU upgrades on VSS pairs of Catalyst 6807 switches with single SUP6T in each chassis, and a few C6800-32P10G-XL linecards. An eFSU upgrade occurs on the standby supervisor first, which boots up into standby hot mode and is capable of an SSO switchover. This means second(s) of downtime, which is what we’re looking for. These upgrades involve changing code in the same train and are very picky in regards to what pre- and post-upgrade versions are supported. The compatibility matrix for eFSU upgrades can be found in the Cisco config guides…

https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst6500/ios/15-5SY/release_notes/release_notes_15_5_SY.html

https://www.cisco.com/c/dam/en/us/td/docs/switches/lan/catalyst6500/ios/SX_SY_EFSU_Compatibility_Matrix.xlsx

…the second link is the .xlsx file, with code trains as worksheets. Find the right code train and ensure that there is a C in the box of your upgrade. If this is not the case, you will likely need to perform several upgrades that are compatible. If that is not possible, you’re stuck doing a Fast Software Upgrade – which doesn’t seem to be so fast at all, with downtime in the minutes rather than second(s). This is due to the standby supervisor coming back after the upgrade in RPR mode rather than SSO mode. I’ll be doing this type of upgrade in the future, but can’t really speak much on it now.

My experience with eFSU on C6807 in the 15.5 code train was actually very pleasant. I was worried about split-brain, dual-active nightmares but it was a mostly pleasant experience. The only issue I encountered was when the supervisor in one chassis failed to boot following the issu loadversion command. The total time to boot the new code on the supervisor and all linecards was roughly 20 minutes, and after 20 minutes the supervisor had a red status LED and the management NIC was lit up green. All other LEDs were dead. Not good. The active supervisor logged that the ISSU had failed and it was aborting. I was unsure what version of code the standby would come up on if we power cycled it, or if it would even boot at all! I decided to pull the line cards, remove the links from the standby supervisor, and power cycle it. This way it would be completely detached from the network.

The standby supervisor came up on the new code! That was good, but the ISSU process had been aborted on the active supervisor. I wasn’t sure if they would sync up correctly in this state – and I didn’t want to take any chances. So I changed the boot variable on the standby (offline) supervisor to match the active supervisor’s code and reloaded it. When it came back online on the old code, I was confident that it could be powered down, the supervisor (containing VSL links) could be cabled back together, and the switch could be brought back online. Only when it came back online and was standby hot status with sso redundancy mode did I insert the old line cards and watch them come back online. They had not been upgraded previously so I knew they would be fine.

I proceeded to perform the eFSU change one more time and it went off without a hitch and I was able to still hit my change window, with less than a second of network outage.

Detecting Rogue DHCP Servers Using an ASA

DHCP is a protocol used nearly everywhere to configure network parameters of host devices automatically. A DHCP server located on a LAN segment (or elsewhere and reached via DHCP relay) will typically send parameters for the following:

Client IP Address
Default Gateway
Subnet Mask
DNS Servers

A rogue DHCP server will configure hosts with bad information for various reasons. It may simply be a rogue residential router connected to the network backwards. Or it could be a villain intentionally redirecting host traffic to a man-in-the-middle to gather information, disrupt critical business traffic, or other nefarious reasons. Rogue DHCP servers will cause problems that are frustrating at best, and a critical security hazard at worst.

The good news is that they are fairly easy to detect with a packet capture if you are connected to the LAN in question. However, sometimes I have a need to shut these down from a distance and that’s where the packet capture functionality of a Cisco ASA comes in handy.

First, login to the ASA and get to the Privileged EXEC (#) prompt. Run the following command:

Cisco-ASA# capture $CAPTURE-NAME interface $IF-NAME match udp any eq bootps any

Note:
$CAPTURE-NAME = whatever you want to call this packet capture.
$IF-NAME = the interface that you want to capture packets through.

This will match on any source address UDP port 67, which is the port that a DHCP server will use.

Let this run for a bit and try to generate some DHCP traffic (bounce a port if you can).

Use the command ‘show capture’ to see currently active captures and see if any data has been captured.

Cisco-ASA# show capture

capture TEST-CAPTURE type raw-data interface INSIDE [Capturing – 2058 bytes]

  match udp any eq bootps any

Now that we know data has been captured, let’s take a closer look.

Cisco-ASA# show capture TEST-CAPTURE

7 packets captured

1: 3:45:22.542727 802.1Q vlan#10 P0 0.0.0.0.68 > 255.255.255.255.67: udp 300

2: 3:45:22.565934 802.1Q vlan#10 P0 192.168.1.1.67 > 255.255.255.255.68: udp 548

3: 3:45:22.728508 802.1Q vlan#10 P0 0.0.0.0.68 > 255.255.255.255.67: udp 300

4: 3:45:22.786443 802.1Q vlan#10 P6 172.16.31.1.67 > 255.255.255.255.68: udp 300

5: 3:45:22.786519 802.1Q vlan#10 P6 172.16.31.1.67 > 255.255.255.255.68: udp 300

6: 3:45:23.823504 802.1Q vlan#10 P0 0.0.0.0.68 > 255.255.255.255.67: udp 300

7: 3:45:23.888702 802.1Q vlan#10 P6 172.16.31.1.67 > 255.255.255.255.68: udp 300

7 packets shown

In this example, 172.16.31.1 is a valid DHCP server. 192.168.1.1 however is not! That addressing scheme sounds like an out-of-the-box consumer-grade Netgear router. We can get more information by appending the word ‘detailed’ to the previous command.

Cisco-ASA# show capture TEST-CAPTURE detail

7 packets captured

1: 3:45:22.542727 e0c7.677a.a13c ffff.ffff.ffff 0x8100 Length: 346

802.1Q vlan#10 P0 0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] udp 300 [tos 0x18] (ttl 255, id 10635)

2: 3:45:22.565934 b039.56ab.32ea ffff.ffff.ffff 0x8100 Length: 594

802.1Q vlan#10 P0 192.168.1.1.67 > 255.255.255.255.68: [udp sum ok] udp 548 (ttl 64, id 0)

3: 3:45:22.728508 e0c7.677a.a13c ffff.ffff.ffff 0x8100 Length: 346

802.1Q vlan#10 P0 0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] udp 300 [tos 0x18] (ttl 255, id 10636)

4: 3:45:22.786443 40a6.7713.afb1 ffff.ffff.ffff 0x8100 Length: 346

802.1Q vlan#10 P6 172.16.31.1.67 > 255.255.255.255.68: [udp sum ok] udp 300 [ttl 1] (id 47344)

5: 3:45:22.786519 40a6.7713.afb1 ffff.ffff.ffff 0x8100 Length: 346

802.1Q vlan#10 P6 172.16.31.1.67 > 255.255.255.255.68: [udp sum ok] udp 300 [ttl 1] (id 47345)

6: 3:45:23.823504 e0c7.677a.a13c ffff.ffff.ffff 0x8100 Length: 346

802.1Q vlan#10 P0 0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] udp 300 [tos 0x18] (ttl 255, id 10637)

7: 3:45:23.888702 40a6.7713.afb1 ffff.ffff.ffff 0x8100 Length: 346

802.1Q vlan#10 P6 172.16.31.1.67 > 255.255.255.255.68: [udp sum ok] udp 300 [ttl 1] (id 47348)

7 packets shown

This shows us that the MAC address of the 192.168.1.1 host (aka villainous rogue DHCP server) is b039.56ab.32ea. We can enter the OIU (first 6 characters of a MAC address) into a tool like the Wireshark OUI Lookup Tools and discover that this is a Netgear device. Tracking it on the LAN is easy and out of the scope of this post.

Of course, none of this would be required if you have properly configured features like DHCP snooping or 802.1X port authentication. I hope this helps you save some time troubleshooting in the future.

PPPoE Is Not Supported with Failover Enabled (ASA) – Seriously

Cisco states that PPPoE is only supported on the 5500x ASA models in single routed mode without failover. (see here). When a telco tech came and replaced a DSL modem with a transparent bridge several hundred miles from my office, they gave me little choice but to challenge Cisco on that statement. I configured the VPDN group and tied it in with the outside interface and everything was working like it was before. Or was it?

First lesson learned – the ASA will allow you to configure PPPoE on an interface when in a failover pair. There are no warning lights, no flashing alarms, it will just let you do it. And it might even work, like it did for me.

Or you might end up in a strange situation where the standby firewall has gone through the PPPoE stages and now has a session up. In my case, this was identified with the following command:

‘show vpdn session state pppoe’ (or something like that)

My active firewall was at the PADI_SENT stage and was stuck there for a long time. My standby firewall was at the SESSION_UP stage and had an active Internet connection over the DSL line. This is perplexing to say the least. I fixed it by shutting the switchport that connected to the WAN interface on my standby firewall. Then I stripped the VPDN configuration from the outside interface on the active firewall and re-applied it. After confirming that the state was SESSION_UP on the active firewall, I no shut the port on the standby firewall. The end result was that the DSL connection was up on the active firewall and the standby firewall was failover ready. I thought I had out-smarted Cisco, I should have known better.

Lesson two – an ASA will disable failover if it boots up and has PPPoE attached to an interface.

Of course these lessons always happen at inconvenient times and/or places. I learned this when I was upgrading the ASA code. I had done it dozens of times and expected this time to be no different. I copied the new code over to the firewall pair, verified the sha-512 signature, changed the boot variables, wrote memory, and reloaded the standby unit. I watched the clock tick for several minutes. After 10 minutes and the active unit still reporting that the standby unit had failed, I knew something had gone wrong.

I got in touch with someone on-site who confirmed that both firewalls had green active lights. Fantastic. I had them immediately power down the secondary unit until I could get eyes on the console output during a boot sequence.

The firewall boot looked normal – strange. I logged in and found out that failover was OFF! This can cause some serious issues that are very hard to detect and troubleshoot (especially from hundreds of miles away). One example is that any device which uses the firewall as a default gateway now has two devices responding to their ARP request. This is a race condition and any device that caches an ARP reply from the secondary/should-be-standby-but-is-active unit is in for a bad time. Good luck getting outside of your subnet, I hope they don’t cache that MAC for too long.

When I tried to enable failover, I got the message:
PPPOE Client cannot be enabled on interface, Gi1/1(outside)
failover is not compatible with above configurations,
user must manually remove or fix them as instructed before failover can be enabled.

The fix for this was to remove the ‘ip address’ and ‘pppoe client’ commands from the interface mentioned in the error message, then try again. I ended up removing those commands from both firewalls before finally enabling failover on the secondary unit, which did sync and settle into standby.

TL;DR – don’t configure PPPoE on an active/standby pair of ASAs.

Access Control Policies Won’t Apply to New Firepower Devices

I recently ran into a fairly simple issue during a Firepower installation. I registered some 5506-X Firepower modules to a Firepower Management Center (FMC) before I had any licensing applied to the FMC. I am using the classic licensing model, so this process may be different under smart licensing. After linking the control licenses with the FMC ID in the Cisco portal, I added the licenses to the FMC. The licenses I applied were being used, however the devices appeared in the FMC under ‘device management’ had no access control policy attached to them.

I remember setting access control policies when  adding these devices, so I knew something was up. Click on the pencil icon on the right side of the device in question to edit that device. Click on the device tab to show general device information. If there is now adequate licensing for the device, you should be able to select different licensing schemes depending on what you have purchased and are using. The basic control license gives you ‘control’ and ‘protection’. Once those are both selected, your device will appear under the device management screen as having an access policy which can now be deployed.