Constraints Are Good

I used to think of constraints as being a bad thing, in general. Constraints limit possibility. I’ve since reconsidered. When faced with a problem, constraints narrow the field of solutions and should be used to make better choices. Perhaps this is just a glass half full approach. Just something to think about.

A Tough Week for Studying

I came down with a cold a few days ago and have had a hard time studying, mostly due to lack of energy. I’m about 3/4 through the Coursera Learning how to Learn course and am finding it very helpful. Typically I will plow through reading and video material cover-to-cover and hope to recall bits and pieces whenever I need them. This technique has served me well in the past, but I know it won’t suffice for the CCIE. Once the course is completed, I’ll post a few useful tips that will transfer over to CCIE studies.

In line with the above thoughts, I’ve been running-in-place reviewing some Anki flashcards that someone else posted online. What I’m discovering is that these cards only reinforce what I already know, along with what I know I should be studying more. But the cards in themselves are NOT helping me learn what I don’t know. For instance, Performance Routing is something that I understand at a surface level only. Rather than beating flashcards into my head, I should be going over the Cisco config guides and reviewing others blogs to learn this technology at a deeper level. At any rate, that is something learned and I will adjust my study habits accordingly. My learning will slow until I get my hands on a server for labbing, which should really help make some of these ideas more concrete in my head.

And the Journey Continues…

I mentioned earlier this Summer that I would be reviewing the CCIE Routing and Switching (RS) v5.1 OCG books, although I had no intention to follow through with the RS v5.1 path. That still holds true today. However – I am going to sit the written in a few weeks because I have a voucher for it, so why not? Having never sat it before, I am not very confident in a pass. But a pass now just earns me the right to book the lab before February 24, 2020 when the new Enterprise Infrastructure v1.0 lab will roll out. There are plenty of RS candidates who have been studying for far longer than me that are vying for a seat to earn their number before the v5.1 lab is deprecated. Even if I pass the written, if would be an immense challenge to study up for current lab and try to pass it in the next 4 months. I do still want to continue down this path though.

I’m going to try and gamify this a bit to encourage me to study more. Basically – I want to update this blog every Sunday night detailing what I studied the previous week and what I intend to review over the following week. This will help track time invested, increase my buy-in, and hopefully encourage others along their journey (CCIE or another).

So far I have reviewed the following materials (time estimates are completely fabricated, trying to be as accurate as possible):

  1. CCIE RS v5.1 OCG Volume 1 – 40 hours
  2. CCIE RS v5.1 OCG Volume 2 – 35 hours
  3. Misc blogs, etc. – 5 hours

It is Monday while I write this, and I intend to spend the next week or so completing a Coursera course called “Learning How to Learn: Powerful mental tools to help you master tough subjects”. This is a free course and was brought to my attention while reviewing Tim McConnaughy’s CCIE journey over at carpe-dmvpn.com. Tim was a great speaker at Cisco Live US 2019, presenting an introduction to IP Multicast course (BRKIPM-1261). Tim suggests that developing learning techniques helped him study immensely, and I need to develop techniques to help retain what I am reading. Up soon I will be building a lab, I’ll post more on that as it occurs.

Cisco Catalyst 9500 Stackwise Virtual Link Requires 10Gb Links

I ran into a frustrating and not well documented feature of the Catalyst 9500s the other day, while setting up a pair of C9500-16X in a StackWise Virtual (SWV) configuration. For the uninitiated, StackWise Virtual is similar to Virtual Switching System (VSS). It allows two physical switches to run as one logical switch. This is helpful for distribution or core/distribution layer when you want multitple non-blocking L2 links to access layer switches. The active switch in the SWV pair process and sends all control traffic while the master forwards traffic in the data path and can take over as the active switch in the event of a switchover (failure of active switch, forced switchover due to code upgrade, etc).

In production, the StackWise Virtual Links (SVL) that connect the two switches would most likely be a pair of 10Gb links. I was trying to get a pair of C9500-16X configured with 1Gb links temporarily and ran into a confusing issue. I ensured that both switches were on the same license and code version, and applied the correct SWV configuration. The configuration is really dead simple – just define a stackwise-virtual domain # and then device the stackwise-virtual links. Optionally, a dual-active detection (DAD) link can be defined. Switch # and priority are defined like they are in a stackwise 480 (physical stacking cables).

When both switches were booted, the DAD link and 2x SVL links were flashed on both switches during the 120-second neighbor discovery period. Whichever switch booted first (even if by a fraction of a second) woudl eventually folly boot, then a strange sequence of events would occur:

1) Both SVL links would become err-disabled due to link flap.

*Jul 19 16:51:20.566: %PM-4-ERR_DISABLE: link-flap error detected on Te1/0/16, putting Te1/0/16 in err-disable state
*Jul 19 16:51:21.008: %PM-4-ERR_DISABLE: link-flap error detected on Te1/0/15, putting Te1/0/15 in err-disable state

2) Duak-active detection link will stay up.

Switch#sh stackwise-virtual dual-active-detection 
Dual-Active-Detection Configuration:
————————————-
SwitchDad portStatus
—————————
1 TenGigabitEthernet1/0/14  up  

3) All other links not related to SWV will become err-disabled.

Switch# sh int status err

Port      Name               Status       Reason               Err-disabled Vlans
Te1/0/1                      err-disabled dual-active-recovery
Te1/0/2                      err-disabled dual-active-recovery
Te1/0/3                      err-disabled dual-active-recovery
Te1/0/4                      err-disabled dual-active-recovery
Te1/0/5                      err-disabled dual-active-recovery
Te1/0/6                      err-disabled dual-active-recovery
Te1/0/7                      err-disabled dual-active-recovery
Te1/0/8                      err-disabled dual-active-recovery
Te1/0/9                      err-disabled dual-active-recovery
Te1/0/10                     err-disabled dual-active-recovery
Te1/0/11                     err-disabled dual-active-recovery
Te1/0/12                     err-disabled dual-active-recovery
Te1/0/13                     err-disabled dual-active-recovery

4) No neighbors detected on SVL ports (makes sense, link is err-disabled).

Switch#sh stackwise-virtual neighbors 
Stackwise Virtual Link(SVL) Neighbors Information:
————————————————–
SwitchSVLLocal Port                  Remote Port
——————-                  ———–
1    TenGigabitEthernet1/0/15  
     TenGigabitEthernet1/0/16  

I tried all sorts of things to get this working, and to verify my equipment didn’t have any trouble, with no success. Finally I realized I was using GLC-SX-MMD SFPs for all of these links and moved them to SFP-10G-SR links, and everything worked fine after both switches were rebooted. Cisco explicitly states in their docs that network module ports are not capable of being SVL links, but I have not seen where they list 10Gb links as being a requirement. Also I have heard that future software releases may support SVL on network module ports. However as of 16.9.3, that configuration will not function. 

Cisco Live US 2019

I was fortunate to go to San Diego in 2019 for the Cisco Live US event. This was my first time attending Cisco Live with my only previous conference experience at Aruba’s Atmosphere conferences – which I thought were very well done. Cisco Live reminded me of that but on a larger scale. Where Atmosphere capped attendance at 3k attendees (in 2017 and 2018), Cisco Live had almost 30k attendees in 2019. However the show was still easy to navigate and I found that it wasn’t as widespread as I feared it might be.

Technical seminars were available on Sunday June 9th for an added cost and I attended an all-day talk on the Catalyst 9K portfolio. It was very well done and I learned a lot about the Catalyst 9400 (access chassis), 9500 (core/distribution fixed), and 9600 (core/distribution chassis) models that I didn’t know previously.

Given the scope of the rest of the conference, it was nice to bow out of a session that didn’t meet expectations and hang around the DevNet workshops while speakers gave live demos on topics like configuring IOS XE with Ansible, YANG models, NETCONF/RESTCONF, Git, and more. That and the announcement and the new DevNet certifications had me excited that Cisco is really on the right track for the future. Speaking of those certification changes…

I used the free testing opportunity to take the ARCH exam as I had been studying for it off and on since December when I passed the CCDA. Sure enough I earned a passing score, so now I have a CCDP, well at least until February 2020 when the Cisco certifications get restructured. Related to that, I have a few CCIE RS exam books for the written that are basically useless now. I understand they aren’t really useless, the knowledge is still valid and good. It’s just that I won’t be taking the CCIE Written due to the examination changes. However I have decided to tackle these two volumes over the next two months to see if I want to continue towards the new CCIE Enterprise Infrastructure track. If I can maintain interest and pace with these books over the busy Summer, that will be a good indicator to whether or not I should continue down this path.

CCDA Anki Flash Card Deck

Awhile ago I promised to upload my CCDA Anki Flash Card deck if I ever made one. I did – and so here it is.

There are 88 cards, mostly just random facts that I pulled from the Official Cert Guide that I figured may be on the exam. I passed the exam last December with a pretty decent score – not sure if the flash cards helped but they certainly didn’t hurt. The DESGN exam wasn’t that difficult to pass on a first attempt, provided you are familiar with most of the subjects already on at least some level.

Catalyst 6807 eFSU Upgrade

I recently did a few eFSU upgrades on VSS pairs of Catalyst 6807 switches with single SUP6T in each chassis, and a few C6800-32P10G-XL linecards. An eFSU upgrade occurs on the standby supervisor first, which boots up into standby hot mode and is capable of an SSO switchover. This means second(s) of downtime, which is what we’re looking for. These upgrades involve changing code in the same train and are very picky in regards to what pre- and post-upgrade versions are supported. The compatibility matrix for eFSU upgrades can be found in the Cisco config guides…

https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst6500/ios/15-5SY/release_notes/release_notes_15_5_SY.html

https://www.cisco.com/c/dam/en/us/td/docs/switches/lan/catalyst6500/ios/SX_SY_EFSU_Compatibility_Matrix.xlsx

…the second link is the .xlsx file, with code trains as worksheets. Find the right code train and ensure that there is a C in the box of your upgrade. If this is not the case, you will likely need to perform several upgrades that are compatible. If that is not possible, you’re stuck doing a Fast Software Upgrade – which doesn’t seem to be so fast at all, with downtime in the minutes rather than second(s). This is due to the standby supervisor coming back after the upgrade in RPR mode rather than SSO mode. I’ll be doing this type of upgrade in the future, but can’t really speak much on it now.

My experience with eFSU on C6807 in the 15.5 code train was actually very pleasant. I was worried about split-brain, dual-active nightmares but it was a mostly pleasant experience. The only issue I encountered was when the supervisor in one chassis failed to boot following the issu loadversion command. The total time to boot the new code on the supervisor and all linecards was roughly 20 minutes, and after 20 minutes the supervisor had a red status LED and the management NIC was lit up green. All other LEDs were dead. Not good. The active supervisor logged that the ISSU had failed and it was aborting. I was unsure what version of code the standby would come up on if we power cycled it, or if it would even boot at all! I decided to pull the line cards, remove the links from the standby supervisor, and power cycle it. This way it would be completely detached from the network.

The standby supervisor came up on the new code! That was good, but the ISSU process had been aborted on the active supervisor. I wasn’t sure if they would sync up correctly in this state – and I didn’t want to take any chances. So I changed the boot variable on the standby (offline) supervisor to match the active supervisor’s code and reloaded it. When it came back online on the old code, I was confident that it could be powered down, the supervisor (containing VSL links) could be cabled back together, and the switch could be brought back online. Only when it came back online and was standby hot status with sso redundancy mode did I insert the old line cards and watch them come back online. They had not been upgraded previously so I knew they would be fine.

I proceeded to perform the eFSU change one more time and it went off without a hitch and I was able to still hit my change window, with less than a second of network outage.

Studying for CCDA

I’ve decided to start studying for the CCDA. I haven’t taken any tests for awhile and I’d like to start down the road to renewing my CCNP R&S (which admittedly won’t expire for another 26 months). I plan to knock out the CCDA before November is over and then the CCDP by the end of February 2019.

My normal study patterns are typically loose and I don’t set any deadlines. I am still usually able to achieve goals in this fashion. This time however, I’m setting weekly goals of material to cover and will work every hard to ensure I stay on path. Part of the reason I am doing this is to determine how well I can stick to a schedule, as I might entertain studying for the CCIE R&S after this. I’m not 100% committed to that path though as I learn more about network automation and spend a good chunk of my free time on other skills that I think are useful and valuable.

I may be posting some of my notes on the CCDA going forward. I haven’t seen too many online resources for CCDA, and I hope it helps someone else studying.

This Guy Logged Putty to a Network Share and You Won’t Believe What Happened Next

It turns out that logging putty sessions to a network share is, or can be a bad idea. I learned this after getting a new laptop at a new job. I had several workstations and wanted to log all of my putty sessions to the same folder – I figured this would help with troubleshooting at some point down the line. What I didn’t realize was that this would cripple me for a few days as I worked and tried to figure out why my sessions were so slow.

It didn’t help that I had just bought a sketchy USB-to-serial cable and was questioning the drivers I had installed. Putty sessions would often hang before I could even login to a device. How frustrating! I realized that when I was connected to a wireless network I had the most issues – using a wired connection was typically no trouble at all. At some point in troubleshooting almost any strange issue, I turn to Wireshark. Wireshark can be a great tool when you have a defined problem scope, which I didn’t quite have here – but I was on the verge of something.

I saw a lot of chatty SMB traffic going back and forth between my laptop and a file server. The destination folder matched my putty log folder. Suddenly it hit me – the added latency of a wireless connection combined with the chatty nature of the SMB traffic caused all of the ssh sessions from my laptop to be mostly unusable. So obvious and clear, why hadn’t I thought of it sooner? As soon as I changed the logging path to a local drive, my sessions sped up dramatically.

This is a strange problem that I suspect won’t help anyone specifically – but I do hope it’s at least mildly interesting.

No Safe Changes

Settling into a new job – I was working on what I thought was a routine change. Setup a spare switch in a temporary location with a basic config. Easy enough, right? The time came for me to configure a port on the upstream device. The device in question was a legacy Catalyst 65xx – a big chassis switch that I had read about but never had any experience with. The port I was going to use had a dozen lines of configuration already applied – mostly related to queuing. My first instinct was to issue a ‘default interface <slot/port>’ command and start from scratch. This is almost always the right thing to do, as it ensures no confusing stale configuration remains (I’m looking at you ‘switchport access vlan # / switchport mode trunk’).

Leaning over, I asked my co-worker if the ‘default interface’ command worked on these things. After being assured that it would be fine, I held my breathe and pressed ‘enter’.

I was greeted with several lines of output related to quality of service (QoS) being set to default values on a range of interfaces. Crap! Had I just wiped out the configuration for an entire line card?

No – it turns out the architecture of these switches is such that the queuing must be configured identically on specific groups of ports. I forget if it was all 48 ports on the card, orĀ  16, or whatever, but the point is that I made a simple change and there were unintended consequences. At least Cisco was kind enough to leave me a message about it. And it didn’t bring the network down.

This was a reminder that even the most mundane, routine, everyday changes can go sideways when you least suspect it.