Cisco Catalyst 9500 Stackwise Virtual Link Requires 10Gb Links

I ran into a frustrating and not well documented feature of the Catalyst 9500s the other day, while setting up a pair of C9500-16X in a StackWise Virtual (SWV) configuration. For the uninitiated, StackWise Virtual is similar to Virtual Switching System (VSS). It allows two physical switches to run as one logical switch. This is helpful for distribution or core/distribution layer when you want multitple non-blocking L2 links to access layer switches. The active switch in the SWV pair process and sends all control traffic while the master forwards traffic in the data path and can take over as the active switch in the event of a switchover (failure of active switch, forced switchover due to code upgrade, etc).

In production, the StackWise Virtual Links (SVL) that connect the two switches would most likely be a pair of 10Gb links. I was trying to get a pair of C9500-16X configured with 1Gb links temporarily and ran into a confusing issue. I ensured that both switches were on the same license and code version, and applied the correct SWV configuration. The configuration is really dead simple – just define a stackwise-virtual domain # and then device the stackwise-virtual links. Optionally, a dual-active detection (DAD) link can be defined. Switch # and priority are defined like they are in a stackwise 480 (physical stacking cables).

When both switches were booted, the DAD link and 2x SVL links were flashed on both switches during the 120-second neighbor discovery period. Whichever switch booted first (even if by a fraction of a second) woudl eventually folly boot, then a strange sequence of events would occur:

1) Both SVL links would become err-disabled due to link flap.

*Jul 19 16:51:20.566: %PM-4-ERR_DISABLE: link-flap error detected on Te1/0/16, putting Te1/0/16 in err-disable state
*Jul 19 16:51:21.008: %PM-4-ERR_DISABLE: link-flap error detected on Te1/0/15, putting Te1/0/15 in err-disable state

2) Duak-active detection link will stay up.

Switch#sh stackwise-virtual dual-active-detection 
Dual-Active-Detection Configuration:
————————————-
SwitchDad portStatus
—————————
1 TenGigabitEthernet1/0/14  up  

3) All other links not related to SWV will become err-disabled.

Switch# sh int status err

Port      Name               Status       Reason               Err-disabled Vlans
Te1/0/1                      err-disabled dual-active-recovery
Te1/0/2                      err-disabled dual-active-recovery
Te1/0/3                      err-disabled dual-active-recovery
Te1/0/4                      err-disabled dual-active-recovery
Te1/0/5                      err-disabled dual-active-recovery
Te1/0/6                      err-disabled dual-active-recovery
Te1/0/7                      err-disabled dual-active-recovery
Te1/0/8                      err-disabled dual-active-recovery
Te1/0/9                      err-disabled dual-active-recovery
Te1/0/10                     err-disabled dual-active-recovery
Te1/0/11                     err-disabled dual-active-recovery
Te1/0/12                     err-disabled dual-active-recovery
Te1/0/13                     err-disabled dual-active-recovery

4) No neighbors detected on SVL ports (makes sense, link is err-disabled).

Switch#sh stackwise-virtual neighbors 
Stackwise Virtual Link(SVL) Neighbors Information:
————————————————–
SwitchSVLLocal Port                  Remote Port
——————-                  ———–
1    TenGigabitEthernet1/0/15  
     TenGigabitEthernet1/0/16  

I tried all sorts of things to get this working, and to verify my equipment didn’t have any trouble, with no success. Finally I realized I was using GLC-SX-MMD SFPs for all of these links and moved them to SFP-10G-SR links, and everything worked fine after both switches were rebooted. Cisco explicitly states in their docs that network module ports are not capable of being SVL links, but I have not seen where they list 10Gb links as being a requirement. Also I have heard that future software releases may support SVL on network module ports. However as of 16.9.3, that configuration will not function.