Multi Protocol Label Switching is a technology to deliver IP
Forwarding of data packets is via labels – MPLS enabled routers do not look into IP header to forward packets
MPLS is known as OSI layer 2.5 – Label info is inserted between Data link and Network layer and this is sometimes called shim header
MPLS works over most “Layer 2 technologies” such as ATM, FR, PPP, POS, Ethernet
Network infrastructure convergence – MPLS enabled network allows to carry different kind of traffic (IPv4, IPv6, Layer2 frames) across single network infrastructure
No need to have BGP enabled on all routers – Very important for scaling networks – because MPLS forwarding is done via labels, we do not need to keep all destination IP addresses in routing tables
– Allows use of overlapping IPv4 address space – Allows optimal traffic flow
Traffic engineering – Preffered path is least cost path determined by IGP – Basic idea is to use links in network infrastructure efficiently – MPLS needs to be able to provide mechanism to divert traffic to other links beside preffered path
Main building blocks of MPLS:
Label – 32 bit value inserted between Layer 2 and Layer 3
LSR – Label Switch Router (eg. PE, P) LSP – Label Switched Path IGP – Interior Gateway Protocol LDP – Label Distribution Protocol LIB, LFIB – Label Information Base, Label Forwarding Information Base MP-BGP, RSVP – Protocols for MPLS VPN and MPLS TE
Egress LSR not always performs label disposition – PHP (Penultimate Hop Popping) signaled via implicit null label (LDP advertising MPLS label of value three)
Penultimate Hop Popping (PHP) is a feature in MPLS (Multiprotocol Label Switching) where the second-to-last router (the penultimate hop) removes (or pops) the MPLS label before forwarding the packet to the final router. This improves efficiency and reduces workload on the last router.
Assigning and distributing MPLS labels Each LSR needs to run IGP to learn IP prefixes (eg. neighbor loopbacks, BGP next hops) Each LSR then forms “LDP neighborship” between its directly connected LSR
Once LDP neighborship is formed, each LSR uses LDP to “assign labels to IP prefixes” it knows about – each LSR does this independently and advertises its labels to its LDP neighbors
LDP is standards based – RFC 3035 and RFC 3036 LDP uses UDP for session discovery and neighbor discovery (port 646 and destination IP 224.0.0.2) LDP uses TCP (port 646 and destination IP of its LDP peer) for rest of the messages (label advertisement, label withdrawal, session maintenance, session teardown)
Forwarding MPLS packets – which label to use? RIB stores IP prefixes, LIB stores MPLS labels LFIB is created from both RIB and LIB and used to forward MPLS tagged packets Example for LSR in bottom picture: – RIB has 1.1.1.1/32 learned via IGP over e0/0 interface – LIB has label “L” for prefix 1.1.1.1/32 learned from its LDP peer – LFIB has: “to forward packet to 1.1.1.1/32, use label L and send packet using peer LDP nexthop over e0/0 interface”
Label stacking
Labeling does not make forwarding of packets faster Label stacking is the primary use of MPLS that enables use of MPLS L2 and L3 VPNs, traffic engineering and other services Most used examples of label stacking: – 2 labels for MPLS VPN – bottom label indicates which VPN this packet belongs to, outer is used by core LSRs for packet forwarding – 3 labels for MPLS TE – the most upper label is used to indicate which TE tunnel to forward this packet
Use of MPLS to build Layer 3 VPN
MPLS VPN is set of sites that communicate with each other – these sites can be connected to MPLS infrastructure at various PE routers Each site is identified by its own VRF (Virtual Routing and Forwarding), by default communication between VRF is not allowed Each PE router assigns distinct MPLS label for each VRF it communicates with other PE routers – this label is not assigned by LDP, but by MP-BGP
RD (Route Distinguisher) is attached to each IP prefix exchanged in VPN to make them unique – RD + prefix = VPN prefix RD allows to use overlapping IP addresses among VPNs RD length is 64 bits and is in format X:Y, where X is usually Autonomous System Number or IP address – usually one RD is assigned per customer RT (Route Target) governs which VPN prefixes are allowed to be imported or exported out of particular VPN
Route Targets
In MPLS Layer 3 VPNs, a Route Target (RT) is a special extended BGP attribute used to control which VPN routes are imported and exported between PE (Provider Edge) routers.
In an MPLS VPN network:Multiple customers share the same provider backbone.Each customer has a separate routing table called a VRF (Virtual Routing and Forwarding).Routes must be kept isolated between customers.The Route Target ensures that:Only the correct VPN routes are shared between the correct VRFs.Customer A’s routes are not accidentally sent to Customer B.
Each VRF has:
Export Route Target defined
Import Route Target defined
A PE router learns a route from a customer. It adds a Route Target (RT) to that route.The route is advertised via MP-BGP to other PE routers. Other PE routers check: If the route’s RT matches their import RT, If yes → route is installed in the VRF, If no → route is ignored
Customer A has two sites:
Site 1 connected to PE1
Site 2 connected to PE2
Both VRFs are configured with:
Export RT: 100:1
Import RT: 100:1,
Result: PE1 exports routes with RT 100:1, PE2 imports routes with RT 100:1, Both sites can communicate. If another customer uses RT 200:1, their routes stay completely separate.
In order to bring L3 VPN into life, you need to exchange both RD and RT – this is done by MP-BGP
so the functions have been seperated
MPLS Layer 3 VPN Intranet for customer in VPN RED
MPLS Layer 3 VPN Intranet for customer in VPN GREEN
MPLS Layer 3 VPN Intranet for customer in VPN BLUE
MPLS Layer 3 VPN Extranet between customer VPN RED and VPN BLUE
Using RT you create Intranet or Extranet – Intranet – different sites of “same” VPN can communicate – Extranet – different sites of “different” VPNs can communicate
Exchanging RD, RT and VPN label over MPLS network -Each PE router forms iBGP session with other PE router -Over this iBGP sessions, PE routers exchange VPN prefixes -Each VPN prefix is exchanged with its associated RT and VPN label – RT is for importing routes into VRF RIB, VPN label is for actual packet forwarding
Packet forwarding with MPLS Layer 3 VPN
-IGP label is assigned by LDP -VPN label is assigned by MP-BGP
1.) PE1 receives IP packet on VRF interface assigned to site 1 of VPN BLUE. 2.) PE1 looks up VPN and IGP label, imposes these both labels as label stack to IP packet and forwards it to MPLS network. IGP label is known based on iBGP next hop, which is IP address of PE2. 3.) P1 router swaps IGP label based on its LFIB table. 4.) P2 removes IGP label due to PHP, but does not touch VPN label. 5.) PE2 router receives IP packet with VPN label, which it uses to select correct outgoing VPN site 6.) PE2 then strips off VPN label, makes lookup in its VRF RIB for particular VPN site to get the outgoing interface to send received packet to
Exchanging routing information between CE and PE routers – Static routing – RIP – EIGRP – OSPF – IS-IS – eBGP
Protocol: This field is 8 bits in length. It indicates the upper-layer protocol. The Internet Assigned Numbers Authority (IANA) is responsible for assigning IP protocol values. Table 1-2 shows some key protocol numbers. You can find a full list
Header Checksum: This field is 16 bits in length. The checksum does not include the data portion of the packet in the calculation. The checksum is verified and recomputed at each point the IP header is processed (on end clients)
Padding: This field is variable in length. It ensures that the IP header ends on a 32-bit boundary.
Header Length: This field is 4 bits in length. It indicates the length of the header in 32-bit words (4 bytes) so that the beginning of the data can be found in the IP header. The minimum value for a valid header is 5 (0101) for five 32-bit words.
Total Length: This field is 16 bits in length. It represents the length of the datagram, or packet, in bytes, including the header and data. The maximum length of an IP packet can be 216 − 1 = 65,535 bytes. Routers use this field to determine whether fragmentation is necessary by comparing the total length with the outgoing MTU.
Identification: This field is 16 bits in length. It is a unique identifier that denotes fragments for reassembly into an original IP packet.
Flags: This field is 3 bits in length. It indicates whether the packet can be fragmented and whether more fragments follow. Bit 0 is reserved and set to 0. Bit 1 indicates May Fragment (0) or Do Not Fragment (1). Bit 2 indicates Last Fragment (0) or More Fragments to Follow (1).
Fragment Offset: This field is 13 bits in length. It indicates (in bytes) where in the packet this fragment belongs. The first fragment has an offset of 0.
ToS (Type of Service): This field is 8 bits in length. Quality of service (QoS) parameters such as IP precedence and DSCP are found in this field. (These concepts are explained later in this chapter.)
The ToS field of the IP header is used to specify QoS parameters. Routers and Layer 3 switches look at the ToS field to apply policies, such as priority, to IP packets based on the markings. An example is a router prioritizing time-sensitive IP packets over regular data traffic such as web or email, which is not time sensitive.
DSCP
DSCP has 2’6 = 64 levels of classification, which is significantly higher than the eight levels of the IP precedence bits
backward compatible with IP precedence
Defines three sets of PHBs: Class Selector (CS), Assured Forwarding (AF), and Expedited Forwarding (EF).
CS PHB set is for DSCP values that are compatible with IP precedence bits
The AF PHB set is used for queuing and congestion avoidance.
The EF PHB set is used for premium service
IPv4 Fragmentation
Although the maximum length of an IP packet is 65,535 bytes, most of the common lower-layer protocols do not support such large MTUs. For example, the MTU for Ethernet is approximately 1518 bytes. When the IP layer receives a packet to send, it first queries the outgoing interface to get its MTU. If the packet’s size is greater than the interface’s MTU, the layer fragments the packet.
When a packet is fragmented, it is not reassembled until it reaches the destination IP layer. The destination IP layer performs the reassembly
Any router in the path can fragment a packet, and any router in the path can fragment a fragmented packet again, and these kind of double fragmentation can cause unrecoverable packets on destination
Each fragment receives its own IP header and identifier, and it is routed independently from other packets. Routers and Layer 3 switches in the path do not reassemble the fragments. The destination host performs the reassembly and places the fragments in the correct order by looking at the Identification and Fragment Offset fields.
If one or more fragments are lost, the entire packet must be retransmitted. Retransmission is the responsibility of a higher-layer protocol (such as TCP). Also, you can set the Flags field in the IP header to Do Not Fragment; in this case, the packet is discarded if the outgoing MTU is smaller than the packet like full drop or like an ACL drop
IPv4 Addressing
Classes A, B, and C are unicast IP addresses, meaning that the destination is a single host. IP Class D addresses are multicast addresses, which are sent to multiple hosts
Class A address range 1.0.0.0 to 126.0.0.0. Networks 0 and 127 are reserved. For example, 127.0.0.1 is reserved for the local host or host loopback.
Class B addresses range from 128 (10000000) to 191 (10111111) in the first byte. Network numbers assigned to companies or other organizations are from 128.0.0.0 to 191.255.0.0
As with Class A addresses, having a segment with more than 65,000 hosts broadcasting will surely not work; you resolve this issue with subnetting.
Class C addresses range from 192 (11000000) to 223 (11011111) in the first byte. Network numbers assigned to companies are from 192.0.0.0 to 223.255.255.0.
254 IP addresses for host assignment per Class C network
Class D addresses range from 224 (11100000) to 239 (11101111) in the first byte. Network numbers assigned to multicast groups range from 224.0.0.1 to 239.255.255.255
These addresses do not have a host or network part. Some multicast addresses are already assigned; for example, routers running EIGRP use 224.0.0.10
Class E addresses range from 240 (11110000) to 254 (11111110) in the first byte. These addresses are reserved for experimental networks. Network 255 is reserved for the broadcast address, such as 255.255.255.255
Networks 0.0.0.0 and 127.0.0.0 are reserved as special-use addresses
Large organizations can use network 10.0.0.0/8 to assign address space throughout the enterprise. Midsize organizations can use one of the Class B private networks 172.16.0.0/16 through 172.31.0.0/16 for IP addresses. The smaller Class C addresses, which begin with 192.168, can be used by corporations and are commonly used in home routers.
NAT
NAT performs a many-to-one translation which is usally from many private addresses to one public address, the process is called Port Address Translation (PAT) because different port numbers identify translations
It is called port based translation because source ports are also translated because a source port might be used by one host inside network , at the same time same port could also be used by another host, for second host using a same port will translate to a different source port on the public side
Router or firewall performing translation keeps track of translation in a translation table This translation record is just like connection table and also times out if connection becomes idle. Some applications also send packets out at interval to keep the NAT entry alive , in The absence of data traffic
source addresses for outgoing IP packets are converted to globally unique IP addresses
NAT has several forms
Static NAT: Host is manually / statically assigned an external address, making that host avaiable to the external world when coming outside to inside and also allows host going out with that static address from inside to outside
Dynamic NAT: Dynamically maps a private IP address to a registered IP address from a pool (group) of registered addresses. The are two types of dynamic NAT
Overloading: Maps multiple unregistered or private IP addresses to a single registered IP address by using different ports. This is also known as PAT, single-address NAT. The number of PAT translations are limited by maximum of 65,535 internal hosts via PAT.
Overlapping: Overlapping networks result when you have overlapping subnets in two different locations. Overlapping networks also result when two companies, merge. These two networks need to communicate, preferably without having to readdress all their devices.
Inside local address: The real IP address of the device that resides in the internal network. This address is used in the stub domain.
Inside global address: The translated IP address of the device that resides in the internal network. This address is used in the public network.
Outside global address: The real IP address of a device that resides in the Internet, outside the stub domain.
Outside local address: The translated IP address of the device that resides in the Internet. This address is used inside the stub domain.
Different types of NAT
Static NAT
Commonly used to assign a network device with an internal private IP address a unique public address so that it can be accessed from the Internet.
Dynamic NAT
Dynamically maps an unregistered or private IP address to a registered IP address from a pool (group) of registered addresses.
PAT
Maps multiple unregistered or private IP addresses to a single registered IP address by using different ports.
Inside local address
The real IP address of a device that resides in the internal network. This address is used in the stub domain.
Inside global address
The translated IP address of the device that resides in the internal network. This address is used in the public network.
Outside global address
The real IP address of a device that resides on the Internet, outside the stub domain.
Outside local address
The translated IP address of a device that resides on the Internet. This address is used inside the stub domain.
IPv4 Address Subnets
Multicast addresses do not use subnet masks
IP Address Subnet Design Example
The development of an IP address plan or IP address subnet design is an important concept for a network designer. You should be capable of creating an IP address plan based on many factors, including the following:
-Number of locations -Number of devices per location -IP addressing requirements for each individual location or building -Number of devices to be supported in each comms room -Site requirements, including VoIP devices, wireless LAN, and video
Subnetting for a small company. Suppose the company has 200 hosts and is assigned the Class C network 195.10.1.0/24. The 200 hosts need to be in six different LANs.
You can subnet the Class C network using the mask 255.255.255.224
Deriving number of networks from default networks
Variable-length subnet masking (VLSM) is a process used to divide a network into subnets of various sizes to prevent wasting IP addresses. If a Class C network uses 255.255.255.240 as a subnet mask, 16 subnets are available, each with 14 IP addresses
Class B network 130.20.0.0/16. Using a /20 mask produces 16 subnetworks,
The loopback address is a single IP address with a 32-bit mask. In the previous example, network 130.20.75.0/24 could provide 256 loopback addresses for network devices, starting with 130.20.75.0/32 and ending with 130.20.75.255/32.
Global companies divide this address space into continental regions for the Americas, Europe/Middle East, Africa, and Asia/Pacific. An example is shown in Table 1-25, where the address space has been divided into four major blocks:
10.0.0.0 to 10.63.0.0 is reserved.
10.64.0.0 to 10.127.0.0 is for the Americas. 10.128.0.0 to 10.191.0.0 is for Europe, Middle East, and Africa. 10.192.0.0 to 10.254.0.0 is for Asia Pacific.
Subnets to be assign for data, voice, wireless, and management VLANs. Table 1-26 shows an example. The large site is allocated network 10.64.16.0/20. The first four /24 subnets are assigned for data VLANs, the second four /24 subnets are assigned for voice VLANs, and the third four /24 subnets are assigned for wireless VLANs. Other subnets are used for router and switch interfaces, point-to-point links, and network management devices.
When assigning subnets for a site or perhaps a floor of a building, do not assign subnets that are too small. You want to assign subnets that allow for growth
For example, if a floor has a requirement for 50 users, do you assign a /26 subnet (which allows 62 addressable nodes)? Or do you assign a /25 subnet, which allows up to 126 nodes?
Assigning a subnet that is too large will prevent you from having other subnets for IPT and video conferencing.
The company might make an acquisition of another company. Although a new address design would be the cleanest solution, the recommendation is to avoid re-addressing of networks. Here are some other options:
If you use 10.0.0.0/8 as your network, use the other private IP addresses for the additions.
Use NAT as a workaround.
Performing Route Summarization
As a network designer, you will want to allocate IPv4 address space to allow for route summarization. Large networks can grow quickly from 500 routes to 1000 and higher. Route summarization reduces the size of the routing table
Planning for a Hierarchical IP Address Network
When IPv4 addressing for a companywide network, recommended practice dictates that you allocate contiguous address blocks to regions of the network. Hierarchical IPv4 addressing enables summarization, which makes the network easier to manage and troubleshoot.
Network subnets cannot be aggregated because /24 subnets from many different networks are deployed in different areas of the network. For example, subnets under 10.10.0.0/16 are deployed in Asia (10.10.4.0/24), the Americas (10.10.6.0/24), and Europe (10.10.8.0/24). The same occurs with networks 10.70.0.0/16 and 10.128.0.0/16. This lack of summarization in the network increases the size of the routing table, making it less efficient. It also makes it harder for network engineers to troubleshoot because it is not obvious in which part of the world a particular subnet is located.
Network That Is Not Summarized
By contrast, Figure 1-6 shows a network that allocates a high-level block to each region:
10.0.0.0/18 for Asia Pacific networks
10.64.0.0/18 for Americas networks 10.128.0.0/18 for European/Middle East networks
This solution provides for summarization of regional networks at area borders and improves control over the growth of the routing table.
Here are some examples of standards:
Use .1 or .254 (in the last octet) as the default gateway of the subnet.
Match the VLAN ID number with the third octet of an IP address. (For example, the IP subnet 10.10.150.0/25 is assigned to VLAN 150.)
Reserve .1 to .15 of a subnet for static assignments and .16 to .239 for the DHCP pool.
Allocate /24 subnets for user devices (such as laptops and PCs).
Allocate a parallel /24 subset for VoIP devices (IP phones).
Allocate subnets for access control systems and video conferencing systems.
Reserve subnets for future use.
Use /30 subnets for point-to-point links.
Use /32 for loopback addresses.
Allocate subnets for remote access and network management.
Case Study: IP Address Subnet Allocation
Consider a company that has users in several buildings in a campus network. Building A has four floors, and building B has two floors
the building’s Layer 3 switches will be connected via a dual-fiber link between switch A and switch B. Both switches will connect to the WAN router R1. Assume that you have been allocated network 10.10.0.0/17 for this campus and that IP phones will be used.
Notice that the VLAN number matches the third octet of the IP subnet. The second floor is assigned VLAN 12 and IP subnet 10.10.12.0/24. For building B, VLAN numbers in the 20s are used, with floor 1 having a VLAN of 21 assigned with IP subnet 10.10.21.0/24.
VLANs for IP telephony (IPT) are similar to data VLANs, with the correlation of using numbers in the 100s. For example, floor 1 of building A uses VLAN 11 for data and VLAN 111 for voice, and the corresponding IP subnets are 10.10.11.0/24 (data) and 10.10.111.0.24 (voice). This is repeated for all floors.
This solution uses /30 subnets for point-to-point links from the 10.10.2.0/24 subnet. Loopback addresses are taken from the 10.10.1.0/24 network starting with 10.10.1.1/32 for the WAN router. Subnet 10.10.3.0/24 is reserved for the building access control system.
BOOTP and DHCP
The BOOTP server port is UDP port 67. The client port is UDP port 68 DHCP is extension of BOOTP that is why the behavior is exactly same with enhancements in DHCP but BOOTP requires that you build a MAC address–to–IP address table on the server. You must obtain every device’s MAC address, which is a time-consuming effort.
That is DHCP was introduced with “lease” function for any client / mac address DHCP not just provides network address but also delivers configuration parameters to hosts
An IP address is assigned as follows:
Step 1. The client sends a DHCPDISCOVER message to the local network using a 255.255.255.255 broadcast.
Step 2. DHCP relay agents (routers and switches) can forward the DHCPDISCOVER message to the DHCP server in another subnet.
Step 3. The server sends a DHCPOFFER message to respond to the client, offering IP address, lease expiration, and other DHCP option information.
Step 4. Using DHCPREQUEST, the client can request additional options or an extension on its lease of an IP address. This message also confirms that the client is accepting the DHCP offer.
Step 5. The server sends a DHCPACK (acknowledgment) message that confirms the lease and contains all the pertinent IP configuration parameters.
Step 6. If the server is out of addresses or determines that the client request is invalid, it sends a DHCPNAK message to the client.
ARP
When ARP response is received it is cached as well in the ARP table , listing IP addresses with MAC addresses
ARP is a broadcast and ARP request contains the sender’s IP and MAC address and the target IP address. That is why ARP response is unicast
All nodes in the broadcast domain receive the ARP request and process it.
ARP request is always a broadcast and ARP response is always a unicast
Hold means keep holding on to info as long as hold time is not 0, the moment it reaches 0, all things related to that neighbor is dropped and neighbors are also told to withdraw
Although the maximum length of an IPv4 datagram is 65535, most transmission links enforce a smaller maximum packet length limit, called an MTU. The MTU size can even differ from link to link
IPv4 fragmentation breaks a datagram into pieces that are reassembled later on the end station , broken by network devices but assembled later on end device
Some headers in IPv4 header that are of significance are “do not fragment” DF bit, fragment offset fields, along with “more fragments” (MF)
in above figure because DF bit or Do not fragment is not set that is why IP packet was fragmented and not discarded upon the need for fragmentation, determines whether or not a packet is “allowed” to be fragmented.
Identifier is the identifier of the packet, which helps receiver make sure it is assembling the same packet back
offset
The fragment offset is 13 bits and indicates where a fragment belongs in the original IPv4 datagram. This value is a multiple of 8 bytes, like a puzzle where the puzzle fits in the IPv4 packet to make it whole or complete,
The second fragment has an offset of 185 (185 x 8 = 1480); the data portion of this fragment starts 1480 bytes into the original IPv4 datagram,
The third fragment has an offset of 370 (370 x 8 = 2960); the data portion of this fragment starts 2960 bytes into the original IPv4 datagram.
The fourth fragment has an offset of 555 (555 x 8 = 4440), which means that the data portion of this fragment starts 4440 bytes into the original IPv4 datagram.
It is only when the last fragment is received that the size of the original IPv4 datagram can be determined.
Issues with IPv4 Fragmentation
IPv4 fragmentation results in a small increase in CPU and memory overhead to fragment an IPv4 datagram. This is true for the sender and for a router in the path between a sender and a receiver.
The creation of fragments involves the creation of fragment headers and copies the original datagram into the fragments.
Fragmentation causes more overhead for the receiver when reassembling the fragments because the receiver must allocate memory for the arriving fragments and coalesce them back into one datagram after all of the fragments are received.
Reassembly on a host is not considered a problem because the host has the time and memory resources to devote to this task.
Reassembly, however, is inefficient on a router or firewall whose primary job is to forward packets as quickly as possible.
A router is not designed to hold on to packets for any length of time.
A router that does the reassembly chooses the largest buffer available (18K), because it has no way to determine the size of the original IPv4 packet until the last fragment is received.
Another fragmentation issue involves how dropped fragments are handled.
If one fragment of an IPv4 datagram is dropped, then the entire original IPv4 datagram must be present and it is also fragmented.
This is seen with Network File System (NFS). NFS has a read and write block size of 8192.
Therefore, a NFS IPv4/UDP datagram is approximately 8500 bytes (which includes NFS, UDP, and IPv4 headers).
A sending station connected to an Ethernet (MTU 1500) has to fragment the 8500-byte datagram into six (6) pieces; Five (5) 1500 byte fragments and one (1) 1100 byte fragment.
If any of the six fragments are dropped because of a congested link, the complete original datagram has to be retransmitted. This results in six more fragments to be created.
If this link drops one in six packets, then no NFS data are transferred over this link
Firewalls that filter or manipulate packets based on Layer 4 (L4) through Layer 7 (L7) information have trouble processing IPv4 fragments correctly
If the IPv4 fragments are out of order, a firewall blocks the non-initial fragments because they do not carry the information that match the packet filter.
Firewalls nowadays should virtually reassemble packets (which does not actually reassembles packets but only locally in its memory to be able to inspect packet)
PMTUD
TCP MSS addresses fragmentation at the two endpoints of a TCP connection, but it does not handle cases where there is a smaller MTU link in the middle between these two endpoints and UDP traffic.
PMTUD is a mechanism to dynamically determine the true lowest MTU (Maximum Transmission Unit) on the path between a sender and a receiver
If PMTUD is enabled on a host, all TCP and UDP packets from the host have the DF bit set.
so that intermediate routers won’t fragment but if there is a need for fragmentation and network devices drop the packet but still let the sender know that fragmentation is needed
PMTUD Steps
A host sends an IPv4 packet (or a TCP/UDP segment) with the DF bit set.
That packet traverses the network toward its destination. At some point there may be a link with smaller MTU than the packet size.
When a router along the path encounters a packet that it cannot forward without fragmentation (because the packet size > the outgoing link’s MTU) and the packet has the DF bit set, then:
The router drops the packet.
The router sends an ICMP “Destination Unreachable – fragmentation needed and DF set” (Type 3, Code 4) message back to the sender. This ICMP message includes the MTU of the next‐hop link in the “unused” field if the router supports it (per RFC 1191). If intermediate routers don’t support including the MTU in the ICMP message or the host ignores the message, then the path MTU may not be found correctly
The sender receives that ICMP message and then reduces its packet size (or the MSS for TCP) for that destination, using the newly discovered path MTU value.
The host updates its send size and retries with smaller size, now the packet goes through successfully. A host records the MTU value for a destination because it creates a host (/32) entry in its routing table with this MTU value.
Because the path can change for same destination on internetwork, PMTUD is an ongoing process: if things change, new ICMP messages may cause further reductions.
For PMTUD to work properly, the ICMP “fragmentation needed” messages must actually reach the sender. If those ICMP messages are blocked by firewalls, routers, or filtered, PMTUD will fail silently
On Cisco routers the command tunnel path-mtu‐discovery (when applied to the tunnel interface) allows the router to participate in PMTUD for encapsulated traffic, to copy DF bit from inner to outer packet, and to dynamically adjust the tunnel MTU
With Cisco routers and switches we can perform extended ping to determine the biggest size possible through the path
ping
Protocol [ip]:
Target IP address: 172.31.176.164
Repeat count [5]:
Datagram size [100]:
Timeout in seconds [2]:
Extended commands [n]: y
Ingress ping [n]:
Source address or interface:
DSCP Value [0]:
Type of service [0]:
Set DF bit in IP header? [no]: y
Validate reply data? [no]:
Data pattern [0x0000ABCD]:
Loose, Strict, Record, Timestamp, Verbose[none]: V
Loose, Strict, Record, Timestamp, Verbose[V]:
Sweep range of sizes [n]: y
Sweep min size [36]: 1400
Sweep max size [20000]: 1600
Sweep interval [1]:
Type escape sequence to abort.
Sending 1005, [1400..1600]-byte ICMP Echos to 172.31.176.164, timeout is 2 seconds:
Packet sent with the DF bit set
Reply to request 0 (7 ms) (size 1400)
Reply to request 1 (10 ms) (size 1401)
Reply to request 2 (8 ms) (size 1402)
Reply to request 3 (7 ms) (size 1403)
Reply to request 4 (4 ms) (size 1404)
Reply to request 5 (4 ms) (size 1405)
Reply to request 6 (3 ms) (size 1406)
Reply to request 7 (4 ms) (size 1407)
Reply to request 8 (4 ms) (size 1408)
Reply to request 9 (4 ms) (size 1409)
Reply to request 10 (5 ms) (size 1410)
Reply to request 11 (6 ms) (size 1411)
Reply to request 12 (3 ms) (size 1412)
Reply to request 13 (4 ms) (size 1413)
Reply to request 14 (3 ms) (size 1414)
Reply to request 15 (3 ms) (size 1415)
Reply to request 16 (5 ms) (size 1416)
Reply to request 17 (3 ms) (size 1417)
Reply to request 18 (3 ms) (size 1418)
Reply to request 19 (3 ms) (size 1419)
Reply to request 20 (5 ms) (size 1420)
Reply to request 21 (7 ms) (size 1421)
Reply to request 22 (3 ms) (size 1422)
Reply to request 23 (3 ms) (size 1423)
Reply to request 24 (4 ms) (size 1424)
Reply to request 25 (6 ms) (size 1425)
Reply to request 26 (4 ms) (size 1426)
Reply to request 27 (3 ms) (size 1427)
Reply to request 28 (4 ms) (size 1428)
Reply to request 29 (3 ms) (size 1429)
Reply to request 30 (4 ms) (size 1430)
Reply to request 31 (4 ms) (size 1431)
Reply to request 32 (3 ms) (size 1432)
Reply to request 33 (3 ms) (size 1433)
Reply to request 34 (4 ms) (size 1434)
Unreachable from 172.31.203.21, maximum MTU 1434 (size 1435)
Request 36 timed out (size 1436)
Request 37 timed out (size 1437)
Request 38 timed out (size 1438)
Request 39 timed out (size 1439)
Request 40 timed out (size 1440)
Request 41 timed out (size 1441)
Unreachable from 172.31.203.21, maximum MTU 1434 (size 1442)
Request 43 timed out (size 1443)
Unreachable from 172.31.203.21, maximum MTU 1434 (size 1444)
Request 45 timed out (size 1445)
Unreachable from 172.31.203.21, maximum MTU 1434 (size 1446)
Request 47 timed out (size 1447)
Unreachable from 172.31.203.21, maximum MTU 1434 (size 1448)
Request 49 timed out (size 1449)
Unreachable from 172.31.203.21, maximum MTU 1434 (size 1450)
Request 51 timed out (size 1451)
Success rate is 67 percent (35/52), round-trip min/avg/max = 3/4/10 ms
but this is also possible with windows, although windows does not increment automatically
ping 8.8.8.8 -f -l 1500
-f → Sets the DF (Don’t Fragment) bit. -l <size> → Sets the ICMP payload packet size.
If network or firewall in path is not filtering ICMP packets returning from remote device then on CLI and packet capture we should see
Packet needs to be fragmented but DF set.
So, if ping -f -l works at 1472 bytes, then the actual Path MTU is:
IPv6 address is made up of two parts. The first 64 bits usually represent the subnet prefix, and the last 64 bits usually represent the address assigned to interface.
2001:db8:a:a::/64 is subnet or prefix Network interface can have the address 2001:db8:a:a::1 where the last 64 bits, which are ::1 Hosts on this network can have ::10 and ::20 etc and all devices in this network are configured with default gateway 2001:db8:a:a::1
Link-local address fe80::a00:27ff:fe5d:6d6 and the global unicast address 2001:db8:a:a::10 (statically configured). Notice the %11 at the end of the link-local address. This is the interface identification number, and it is needed so that the system knows which interface to send the packets out of; keep in mind that you can have multiple interfaces on the same device with the same link-local address assigned to them.
EUI-64
EUI-64 helps with auto configuring unique IP addresses in IPv6 world because of how big the IPv6 addresses are allows your end devices to automatically assign their own global unicast and link-local addresses
EUI-64 takes the client’s MAC address Splits the 48 bits MAC address in half, and inserts the hex values FFFE in the middle. In addition, it takes the seventh bit from the left and flips it. So, if it is a 1, it becomes a 0, and if it is a 0, it becomes a 1.
Looking at the host bits in address 0a00:27ff:fe5d:06d6 we can see this is an EUI-64 address because it has FFFE in it
For example MAC address is 08-00-27-5D-06-D6 Split it in half and add FFFE in the middle to get 08-00-27-FF-FE-5D-06-D6
08 is hex and in binary it is 000010″0″0. The seventh bit from left is a 0, so make it a 1. Now you have 000010″1″0 – convert to hex it becomes 0a making it 0A00:27FF:FE5D:06D6 in address fe80::a00:27ff:fe5d:6d6
By default, routers use EUI-64 when generating the interface portion of the link-local address of an interface if you want to use EUI-64 for a statically configured global unicast address, use the eui-64 keyword at the end of the ipv6 address
When a Windows PC and router interface are enabled for SLAAC, they send a Router Solicitation (RS) message to the all-routers multicast address (ff02::2) to ask if any routers are on local link. Router then sends a Router Advertisement (RA) that identifies following:
The network prefix(es) used on that link (e.g., 2001:db8:1:1::/64), Flags indicating whether to use SLAAC or DHCPv6, The router’s lifetime as a default gateway, And other configuration details.
The PC uses the prefix from the RA and combines it with its own interface identifier (often based on MAC address or a random value) to form a full IPv6 global unicast address.
RA’s source address (the router’s link-local address, usually starting with fe80::) is used by the host as the next-hop (default gateway).
In IPv6, all routers must have a link-local address on each interface, and hosts use that address as the default gateway.
To verify an IPv6 address generated by SLAAC on a router interface, use the show ipv6 interface command However, note that this occurs only if IPv6 unicast routing was not enabled on the router and, as a result, the router is acting as an end device, that is why next hop router’s link local address is listed as default router.
RA are only generated by default only if 1. Router interface is enabled for IPv6 2. IPv6 unicast routing is enabled 3. RAs are not being suppressed on the interface 4. Make sure that the router interface has a /64 prefix by using the show ipv6 interface command, SLAAC works only if the router is using a /64 prefix
In addition, if you have more than one router on a subnet generating RAs, which can happen with redundant gateways, the clients learn about multiple default gateways from the RAs as shown below
Although a device is able to determine its IPv6 address, prefix, and default gateway using SLAAC, there is not much else the devices can obtain. In a modern network, the devices may also need information such as Network Time Protocol (NTP) server information, domain name information, DNS server information
Use a DHCPv6 server.
Cisco routers and switches can act as DHCPv6 servers, but for their interface to be able to hand out v6 IP addresses using configured pool we must enable interface command “ipv6 dhcp server [pool-name]
If you are troubleshooting an issue where clients are not receiving IPv6 addressing information or where they are receiving wrong IPv6 addressing information from a router or multilayer switch acting as a DHCPv6 server, check the interface and make sure it was associated with the correct pool.
Stateless DHCPv6
Stateless DHCPv6 is a combination of SLAAC and DHCPv6. With stateless DHCPv6, clients use a router’s RA to automatically determine the IPv6 address, prefix, and default gateway. Included in the RA is a flag that tells the client to get other non-addressing information from a DHCPv6 server, such as the address of a DNS server etc
To accomplish this, ensure that the ipv6 nd other-config-flag interface configuration command is enabled This ensures that the RA informs the client that it must contact a DHCPv6 server for other information
DHCPv6 Operation
DHCPv6 has a four-step negotiation process, like IPv4. However, DHCPv6 uses the following messages:
Redundancy requires that we connect second link between switches but that is loop – this is where spanning tree steps in disables one side of the link / interface to remove the loop
One indication of loop is that mac shows up behind different ports which it should not Layer 2 looped frames do not have TTL mechanism so if looped they keep going around and it grinds network equipment to halt
STP works by first making switches aware by sending and receiving BPDUs to one another rather than silence or dark network
STP selects one switch in the network as a root switch and a tree is built from this root switch’s perspective by simply stretching STP network down from that root switch
STP has multiple versions:
802.1D, which is the original specification
Per-VLAN Spanning Tree (PVST)
Per-VLAN Spanning Tree Plus (PVST+)
———————————————
802.1W Rapid Spanning Tree Protocol (RSTP)
802.1S Multiple Spanning Tree Protocol (MST)
Cisco switches can operate in PVST+, RSTP, and MST modes. All three of these modes are backward compatible with 802.1D.
Original version of STP only ensures Loop free topology in one VLAN
802.1D Port States
Disabled: The port is in an administratively off position (that is, shut down).
Blocking: The switch port is enabled but the port is not forwarding any traffic to ensure that a loop is not created. The switch does not modify the MAC address table.
Special: Port can only receive BPDUs
Listening: The switch port has transitioned from a blocking state Port can now send or receive BPDUs. It still cannot forward any other network traffic. The duration of the state correlates to the STP forwarding time.
Special: Port can send and receive BPDUs
Learning: The switch port can add MAC entries in MAC address table from network traffic that it receives. The switch still does not forward any other network traffic besides BPDUs. The duration of the state correlates to the STP forwarding time. The next port state is forwarding.
Special: Port can send and receive BPDU but can also do mac learning on port (learn is in the name)
Forwarding: The switch port can forward all network traffic and can update the MAC address table as expected. This is the final state for a switch port to forward network traffic.
Special: only forwarding actually forwards traffic (forward is in the name)
Broken: The switch has detected a configuration or an operational problem on a port that can have major effects. The port discards packets as long as the problem continues to exist.
If timers are left to defaults 802.1D takes about 30 seconds for a port to transition from Blocking to Forwarding state
802.1D Port Types
Root port (RP): A network port that connects to the root bridge or an upstream switch that leads to root switch in the spanning-tree topology. There should be only one root port per VLAN on a switch.
Designated port (DP): A network port that receives and forwards BPDU frames to other switches. Designated ports provide connectivity to downstream devices and switches or Drives away from root There should be only one active designated port on a link.
Blocking port: A network port that is not forwarding traffic because of STP calculations.
Several key terms are related to STP:
Root bridge: The root bridge has all ports are in a forwarding state and non blocking This switch is considered the top of the spanning tree for all path calculations by other switches. All ports on the root bridge are categorized as designated ports.
Bridge protocol data unit (BPDU): This network packet is used for network switches to identify each other and notify of changes in the topology. A BPDU uses the destination MAC address 01:80:c2:00:00:00. There are two types of BPDUs:
Configuration BPDU: This BPDU is used to identify the root bridge, root ports, designated ports, and blocking ports. The configuration BPDU consists of the following fields: – STP type – root path cost – root bridge identifier – local bridge identifier – max age – hello time – forward delay
Topology change notification (TCN) BPDU: This BPDU is used to communicate changes in the Layer 2 topology to other switches. It is explained in greater detail later in the chapter.
Root path cost: This is the combined cost toward the root switch.
System priority: This 4-bit value indicates the desire for a switch to be root bridge. The default value is 32,768.
System ID extension: This 12-bit value indicates the VLAN (12 bits because VLAN ID is 12 bit) that the BPDU belongs to because BPDU are generated per vlan or BPDU can belong to only one VLAN. The system priority (root making value) and system ID extension (VLAN) are combined as part of the switch’s identification of a bridge
Root bridge identifier: Root bridge’s system MAC address + system ID extension + system priority of the root bridge
Local bridge identifier: System MAC address + system ID extension + system priority of the local bridge.
Max age: This is the maximum length of time that a bridge port stores its BPDU information. The default value is 20 seconds (10x the default hello time) but can be configured with the command spanning-tree vlan vlan-id max-age maxage. If a switch loses contact with the BPDU’s source, switch keeps that the BPDU information on interface till Max Age timer counts down. Max age timer counts down when there is an indirect failure and not the interface down event
Hello time: This is the time interval that a BPDU is advertised out of a port. The default value is 2 seconds, but the value can be configured to 1 to 10 seconds with the command spanning-tree vlan vlan-id hello-time hello-time.
Forward delay: The name is actually Forwarding Delay This is the amount of time that a port stays in a listening and learning state (where it does not forward traffic). The default value is 15 seconds, but the value can be changed to a value of 4 to 30 seconds with the command spanning-tree vlan vlan-id forward-time forward-time.
STP cost is assigned on interface and root path cost is calculated by adding cumulative cost to reach root
Long mode and short mode
Original default costs were set for different speeds upto only 20 Gbps but as networking has advanced 10 Gbps has become common.
Another method, called long mode, uses a 32-bit value and uses a reference speed of 20 Tbps
The original method, known as short mode, has been the default for most switches, but has been transitioning to long mode based on specific platform and OS versions.
Link Speed
Short-Mode STP Cost
Long-Mode STP Cost
10 Mbps
100
2,000,000
100 Mbps
19
200,000
1 Gbps
4
20,000
10 Gbps
2
2000
20 Gbps
1
1000
100 Gbps
1
200
1 Tbps
1
20
10 Tbps
1
2
Devices can be configured with the long-mode interface cost with the command spanning-tree pathcost method long. The entire Layer 2 topology should use the same setting for every device in the environment to ensure a consistent topology. Before you enable this setting in an environment, it is important to conduct an audit to ensure that the setting will work.
1. Elect Root Bridge, starts with I am root
As switch boots it wants to find root bridge, and starts by assuming that it itself is root uses the local bridge identifier as the root bridge identifier listens for BPDUs coming from all the ports for neighbors If the neighbor’s configuration BPDU is inferior to its own BPDU, the switch ignores that BPDU If the neighbor’s configuration BPDU is better than its own BPDU the switch updates its BPDUs to include the new better root bridge + new root path cost. This process continues until all switches in a topology have identified the root bridge switch.
STP favours the switch with lowest priority inside the bridge ID If priority is same then switch with lower system MAC address wins Generally, older switches have a lower MAC address and are considered more preferable but configuration changes in priority should be made for optimal placement of the root bridge
show spanning-tree root to display the root bridge
SW1# show spanning-tree root
Root Hello Max Fwd
Vlan Root ID Cost Time Age Dly Root Port
---------------- -------------------- --------- ----- --- --- ------------
VLAN0001 32769 0062.ec9d.c500 0 2 20 15
VLAN0010 32778 0062.ec9d.c500 0 2 20 15
VLAN0020 32788 0062.ec9d.c500 0 2 20 15
VLAN0099 32867 0062.ec9d.c500 0 2 20 15
this command is like a snapshot or view of root for all VLANs there can be different root switches for some VLANs, it is not mandatory to one root for all VLANs
When a switch generates the BPDUs, the root path cost includes only the calculated metric to the root and does not include the cost of the port that the BPDU is advertised out of
The receiving switch adds the port cost for its interface on which the BPDU was received with the value of the root path cost in the BPDU and that is the value switch thinks to reach the root is
The root path cost is always zero on the root bridge
cost on those links is 4 because of 1 gig links (short mode)
SW2# show spanning-tree root
Root Hello Max Fwd
Vlan Root ID Cost Time Age Dly Root Port
---------------- -------------------- --------- ----- --- --- ------------
VLAN0001 32769 0062.ec9d.c500 4 2 20 15 Gi1/0/1
VLAN0010 32778 0062.ec9d.c500 4 2 20 15 Gi1/0/1
VLAN0020 32788 0062.ec9d.c500 4 2 20 15 Gi1/0/1
VLAN0099 32867 0062.ec9d.c500 4 2 20 15 Gi1/0/1
SW3# show spanning-tree root
Root Hello Max Fwd
Vlan Root ID Cost Time Age Dly Root Port
---------------- -------------------- --------- ----- --- --- ------------
VLAN0001 32769 0062.ec9d.c500 4 2 20 15 Gi1/0/1
VLAN0010 32778 0062.ec9d.c500 4 2 20 15 Gi1/0/1
VLAN0020 32788 0062.ec9d.c500 4 2 20 15 Gi1/0/1
VLAN0099 32867 0062.ec9d.c500 4 2 20 15 Gi1/0/1
Locating Root “Ports”
After the switches have identified the root bridge, they must determine their root port (RP).
Only the root bridge continues to advertise configuration BPDUs out all of its ports. The switch compares the BPDU information received on its port to identify the RP.
The RP is selected using the following logic , only moves to next step when there is a tie This step is interface centric because we are selecting a root “port”
The interface associated to lowest path cost is more preferred.
The interface associated to the lowest system priority of the “advertising switch” is preferred next.
The interface associated to the lowest system MAC address of the advertising switch is preferred next.
When multiple links are associated to the same switch, the lowest port priority from the advertising switch is preferred.
When multiple links are associated to the same switch, the lower port number from the advertising switch is preferred.
Locating Blocked / Designated Switch “Ports”
Root for a VLAN is elected Root ports are elected Now next is Designated ports / blocking ports between 2 non-root switches needs to be decided
one of those switch’s “designated ports” must be set to a blocking state to prevent a forwarding loop
The interface is a designated port and must not be considered an RP.
The switch with the lower path cost to the root bridge forwards packets, and the one with the higher path cost blocks. If they tie, they move on to the next step.
The system priority of the local switch is compared to the system priority of the remote switch. The local port is moved to a blocking state if the remote system priority is lower than that of the local switch. If they tie, they move on to the next step.
The system MAC address of the local switch is compared to the system MAC address of the remote switch. The local designated port is moved to a blocking state if the remote system MAC address is lower than that of the local switch.
When multiple links are associated to the same switch, the lowest port priority from the advertising switch is preferred.
When multiple links are associated to the same switch, the lower port number from the advertising switch is preferred.
SW1# show spanning-tree vlan 1
VLAN0001
Spanning tree enabled protocol rstp
! This section displays the relevant information for the STP root bridge
Root ID Priority 32769
Address 0062.ec9d.c500
This bridge is the root
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
! This section displays the relevant information for the Local STP bridge
Bridge ID Priority 32769 (priority 32768 sys-id-ext 1)
Address 0062.ec9d.c500
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Aging Time 300 sec
Interface Role Sts Cost Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------------------
Gi1/0/2 Desg FWD 4 128.2 P2p
Gi1/0/3 Desg FWD 4 128.3 P2p
Gi1/0/14 Desg FWD 4 128.14 P2p Edge
If the Type field includes *TYPE_Inc -, this indicates a port configuration mismatch between this switch and the switch it is connected to, it is seen when port mode is mixed Access and Trunk between switches
These port types are expected on Catalyst switches:
P2p
P2p is point-to-point link only, i.e.:
The port connects directly to a switch or router device on full-duplex Ethernet link
Why it matters in STP:
STP can converge faster on point-to-point links
Rapid STP (RSTP) can move these ports to forwarding almost immediately when safe
P2p Edge
A point-to-point link
AND an edge port (connected to an end device)
This is essentially PortFast
What STP assumes:
No risk of loops
The device is not a switch
The port can go to Forwarding immediately
Typical devices on P2p Edge ports:
PCs
Servers
Printers
IP phones
Ports that are blocked go in BLK state Alternate port is the alternate port to reach root in an event Gi1/0/1 fails
All the ports on SW2 are in a forwarding state, but port Gi1/0/2 on SW3 is in a blocking (BLK) state. SW3’s Gi1/0/2 port has also been designated as an alternate port to reach the root in the event that the Gi1/0/1 connection fails.
SW3’s Gi1/0/2 port rather than SW2’s Gi1/0/3 port was placed into a blocking state is that SW2’s system MAC address (0081.c4ff.8b00) is lower than SW3’s system MAC address (189c.5d11.9980).
SW2# show spanning-tree vlan 1
VLAN0001
Spanning tree enabled protocol rstp
Root ID Priority 32769
Address 0062.ec9d.c500
Cost 4
Port 1 (GigabitEthernet1/0/1)
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Bridge ID Priority 32769 (priority 32768 sys-id-ext 1)
Address 0081.c4ff.8b00
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Aging Time 300 sec
Interface Role Sts Cost Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------------------
Gi1/0/1 Root FWD 4 128.1 P2p
Gi1/0/3 Desg FWD 4 128.3 P2p
Gi1/0/4 Desg FWD 4 128.4 P2p
SW3# show spanning-tree vlan 1
VLAN0001
Spanning tree enabled protocol rstp
! This section displays the relevant information for the STP root bridge
Root ID Priority 32769
Address 0062.ec9d.c500
Cost 4
Port 1 (GigabitEthernet1/0/1)
Hello Time 2 sec Max Age 20 sec Forward Delay 15 se
! This section displays the relevant information for the Local STP bridge
Bridge ID Priority 32769 (priority 32768 sys-id-ext 1)
Address 189c.5d11.9980
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Aging Time 300 sec
Interface Role Sts Cost Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------------------
Gi1/0/1 Root FWD 4 128.1 P2p
Gi1/0/2 Altn BLK 4 128.2 P2p
Gi1/0/5 Desg FWD 4 128.5 P2p
show spanning-tree interface interface-id [detail] shows STP state for only the specified interface. The detail keyword provides 1. port cost 2. port priority 3. number of transitions 4. link type 5. count of BPDUs sent or received for every VLAN supported on that interface.
show spanning-tree vlan x shows where that vlan spans to on current switch
SW3# show spanning-tree interface gi1/0/1 detail
! Output omitted for brevity
Port 1 (GigabitEthernet1/0/1) of VLAN0001 is root forwarding
Port path cost 4, Port priority 128, Port Identifier 128.1.
Designated root has priority 32769, address 0062.ec9d.c500
Designated bridge has priority 32769, address 0062.ec9d.c500
Designated port id is 128.3, designated path cost 0
Timers: message age 16, forward delay 0, hold 0
Number of transitions to forwarding state: 1
Link type is point-to-point by default
BPDU: sent 15, received 45908
Port 1 (GigabitEthernet1/0/1) of VLAN0010 is root forwarding
Port path cost 4, Port priority 128, Port Identifier 128.1.
Designated root has priority 32778, address 0062.ec9d.c500
Designated bridge has priority 32778, address 0062.ec9d.c500
Designated port id is 128.3, designated path cost 0
Timers: message age 15, forward delay 0, hold 0
Number of transitions to forwarding state: 1
Link type is point-to-point by default
MAC BPDU: sent 15, received 22957
..
STP Topology Changes
Configuration BPDUs always flow from the root bridge toward the edge switches However, changes in the topology (for example, switch failure, link failure, or links becoming active) have an impact on “all” the switches in the Layer 2 topology.
The switch that detects a fault sends a topology change notification (TCN) BPDU toward the root bridge, out its RP. If an upstream switch receives the TCN, it sends out an acknowledgment and forwards the TCN out its RP to the root bridge.
By default, a switch ages out MAC entries after 300 seconds (5 minutes) When STP detects a topology change (link up/down, port role change): The switch temporarily reduces the MAC aging time
Upon receipt of the TCN, the root bridge creates a new configuration BPDU with the Topology Change flag set, and it is then flooded to all the switches. When a switch receives a configuration BPDU with the Topology Change flag set, all switches change their MAC address timer to the forwarding delay timer (with a default of 15 seconds). This flushes out MAC addresses for devices that have not communicated in that 15-second window but maintains MAC addresses for devices that are actively communicating.
However, a side effect of flushing the MAC address table is that it temporarily increases the unknown unicast flooding while it is rebuilt. Remember that this can impact hosts because of their CSMA/CD behavior. The MAC address timer is then reset to normal (300 seconds) after the 2 configuration BPDU are seen “I’ve now seen two consecutive consistent BPDUs — the topology is stable again.”
Because these TCNs are generated on per VLAN basis, as a side effect that VLAN’s mac table mac entry retainer time will be reduced creating rebroadcasting of unknown unicast for MAC address relearning by the switch on that VLAN. As the number of hosts (without portfast) increases, the more likely TCN generation is to occur and the more hosts that are impacted by the broadcasts. Topology changes should be checked as part of the troubleshooting process. Portfast stops generation of TCN and reduce the generation of TCNs.
Topology changes are seen with the command show spanning-tree [vlan vlan-id] detail on a switch. The output of this command shows the topology change count and time since the last change has occurred.
A sudden or continuous increase in TCNs indicates a potential problem and should be investigated further for flapping ports or events on a connected switch.
SW1# show spanning-tree vlan 10 detail
VLAN0010 is executing the rstp compatible Spanning Tree protocol
Bridge Identifier has priority 32768, sysid 10, address 0062.ec9d.c500
Configured hello time 2, max age 20, forward delay 15, transmit hold-count 6
We are the root of the spanning tree
Topology change flag not set, detected flag not set
Number of topology changes 42 last change occurred 01:02:09 ago
from GigabitEthernet1/0/2
Times: hold 1, topology change 35, notification 2
hello 2, max age 20, forward delay 15
Timers: hello 0, topology change 0, notification 0, aging 300
The process of determining why TCNs are occurring involves finding a port that is flapping and it does not have portfast enabled, if it is connected to another switch then trace port on another switch but in same VLAN
Direct Link Failures of blocking segment- traffic impact
When a port goes down STP process is aware of that “direct link” failure
In below scenario link between SW2 and SW3 goes down SW2 Gi1/0/3 is DP and SW3 Gi1/0/2 Blocking This link going down will not impact traffic as both switches transmit traffic through SW1 and because of this direct link blocking between SW2 and SW3, SW2 learns all the MAC addresses behind SW3 via SW1 and SW3 learns all the MAC addresses behind SW2 via SW1
Blocked ports cannot send data and do not receive Data, also do not send BPDU but can receive BPDU only switches also do not learn MAC on blocked ports
but designated port can send and receive data but in this case SW2’s Designated port will never forward out of Gi1/0/3 because no MAC has been learned through that port so even though designated port can send data, it will never send it because traffic outflow is dictated by MAC address learning
Dont forget about TCN generated from P2p port going down, both SW2 and SW3 will advertise a TCN toward the root switch, which results in the Layer 2 topology flushing its MAC address table.
Direct Link Failures – Loss of root – traffic impact 30 seconds for 802.1D
In the second scenario, the link between SW1 and SW3 fails. Network traffic to and from SW1 to SW3 and Network traffic to and from SW2 -> SW1 -> SW3 and SW3 -> SW1 -> SW2 will be affected because of blocking segment between SW2 and SW3, all traffic between SW2 and SW3 goes via SW1 but because link between SW1 and SW3 is down , Layer 2 network will have to reconverge with the help of STP
– SW1 detects a link failure on its Gi1/0/3 interface. – SW3 detects a link failure on its Gi1/0/1 interface and SW3 does not use max age timer on its Gi1/0/1
1. TCNs from all switches to root but no way to send in this scenario so switch will wait: – Normally, SW1 would generate a TCN flag out its root port, but it itself is a root bridge, so it does not. SW1 will wait for a TCN from non root switches – At this point, SW3 would attempt to send a TCN toward the root switch to notify it of a topology change; however, its root port is down, and its only other port that is connected to this layer 2 network is in blocking mode , so SW3 will wait for this port to come out of blocking mode but it will still send TCN once the port is out of blocking mode
2. Affected interfaces remove their best BPDU (root / root port) and activate alternative port as BPDUs from root are still coming in another (blocking) port: – SW3 removes its best BPDU (was root port as best only comes on root port) without waiting for max age timer on its Gi1/0/1 interface because it is now in a down state. – SW2 was always receiving BPDU from SW1 and relaying it to SW3 – because root port was lost SW3 must look for a new root port – SW3 never lost access to root as it was receiving BPDUs on its Gi1/0/2 in Blocked state – because BPDU are coming on blocking port Gi1/0/2 of SW3, and SW3 detects that this root is reachable over Gi1/0/2 Blocking port so it transitions to listening and then learning
3. TCN can now reach root – once SW3 bring its port Gi1/0/2 to forwarding state then TCN is dispatched towards root from Gi1/0/2 – SW1 advertises a configuration BPDU with the Topology Change flag out of all its ports. It keeps TC set for the topology change period (commonly Max Age + Forward Delay = 35s by default). – This BPDU is received and relayed to all switches in the environment , SW2 receives it and relays it to SW3
4. Non root switches reduce their MAC address age timer to forward delay – These switches then reduce the MAC address age timer to the forward delay timer to flush out older MAC entries. – If other switches were connected to SW1, they would receive a configuration BPDU with the Topology Change flag set also for all the VLANs on trunk port. These packets have an impact for all switches in the same Layer 2 domain.
The total convergence time for SW3 is 30 seconds: 15 seconds for the listening state and 15 seconds for the learning state before SW3’s Gi1/0/2 can be made the RP.
Direct Link Failure Scenario 3
In the third scenario, the link between SW1 and SW2 fails
Network traffic from SW1 or SW3 toward SW2 is impacted because SW3’s Gi1/0/2 port is in a blocking state.
SW1 detects a link failure on its Gi1/0/2 interface. SW2 detects a link failure on its Gi1/0/1 interface and SW3 does not use max age timer on its Gi1/0/1
1. TCNs from all switches to root but no way to send in this scenario so switch will wait:
– Normally SW1 would generate a TCN flag out its root port, but it is the root bridge, so it does not as root does not do that. SW1 would advertise a TCN if it were not the root bridge. – At this point, SW2 would attempt to send send TCN towards the root switch to notify it of a topology change however its root port is down and unable to do as its RP port is down so it will wait for path to root to resolve and then send TCN
2. Affected interfaces remove their best BPDU and best BPDU (root) via different interface as BPDU are not coming on Desgnated port due to adjacent port is blocking:
– SW2 removes its best BPDU (was root port as best only comes on root port) without waiting for max age timer on its Gi1/0/1 interface because it is now in a down state. – because root port was lost SW2 must look for a new root port – but because the local port facing SW3 is Designated port and port on SW3 is blocking as blocking port does not send BPDUs but only receives BPDU, visibility or path to root is lost
3. Declaring itself root because of remote blocking port and then receiving and loosing root election – SW2 will declare itself root and generate its own BPDU and send it to SW3 – SW3 receives SW2’s inferior BPDUs and discards them as it is still receiving superior BPDUs from SW1 – Because this BPDU from SW2 was not accepted this leads to expiry of max age timer on Gi1/0/2 of SW3 and transitions from blocking to listening state. SW3 can now forward the next configuration BPDU it receives from SW1 to SW2. – SW2 receives SW1’s configuration BPDU via SW3 and recognizes it as superior. It marks its Gi1/0/3 interface as the root port and transitions it to the listening state.
4. TCN can now reach root – once SW2 bring its port Gi1/0/2 to forwarding state then TCN is dispatched towards root from Gi1/0/2 – SW1 advertises a configuration BPDU with the Topology Change flag out of all its ports. It keeps TC set for the topology change period (commonly Max Age + Forward Delay = 35s by default). – This BPDU is received and relayed to all switches in the environment , SW3 receives it and relays it to SW2
5. Non root switches reduce their MAC address age timer to forward delay – These switches then reduce the MAC address age timer to the forward delay timer to flush out older MAC entries. – If other switches were connected to SW1, they would receive a configuration BPDU with the Topology Change flag set also for all the VLANs on trunk port. These packets have an impact for all switches in the same Layer 2 domain.
The total convergence time for SW2 is 50 seconds: 20 seconds for the Max Age timer on SW3, 15 seconds for the listening state on SW2, and 15 seconds for the learning state.
Indirect Failures
In some scenarios involving signalling over WAN, switch do not see direct interface failures, but WAN signalling is not present while the interface is up and this is where hello and max age timer comes in
– An event occurs that impairs or corrupts data on the link. SW1 and SW3 still report a link up condition. – SW3 stops receiving configuration BPDUs on its RP, SW3’s max age timer expires and removes the best BPDU after max age expiry – because SW3 lost path to root it will have to find the path to root through another best path (lowest cost to root) and that is next port that is Gi1/0/2 in blocking port – SW3 transitions Gi1/0/2 from blocking to listening state – SW2 continues to advertise SW1’s configuration BPDUs toward SW3 – SW3 receives SW1’s configuration BPDU via SW2 on its Gi1/0/2 interface. This port is now marked as the RP
The total time for reconvergence on SW3 is 50 seconds: 20 seconds for the Max Age timer on SW3, 15 seconds for the listening state on SW3, and 15 seconds for the learning state on SW3.
Rapid Spanning Tree Protocol
Although 802.1D did a decent job of preventing Layer 2 forwarding loops, it was not designed to support multiple VLANs, also for traffic engineering requirements such as blocking one link for half vlans and blocking another link for other half of vlans for load balancing and equally utilising both uplinks
Cisco also created other versions like PVST and PVST+ which were Cisco proprietary
but standard versions that are compatible with other vendors such as RSTP and MST should be used in production
RSTP (802.1W) Port States
RSTP reduces the number of port states to three:
Discarding: Blocking, This state combines the traditional STP states disabled, blocking, and listening.
Learning: The switch port modifies the MAC address table with any network traffic it receives. The switch still does not forward any other network traffic besides BPDUs.
Forwarding: The switch port forwards all network traffic and updates the MAC address table as expected. This is the final state for a switch port to forward network traffic.
RSTP relies on handshake with a switch connected on the other end, If a handshake does not occur, the other device is assumed to be non-RSTP compatible and for backwards compatibility the port defaults to regular 802.1D behavior
RSTP (802.1W) Port Roles
RSTP defines the following port roles:
Root port (RP): A network port that connects to the root switch or an upstream switch in the spanning-tree topology. There should be only one root port per VLAN on a switch.
Designated port (DP): A network port that receives and forwards frames to other switches. Designated ports provide connectivity to downstream devices and switches. There should be only one active designated port on a link. Designated port drives packets away from root
Alternate port: A network port that provides alternate connectivity toward the root switch “through a different switch”. It does not forward traffic, So if the main (active) path to the root switch fails, the alternate port can take over.
Backup port: These are very rare because this port is only seen when a switch connects with 2 links into hub or shared segment , a backup port is kept blocked to prevent loops, one link going to hub becomes Designated port and second link becomes backup port (blocks traffic)
RSTP (802.1W) Port Types
RSTP defines three types of ports that are used for building the STP topology:
Edge port: A port at the edge of the network where hosts connect to the Layer 2 topology with one interface and “cannot form a loop”. These ports directly correlate to ports that have the STP portfast feature enabled.
Non-Edge port: A port that has received a BPDU.
Point-to-point port: Any port that connects to another RSTP switch with full duplex. “Full-duplex links do not permit more than two devices on a network segment, so determining whether a link is full duplex is the fastest way to check the feasibility of being connected to a switch”.
Multi-access Layer 2 devices such as hubs can connect only at half duplex. If a port can connect only via half duplex, it must operate under traditional 802.1D forwarding states.
Building the RSTP Topology
With RSTP, switches exchange handshakes with other RSTP switches to transition through the following STP states and it is faster this way
When two switches first connect, they establish a bidirectional handshake across the shared link to identify the root bridge.
This is straightforward for an environment with only two switches; however, large environments require greater logic
RSTP uses a synchronization process to add a switch to the RSTP topology, The synchronization process starts when two switches (such as SW1 and SW2) are first connected. The process proceeds as follows:
– As the first two switches connect to each other, they verify that they are connected with a point-to-point link by checking the full-duplex status. – They establish a handshake with each other to advertise a proposal (in configuration BPDUs) that their interface should be the DP for that segment. – There can be only one DP per segment, so each switch identifies whether it is the superior or inferior switch, using the same logic as in 802.1D for the system identifier (that is, the lowest priority and then the lowest MAC address). Using the MAC addresses from figure, SW1 (0062.ec9d.c500) is the superior switch to SW2 (0081.c4ff.8b00).
– The inferior switch (SW2) recognizes that it is inferior and marks its local port (Gi1/0/1) as the RP. At that same time, it moves all non-edge ports to a discarding state. At this point in time, the switch has stopped all local switching for non-edge ports. – The inferior switch (SW2) sends an agreement (configuration BPDU) to the root bridge (SW1), which signifies to the root bridge that synchronization is occurring on that switch. – The inferior switch (SW2) moves its RP (Gi1/0/1) to a forwarding state. The superior switch moves its DP (Gi1/0/2) to a forwarding state too. – The inferior switch (SW2) repeats the process for any downstream switches connected to it.
RSTP Convergence
The RSTP convergence process can occur quickly. RSTP ages out the port information after it has not received hellos in three consecutive cycles. Using default timers, the Max Age would take 20 seconds, but RSTP requires only 6 seconds. And thanks to the new synchronization, ports can transition from discarding to forwarding in an extremely low amount of time.
If a downstream switch fails to acknowledge the proposal, the RSTP switch must default to 802.1D behaviors to prevent a forwarding loop.
STP Topology Tuning
A properly designed network places the root bridge on a specific switch and influences which ports should be designated ports (forwarding state) and which ports should be alternate ports (that is, discarding state) based on hardware platform and topology.
Ideally, the root bridge is placed on a core switch, and a “secondary” root bridge is designated. Root bridge placement is accomplished by “lowering” the system priority on the root bridge to the lowest value possible, raising the secondary root bridge to a value slightly higher than that of the root bridge, and (ideally) increasing the system priority on all other switches unless you plan to keep switches on default priority. By increasing non root switch priority and lowering switch priority for root and secondary root switches, it is made sure that when a new non-configured switch is connected to topology, it does not take over as root. The priority is set with either of the following commands:
spanning-tree vlanvlan-idprioritypriority: The priority is a value between 0 and 61,440, in increments of 4096.
spanning-tree vlanvlan-idroot {primary | secondary} [diameterdiameter]: This command executes a script that sets the priority numerically, along with the potential for timers if the diameter keyword is used. The primary keyword sets the priority to 24,576, and the secondary keyword sets the priority to 28,672.
If a different switch has a priority of 24,576 (or lower) and is more preferred when the command spanning-tree vlanvlan-idroot {primary | secondary} is executed, the script has logic to lower the priority to a lower value in an attempt to make it the root bridge, this is possible because current root is in BPDU and along with that system ID or name contains system priority value and system mac address
The optional diameter command makes it possible to tune the Spanning Tree Protocol (STP) convergence and modifies the timers; it should reference the maximum number of Layer 2 hops between a switch that is maximum hops away and the root bridge. The timers do not need to be modified on other switches because they are carried throughout the topology through the root bridge’s bridge protocol data units (BPDUs) as you only configure timers in one place, you only change timers on root bridge
All the other switches automatically learn those timer values, because the root bridge advertises them inside its BPDUs, which are sent throughout the Layer 2 network. So there’s no need to manually configure timers on every switch. When other switches receive the root’s BPDUs: – They propagate those same values further downstream – They adopt the root’s timer values
The root bridge generates the “authoritative” BPDUs
These BPDUs include:
Hello time
Max age
Forward delay (used for learning state)
! Verification of SW1 Priority before modifying the priority
SW1# show spanning-tree vlan 1
VLAN0001
Spanning tree enabled protocol rstp
Root ID Priority 32769
Address 0062.ec9d.c500
This bridge is the root
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Bridge ID Priority 32769 (priority 32768 sys-id-ext 1)
Address 0062.ec9d.c500
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Aging Time 300 sec
! Configuring the SW1 priority as primary root for VLAN 1
SW1(config)# spanning-tree vlan 1 root primary
! Verification of SW1 Priority after modifying the priority
SW1# show spanning-tree vlan 1
VLAN0001
Spanning tree enabled protocol rstp
Root ID Priority 24577 <<<
Address 0062.ec9d.c500
This bridge is the root
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Bridge ID Priority 24577 (priority 24576 sys-id-ext 1) <<<
Address 0062.ec9d.c500
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Aging Time 300 sec
Interface Role Sts Cost Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------------------
Gi1/0/2 Desg FWD 4 128.2 P2p
Gi1/0/3 Desg FWD 4 128.3 P2p
Gi1/0/14 Desg FWD 4 128.14 P2p
! Configuring the SW2 priority as secondary root for VLAN 1
SW2(config)# spanning-tree vlan 1 root secondary
SW2# show spanning-tree vlan 1
VLAN0001
Spanning tree enabled protocol rstp
Root ID Priority 24577 <<<
Address 0062.ec9d.c500
Cost 4
Port 1 (GigabitEthernet1/0/1)
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Bridge ID Priority 28673 (priority 28672 sys-id-ext 1) <<<
Address 0081.c4ff.8b00
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Aging Time 300 sec
Interface Role Sts Cost Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------------------
Gi1/0/1 Root FWD 4 128.1 P2p
Gi1/0/3 Desg FWD 4 128.3 P2p
Gi1/0/4 Desg FWD 4 128.4 P2p
The best way to prevent erroneous devices from taking over the STP root role is to set the priority to 0 for the primary root switch and to 4096 for the secondary root switch. “In addition, root guard should be used”
Modifying STP Root Port and Blocked Switch Port Locations
Cost calculation method forces how we implement cost on interface, The receiving switch adds the port cost for the interface on which the BPDU was received in conjunction with the value of the root path cost in the BPDU.
SW1 advertises its BPDUs to SW3 with a root path cost of 0. SW3 receives the BPDU and adds its STP port cost of 4 to the root path cost in the BPDU (0), resulting in a value of 4. SW3 then advertises the BPDU toward SW5 with a root path cost of 4, to which SW5 then adds its STP port cost of 4. SW5 therefore reports a root path cost of 8 to reach the root bridge via SW3.
SW1# show spanning-tree vlan 1
! Output omitted for brevity
VLAN0001
Root ID Priority 32769
Address 0062.ec9d.c500
This bridge is the root
..
Interface Role Sts Cost Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------------------
Gi1/0/2 Desg FWD 4 128.2 P2p
Gi1/0/3 Desg FWD 4 128.3 P2p
SW3# show spanning-tree vlan 1
! Output omitted for brevity
VLAN0001
Root ID Priority 32769
Address 0062.ec9d.c500
Cost 4
Port 1 (GigabitEthernet1/0/1)
..
Interface Role Sts Cost Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------------------
Gi1/0/1 Root FWD 4 128.1 P2p
Gi1/0/2 Altn BLK 4 128.2 P2p
Gi1/0/5 Desg FWD 4 128.5 P2p
SW5# show spanning-tree vlan 1
! Output omitted for brevity
VLAN0001
Root ID Priority 32769
Address 0062.ec9d.c500
Cost 8
Port 3 (GigabitEthernet1/0/3)
..
Interface Role Sts Cost Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------------------
Gi1/0/3 Root FWD 4 128.3 P2p
Gi1/0/4 Altn BLK 4 128.4 P2p
Gi1/0/5 Altn BLK 4 128.5 P2p
You can lower a path that is currently an alternate port while making it designated, or you can raise the cost on a port that is designated to turn it into a blocking port The spanning-tree command modifies the cost for all VLANs unless the optional vlan keyword is used to specify a VLAN
SW3# conf t
SW3(config)# interface gi1/0/1
SW3(config-if)# spanning-tree cost 1
SW3# show spanning-tree vlan 1
! Output omitted for brevity
VLAN0001
Root ID Priority 32769
Address 0062.ec9d.c500
Cost 1
Port 1 (GigabitEthernet1/0/1)
Bridge ID Priority 32769 (priority 32768 sys-id-ext 1)
Address 189c.5d11.9980
..
Interface Role Sts Cost Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------------------
Gi1/0/1 Root FWD 1 128.1 P2p
Gi1/0/2 Desg FWD 4 128.2 P2p
Gi1/0/5 Desg FWD 4 128.5 P2p
SW2# show spanning-tree vlan 1
! Output omitted for brevity
VLAN0001
Root ID Priority 32769
Address 0062.ec9d.c500
Cost 4
Port 1 (GigabitEthernet1/0/1)
Bridge ID Priority 32769 (priority 32768 sys-id-ext 1)
Address 0081.c4ff.8b00
..
Interface Role Sts Cost Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------------------
Gi1/0/1 Root FWD 4 128.1 P2p
Gi1/0/3 Altn BLK 4 128.3 P2p
Gi1/0/4 Desg FWD 4 128.4 P2p
Modifying STP Port Priority
STP port priority impacts which port is an alternate port when multiple links are used between same switches. Remember that system ID and port cost are the same, so the next check is port priority, followed by the port number. “Both the port priority and port number are controlled by the upstream switch”, because it is closer to the root bridge.
You can modify the port priority on SW4’s Gi1/0/6 (toward SW5’s Gi1/0/5 interface) with the command spanning-tree [vlan vlan-id] port-prioritypriority. The optional vlan keyword allows you to change the priority on a VLAN-by-VLAN basis
SW4# configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
SW4(config)# interface gi1/0/6
SW4(config-if)# spanning-tree port-priority 64
Additional STP Protection Mechanisms
The following scenarios are common for Layer 2 forwarding loops:
STP disabled on a switch
A misconfigured load balancer that transmits traffic out multiple ports with the same MAC address
A misconfigured virtual switch that bridges two physical ports (Virtual switches typically do not participate in STP.)
End users using a dumb network switch or hub
Catalyst switches detect a MAC address that is flapping between interfaces and notify via syslog with the MAC address of the host, VLAN, where MAC is flapping
12:40:30.044: %SW_MATM-4-MACFLAP_NOTIF: Host 70df.2f22.b8c7 in vlan 1 is flapping
between port Gi1/0/3 and port Gi1/0/2
Root Guard
Root Guard prevents a configured port from becoming a “root port” it “is configured on designated port” facing switches that should never become root Root guard prevents a downstream switch (often misconfigured or rogue) from becoming a root bridge in a topology Root guard places a port in a root inconsistent state for interfaces or vlan that receives a “superior BPDU” when root guard is configured Interfaces in root inconsistent state cannot forward traffic out of this port, root guard does not block port permanently but it only blocks when superior BPDU are received
“I received a superior BPDU on this port, but I’m not allowed to accept it as the root path.” Prevents an unauthorized or misconfigured switch from becoming the root bridge
How it recovers
Once the superior BPDU stops, the port: – Automatically leaves root inconsistent – Returns to normal forwarding (no manual reset needed)
! configure on designated port that is facing "down stream"
spanning-tree guard root
root guard should be configured on SW2’s Gi1/0/4 port toward SW4 root guard should be configured on SW3’s Gi1/0/5 port toward SW5 this configuration prevents SW4 and SW5 from becoming root but still allows SW2 to maintain connectivity to SW1 via SW3 if link between SW2 and SW1 goes down but if link between SW2 and SW3 also goes down then it will not work even if alternate path via SW4 exists, it will not work
Root Guard protects you from an “unexpected root” on that port, but the trade-off is that it can also kill an otherwise-valid backup path.
STP Portfast
Portfast as name suggests brings port up faster by skipping learning (listening also if not RSTP) Portfast also stops generation of TCN when port goes down Portfast is configured on host , access ports only Portfast allows traffic forwarding immediately, this is useful for DHCP and PXE boot ports
If BPDU is received on portfast enabled port then portfast “functionality” is removed from port and it progressed through learning (and listening if not RSTP) states
SW1# show spanning-tree interface gi1/0/13 detail
Port 13 (GigabitEthernet1/0/13) of VLAN0010 is designated forwarding
Port path cost 4, Port priority 128, Port Identifier 128.13.
Designated root has priority 32778, address 0062.ec9d.c500
Designated bridge has priority 32778, address 0062.ec9d.c500
Designated port id is 128.13, designated path cost 0
Timers: message age 0, forward delay 0, hold 0
Number of transitions to forwarding state: 1
The port is in the portfast mode <<<
Link type is point-to-point by default
BPDU: sent 23103, received 0
SW2# conf t
Enter configuration commands, one per line. End with CNTL/Z.
SW2(config)# spanning-tree portfast default
%Warning: this command enables portfast by default on all interfaces. You
should now disable portfast explicitly on switched ports leading to hubs,
switches and bridges as they may create temporary bridging loops.
SW2(config)# interface gi1/0/8
SW2(config-if)# spanning-tree portfast disable
BPDU Guard
Remember that Guard is placed outside to stop things coming in, not going out so remember that BPDU Guard is always to stop from receiving or entering of BPDU
BPDU guard is a safety mechanism that places ports configured with STP portfast into an ErrDisabled state upon receipt of a BPDU Err-disabled port is “disabled” or in shutdown like state
This ensures that loop cannot be accidentally created if a switch is connected because just configuring portfast is not enough, switche removes portfast functionality from port as BPDU is received on port even though it shows in configuration, you have to look at the show spanning-tree interface detail command to see it
BPDU guard is typically configured with all host-facing ports that are enabled with portfast.
! BPDU guard is enabled globally on all STP portfast ports
spanning-tree portfast bpduguard default
! but can be disabled on specific port if enabled globally
spanning-tree bpduguard disable
! enabling on a single port
spanning-tree bpduguard enable
SW1# configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
SW1(config)# spanning-tree portfast bpduguard default
SW1(config)# interface gi1/0/8
SW1(config-if)# spanning-tree bpduguard disable
SW1# show spanning-tree interface gi1/0/7 detail
Port 7 (GigabitEthernet1/0/7) of VLAN0010 is designated forwarding
Port path cost 4, Port priority 128, Port Identifier 128.7.
Designated root has priority 32778, address 0062.ec9d.c500
Designated bridge has priority 32778, address 0062.ec9d.c500
Designated port id is 128.7, designated path cost 0
Timers: message age 0, forward delay 0, hold 0
Number of transitions to forwarding state: 1
The port is in the portfast mode
Link type is point-to-point by default
Bpdu guard is enabled by default <<<
BPDU: sent 23386, received 0
SW1# show spanning-tree interface gi1/0/8 detail
Port 8 (GigabitEthernet1/0/8) of VLAN0010 is designated forwarding
Port path cost 4, Port priority 128, Port Identifier 128.8.
Designated root has priority 32778, address 0062.ec9d.c500
Designated bridge has priority 32778, address 0062.ec9d.c500
Designated port id is 128.8, designated path cost 0
Timers: message age 0, forward delay 0, hold 0
Number of transitions to forwarding state: 1
The port is in the portfast mode by default
Link type is point-to-point by default
BPDU: sent 23388, received 0
syslog messages are generated when a BPDU is received on a BPDU guard–enabled port. The port is then placed into an ErrDisabled state, as shown with the command show interfaces status
12:47:02.069: %SPANTREE-2-BLOCK_BPDUGUARD: Received BPDU on port GigabitEthernet1/0/2 with BPDU Guard enabled. Disabling port.
12:47:02.076: %PM-4-ERR_DISABLE: bpduguard error detected on Gi1/0/2, putting Gi1/0/2 in err-disable state
12:47:03.079: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/2, changed state to down
12:47:04.082: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/2, changed state to down
SW1# show interfaces status
Port Name Status Vlan Duplex Speed Type
Gi1/0/1 notconnect 1 auto auto 10/100/1000BaseTX
Gi1/0/2 SW2 Gi1/0/1 err-disabled 1 auto auto 10/100/1000BaseTX <<<
Gi1/0/3 SW3 Gi1/0/1 connected trunk a-full a-1000 10/100/1000BaseTX
By default, ports that are put in the ErrDisabled state because of BPDU guard do not automatically restore themselves, reason is for administrators to be notified of a switch connecting to an access port that is only meant to connect hosts
But Error Recovery service can be used to reactivate ports that are shut down for a specific problem reducing manual work using command errdisable recovery cause bpduguard and interval can be configured using errdisable recovery intervaltime-seconds , this time controls how long a port stays in err state before it is shut and unshut to bring it up by switch itself
SW1# configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
SW1(config)# errdisable recovery cause bpduguard
SW1# show errdisable recovery
! Output omitted for brevity
ErrDisable Reason Timer Status
----------------- --------------
arp-inspection Disabled
bpduguard Enabled
..
Recovery command: "clear Disabled
Timer interval: 300 seconds
Interfaces that will be enabled at the next timeout:
Interface Errdisable reason Time left(sec)
--------- ----------------- --------------
Gi1/0/2 bpduguard 295
! Syslog output from BPDU recovery. The port will be recovered, and then
! triggered again because the port is still receiving BPDUs.
SW1#
01:02:08.122: %PM-4-ERR_RECOVER: Attempting to recover from bpduguard err-disable
state on Gi1/0/2
01:02:10.699: %SPANTREE-2-BLOCK_BPDUGUARD: Received BPDU on port Gigabit
Ethernet1/0/2 with BPDU Guard enabled. Disabling port.
01:02:10.699: %PM-4-ERR_DISABLE: bpduguard error detected on Gi1/0/2, putting
Gi1/0/2 in err-disable state
Error Recovery service operates every 300 seconds (5 minutes). This can be changed to a value of 30 to 86,400 seconds with the global configuration command errdisable recovery intervaltime.
BPDU Filter
BPDU Filter is something that stops sending and receiving of BPDUs
BPDU filter blocks BPDUs from being transmitted out a port. BPDU filter means Don’t participate in STP on this port. BPDU filter can be enabled globally or on a specific interface. The global BPDU filter configuration uses the command spanning-tree portfast bpdufilter default. The interface-specific BPDU filter is enabled with the interface configuration command spanning-tree bpdufilter enable.
If BPDU filter is enabled on a portfast enabled port, the behavior changes depending on the configuration:
If BPDU filter is enabled globally using command spanning-tree portfast bpdufilter default
Cisco does not blindly stop sending BPDUs forever on all interfaces Instead, it does a “safety probe.” , The port initially sends ~10–12 BPDUs to ask “Is there another switch out there?”
If no BPDU is received back
The port assumes it’s an end device
BPDU filtering kicks in
STP is effectively disabled on that port
—————————————
If a BPDU is received
switch thinks there is another switch connected
STP logic turns back on for that port
Now because there is a switch connected and a BPDU is received
Switch must decide which switch is superior:
to decide which port will be designated and which port will be blocking on that segment
Global BPDU filter is “safe-ish”:
It allows PortFast convenience
But auto-recovers STP if a switch is accidentally plugged in
Enabling interface level BPDU filter is dangerous unless you know the topology and you know what you are doing interface gi1/0/1 spanning-tree bpdufilter enable – No safety check – No listening – STP is completely disabled, no sending of BPDUs and no receiving of BPDUs – Easy way to create a loop
Be careful with the deployment of BPDU filter because it could cause problems. Most network designs do not require BPDU filter, which adds an unnecessary level of complexity and also introduces risk.
after BPDU filter is enabled on the Gi1/0/2 interface prohibiting any BPDUs from being sent or received
! SW1 was enabled with BPDU filter only on port Gi1/0/2
SW1# show spanning-tree interface gi1/0/2 detail | in BPDU|Bpdu|Ethernet
Port 2 (GigabitEthernet1/0/2) of VLAN0001 is designated forwarding
Bpdu filter is enabled
BPDU: sent 113, received 84 <<<
SW1# show spanning-tree interface gi1/0/2 detail | in BPDU|Bpdu|Ethernet
Port 2 (GigabitEthernet1/0/2) of VLAN0001 is designated forwarding
Bpdu filter is enabled
BPDU: sent 113, received 84 <<< same
! SW2 was enabled with BPDU filter globally
SW2# show spanning-tree interface gi1/0/2 detail | in BPDU|Bpdu|Ethernet
Port 1 (GigabitEthernet1/0/2) of VLAN0001 is designated forwarding
BPDU: sent 56, received 5
SW2# show spanning-tree interface gi1/0/2 detail | in BPDU|Bpdu|Ethernet
Port 1 (GigabitEthernet1/0/2) of VLAN0001 is designated forwarding
BPDU: sent 58, received 5 <<< probes sent
Problems with Unidirectional Links
Fiber-optic cables consist of strands of glass/plastic with one strand that transmits and one strand that receives and order is opposite on remote side. Networks that rely on fibre optics can sometimes encounter unidirectional traffic if one strand breaks so it feels like one site is sending and other site is receiving but there is no return traffic
If tx is bad and rx is good, interface will show as up but BPDUs are not able to be transmitted, and the downstream switch eventually times out the existing root port and identifies a different port as the root port. Traffic is then received on the new root port of remote switch and also forwarded out of the working tx strand that is still working of the former root port of remote switch, thereby creating a forwarding loop
A couple solutions can resolve this scenario:
STP loop guard
Unidirectional Link Detection
STP Loop Guard
STP loop guard prevents any “alternative” (candidate root) or “root ports” from becoming designated ports. Loop guard places the original port in a “loop inconsistent” state while BPDUs are not being received on remote switch on root or alternate ports. When BPDU transmission starts again on that interface, the port recovers and begins to transition through the STP states again.
Loop guard is enabled globally by using the command spanning-tree loopguard default, or it can be enabled on an interface basis with the interface command spanning-tree guard loop. It is important to note that loop guard should not be enabled on portfast-enabled ports (because it directly conflicts with the root/alternate port logic).
SW2# config t
SW2(config)# interface gi1/0/1
SW2(config-if)# spanning-tree guard loop
! Placing BPDU filter on SW2’s RP (Gi1/0/1) triggers loop guard.
SW2(config-if)# interface gi1/0/1
SW2(config-if)# spanning-tree bpdufilter enable
01:42:35.051: %SPANTREE-2-LOOPGUARD_BLOCK: Loop guard blocking port Gigabit
Ethernet1/0/1 on VLAN0001
SW2# show spanning-tree vlan 1 | b Interface
Interface Role Sts Cost Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------
Gi1/0/1 Root BKN*4 128.1 P2p *LOOP_Inc
Gi1/0/3 Root FWD 4 128.3 P2p
Gi1/0/4 Desg FWD 4 128.4 P2p
Ports in an inconsistent state and does not forward any traffic.
Inconsistent ports are viewed with the command show spanning-tree inconsistentports
SW2# show spanning-tree inconsistentports
Name Interface Inconsistency
-------------------- ------------------------ ------------------
VLAN0001 GigabitEthernet1/0/1 Loop Inconsistent
VLAN0010 GigabitEthernet1/0/1 Loop Inconsistent
VLAN0020 GigabitEthernet1/0/1 Loop Inconsistent
VLAN0099 GigabitEthernet1/0/1 Loop Inconsistent
Number of inconsistent ports (segments) in the system : 4
Unidirectional Link Detection
Unidirectional Link Detection (UDLD) allows for the bidirectional monitoring of fiber-optic cables.
UDLD operates by transmitting UDLD packets to a neighbor device that includes the system ID and port ID of the interface transmitting the UDLD packet. The receiving device then repeats that information, including its system ID and port ID, back to the originating device. The process continues indefinitely.
UDLD must be enabled on the remote switch as well. After it is configured, the status of UDLD neighborship can be verified with the command show udld neighbors, neighbor information because like CDP system ID is exchanged. You can view more detailed information with the command show udldinterface-id.
UDLD operates in two different modes:
Normal: In normal mode, if a frame is not acknowledged, the link is considered undetermined and the port remains active – almost useless
Aggressive: In aggressive mode, when a frame is not acknowledged, the switch sends another eight packets in 1-second intervals. If those packets are not acknowledged, the port is placed into an error state.
UDLD is enabled globally with the command udld enable [aggressive]. This command enables UDLD on any small form-factor pluggable (SFP)–based port. UDLD can be disabled on a specific SFP port with the interface configuration command udld port disable. UDLD recovery can be enabled with the command udld recovery [intervaltime], where the optional interval keyword allows for the timer to be modified from the default value of 5 minutes. UDLD can be enabled on a port-by-port basis with the interface configuration command udld port [aggressive], where the optional aggressive keyword places the ports in UDLD aggressive mode.
SW1# conf t
Enter configuration commands, one per line. End with CNTL/Z.
SW1(config)# udld enable
SW1# show udld neighbors
Port Device Name Device ID Port ID Neighbor State
---- ----------- --------- ------- --------------
Te1/1/3 081C4FF8B0 1 Te1/1/3 Bidirectional <<<
SW1# show udld Te1/1/3
Interface Te1/1/3
---
Port enable administrative configuration setting: Follows device default
Port enable operational state: Enabled
Current bidirectional state: Bidirectional
Current operational state: Advertisement - Single neighbor detected
Message interval: 15000 ms
Time out interval: 5000 ms
Port fast-hello configuration setting: Disabled
Port fast-hello interval: 0 ms
Port fast-hello operational state: Disabled
Neighbor fast-hello configuration setting: Disabled
Neighbor fast-hello interval: Unknown
Entry 1
---
Expiration time: 41300 ms
Cache Device index: 1
Current neighbor state: Bidirectional
Device ID: 081C4FF8B0
Port ID: Te1/1/3
Neighbor echo 1 device: 062EC9DC50
Neighbor echo 1 port: Te1/1/3
TLV Message interval: 15 sec
No TLV fast-hello interval
TLV Time out interval: 5
TLV CDP Device name: SW2
In moden networks usually there is less reliance on Layer 2 / spanning tree, and there is no need for load balancing of VLANs, modern networks either use port-channels or Layer 3 networking down to access layer, MST is used to fulfil the requirement of stopping loops in case something is connected by mistake
4 different VLANs , 4 different topologies and 4 different STP instances If number of vlans increase to 10 then switch CPU will need to maintain 10 different STP instances and 10 different topologies
Not only that, switch must listen for BPDUs of every VLAN and topology changes can cause TCN and config BPDU with topology change flag
MST provides a blended approach by mapping one or multiple VLANs onto a single STP tree, called an MST instance (MSTI).
VLANs 1 and 2 correlate to one MSTI, VLAN 3 to a second MSTI, and VLAN 4 to a third MSTI.
A grouping of MST switches with the same high-level configuration is known as an MST region. MST region appear as a single virtual switch to external switches as part of a compatibility mechanism
How MST topology is perceived outside of MST region Everything inside the MST region looks like one virtual switch to the outside world
Above we can see that SW3 is blocking port to Root, which is not normal, if it was normal STP, it would become root port and not discarding, and instead it blocking port would be on SW2 – SW3 segment
For switches inside the MST region calculate STP internally For outside switches they pretend to be a single switch
MST Instances (MSTIs)
MST uses a special STP instance called the internal spanning tree (IST), which is always the first instance, instance 0. The IST runs on all switch port interfaces for switches in the MST region, regardless of the VLANs associated with the ports. Additional information about other MSTIs is included (nested) in the IST BPDU that is transmitted throughout the MST region. That single IST BPDU carries information for all MSTIs running
This enables the MST to advertise only one set of BPDUs, minimizing STP traffic regardless of the number of instances while providing the necessary information to calculate the STP for other MSTIs.
The number of MST instances varies by platform, but platform should support at least 16 instances allowing 15 different topologies, The IST is always instance 0, so instances 1 to 15 can support other VLANs
There is not a special name for instances 1 to 15; they are simply known as MSTIs.
MST Configuration
SW1(config)# spanning-tree mode mst
! change mode to MST
SW1(config)# spanning-tree mst 0 root primary
! The primary keyword sets the priority to 24,576, and
! the secondary keyword sets the priority to 28,672
SW1(config)# spanning-tree mst 1 root primary
SW1(config)# spanning-tree mst 2 root primary
! or set the system priority manually instead of root
! primary or root secondary keywords
! spanning-tree mst 2 priority 16384
SW1(config)# spanning-tree mst configuration
! enter MST configuration submode
SW1(config-mst)# name ENTERPRISE_CORE
! define MST region name, it must match on all switches
! in the region
SW1(config-mst)# revision 2
! this MST version number must match on all switches
! in an MST Region, By default, a region name is an empty
! string
! Associate vlans to MST instances, by default all vlans
! are associated to MST 0 instance, for varying topologies
! assign vlans to different instances
SW1(config-mst)# instance 1 vlan 10,20
SW1(config-mst)# instance 2 vlan 99
The command show spanning-tree mst configuration provides a quick verification of the MST configuration on a switch
Notice that MST instance 0 contains all the VLANs except for VLANs 10, 20, and 99, regardless of whether those VLANs are configured on the switch
MST instance 1 contains VLAN 10 and 20, and MST instance 2 contains only VLAN 99.
The relevant spanning tree information can be obtained with the command show spanning-tree. However, the VLAN numbers are not shown and the MST instance is provided instead. In addition, the priority value for a switch is the MST instance plus the switch priority (not the vlan number + switch priority)
SW1# show spanning-tree
! Output omitted for brevity
! Spanning Tree information for Instance 0 (All VLANs but 10,20, and 99)
MST0
Spanning tree enabled protocol mstp
Root ID Priority 24576
Address 0062.ec9d.c500
This bridge is the root
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Bridge ID Priority 24576 (priority 24576 sys-id-ext 0)
Address 0062.ec9d.c500
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Interface Role Sts Cost Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------------------
Gi1/0/2 Desg FWD 20000 128.2 P2p
Gi1/0/3 Desg FWD 20000 128.3 P2p
! Spanning Tree information for Instance 1 (VLANs 10 and 20)
MST1
Spanning tree enabled protocol mstp
Root ID Priority 24577
Address 0062.ec9d.c500
This bridge is the root
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Bridge ID Priority 24577 (priority 24576 sys-id-ext 1)
Address 0062.ec9d.c500
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Interface Role Sts Cost Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------------------
Gi1/0/2 Desg FWD 20000 128.2 P2p
Gi1/0/3 Desg FWD 20000 128.3 P2p
! Spanning Tree information for Instance 2 (VLAN 99) >>> instead of 24576 + 99
MST2 >>> it is 24576 + 2
Spanning tree enabled protocol mstp
Root ID Priority 24578
Address 0062.ec9d.c500
This bridge is the root
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Bridge ID Priority 24578 (priority 24576 sys-id-ext 2)
Address 0062.ec9d.c500
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Interface Role Sts Cost Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------------------
Gi1/0/2 Desg FWD 20000 128.2 P2p
Gi1/0/3 Desg FWD 20000 128.3 P2p
A consolidated view of the MST topology table is displayed with the command show spanning-tree mst [instance-number]. The optional instance-number can be included to restrict the output to a specific instance.
SW1# show spanning-tree mst
! Output omitted for brevity
##### MST0 vlans mapped: 1-9,11-19,21-98,100-4094
Bridge address 0062.ec9d.c500 priority 0 (24576 sysid 0)
Root this switch for the CIST
Operational hello time 2 , forward delay 15, max age 20, txholdcount 6
Configured hello time 2 , forward delay 15, max age 20, max hops 20
Interface Role Sts Cost Prio.Nbr Type
---------------- ---- --- --------- -------- ------------------------
Gi1/0/2 Desg FWD 20000 128.2 P2p
Gi1/0/3 Desg FWD 20000 128.3 P2p
##### MST1 vlans mapped: 10,20
Bridge address 0062.ec9d.c500 priority 24577 (24576 sysid 1)
Root this switch for MST1
Interface Role Sts Cost Prio.Nbr Type
---------------- ---- --- --------- -------- ------------------------
Gi1/0/2 Desg FWD 20000 128.2 P2p
Gi1/0/3 Desg FWD 20000 128.3 P2p
##### MST2 vlans mapped: 99
Bridge address 0062.ec9d.c500 priority 24578 (24576 sysid 2)
Root this switch for MST2
Interface Role Sts Cost Prio.Nbr Type
---------------- ---- --- --------- -------- ------------------------
Gi1/0/2 Desg FWD 20000 128.2 P2p
Gi1/0/3 Desg FWD 20000 128.3 P2p
SW2# show spanning-tree mst interface gigabitEthernet 1/0/1
GigabitEthernet1/0/1 of MST0 is root forwarding
Edge port: no (default) port guard : none (default)
Link type: point-to-point (auto) bpdu filter: disable (default)
Boundary : internal bpdu guard : disable (default)
Bpdus sent 17, received 217
Instance Role Sts Cost Prio.Nbr Vlans mapped
-------- ---- --- --------- -------- -------------------------------
0 Root FWD 20000 128.1 1-9,11-19,21-98,100-4094
1 Root FWD 20000 128.1 10,20
2 Root FWD 20000 128.1 99
MST Tuning
MST supports the port cost and port priority The interface configuration command spanning-tree mstinstance-numbercostcost sets the interface cost
SW3# show spanning-tree mst 0
! Output omitted for brevity
Interface Role Sts Cost Prio.Nbr Type
---------------- ---- --- --------- -------- --------------------
Gi1/0/1 Root FWD 20000 128.1 P2p
Gi1/0/2 Altn BLK 20000 128.2 P2p
Gi1/0/5 Desg FWD 20000 128.5 P2p
SW3# configure term
Enter configuration commands, one per line. End with CNTL/Z.
SW3(config)# interface gi1/0/1
SW3(config-if)# spanning-tree mst 0 cost 1
SW3# show spanning-tree mst 0
! Output omitted for brevity
Interface Role Sts Cost Prio.Nbr Type
---------------- ---- --- --------- -------- ---------------------
Gi1/0/1 Root FWD 1 128.1 P2p
Gi1/0/2 Desg FWD 20000 128.2 P2p
Gi1/0/5 Desg FWD 20000 128.5 P2p
The interface configuration command spanning-tree mstinstance-numberport-prioritypriority sets the interface priority.
SW4# configure term
Enter configuration commands, one per line. End with CNTL/Z.
SW4(config)# interface gi1/0/5
SW4(config-if)# spanning-tree mst 0 port-priority 64
Network engineers should be aware of two common misconfigurations within the MST region:
VLAN assignment to the IST
Trunk link pruning
VLAN Assignment to the IST
Remember that the IST operates across all links in the MST region, regardless of the VLAN assigned to the actual port.
SW1 and SW2 contain two network links between them allowing VLAN 10 and VLAN 20 Gi1/0/1 and Gi1/0/2 are not trunks but they are access ports with VLANs 10 and 20 assigned VLAN 10 is assigned to the IST, and VLAN 20 is assigned to MSTI 1
Looking at above diagram it looks like that traffic from PC 1 on VLAN 10 will traverse over the Gi1/0/2 but no, traffic will actually be blocked, we need to correct this using:
– port priority – move VLAN 10 to MSTI 1, the switches will build a topology based on the links in use by that MST – allow vlans on all interfaces – Trunk , configure both Gi1/0/1 and Gi1/0/2 as trunks on SW1 and SW2
The IST (Instance 0) runs over all physical links inside the MST region — regardless of VLAN assignment.
IST topology is calculated SW1 is the root bridge All SW1 ports = Designated Ports (DPs) SW2 must block one of its links to prevent a loop
The IST sees:
Two parallel physical links
Same cost
Same root
So one must block, even if:
One link is “for VLAN 10”
The other is “for VLAN 20”
To IST, they’re just two paths to same switch
Trunk Link Pruning
A network engineer made a mistake and has pruned VLANs on the trunk links between SW1 to SW2 and SW1 to SW3 to help load balance traffic.
Shortly after implementing the change, users attached to SW1 and SW3 cannot talk to the servers on SW2. The reason is that although the VLANs on the trunk links have changed, the MSTI topology has not.
You pruned VLAN 10 on one trunk but pruned VLAN 20 on a different trunk the MST topology stays the same, but the VLAN forwarding paths no longer match it.
So rules for pruning vlans with MST are as follow:
– Never prune VLANs inconsistently if they belong to the same MST instance (MSTI). – On any given trunk link, either allow all VLANs in an MSTI, or prune all of them together.
When configuring trunk pruning in MST:
Think in MSTIs, not individual VLANs
Prune per MST instance, not per VLAN
If VLANs share an MSTI → they must travel together
MST Region Boundary
Externally, an MST region must look like one spanning-tree instance, This is non-negotiable — it’s how MST scales. A PVST+ switch expects every VLAN has its own spanning tree
So a PVST+ switch sends:
A BPDU for VLAN 1
A BPDU for VLAN 10
A BPDU for VLAN 20
etc.
MST cannot accept per-VLAN information so MST must ignore VLAN-specific topology from outside. MST has to ask: If I can only believe ONE BPDU from outside, which one do I choose VLAN 1
Not because VLAN 1 is special logically, but because:
VLAN 1 always exists
VLAN 1 cannot be deleted
VLAN 1 is guaranteed to be present end-to-end
So VLAN 1 becomes the anchor VLAN.
The IST (Instance 0) is:
“The single spanning tree that also represents the MST region to the outside world.”
When an MST switch hears PVST+ BPDUs:
It hears many BPDUs (VLAN 1, 10, 20…)
It must pick exactly one
It picks VLAN 1
That BPDU becomes the IST’s view of the outside world
But what about the other VLANs? (your natural next question) for PVST+ > MST and MST > PVST+
for MST > PVST+ , PVST+ expects a BPDU per VLAN.
So MST does this trick:
Take the IST BPDU
Copy it
Send it back as:
“VLAN 10 BPDU”
“VLAN 20 BPDU”
etc.
This is PVST Simulation.
The PVST simulation mechanism sends out PVST+ (and also includes RPVST) BPDUs (one for each VLAN), using the information from the IST.
for PVST+ > MST it is not needed, as long as VLAN 1’s BPDU helps in all the functions reliant on BPDU and contains
– STP type – root path cost – root bridge identifier – local bridge identifier – max age – hello time – forward delay
The mental model that usually makes it click
Think of MST like a company spokesperson:
Inside the company: many departments (MSTIs)
Outside the company: one voice
VLAN 1 is the spokesperson’s microphone
An MST region boundary is any port that connects to a switch that is in a different MST region or that connects to 802.1D or 802.1W BPDUs.
There are two design considerations when integrating an MST region with a PVST+/RPVST environment: The MST region is the root bridge, or the MST region is not a root bridge for any VLAN. These scenarios are explained in the following sections.
MST Region as the Root Bridge
Shows the IST instance as the root bridge for all VLANs. SW1 and SW2 advertise multiple superior BPDUs for each VLAN toward SW3, which is operating as a PVST+ switch. SW3 is responsible for blocking ports
Making the MST region the root bridge ensures that Blocking does not take place on MST region or virtual switch, avoiding block on MST is the goal
MST Region Not a Root Bridge for Any VLAN
In this scenario, the MST region boundary ports can only block or forward for “all VLANs” together. Remember that only the VLAN 1 PVST BPDU is used for the IST and that the IST BPDU is a one-to-many translation of IST BPDUs to all PVST BPDUs. There is not an option to load balance traffic because the IST instance must remain consistent.
If an MST switch detects a better BPDU for a specific VLAN on a boundary port, the switch will use BPDU guard to block this port. The port will then be placed into a root inconsistent state. Although this may isolate downstream switches, it is done to ensure a loop-free topology; this is called the PVST simulation check.
DMVPN provides full mesh broadcast network type connectivity over WAN transport by using mGRE or multipoint GRE, as a result we get sites on spokes with direct spoke to spoke to communication that is on top secured with IPSec encryption, popular because of ease of configuration and scalability
Before we get into DMVPN, we need to know GRE well
With DMVPN, spokes have to register to hub just like SIP phone registers to the SIP server
Generic Routing Encapsulation (GRE) Tunnels
GRE not just provides connectivity for IP but also legacy and nowadays nonrouteble protocols like DECnet, Systems Network Architecture SNA and IPX
Running protocols over VPN was a big issue due to VPN being point to point and networks had to be designed around the point to point topologies but routing protocols function well over broadcast like topologies , mGRE resolves that problem
Additional header is added when packets travel over the GRE tunnel
GRE tunnels support IPv4 or IPv6 addresses as an overlay or transport network.
GRE creates a virtual network or overlay network over a real physical underlay network
In the routing tables of participating routers R11 and R31 , 10.1.1.0/24 is behind 192.168.0.11 and 10.3.3.0/24 is behind 192.168.0.31 , The Transport side or WAN side routing table does not have 192.168.0.0/16 network range , and that is how when tunnels are up those stub networks are accessible, and if tunnels are not up then they are not accessible
interface Tunnel100
! create tunnel interface
bandwidth 4000
! Virtual interfaces do not have the concept of latency
! and need to have a reference bandwidth configured so that
! routing protocols that use bandwidth for best-path calculation
! can make intelligent decisions
! measured and configured in kilo bits
! Bandwidth is also used for quality of service (QoS) configuration
! on the interface
ip address 192.168.100.11 255.255.255.0
! GRE tunnel needs IP as it is just like any other interface
! this is overlay IP
ip mtu 1400
! reduce the mtu for tunnel interface
! exact added size differs based on tunnel type and encryption used
! min 24 bytes to 77 bytes
keepalive 5 3
! The default timer is 10 seconds and three retries
! Tunnel interfaces are GRE point-to-point (P2P) by default,
! and the line protocol enters an up state when the router detects
! that a route to the tunnel destination exists in the routing
! table. If the tunnel destination is not in the routing table,
! the tunnel interface (line protocol) enters a down state.
! What if there is a problem on remote end and remote router is down
! By default, GRE tunnels stay “up” as long as the interface is configured
! and tunnel destination is in routing table
! Tunnel keepalives ensure that bidirectional communication exists
! between tunnel endpoints to keep the line protocol up
tunnel source GigabitEthernet0/1
! tunnel's source interface is used for encapsulation and decapsulation
! tunnel source also accepts IP address as well
! tunnel source can be physical or loopback interface
tunnel destination 172.16.31.1
! tunnel's destination is where GRE sends packets or terminates tunnel
! for mGRE this is not defined but dynamically provided
Tunnel Type
Tunnel Header Size
GRE without IPsec
24 bytes
DES/3DES IPsec (transport mode)
18–25 bytes
DES/3DES IPsec (tunnel mode)
38–45 bytes
GRE/DMVPN + DES/3DES
42–49 bytes
GRE/DMVPN + AES + SHA-1
62–77 bytes
GRE Sample Configuration
R11
interface Tunnel100
bandwidth 4000
ip address 192.168.100.11 255.255.255.0
ip mtu 1400
keepalive 5 3
tunnel source GigabitEthernet0/1
tunnel destination 172.16.31.1
!
router eigrp GRE-OVERLAY
address-family ipv4 unicast autonomous-system 100
topology base
exit-af-topology
network 10.0.0.0
network 192.168.100.0
exit-address-family
R31
interface Tunnel100
bandwidth 4000
ip address 192.168.100.31 255.255.255.0
ip mtu 1400
keepalive 5 3
tunnel source GigabitEthernet0/1
tunnel destination 172.16.11.1
!
router eigrp GRE-OVERLAY
address-family ipv4 unicast autonomous-system 100
topology base
exit-af-topology
network 10.0.0.0
network 192.168.100.0
exit-address-family
R11# show interface tunnel 100
! Output omitted for brevity
Tunnel100 is up, line protocol is up
Hardware is Tunnel
Internet address is 192.168.100.1/24
MTU 17916 bytes, BW 400 Kbit/sec, DLY 50000 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation TUNNEL, loopback not set
Keepalive set (5 sec), retries 3
Tunnel source 172.16.11.1 (GigabitEthernet0/1), destination 172.16.31.1
Tunnel Subblocks:
src-track:
Tunnel100 source tracking subblock associated with GigabitEthernet0/1
Set of tunnels with source GigabitEthernet0/1, 1 member (includes
iterators), on interface <OK>
Tunnel protocol/transport GRE/IP
Key disabled, sequencing disabled
Checksumming of packets disabled
Tunnel TTL 255, Fast tunneling enabled
Tunnel transport MTU 1476 bytes
Tunnel transmit bandwidth 8000 (kbps)
Tunnel receive bandwidth 8000 (kbps)
Last input 00:00:02, output 00:00:02, output hang never
R11# show ip route
! Output omitted for brevity
Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
Gateway of last resort is not set
10.0.0.0/8 is variably subnetted, 3 subnets, 2 masks
C 10.1.1.0/24 is directly connected, GigabitEthernet0/2
D 10.3.3.0/24 [90/38912000] via 192.168.100.31, 00:03:35, Tunnel100 <<<
172.16.0.0/16 is variably subnetted, 3 subnets, 2 masks
C 172.16.11.0/30 is directly connected, GigabitEthernet0/1
R 172.16.31.0/30 [120/1] via 172.16.11.2, 00:00:03, GigabitEthernet0/1
192.168.100.0/24 is variably subnetted, 2 subnets, 2 masks
C 192.168.100.0/24 is directly connected, Tunnel100 <<<
Verifying that 10.3.3.3 network is reachable via Tunnel 100 (192.168.100.0/24)
R11# traceroute 10.3.3.3 source 10.1.1.1
Tracing the route to 10.3.3.3
1 192.168.100.31 1 msec * 0 msec
Notice that from R11’s perspective, the network is only one hop away. The traceroute does not display all the hops in the underlay
In the same fashion, the packet’s time to live (TTL) is encapsulated as part of the payload. The original TTL decreases by only one for the GRE tunnel, regardless of the number of hops in the transport network.
Route recursion issue in GRE
Route recursion happens when a router tries to resolve the underlay next hop of a GRE tunnel destination using the tunnel itself, creating a logical loop, in order to prevent this we need to “not advertise” the underlay networks through GRE peering.
This scenario can occur when routing protocol is turned on all interfaces without care (regardless of passive default command) This includes GRE tunnel destination’s subnet in the routing protocol
That route must be reachable via a physical interface If the route to the tunnel destination disappears → GRE goes down
Sequence of events to failure
Step 1: Normal Operation
Tunnel destination is reachable via the physical interface
GRE tunnel comes UP
IGP advertises routes over the tunnel
Step 2: IGP Learns a “Better” Route
IGP learns the tunnel destination IP via the GRE tunnel
This route has:
Lower metric
Or preferred administrative distance
Step 3: Recursive Dependency
Router now thinks: “To reach the GRE destination, use the tunnel”
But the tunnel itself requires reachability to that destination
Tunnel depends on itself
What Happens Next?
GRE tunnel goes DOWN
IGP adjacency over tunnel goes DOWN
Physical-path route reappears
Tunnel comes UP
Loop repeats
Result:
Tunnel flapping
IGP instability
High CPU
Intermittent packet loss
Next Hop Resolution Protocol (NHRP)
NHC refers to DMVPN Spoke NHS refers to DMVPN Hub
NHRP is just like ARP but for non-broadcast multi-access (NBMA) WAN networks such as Frame Relay and ATM networks
NHRP is a client/server protocol that allows devices to register themselves. NHRP next-hop servers (NHSs) are responsible for registering addresses or networks, and replying to any queries received by next-hop clients (NHCs).
NHC can reach NHS and ask for of underlay and overlay IP for a specific “network”
NHCs are statically configured with the IP addresses of the hubs (NHSs) so that they can register their overlay (tunnel IP) and NBMA (underlay) IP addresses with the hubs
NHRP Message Types
Message Type
Description
Registration
Registration NHRP messages are sent by the NHC (spoke) toward the NHS (hub). The NHC (spoke) also specifies the amount of time that the registration should be maintained by the NHS (hub)
Resolution
Resolution NHRP messages provide the address resolution to remote spoke. Resolution reply provides underlay and overlay IP address for a remote network.
Redirect
This allows Hub to notify the spoke that a specific network can be reached by using a more optimal path (spoke-to-spoke tunnel). Redirect NHRP messages are essential component of DMVPN Phase 3 spoke to spoke to work.
Purge
Purge NHRP messages are sent to remove a cached NHRP entry. Purge messages notify routers of change. A purge is typically sent by a Hub to spoke to indicate that the mapping for an address/network that it answered is not valid anymore
Error
Error messages are used to notify the sender of an NHRP packet that an error has occurred.
Dynamic Multipoint VPN (DMVPN)
Zero-touch provisioning: It is considered a zero-touch technology because no configuration is needed on the DMVPN hub routers as new spokes are added to the DMVPN network
Spoke-to-spoke tunnels: DMVPN provides full-mesh connectivity. Dynamic spoke-to-spoke tunnels are created as needed and torn down when no longer needed. There is no packet loss while building dynamic on-demand spoke-to-spoke tunnels “after the initial spoke-to-hub tunnels are established”.
Multiprotocol support: DMVPN can use IPv4, IPv6, and MPLS as either the overlay or underlay network protocol.
Multicast support: DMVPN allows multicast traffic to flow on the tunnel interfaces.
Adaptable connectivity: DMVPN routers can establish connectivity behind Network Address Translation (NAT). Spoke routers can use dynamic IP addressing such as Dynamic Host Configuration Protocol (DHCP).
A spoke site initiates a persistent VPN connection to the hub router. Network traffic between spoke sites does not have to travel through the hubs. DMVPN then dynamically builds a VPN tunnel between spoke sites on an as-needed basis. This allows network traffic, such as voice over IP (VoIP), to take a direct path, which reduces delay and jitter without consuming bandwidth at the hub site.
DMVPN was released in three phases, each phase built on the previous one with additional functions. DMVPN spokes can use DHCP or static addressing for the transport and overlay networks.
Next-hop preservation
interface Tunnel0
ip summary-address eigrp 100 10.1.0.0 255.255.0.0
Summarization is used on hub router in DMVPN design to reduce the routing table size in hub because a lot of sites report / advertise a lot of subnets per site and can increase the size of routing table on hub
but problem occurs when summary is configured, next hop is changed to summarising router which is normal in any summarization and in DMVPN and instead of spoke to spoke communication it becomes spoke to hub to spoke communication
NHRP shortcut
A dynamically created, “more-specific” route pushed by hub (phase 3) installed by NHRP that changes the next hop from the hub to the destination spoke, allowing direct spoke-to-spoke forwarding.
That creates a shortcut tunnel between spokes
NHRP Shortcuts are Dynamic → created on demand More specific → overrides a summary route Installed in the routing table → not just a cache Changes the next hop → from hub → spoke Enables direct tunnels → spoke-to-spoke
hence Phase 2 + summarisation = hub-and-spoke forwarding only
Phase 1: Spoke-to-Hub
DMVPN Phase 1, the first DMVPN implementation VPN tunnels are created only between spoke and hub sites. Traffic between spokes must traverse the hub to reach any other spoke.
Phase 2: Spoke-to-Spoke
DMVPN Phase 2 allows spoke-to-spoke but DMVPN Phase 2 does not support spoke-to-spoke communication between different DMVPN networks (multilevel hierarchical DMVPN).
DMVPN spoke to spoke communication breaks when hub summarizes routes because Spokes do not know which spoke owns which subnet and cannot build NHRP shortcut and traffic must go via spoke → hub → spoke Spoke-to-spoke still technically exists, but is never used
Same thing happens in hierarchical DMVPN because regional hubs summarize routes upward and global hub only sees big summary routes so even if local region’s hub is not using summarization, remote region’s routes are summarized so spoke to spoke (in different region) communication in DMVPN Phase 2 breaks
Phase 3 fixes exactly this problem.
Phase 3: Hierarchical Tree Spoke-to-Spoke
DMVPN Phase 3 fixes above problem and refines spoke-to-spoke connectivity by adding below NHRP messages by adding two NHRP messages:
NHRP installs above shortcut route and saves it in NHRP cache
More specific than the summary
Overrides the hub route
Spoke A now builds a direct GRE/IPsec tunnel to Spoke B and data packets now go directly from spoke to spoke
so summary route still exists for scale of HUB router memory but NHRP injects more-specific routes dynamically, More specific routes override summaries
Difference in Phase 2 and Phase 3 DMVPN with multilevel hierarchical topologies
Connectivity between DMVPN tunnels 20 and 30 is established by DMVPN tunnel 10 All three DMVPN tunnels use the same DMVPN tunnel ID, even though they use different tunnel interfaces
For Phase 2 DMVPN tunnels, traffic from R5 must flow to the hub R2, where it is sent to R3 and then back down to R6
For Phase 3 DMVPN tunnels, a spoke-to-spoke tunnel is established between R5 and R6, and the two routers can communicate directly.
Each DMVPN phase has its own specific configuration. Intermixing DMVPN phases on the same tunnel network is not recommended. If you need to support multiple DMVPN phases for a migration, a second DMVPN network (subnet and tunnel interface) should be used.
DMVPN Configuration
DMVPN Hub Configuration
R11-Hub
interface Tunnel100
bandwidth 4000
! Virtual interfaces do not have the concept of latency
! and need to have a reference bandwidth configured so that
! routing protocols that use bandwidth for best-path calculation
! can make intelligent decisions
! measured and configured in kilo bits
! Bandwidth is also used for quality of service (QoS) configuration
! on the interface
ip address 192.168.100.11 255.255.255.0
! allocate an overlay IP address
ip mtu 1400
! set ip mtu to 1400 , typical value for DMVPN to account for additional
! encapsulation
ip nhrp map multicast dynamic
! Good to enable multicast support for NHRP
! NHRP just like subnets can also provide mapping of overlay IP
! + underlay IP for multicast addresses , To support multicast
! or routing protocols that use multicast, enable this on DMVPN hub
! routers
ip nhrp network-id 100
! Enable NHRP on tunnel and assign unique network identity
! this NHRP network ID is not used in any negotiation but
! It is recommended that the NHRP network ID match on all
! routers participating in the same DMVPN network.
! It is used by local router to identify the DMVPN cloud
! because multiple tunnel interfaces can belong to the same
! DMVPN cloud
ip nhrp redirect
! Enable Phase 3 or NHRP redirect function on DMVPN network
ip tcp adjust-mss 1360
! to influence the TCP MSS negotiation in 3 WAY handshake
! for TCP packets visible on tunnel which they are even in
! case of TLS, typical value is 1360 to accommodate the 20
! bytes for IP + 20 bytes for TCP header
tunnel source GigabitEthernet0/1
! this can be logical interface like loopback
! QoS problems can occur with the use of loopback interfaces
! when there are multiple paths in the forwarding table to the
! decapsulating router. The same problems occur automatically
! with port channels, which are not recommended at the time of
! this writing.
tunnel mode gre multipoint
! configure tunnel as mGRE tunnel
tunnel key 100
! Optionally use tunnel key in case multiple tunnel interfaces
! use same source interface , Tunnel keys, if configured, must
! match for a DMVPN tunnel to be established between two routers
! the tunnel key adds 4 bytes to the DMVPN header. The tunnel key
! is configured with the command tunnel key 0-4294967295
! If the tunnel key is defined on the hub router, it must be defined
! on all the spoke routers.
Note that mGRE tunnels do not support the option for using a keepalive. Keepalive is only logically possible when there is a single endpoint on other end, but in mGRE we have multiple endpoints
There is no technical correlation between the NHRP network ID and the tunnel interface number; however, keeping them the same helps from an operational support standpoint.
DMVPN Spoke Configuration for DMVPN Phase 1 (Point-to-Point)
The configuration of DMVPN Phase 1 spokes is similar to the configuration for a hub router except two differences:
You do not use an mGRE tunnel. Instead, you specify the tunnel destination (because communication has to come back to hub)
The NHRP mapping points to at least one active NHS
R31-Spoke (Single NHRP Command Configuration)
interface Tunnel100
bandwidth 4000
! Virtual interfaces do not have the concept of latency
! and need to have a reference bandwidth configured so that
! routing protocols that use bandwidth for best-path calculation
! can make intelligent decisions
! measured and configured in kilo bits
! Bandwidth is also used for quality of service (QoS) configuration
! on the interface
ip address 192.168.100.31 255.255.255.0
! assign overlay IP address to the Spoke
ip mtu 1400
ip nhrp network-id 100
ip nhrp nhs 192.168.100.11 nbma 172.16.11.1 multicast
! define the DMVPN HUB or NHS, more can be added
! multicast keyword provides multicast mapping functions
! in NHRP and is required to support the following routing
! protocols: RIP, EIGRP, and Open Shortest Path First (OSPF)
ip tcp adjust-mss 1360
tunnel source GigabitEthernet0/1
tunnel destination 172.16.11.1
! tunnel destination is DMVPN HUB underlay address
tunnel key 100
R41-Spoke (Multiple NHRP Commands Configuration)
! NHS with MAP commands
interface Tunnel100
bandwidth 4000
ip address 192.168.100.41 255.255.255.0
ip mtu 1400
ip nhrp map 192.168.100.11 172.16.11.1
ip nhrp map multicast 172.16.11.1
ip nhrp network-id 100
ip nhrp nhs 192.168.100.11
ip tcp adjust-mss 1360
tunnel source GigabitEthernet0/1
tunnel destination 172.16.11.1
tunnel key 100
Viewing DMVPN Tunnel Status
Tunnel states, in order of establishment:
INTF: The line protocol of the DMVPN tunnel is down.
IKE: DMVPN tunnels configured with IPsec have not yet established an IKE session.
Ipsec: An IKE session has been established, but an Ipsec security association (SA) has not yet been established.
NHRP: The DMVPN spoke router has not yet successfully registered.
Up: The DMVPN spoke router has registered with the DMVPN hub and received an ACK (positive registration reply) from the hub.
R31-Spoke# show dmvpn
! Output omitted for brevity
Interface: Tunnel100, IPv4 NHRP Details
Type:Spoke, NHRP Peers:1,
# Ent Peer NBMA Addr Peer Tunnel Add State UpDn Tm Attrb
----- --------------- --------------- ----- -------- -----
1 172.16.11.1 192.168.100.11 UP 00:05:26 S >>> static because NHS was defined
R41-Spoke# show dmvpn
! Output omitted for brevity
Interface: Tunnel100, IPv4 NHRP Details
Type:Spoke, NHRP Peers:1,
# Ent Peer NBMA Addr Peer Tunnel Add State UpDn Tm Attrb
----- --------------- --------------- ----- -------- -----
1 172.16.11.1 192.168.100.11 UP 00:05:26 S >>> static because NHS was defined
R11-Hub# show dmvpn
Legend: Attrb ◊–S - Static,–D - Dynamic,–I - Incomplete
–N - NATed,–L - Local,–X - No Socket
–1 - Route Installed, –2 - Nexthop-override
–C - CTS Capable
# Ent --> Number of NHRP entries with same NBMA peer
NHS Status: E --> Expecting Replies, R --> Responding, W --> Waiting
UpDn Time --> Up or Down Time for a Tunn==
Interface: Tunnel100, IPv4 NHRP Details
Type:Hub, NHRP Peers:2,
# Ent Peer NBMA Addr Peer Tunnel Add State UpDn Tm Attrb
----- --------------- --------------- ----- -------- -----
1 172.16.31.1 192.168.100.31 UP 00:05:26 D
1 172.16.41.1 192.168.100.41 UP 00:05:26 D
>>> D ! Dynamic because HUB learned spoke
with detail keyword
R11-Hub# show dmvpn detail
Legend: Attrb --> S - Static, D - Dynamic, I - Incomplete
N - NATed, L - Local, X - No Socket
T1 - Route Installed, T2 - Nexthop-override
C - CTS Capable
# Ent --> Number of NHRP entries with same NBMA peer
NHS Status: E --> Expecting Replies, R --> Responding, W --> Waiting
UpDn Time --> Up or Down Time for a Tunnel
==========================================================================
Interface Tunnel100 is up/up, Addr. is 192.168.100.11, VRF ""
Tunnel Src./Dest. addr: 172.16.11.1/MGRE, Tunnel VRF ""
Protocol/Transport: "multi-GRE/IP"", Protect ""
Interface State Control: Disabled
nhrp event-publisher : Disabled
Type:Hub, Total NBMA Peers (v4/v6): 2
# Ent Peer NBMA Addr Peer Tunnel Add State UpDn Tm Attrb Target Network
----- --------------- --------------- ----- -------- ----- -----------------
1 172.16.31.1 192.168.100.31 UP 00:01:05 D 192.168.100.31/32
1 172.16.41.1 192.168.100.41 UP 00:01:06 D 192.168.100.41/32
R31-Spoke# show dmvpn detail
! Output omitted for brevity
Interface Tunnel100 is up/up, Addr. is 192.168.100.31, VRF ""
Tunnel Src./Dest. addr: 172.16.31.1/172.16.11.1, Tunnel VRF ""
Protocol/Transport: "GRE/IP", Protect ""
Interface State Control: Disabled
nhrp event-publisher : Disabled
IPv4 NHS:
192.168.100.11 RE NBMA Address: 172.16.11.1 priority = 0 cluster = 0
Type:Spoke, Total NBMA Peers (v4/v6): 1
# Ent Peer NBMA Addr Peer Tunnel Add State UpDn Tm Attrb Target Ne
----- --------------- --------------- ----- -------- ----- ------------
1 172.16.11.1 192.168.100.11 UP 00:00:28 S 192.168.100
R41-Spoke# show dmvpn detail
! Output omitted for brevity
Interface Tunnel100 is up/up, Addr. is 192.168.100.41, VRF ""
Tunnel Src./Dest. addr: 172.16.41.1/172.16.11.1, Tunnel VRF " "
Protocol/Transport: "GRE/IP", Protect ""
Interface State Control: Disabled
nhrp event-publisher : Disabled
IPv4 NHS:
192.168.100.11 RE NBMA Address: 172.16.11.1 priority = 0 cluster = 0
Type:Spoke, Total NBMA Peers (v4/v6): 1
# Ent Peer NBMA Addr Peer Tunnel Add State UpDn Tm Attrb Target Network
----- --------------- --------------- ----- -------- ----- -----------------
1 172.16.11.1 192.168.100.11 UP 00:02:00 S 192.168.100.11/32
Viewing the NHRP Cache
NHRP cache very similar to ARP cache contains information returned by hub such as network entry with overlay and underlay IP of spokes , interface it was received on + expiry time (dynamic entries expire)
NHRP Mapping Entry
Description
static
An entry created statically on a DMVPN interface, this is seen on DMVPN Spokes
dynamic
An entry created dynamically. This is seen on DMVPN Hub
incomplete
A Cisco router means the router knows it needs a mapping, but the resolution process has not finished yet. This is just like an “Incomplete” ARP entry
NHRP (Next Hop Resolution Protocol) is commonly used in DMVPN to map: Tunnel IP address → NBMA (physical/WAN) IP address Routers cache these mappings in the NHRP table.
An NHRP entry marked INCOMPLETE indicates: The router has initiated an NHRP resolution request, but has not yet received a valid reply. So: The router does not yet know the NBMA address The mapping cannot be used for forwarding traffic The entry is temporary – usually is seen on HUB when request sent, no reply received and this can be when destination spoke is down , not registered or has incorrect configuration – also happens when NHRP replies are being blocked by ACL, Firewall, NAT
Router# show ip nhrp 10.10.10.2/32 via 10.10.10.2 Tunnel0 created 00:00:12, incomplete
An incomplete entry prevents repetitive NHRP requests for the same entry. Eventually this will time out and permit another NHRP resolution request for the same network.
A healthy entry eventually changes to Dynamic or Static
local
Just like ARP’s local meaning that this overlay IP and underlay IP is on the router interface itself , Cisco routers automatically install a local NHRP entry so that router can correctly identify itself as an NHRP participant
R1# show ip nhrp 10.0.0.1/32 via 10.0.0.1 Tunnel0 created 00:12:33, expire never Type: local, Flags: authoritative
(no-socket)
Mapping entries that do not have associated IPsec sockets and where encryption is not triggered.
NBMA address
Nonbroadcast multi-access address, or the transport IP address where the entry was received.
NHRP message flags specify attributes of an NHRP cache entry
NHRP Message Flag
Description
used
Indicates that this NHRP mapping entry was used to forward data packets within the past “60” seconds.
implicit
Indicates that the NHRP mapping entry was learned implicitly. Examples of such entries are the source mapping information gleaned from an NHRP resolution request received by the local router or from an NHRP resolution packet forwarded “through” the router.
unique
Indicates that this remote NHRP mapping entry must be unique and that it cannot be overwritten with an entry that has the same tunnel IP address but a different NBMA address.
router
Indicates that this NHRP mapping entry is from a remote “router” that provides access to a network or “host” behind the remote router.
rib
NHRP has injected a host route into the IP routing table This is not learned via a routing protocol (EIGRP/OSPF/BGP), but directly installed by NHRP
show ip nhrp
10.10.10.2/32 via 172.16.1.2 Flags: unique, dynamic, rib
This rib flag means this entry is installed in routing table
show ip route 10.10.10.2
Routing entry for 10.10.10.2/32 Known via "nhrp", distance 250, metric 0
Why is AD 250 important? Makes sure routing protocols win Prevents NHRP from overriding real routing decisions NHRP routes are fallback / shortcut routes but because these are longest or most specific routes they always override
When will you see RIB flag set? You’ll see RIB when: DMVPN Phase 2 or 3 is active NHRP resolution succeeds Spoke learns another spoke’s NBMA address Traffic triggers a shortcut
nho
When NHO is set, the spoke is telling the hub: “Do NOT override the next-hop with yourself when replying to NHRP resolution requests.” The hub does not insert itself as the next hop This allows direct spoke-to-spoke tunnels to form
Without NHO Traffic between spokes is forced through the hub Hub becomes the next hop No dynamic spoke-to-spoke tunnels
With NHO (normal DMVPN behavior) Hub returns the real NBMA address of the destination spoke Spokes build direct GRE/IPsec tunnels Enables Phase 2 / Phase 3 DMVPN
nhop
The nhop flag tells that this is valid next-hop for forwarding traffic
R11-Hub# show ip nhrp
192.168.100.31/32 via 192.168.100.31
Tunnel100 created 23:04:04, expire 01:37:26
Type: dynamic, Flags: unique registered used nhop
NBMA address: 172.16.31.1
192.168.100.41/32 via 192.168.100.41
Tunnel100 created 23:04:00, expire 01:37:42
Type: dynamic, Flags: unique registered used nhop
NBMA address: 172.16.41.1
R31-Spoke# show ip nhrp
192.168.100.11/32 via 192.168.100.11
Tunnel100 created 23:02:53, never expire
Type: static, Flags:
NBMA address: 172.16.11.1
R41-Spoke# show ip nhrp
192.168.100.11/32 via 192.168.100.11
Tunnel100 created 23:02:53, never expire
Type: static, Flags:
NBMA address: 172.16.11.1
show ip nhrp “brief” some information such as the used and nhop NHRP message flags are not shown with brief keyword
R11-Hub# show ip nhrp brief
****************************************************************************
NOTE: Link-Local, No-socket and Incomplete entries are not displayed
****************************************************************************
Legend: Type --> S - Static, D - Dynamic
Flags --> u - unique, r - registered, e - temporary, c - claimed
a - authoritative, t - route
============================================================================
Intf NextHop Address NBMA Address
Target Network T/Flag
-------- ------------------------------------------- ------ ----------------
Tu100 192.168.100.31 172.16.31.1
192.168.100.31/32 D/ur
Tu100 192.168.100.41 172.16.41.1
192.168.100.41/32 D/ur
R31-Spoke# show ip nhrp brief
! Output omitted for brevity
Intf NextHop Address NBMA Address
Target Network T/Flag
-------- ------------------------------------------- ------ ----------------
Tu100 192.168.100.11 172.16.11.1
192.168.100.11/32 S/
R41-Spoke# show ip nhrp brief
! Output omitted for brevity
Intf NextHop Address NBMA Address
Target Network T/Flag
-------- ------------------------------------------- ------ ----------------
Tu100 192.168.100.11 172.16.11.1
192.168.100.11/32 S/
The optional detail keyword provides a list of routers that submitted NHRP resolution requests and their request IDs.
Routing Table
Notice that the next-hop address between spoke routers is 192.168.100.11 (R11).
R11-Hub# show ip route
! Output omitted for brevity
Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
Gateway of last resort is 172.16.11.2 to network 0.0.0.0
S* 0.0.0.0/0 [1/0] via 172.16.11.2
10.0.0.0/8 is variably subnetted, 4 subnets, 2 masks
C 10.1.1.0/24 is directly connected, GigabitEthernet0/2
D 10.3.3.0/24 [90/27392000] via 192.168.100.31, 23:03:53, Tunnel100
D 10.4.4.0/24 [90/27392000] via 192.168.100.41, 23:03:28, Tunnel100
172.16.0.0/16 is variably subnetted, 2 subnets, 2 masks
C 172.16.11.0/30 is directly connected, GigabitEthernet0/1
192.168.100.0/24 is variably subnetted, 2 subnets, 2 masks
C 192.168.100.0/24 is directly connected, Tunnel100
R31-Spoke# show ip route
! Output omitted for brevity
Gateway of last resort is 172.16.31.2 to network 0.0.0.0
S* 0.0.0.0/0 [1/0] via 172.16.31.2
10.0.0.0/8 is variably subnetted, 4 subnets, 2 masks
D 10.1.1.0/24 [90/26885120] via 192.168.100.11, 23:04:48, Tunnel100
C 10.3.3.0/24 is directly connected, GigabitEthernet0/2
D 10.4.4.0/24 [90/52992000] via 192.168.100.11, 23:04:23, Tunnel100
172.16.0.0/16 is variably subnetted, 2 subnets, 2 masks
C 172.16.31.0/30 is directly connected, GigabitEthernet0/1
192.168.100.0/24 is variably subnetted, 2 subnets, 2 masks
C 192.168.100.0/24 is directly connected, Tunnel100
R41-Spoke# show ip route
! Output omitted for brevity
Gateway of last resort is 172.16.41.2 to network 0.0.0.0
S* 0.0.0.0/0 [1/0] via 172.16.41.2
10.0.0.0/8 is variably subnetted, 4 subnets, 2 masks
D 10.1.1.0/24 [90/26885120] via 192.168.100.11, 23:05:01, Tunnel100
D 10.3.3.0/24 [90/52992000] via 192.168.100.11, 23:05:01, Tunnel100
C 10.4.4.0/24 is directly connected, GigabitEthernet0/2
172.16.0.0/16 is variably subnetted, 2 subnets, 2 masks
C 172.16.41.0/24 is directly connected, GigabitEthernet0/1
192.168.100.0/24 is variably subnetted, 2 subnets, 2 masks
C 192.168.100.0/24 is directly connected, Tunnel100
Traceroute
Traceroute shows that data from R31 to R41 will go through R11.
DMVPN Configuration for Phase 3 DMVPN (Multipoint)
Phase 3 DMVPN configuration for the hub router adds the interface parameter command ip nhrp redirect on the hub router
This command checks the flow of packets on the tunnel interface and sends a redirect message to the source spoke router when it detects Hub router being used as transit, this is done by detecting for hairpinning
Hairpinning means that traffic is received and sent out an interface in the same cloud (identified by the NHRP network ID) , For instance, hairpinning occurs when packets come in and go out the same tunnel interface.
The Phase 3 DMVPN configuration for spoke routers uses the mGRE tunnel interface and uses the command ip nhrp shortcut on the tunnel interface.
R11-Hub
interface Tunnel100
bandwidth 4000
ip address 192.168.100.11 255.255.255.0
ip mtu 1400
ip nhrp map multicast dynamic
ip nhrp network-id 100
ip nhrp redirect <<<
ip tcp adjust-mss 1360
tunnel source GigabitEthernet0/1
tunnel mode gre multipoint
tunnel key 100
R31-Spoke
interface Tunnel100
bandwidth 4000
ip address 192.168.100.31 255.255.255.0
ip mtu 1400
ip nhrp network-id 100
ip nhrp nhs 192.168.100.11 nbma 172.16.11.1 multicast
ip nhrp shortcut <<<
ip tcp adjust-mss 1360
tunnel source GigabitEthernet0/1
tunnel mode gre multipoint
tunnel key 100
R41-Spoke
interface Tunnel100
bandwidth 4000
ip address 192.168.100.41 255.255.255.0
ip mtu 1400
ip nhrp network-id 100
ip nhrp nhs 192.168.100.11 nbma 172.16.11.1 multicast
ip nhrp shortcut <<<
ip tcp adjust-mss 1360
tunnel source GigabitEthernet0/1
tunnel mode gre multipoint
tunnel key 100
IP NHRP Authentication
NHRP includes an authentication capability, but this authentication is weak because the password is stored in plaintext. Most network administrators use NHRP authentication as a method to ensure that two different tunnels do not accidentally form. You enable NHRP authentication by using the interface parameter command ip nhrp authentication password.
Unique IP NHRP Registration
When Spoke regsiters with hub it adds the unique flag that forces DMVPN NHRP to keep overlay / protocol address and NBMA address unique for a spoke and same as the time of registration, If an NHC client or spoke attempts to register with the NHS using a different NBMA address while the previous entry has not expired yet, the registration process fails.
lets demonstrate this concept by disabling the DMVPN tunnel interface, changing the IP address on the transport interface, and reenabling the DMVPN tunnel interface. Notice that the DMVPN hub denies the NHRP registration because the protocol address is registered to a different NBMA address.
R31-Spoke(config)# interface tunnel 100
R31-Spoke(config-if)# shutdown
00:17:48.910: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 192.168.100.11
(Tunnel100) is down: interface down
00:17:50.910: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel100,
changed state to down
00:17:50.910: %LINK-5-CHANGED: Interface Tunnel100, changed state to
administratively down
R31-Spoke(config-if)# interface GigabitEthernet0/1
R31-Spoke(config-if)# ip address 172.16.31.31 255.255.255.0
R31-Spoke(config-if)# interface tunnel 100
R31-Spoke(config-if)# no shutdown
00:18:21.011: %NHRP-3-PAKREPLY: Receive Registration Reply packet with error -
unique address registered already(14)
00:18:22.010: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel100, changed
state to up
This can cause problems for sites with transport interfaces that connect using DHCP, where they could be assigned different IP addresses before the NHRP cache times out. If a router loses connectivity and is assigned a different IP address, because of its age, it cannot register with the NHS router until that router’s entry is flushed from the NHRP cache.
The interface parameter command ip nhrp registration no-unique stops routers from placing the unique NHRP message flag in registration request packets sent to the NHS. This allows clients to reconnect to the NHS even if the NBMA address changes. This should be enabled on all DHCP-enabled spoke interfaces. However, placing this on all spoke tunnel interfaces keeps the configuration consistent for all tunnel interfaces and simplifies verification of settings from an operational perspective.
The NHC (spoke) has to register with this flag for this change to take effect on the NHS. This can either happens during the normal NHRP expiration timers or can be accelerated by resetting the tunnel interface on the spoke before change of transport IP
Spoke-to-Spoke Communication
In DMVPN Phase 1, the spoke devices rely on the configured tunnel destination to identify where to send the encapsulated packets. Phase 3 DMVPN uses mGRE tunnels and thereby relies on NHRP redirect and resolution request messages to identify the NBMA addresses for any destination networks
R31 initiates a traceroute to R41. Notice that the first packet travels across R11 (hub), but by the time a second stream of packets is sent, the spoke-to-spoke tunnel has been initialized so that traffic flows directly between R31 and R41 on the transport and overlay networks.
! Initial Packet Flow
R31-Spoke# traceroute 10.4.4.1 source 10.3.3.1
Tracing the route to 10.4.4.1
1 192.168.100.11 5 msec 1 msec 0 msec <- This is the Hub Router (R11-Hub)
2 192.168.100.41 5 msec * 1 msec
! Packetflow after Spoke-to-Spoke Tunnel is Established
R31-Spoke# traceroute 10.4.4.1 source 10.3.3.1
Tracing the route to 10.4.4.1
1 192.168.100.41 1 msec * 0 msec
Forming Spoke-to-Spoke Tunnels
Step 1. R31 performs a route lookup for 10.4.4.1 and finds the entry 10.4.4.0/24 with the next-hop IP address 192.168.100.11 through hub. R31 encapsulates the packet destined for 10.4.4.1 and forwards it to R11 out the tunnel 100 interface.
Step 2. R11 receives the packet from R31 and performs a route lookup for the packet destined for 10.4.4.1. R11 locates the 10.4.4.0/24 network with the next-hop IP address 192.168.100.41. R11 checks the NHRP cache and locates the entry for the 192.168.100.41/32 address. R11 forwards the packet to R41, using the NBMA IP address 172.16.41.1, found in the NHRP cache.
The packet is then forwarded out the same tunnel interface (same network id / DMVPN cloud) and hub detects this as hairpinning.
R11 has ip nhrp redirect configured on the tunnel interface and recognizes that the packet received from R31 hairpinned out of the tunnel interface. R11 sends an NHRP redirect to R31, indicating the packet source 10.3.3.1 and destination 10.4.4.1. The NHRP redirect indicates to R31 that the traffic is using a suboptimal path.
Step 3. R31 receives the NHRP redirect and sends an NHRP resolution request to R11 for the 10.4.4.1 address. Inside the NHRP resolution request, R31 provides its protocol (tunnel IP) address, 192.168.100.31, and source NBMA address, 172.16.31.1. R41 performs a route lookup for 10.3.3.1 and finds the entry 10.3.3.0/24 with the next-hop IP address 192.168.100.11. R41 encapsulates the packet destined for 10.4.4.1 and forwards it to R11 out the tunnel 100 interface.
Step 4. R11 receives the packet from R41 and performs a route lookup for the packet destined for 10.3.3.1. R11 locates the 10.3.3.0/24 network with the next-hop IP address 192.168.100.31. R11 checks the NHRP cache and locates an entry for 192.168.100.31/32. R11 forwards the packet to R31, using the NBMA IP address 172.16.31.1, found in the NHRP cache. The packet is then forwarded out the same tunnel interface. R11 has ip nhrp redirect configured on the tunnel interface and recognizes that the packet received from R41 hairpinned out the tunnel interface. R11 sends an NHRP redirect to R41, indicating the packet source 10.4.4.1 and destination 10.3.3.1 The NHRP redirect indicates to R41 that the traffic is using a suboptimal path. R11 forwards R31’s NHRP resolution requests for the 10.4.4.1 address.
Step 5. R41 sends an NHRP resolution request to R11 for the 10.3.3.1 address and provides its protocol (tunnel IP) address, 192.168.100.41, and source NBMA address, 172.16.41.1. R41 sends an NHRP resolution reply directly to R31, using the source information from R31’s NHRP resolution request. The NHRP resolution reply contains the original source information in R31’s NHRP resolution request as a method of verification and contains the client protocol address of 192.168.100.41 and the client NBMA address 172.16.41.1. (If IPsec protection is configured, the IPsec tunnel is set up before the NHRP reply is sent.)
Note
The NHRP reply is for the entire subnet rather than the specified host address.
Step 6. R11 forwards R41’s NHRP resolution requests for the 192.168.100.31 and 10.4.4.1 entries.
Step 7. R31 sends an NHRP resolution reply directly to R41, using the source information from R41’s NHRP resolution request. The NHRP resolution reply contains the original source information in R41’s NHRP resolution request as a method of verification and contains the client protocol address 192.168.100.31 and the client NBMA address 172.16.31.1. (Again, if IPsec protection is configured, the tunnel is set up before the NHRP reply is sent back in the other direction.)
A spoke-to-spoke DMVPN tunnel is established in both directions after step 7 is complete. This allows traffic to flow across the spoke-to-spoke tunnel instead of traversing the hub router.
shows the status of DMVPN tunnels on R31 and R41, where there are two new spoke-to-spoke tunnels (highlighted). The DLX entries represent the local (no-socket) routes. The original tunnel to R11 remains a static tunnel.
R31-Spoke# show dmvpn detail
Legend: Attrb --> S - Static, D - Dynamic, I - Incomplete
N - NATed, L - Local, X - No Socket
T1 - Route Installed, T2 - Nexthop-override
C - CTS Capable
# Ent --> Number of NHRP entries with same NBMA peer
NHS Status: E --> Expecting Replies, R --> Responding, W --> Waiting
UpDn Time --> Up or Down Time for a Tunnel
============================================================================
Interface Tunnel100 is up/up, Addr. is 192.168.100.31, VRF ""
Src./Dest. addr: 172.16.31.1/MGRE, Tunnel VRF ""
Protocol/Transport: "multi-GRE/IP", Protect ""
Interface State Control: Disabled
nhrp event-publisher : Disabled
IPv4 NHS:
192.168.100.11 RE NBMA Address: 172.16.11.1 priority = 0 cluster = 0
Type:Spoke, Total NBMA Peers (v4/v6): 3
# Ent Peer NBMA Addr Peer Tunnel Add State UpDn Tm Attrb Target Network
----- --------------- --------------- ----- -------- ----- -----------------
1 172.16.31.1 192.168.100.31 UP 00:00:10 DLX 10.3.3.0/24
2 172.16.41.1 192.168.100.41 UP 00:00:10 DT2 10.4.4.0/24
172.16.41.1 192.168.100.41 UP 00:00:10 DT1 192.168.100.41/32
1 172.16.11.1 192.168.100.11 UP 00:00:51 S 192.168.100.11/32
R41-Spoke# show dmvpn detail
! Output omitted for brevity
IPv4 NHS:
192.168.100.11 RE NBMA Address: 172.16.11.1 priority = 0 cluster = 0
Type:Spoke, Total NBMA Peers (v4/v6): 3
# Ent Peer NBMA Addr Peer Tunnel Add State UpDn Tm Attrb Target Network
----- --------------- --------------- ----- -------- ----- -----------------
2 172.16.31.1 192.168.100.31 UP 00:00:34 DT2 10.3.3.0/24
172.16.31.1 192.168.100.31 UP 00:00:34 DT1 192.168.100.31/32
1 172.16.41.1 192.168.100.41 UP 00:00:34 DLX 10.4.4.0/24
1 172.16.11.1 192.168.100.11 UP 00:01:15 S 192.168.100.11/32
show ip nhrp detail to view NHRP cache for R31 and R41. Notice the NHRP mappings router, rib, nho, and nhop. The flag rib nho indicates that the router has found an identical route in the routing table that belongs to a different protocol. NHRP has overridden the other protocol’s next-hop entry for the network by installing a next-hop shortcut in the routing table. The flag rib nhop indicates that the router has an explicit method to reach the tunnel IP address using an NBMA address and has an associated route installed in the routing table.
NHRP Mapping with Spoke-to-Hub Traffic
uses the optional detail keyword for viewing the NHRP cache information. The 10.3.3.0/24 entry on R31 and the 10.4.4.0/24 entry on R41 display a list of devices to which the router responded to resolution request packets and the request ID that they received.
R31-Spoke# show ip nhrp detail
10.3.3.0/24 via 192.168.100.31
Tunnel100 created 00:01:44, expire 01:58:15
Type: dynamic, Flags: router unique local
NBMA address: 172.16.31.1
Preference: 255
(no-socket)
Requester: 192.168.100.41 Request ID: 3
10.4.4.0/24 via 192.168.100.41
Tunnel100 created 00:01:44, expire 01:58:15
Type: dynamic, Flags: router rib nho
NBMA address: 172.16.41.1
Preference: 255
192.168.100.11/32 via 192.168.100.11
Tunnel100 created 10:43:18, never expire
Type: static, Flags: used
NBMA address: 172.16.11.1
Preference: 255
192.168.100.41/32 via 192.168.100.41
Tunnel100 created 00:01:45, expire 01:58:15
Type: dynamic, Flags: router used nhop rib
NBMA address: 172.16.41.1
Preference: 255
R41-Spoke# show ip nhrp detail
10.3.3.0/24 via 192.168.100.31
Tunnel100 created 00:02:04, expire 01:57:55
Type: dynamic, Flags: router rib nho
NBMA address: 172.16.31.1
Preference: 255
10.4.4.0/24 via 192.168.100.41
Tunnel100 created 00:02:04, expire 01:57:55
Type: dynamic, Flags: router unique local
NBMA address: 172.16.41.1
Preference: 255
(no-socket)
Requester: 192.168.100.31 Request ID: 3
192.168.100.11/32 via 192.168.100.11
Tunnel100 created 10:43:42, never expire
Type: static, Flags: used
NBMA address: 172.16.11.1
Preference: 255
192.168.100.31/32 via 192.168.100.31
Tunnel100 created 00:02:04, expire 01:57:55
Type: dynamic, Flags: router used nhop rib
NBMA address: 172.16.31.1 Preference: 255
DMVPN 2
DMVPN (Dynamic Multipoint Virtual Private Network) is a hub-and-spoke technology for site-to-site sites, the great advantage of DMVPN is scalability and direct spoke to spoke communication
DMVPN, we actually configure the tunnel interfaces as multipoint interfaces so that we can talk to multiple routers using the same tunnel interface, reducing the configuration and increasing the scale over point-to-point tunnels.
See that there is a transport IP addressing
Then there is overlay network over WAN (transport) that is multipoint GRE acting as a broadcast network, we can tell the broadcast nature by looking at Tunnel 1 Addressing
The default tunnel-type on Cisco routers is a GRE point-to-point. GRE is about as simple as a protocol gets.
EIGRP is distance vector routing protocol Initially it was Cisco proprietary protocol, but it was released to the Internet Engineering Task Force (IETF)
EIGRP uses a diffusing update algorithm (DUAL) to learn loop free paths DUAL also keeps loop-free backup paths for fast convergence
A lot of older protocols used hop count for path selection but that does not take into account link speed and total delay, EIGRP adds logic to the route-selection algorithm to use factors other than hop count alone
EIGRP uses ASN per process (ASN/Process)
Routers within the same domain must use the same metric calculation formula and exchange routes only with members of the same autonomous system (AS), if routing needs to be presented between 2 different EIGRP ASN / Process then router in the middle will need to redistribute between 2 ASN / Processes
For example R3 that is attached to 2 different ASN on 2 different processes does not transfer routes learned from one autonomous system into a different autonomous system
Current implementations of EIGRP support only IPv4 and IPv6.
EIGRP Terminology
Successor route
The route with the lowest path metric to reach a destination. The successor route for R1 to reach 10.4.4.0/24 on R4 is R1→R3→R4.
Successor
The first next-hop router for the successor route. R1’s successor for 10.4.4.0/24 is R3.
Feasible distance (FD)
The metric value for the lowest path metric to reach a destination. The feasible distance is calculated locally using the formula
The FD calculated by R1 for the 10.4.4.0/24 destination network is 3328 (that is, 256 + 256 + 2816).
Reported distance (RD)
Distance reported by a router to reach a destination. The reported distance value is the feasible distance of the advertising router.
R3 advertises the 10.4.4.0/24 destination network to R1 and R2 with an RD of 3072 (2816 + 256). R4 advertises the 10.4.4.0/24 destination network to R1, R2, and R3 with an RD of 2816.
Feasibility condition
For a route to be considered a backup route, the RD received for that route must be less than the FD calculated locally. This logic guarantees a loop-free path.
Feasible successor
Installed in the topology table only Acts as a loop-free backup path
A route that satisfies the feasibility condition is maintained as a backup route. The feasibility condition ensures that the backup route is loop free.
The route R1→R4 is the feasible successor because the RD of 2816 is lower than the FD of 3328 for the R1→R3→R4 path.
Topology Table
EIGRP contains a topology table
The topology table contains all the network prefixes advertised within an EIGRP autonomous system including backup paths and not just contains metric per prefix but hop count also
Values used to calculate the metric BDRLM (Bandwidth , Delay , Reliability , Load , MTU)
show ip eigrp topology ! shows successor and feasible successor
!
show ip eigrp topology [all-links]
! shows successor and feasible successor all-links keyword shows the paths that did not pass the feasibility condition
Prefix 10.4.4.0/24 has cost or FD of 3328 for best path or successor route Successor route’s next hop router is called successor
second path that is feasible successor has RD of 2816 which is lower than FD of successor route, it passes the feasibility condition and is installed in topology table
The 10.4.4.0/24 route is passive (P), which means the topology is stable. During a topology change, routes go into an active (A) state when computing a new path.
EIGRP Neighbors
EIGRP neighbors exchange the entire routing table when forming an adjacency, and they advertise incremental updates only as topology changes occur within a network and no periodic updates
Inter-Router Communication
EIGRP uses IP protocol number (88) uses multicast packets where possible to reduce bandwidth consumed on the links; it uses unicast packets when necessary EIGRP uses Reliable Transport Protocol (RTP) to ensure that packets are delivered instead of TCP A sequence number is included in each EIGRP packet. The sequence value zero does not require a response from the receiving EIGRP router; all other values require an ACK packet that includes the original sequence number All update, query and reply packets are deemed reliable hello and ACK packets do not require acknowledgment If the originating router does not receive an ACK packet from the neighbor before the retransmit timeout expires, it notifies the non-acknowledging router to stop processing its multicast packets
Communication between routers is done with multicast using the group address 224.0.0.10 or the MAC address 01:00:5e:00:00:0a when possible
Opcode Value
Packet Type
Function
1
Update
Used to transmit routing and reachability information with other EIGRP neighbors
2
Request
Used to get specific information from one or more neighbors
3
Query
Sent out to search for another path during convergence
4
Reply
Sent in response to a query packet
5
Hello
Used for discovery of EIGRP neighbors and for detecting when a neighbor is no longer available
Forming EIGRP Neighbors
Hello messages are exchanged to become neighbors
The following parameters must match for the two routers to become neighbors:
Metric formula K values
Primary subnet matches
Autonomous system number (ASN) matches
Authentication parameters
EIGRP Configuration Modes
EIGRP configuration modes: classic mode and named mode.
EIGRP Named Mode
EIGRP named mode provides a hierarchical configuration and stores settings in three subsections:
Address Family: This submode contains settings that are relevant to the global EIGRP AS operations, such as selection of network interfaces, EIGRP K values, logging settings, and stub settings.
Interface: This submode contains settings that are relevant to the interface, such as hello advertisement interval, split-horizon, authentication, and summary route advertisements. In actuality, there are two methods of the EIGRP interface section’s configuration. Commands can be assigned to a specific interface or to a default interface, in which case those settings are placed on all EIGRP-enabled interfaces. If there is a conflict between the default interface and a specific interface, the specific interface takes priority over the default interface.
Topology: This submode contains settings regarding the EIGRP topology database and how routes are presented to the router’s RIB. This section also contains route redistribution and administrative distance settings.
EIGRP named configuration makes it possible to run multiple instances under the same EIGRP process
Step 1. Initialize the EIGRP process by using the command router eigrp process-name. (If a number is used for process-name, the number does not correlate to the autonomous system number.)
Step 2. Initialize the EIGRP instance for the appropriate address family with the command address-family {IPv4 | IPv6} {unicast | vrf vrf-name} autonomous-system as-number.
Step 3. Enable EIGRP on interfaces by using the command network network wildcard-mask.
EIGRP Network Statement
Network statement enrolls interfaces in EIGRP and sends hellos on those interfaces
If wildcard is omitted then any interfaces that fall under the classful boundary are added in EIGRP, secondary networks are not added, if we want secondary networks in EIGRP then they need to be redistributed
router eigrp 1
network 10.0.0.10 0.0.0.0
network 10.0.0.0 0.0.0.255
network 10.0.0.0 0.255.255.255
network 0.0.0.0 255.255.255.255 ! enable on all interfaces
show ip eigrp interfaces [{interface-id [detail] | detail}]
R1# show ip eigrp interfaces
EIGRP-IPv4 Interfaces for AS(100)
Xmit Queue PeerQ Mean Pacing Time Multicast Pending
Interface Peers Un/Reliable Un/Reliable SRTT Un/Reliable Flow Timer Routes
Gi0/2 0 0/0 0/0 0 0/0 0 0
Gi0/1 1 0/0 0/0 10 0/0 50 0
Lo0 0 0/0 0/0 0 0/0 0 0
R2# show ip eigrp interfaces gi0/1 detail
EIGRP-IPv4 VR(EIGRP-NAMED) Address-Family Interfaces for AS(100)
Xmit Queue PeerQ Mean Pacing Time Multicast Pending
Interface Peers Un/Reliable Un/Reliable SRTT Un/Reliable Flow Timer Routes
Gi0/1 1 0/0 0/0 1583 0/0 7912 0
Hello-interval is 5, Hold-time is 15
Split-horizon is enabled
Next xmit serial <none>
Packetized sent/expedited: 2/0
Hello's sent/expedited: 186/2
Un/reliable mcasts: 0/2 Un/reliable ucasts: 2/2
Mcast exceptions: 0 CR packets: 0 ACKs suppressed: 0
Retransmissions sent: 1 Out-of-sequence rcvd: 0
Topology-ids on interface - 0
Authentication mode is not set
Topologies advertised on this interface: base
Topologies not advertised on this interface:
Fields explaination
Xmt Queue – Un/Reliable
Number of unreliable/reliable packets remaining in the transmit queue. The value zero is an indication of a stable network.
Mean SRTT
Average time for a packet to be sent and a received from neighbor in milliseconds.
Pending Routes
Number of routes in the transmit queue that need to be sent.
R1# show ip eigrp neighbors
EIGRP-IPv4 Neighbors for AS(100)
H Address Interface Hold Uptime SRTT RTO Q Seq
(sec) (ms) Cnt Num
0 10.12.1.2 Gi0/1 13 00:18:31 10 100 0 3
Fields explaination
RTO
Timeout for retransmission (waiting for ACK)
Q Cnt
Number of packets (update/query/reply) in queue for sending
Seq Num
Sequence number that was last “received” from this router
show ip route eigrp
R1# show ip route eigrp
Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2
i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
ia - IS-IS inter area, * - candidate default, U - per-user static route
o - ODR, P - periodic downloaded static route, H - NHRP, l - LISP
a - application route
+ - replicated route, % - next hop override, p - overrides from PfR
Gateway of last resort is not set
10.0.0.0/8 is variably subnetted, 5 subnets, 2 masks
D 10.22.22.0/24 [90/3072] via 10.12.1.2, 00:19:25, GigabitEthernet0/1
192.168.2.0/32 is subnetted, 1 subnets
D 192.168.2.2 [90/2848] via 10.12.1.2, 00:19:25, GigabitEthernet0/1
R2# show ip route eigrp
! Output omitted for brevity
Gateway of last resort is not set
10.0.0.0/8 is variably subnetted, 5 subnets, 2 masks
D 10.11.11.0/24 [90/15360] via 10.12.1.1, 00:20:34, GigabitEthernet0/1
192.168.1.0/32 is subnetted, 1 subnets
D 192.168.1.1 [90/2570240] via 10.12.1.1, 00:20:34, GigabitEthernet0/1
EIGRP routes have administrative distance (AD) of 90 and are indicated in the routing table with a D External EIGRP routes have an AD of 170 and are indicated in the routing table with D EX
The metrics for R2’s routes are different from the metrics from R1’s routes. This is because R1’s classic EIGRP mode uses classic metrics, and R2’s named mode uses “wide metrics” “by default”
Router ID
The router ID (RID) is a 32-bit number that uniquely identifies an EIGRP router and is used as a loop-prevention mechanism. The RID can be set dynamically, which is the default, or manually.
The algorithm for dynamically choosing the EIGRP RID uses the highest IPv4 address of any up loopback interfaces. If there are not any up loopback interfaces, the highest IPv4 address of any active up physical interfaces becomes the RID when the EIGRP process initializes.
Some network topologies must advertise a network segment into EIGRP but need to prevent neighbors because it stops sending hello and process received hellos
for example, when advertising access layer networks in a campus topology.
R1# configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
R1(config)# router eigrp 100
R1(config-router)# passive-interface gi0/2
R1(config)# router eigrp 100
R1(config-router)# passive-interface default
04:22:52.031: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.12.1.2
(GigabitEthernet0/1) is down: interface passive
R1(config-router)# no passive-interface gi0/1
*May 10 04:22:56.179: %DUAL-5-NBRCHANGE: EIGRP-IPv4 100: Neighbor 10.12.1.2
(GigabitEthernet0/1) is up: new adjacency
For a named mode configuration, you place the passive-interface state on af-interface default for all EIGRP interfaces or on a specific interface with the af-interfaceinterface-id
R2# configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
R2(config)# router eigrp EIGRP-NAMED
R2(config-router)# address-family ipv4 unicast autonomous-system 100
R2(config-router-af)# af-interface gi0/2
R2(config-router-af-interface)# passive-interface
R2# show run | section router eigrp
router eigrp EIGRP-NAMED
!
address-family ipv4 unicast autonomous-system 100
!
af-interface default
passive-interface
exit-af-interface
!
af-interface GigabitEthernet0/1
no passive-interface
exit-af-interface
!
topology base
exit-af-topology
network 0.0.0.0
exit-address-family
A passive interface does not appear in the output of the command show ip eigrp interfaces even though it was enabled but appears under “show ip protocols” command as passive. Connected networks for passive interfaces are still added to the EIGRP topology table so that they are advertised to neighbors.
show ip protocols command also shows K values set for EIGRP, RID and information such as interfaces enabled for EIGRP, passive interfaces and neighbors
R1# show ip protocols
! Output omitted for brevity
Routing Protocol is "eigrp 100"
Outgoing update filter list for all interfaces is not set
Incoming update filter list for all interfaces is not set
Default networks flagged in outgoing updates
Default networks accepted from incoming updates
EIGRP-IPv4 Protocol for AS(100)
Metric weight K1=1, K2=0, K3=1, K4=0, K5=0
Soft SIA disabled
NSF-aware route hold timer is 240
Router-ID: 192.168.1.1
Topology : 0 (base)
Active Timer: 3 min
Distance: internal 90 external 170
Maximum path: 4
Maximum hopcount 100
Maximum metric variance 1
Automatic Summarization: disabled
Maximum path: 4
Routing for Networks:
10.11.11.1/32
10.12.1.1/32
192.168.1.1/32
Passive Interface(s):
GigabitEthernet0/2
Loopback0
Routing Information Sources:
Gateway Distance Last Update
10.12.1.2 90 00:21:35
Distance: internal 90 external 170
Authentication
Hash is a one way function and cannot be reversed or decrypted A password on an EIGRP router is hashed and sent with EIGRP packet once it is received on neighbor, neighbor also hashes its password and then compare it with received hash, if both has match then packet is accepted and if they do not match then EIGRP packet is discarded
Keychain Configuration
Keychain creation is accomplished with the following steps:
Step 1. Create the keychain by using the command key chain key-chain-name. Step 2. Identify the key sequence by using the command key key-number, where key-number can be anything from 0 to 2147483647. Step 3. Specify the preshared password by using the command key-string password.
classic configuration, authentication must be enabled on the interface
R1# show ip eigrp interface detail
EIGRP-IPv4 Interfaces for AS(100)
Xmit Queue PeerQ Mean Pacing Time Multicast Pending
Interface Peers Un/Reliable Un/Reliable SRTT Un/Reliable Flow Timer Routes
Gi0/1 0 0/0 0/0 0 0/0 50 0
Hello-interval is 5, Hold-time is 15
Split-horizon is enabled
Next xmit serial <none>
Packetized sent/expedited: 10/1
Hello's sent/expedited: 673/12
Un/reliable mcasts: 0/9 Un/reliable ucasts: 6/19
Mcast exceptions: 0 CR packets: 0 ACKs suppressed: 0
Retransmissions sent: 16 Out-of-sequence rcvd: 1
Topology-ids on interface - 0
Authentication mode is md5, key-chain is "EIGRPKEY"
Path Metric Calculation
Metric calculation uses bandwidth and delay by default but can include interface load and reliability, too
A common misconception is that the K values directly apply to bandwidth, load, delay, or reliability; this is not accurate. For example, K1 and K2 both reference bandwidth (BW).
BW represents the slowest link in the path in Kbps
Delay is the total measure of delay in the path, measured in tens of microseconds (μs).
By default, K1 and K3 each has a value of 1, and K2, K4, and K5 are all set to 0
The EIGRP update packet includes path attributes associated with each prefix. The EIGRP path attributes can include hop count, cumulative delay, minimum bandwidth link speed, and RD. The attributes are updated each hop along the way
Notice that the hop count increments, minimum bandwidth decreases, total delay increases, and the RD changes with each EIGRP update.
Default EIGRP Interface Metrics for Classic Metrics
Interface Type
Link Speed (Kbps)
Delay
Metric
Serial
64
20,000 μs
40,512,000
T1
1544
20,000 μs
2,170,031
Ethernet
10,000
1000 μs
281,600
FastEthernet
100,000
100 μs
28,160
GigabitEthernet
1,000,000
10 μs
2816
TenGigabitEthernet
10,000,000
10 μs
512
R1# show ip eigrp topology 10.4.4.0/24
! Output omitted for brevity
EIGRP-IPv4 Topology Entry for AS(100)/ID(10.14.1.1) for 10.4.4.0/24
State is Passive, Query origin flag is 1, 1 Successor(s), FD is 3328
Descriptor Blocks:
10.13.1.3 (GigabitEthernet0/1), from 10.13.1.3, Send flag is 0x0
Composite metric is (3328/3072), route is Internal
Vector metric:
Minimum bandwidth is 1000000 Kbit
Total delay is 30 microseconds
Reliability is 252/255
Load is 1/255
Minimum MTU is 1500
Hop count is 2
Originating router is 10.34.1.4
10.14.1.4 (GigabitEthernet0/2), from 10.14.1.4, Send flag is 0x0
Composite metric is (5376/2816), route is Internal
Vector metric:
Minimum bandwidth is 1000000 Kbit
Total delay is 110 microseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1500
Hop count is 1
Originating router is 10.34.1.4
Wide Metrics
there is not a differentiation between an 11 Gbps interface and a 20 Gbps interface.
EIGRP includes support for a second set of metrics, known as wide metrics, that addresses the issue of scalability with higher-capacity interfaces.
The interface delay varies from router to router, depending on the following logic:
If the interface’s delay was specifically set, the value is converted to picoseconds. Interface delay is always configured in tens of microseconds and is multiplied by 107 for picosecond conversion.
If the interface’s bandwidth was specifically set, the interface delay is configured using the classic default delay, converted to picoseconds. The configured bandwidth is not considered when determining the interface delay. If delay was configured, this step is ignored.
If the interface supports speeds of 1 Gbps or less and does not contain bandwidth or delay configuration, the delay is the classic default delay, converted to picoseconds.
If the interface supports speeds over 1 Gbps and does not contain bandwidth or delay configuration, the interface delay is calculated by 1013/interface bandwidth.
R1# show ip protocols | include AS|K
EIGRP-IPv4 Protocol for AS(100)
Metric weight K1=1, K2=0, K3=1, K4=0, K5=0
R2# show ip protocols | include AS|K
EIGRP-IPv4 VR(EIGRP-NAMED) Address-Family Protocol for AS(100)
Metric weight K1=1, K2=0, K3=1, K4=0, K5=0 K6=0 <<<
Existence of K6 proves use of named EIGRP
Metric Backward Compatibility
EIGRP wide metrics were designed with backward compatibility in mind. EIGRP wide metrics set K1 and K3 to a value of 1 and set K2, K4, K5, and K6 to 0, which allows backward compatibility because the K value metrics match with classic metrics. As long as K1 through K5 are the same and K6 is not set, the two metric styles allow adjacency between routers.
Using a mixture of classic metric and wide metric devices could lead to suboptimal routing, so it is best to keep all devices operating with the same metric style.
Why set delay and not bandwidth
Bandwidth modification with the interface parameter command bandwidthbandwidth has a similar effect on the metric calculation formula but can impact other routing protocols, such as OSPF, at the same time. Modifying the interface delay only impacts EIGRP.
R1# show interfaces gigabitEthernet 0/1 | i DLY
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
R2# show interfaces gigabitEthernet 0/1 | i DLY
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
show interface interface-id. The output displays the EIGRP interface delay, in microseconds
R1# configure terminal
R1(config)# interface gi0/1
R1(config-if)# delay 100
R1(config-if)# do show interface Gigabit0/1 | i DLY
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 1000 usec,
Custom K Values
K values for the path metric formula are set with the command metric weights TOS K1 K2 K3 K4 K5 [K6] under the EIGRP process. TOS always has a value of 0, and K6 is used for named mode configurations.
To ensure consistent routing logic in an EIGRP autonomous system, the K values must match between EIGRP neighbors to form an adjacency and exchange routes. The K values are included as part of the EIGRP hello packet.
Load Balancing
EIGRP allows multiple successor routes (with the same metric) to be installed into the RIB called ECMP, the default maximum ECMP setting is four routes
R1# show run | section router eigrp
router eigrp 100
maximum-paths 6
network 0.0.0.0
R2# show run | section router eigrp
router eigrp EIGRP-NAMED
!
address-family ipv4 unicast autonomous-system 100
!
topology base
maximum-paths 6
exit-af-topology
network 0.0.0.0
eigrp router-id 192.168.2.2
exit-address-family
Unequal Cost Load Balancing
EIGRP supports unequal-cost load balancing, which allows installation of both successor routes and feasible successors into the EIGRP RIB. To use unequal-cost load balancing change EIGRP’s variance multiplier.
Variance Value is Feasible distance (FD) for a route multiplied by the EIGRP variance multiplier Any feasible successor’s FD with a metric below the EIGRP variance up to the maximum number of ECMP routes value is installed into the RIB
There is a way to find exact variance to use
Dividing the feasible successor metric by the successor route metric provides the variance multiplier.
The variance multiplier is a whole number, and any remainders should always round up.
the minimum EIGRP variance multiplier can be calculated so that the direct path from R1 to R4 can be installed into the RIB. The FD for the successor route is 3328, and the FD for the feasible successor is 5376. The formula provides a value of about 1.6 and is always rounded up to the nearest whole number to provide an EIGRP variance multiplier of 2
R1# show ip route eigrp | begin Gateway
Gateway of last resort is not set
10.0.0.0/8 is variably subnetted, 10 subnets, 2 masks
D 10.4.4.0/24 [90/5376] via 10.14.1.4, 00:00:03, GigabitEthernet0/2
[90/3328] via 10.13.1.3, 00:00:03, GigabitEthernet0/1
R1# show ip route 10.4.4.0
Routing entry for 10.4.4.0/24
Known via "eigrp 100", distance 90, metric 3328, type internal
Redistributing via eigrp 100
Last update from 10.13.1.3 on GigabitEthernet0/1, 00:00:35 ago
Routing Descriptor Blocks:
* 10.14.1.4, from 10.14.1.4, 00:00:35 ago, via GigabitEthernet0/2
Route metric is 5376, traffic share count is 149
Total delay is 110 microseconds, minimum bandwidth is 1000000 Kbit
Reliability 255/255, minimum MTU 1500 bytes
Loading 1/255, Hops 1
10.13.1.3, from 10.13.1.3, 00:00:35 ago, via GigabitEthernet0/1
Route metric is 3328, traffic share count is 240
Total delay is 30 microseconds, minimum bandwidth is 1000000 Kbit
Reliability 254/255, minimum MTU 1500 bytes
Loading 1/255, Hops 2
Traffic share count is a ratio used for load-sharing This means traffic is load-balanced unequally:
So traffic is split roughly as:
~62% via 10.13.1.3
~38% via 10.14.1.4
The better path always gets more traffic.
To get equal traffic share counts the metrics must be equal
Once variance is configured, traffic sharing is automatic
in above, only vbond authenticates to the vmanage, every thing else authenticates to the vbond including vsmart and all wan edges
All are assigned certificate from vmanage but they all authenticate to vbond except vbond itself which has to authenticate with vmanage as there is nothing else
Step 1. First we install vmanage and add vbond to vmanage vmanage then issues certificate to vbond vmanage and vbond then perform mutual certificate based authentication and establish a management channel indicated by the grey arrow
Step 2. Then we add vsmart to vmanage and vmanage then issues certificate to the vsmart you will see after step 2 vsmart information is uploaded to vbond (so vsmart can first authenticate to vbond) vsmart then contacts and authenticates with vbond after authentication vsmart will have management channel with vbond and vmanage
At this stage, if we have more vsmarts, they will learn about other vsmarts from vbond
Step 3. Either vmanage can sync with your smart account and download the list of devices or we can use the serial file method which is offline method of importing devices
once device list has been uploaded to vmanage, it uploads this device list to all controllers (vbond and vsmart) at this point all the controllers are aware of all the wan edge devices which will join
Step 4 When the wan edge device comes up it gets DHCP ip and contacts ZTP on a pre-defined URL ZTP in this case is cisco’s online server that will have all the licenses generated will redirect the wan edge to organisation’s vbond wan edge will authenticate with vbond vbond will inform the wan edge about how to get to vmanage and vsmart
Step 5. wan edge will go and authenticate with vmanage and establish the management channel
Step 6. wan edge will go and authenticate with vsmart and establish the OMP channel
Step 7. wan edge will establish the IPSec tunnel with other wan edge routers
TLOC = System IP + Color + Encapsulation protocol
There are 3 kinds of routes OMP routes TLOC routes Service routes
TLOC is maintained using BFD, if a TLOC goes down then all routes associated to that TLOC are removed just like next hop interface
BFD does more than reachability check, it checks for Loss (completely no response) , delay (delayed response) , jitter (variation in delay) as well also called path quality, these path quality metrics are then used in application aware routing
If there is a second vsmart, wan edge will have another omp peering with that vsmart
VPN number is tagged in the IPSec header so other router can land that traffic in same VPN
Configuration is not only pushed to wan edge devices but also to the vsmart vsmart is also considered a managed device like wan edge router since it needs to be added to vmanage and applied configuration through the template etc just like wan edge device once template is applied, devices go in something called vmanage mode and then we cannot configure devices from CLI (initially you can configure devices from CLI but once managed by vmanage you cannot)
Device template > feature template
As can be seen above Centralized control policy (vsmart) Centralized data policy (wan edge) Localized control policy (wan edge) Localized data policy (wan edge)
Centralized control policy is used to create different types of topologies
Centralized data policy is like route-map that is applied on interface effects the data packet directly, we can match packets based on packet header or application based matching which relies on deep packet inspection and take actions such as dropping packets, QoS classification, policing, change next hop and so on – but this is pushed by vsmart and lives in wan edge memory and does not really get added to the device local configuration, remember that from central keyword, anything that is centralized, its policies are in wan edge’s memory and not in the wan edge config
Localized control policy – this is effective or configured on the service side only, so if OSPF and BGP is running on LAN of the wan edge, localized control policy is needed
Localized Data policy is very similar to the Centralized Data policy, only difference is that is configuration is pushed and becomes part of wan edge configuration and is per interface
make sure when connecting vbond device to switch, it is connected using ge0/0 instead of eth0 this will save you a lot of troubleshooting time when standing up vBond
In the File Explorer, navigate to: Control Panel\System and Security\Administrative Tools Double-click Services. This same task can be completed by entering services.msc in the Windows Run dialog (Windows Key + R).
In the Services list, right-click on Windows Time and click Stop. Note: The Windows Time service may already be stopped. In this case, skip this step and go to the next step to Update the Windows Registry
Update the Windows Registry to Create a Local NTP Service
Launch Windows Run (Windows Key + R). Enter regedit and click OK.
Navigate to the registry key: Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time\Parameters
If you do not see LocalNTP REG_DWORD in the list, create it using the following steps. Right-click in the Registry Editor, select New, select DWORD and enter LocalNTP (note that this name is case sensitive).
Double-click LocalNTP, change the Value data to 1, select a Base of Hexadecimal , and click OK. Do not close the Registry Editor because it is used in the following steps.
Update the Windows Registry to Configure the Time Provider
Navigate to the registry key: Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time\TimeProviders Select NtpServer, double-click Enabled, change the Value Data to 1, select a Base of Hexadecimal and click OK.
Do not close the Registry Editor because it is used in the following steps.
Update the Windows Registry to Configure the Announce Flags
Navigate to the registry key: Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time\Config Double-click AnnounceFlags, change the Value data to 5, select a Base of Hexadecimal, and click OK. Close the Registry Editor.
Start the Local Windows NTP Time Service
In the File Explorer, navigate to: Control Panel\System and Security\Administrative Tools Double-click Services. In the Services list, right-click on Windows Time and configure the following settings: Startup type: Automatic Service Status: Start OK
Finally, enable UDP port 123 on the Windows firewall for incoming connections.
In Search find Firewall in Windows Defender… Go to Incoming rules In the right column, select New rule… Select the rule Port Enter UDP port 123 and click Next Select Allow connection and click Next Select all domains Enter the rule name, e.g. Local NTP server, and click Finish.
The local NTP Time Server configuration is now complete. You now can synchronize the time of other computers and devices on your local network.
now visit http://[serverFQDN]/certsrv http://win-vq08g6u98gf.or2.sys.cisco/certsrv/
SDWAN Standup
default username and password for vbond is admin/admin
vManage requires second hard disk in vCenter
We should know this if we are deploying for onprem environment
it needs to be 100G minimum
Make sure it is the master
During setup we can see the additional disk we added
vManage system configuration
Assign vmanage second hard drive , if this has not been done already
cd /opt/unetlab/addons/qemu/vtmgmt-20.9.1
/opt/qemu/bin/qemu-img create -f qcow2 virtiob.qcow2 100G
/opt/unetlab/wrappers/unl_wrapper -a fixpermissions
show version
conf t
system
host-name vManage
system-ip 1.1.255.11
clock timezone Europe/London
site-id 255
organization-name or2.sys.cisco
vbond vbond.or.sys.cisco
ntp server ntp.or2.sys.cisco
! it is important to have ntp server and
! have all controllers and devices with same
! time because we are doing a lot of certificate
! based authentication
! vbond IP is the only controller that you
! define on each all SDWAN devices whether
! controllers or vedge if you have 2 vBond
! then it is good to add FQDN otherwise IP
! address is ok, reason is that on controllers
! we cannot define two different vbond IP addresses
! always commit the configuration
commit
show running-config
vmanage(config-ntp)# do show running-config ! to see the commmitted configuration
system
host-name vmanage
admin-tech-on-failure
no vrrp-advt-with-phymac
aaa
auth-order local radius tacacs
usergroup basic
task system read
task interface read
!
usergroup global
!
usergroup netadmin
!
usergroup operator
task system read
task interface read
task policy read
task routing read
task security read
!
usergroup resource_group_admin
task system read
task interface read
!
usergroup resource_group_basic
task system read
task interface read
!
usergroup resource_group_operator
! check configuration of a section while in that section
vmanage(config)# system
vmanage(config-system)# show configuration ! t show uncommitted configuration but only under system section
system
host-name vManage
system-ip 1.1.255.11
site-id 255
organization-name or2.sys.cisco
clock timezone Europe/London
vbond vbond.or.sys.cisco
ntp
server ntp.or2.sys.cisco
version 4
exit
!
vSmart system configuration
conf t
system
host-name vSmart
system-ip 1.1.255.13
clock timezone Europe/London
site-id 255
organization-name or2.sys.cisco
vbond vbond.or.sys.cisco
ntp server ntp.or2.sys.cisco
vBond system configuration
conf t
system
host-name vBond
system-ip 1.1.255.12
clock timezone Europe/London
site-id 255
organization-name or2.sys.cisco
vbond vbond.or.sys.cisco local
! this local keyword converts the vedge to vbond role
ntp server ntp.or2.sys.cisco
DNS server on Windows Server
Then create DNS A records
Add A record for vbond in or.sys.cisco domain
If we have a second vbond and it needs to be added then add another entry for “vbond” same as above but with different IP, multiple vbonds or vbond redundancy is supported by DNS roundrobin (default)
Add A record for ntp in or”2″.sys.cisco
Configure the (WAN , Transport) interface of SDWAN controllers
These interfaces are configured under VPN 0 and they are used to access the GUI by admins and outbound to edge routers communication using NETCONF (vmanage), for OMP peering (vsmart) and onboarding (vbond)
There is no such thing as LAN interface for these controllers
In Cisco cedge devices we do not have VPN0 instead transport uses Global routing table or default non vrf routing table
Configure vmanage vpn0 Transport Interface (vpn0)
conf t
vpn 0
interface eth0
ip address 1.1.0.11/24
no shutdown
no tunn
! we keep the tunnel interface down for now as it is used to deal with overlay or fabric till we have basic connectivity up
! while within the vpn0 configure default route
ip route 0.0.0.0/0 1.1.0.1
dns 172.16.32.11 ! configure this DNS if your vmanage has reachability to internet for automatic sync of device serial numbers from internet rather than offline import of serial number file, "Sync Smart Account" button rather than "Upload WAN Edge List" button
You cannot have interface ip same as system ip so they both need to be different
vManage(config)# commit Aborted: ‘vpn 0 interface eth0 ip address’: Interface eth0 with address 1.1.0.11/24 & System IP 1.1.0.11 cannot be same in vpn 0
Configure vsmart Transport Interface (vpn0)
conf t
vpn 0
interface eth0
ip address 1.1.0.13/24
no shutdown
no tunn
! we keep the tunnel interface down for now as it is used to deal with overlay or fabric
! while within the vpn0 configure default route
ip route 0.0.0.0/0 1.1.0.1
dns 172.16.32.11
Configure vbond Transport Interface (vpn0)
conf t
vpn 0
interface ge0/0
! Option 1: we need to keep this tunnel interface down for vbond's own onboarding to work
no tunnel-interface
! or
! Option 2: bring up tunnel interface but allow some services on it
vpn 0
interface ge0/0
tunnel-interface
allow-service sshd
allow-service dns
allow-service ntp
Allowed service are both inbound and outbound
such as NTP will be outbound but SSH will be inbound
!--------------------------------
vpn 0
interface ge0/0no tunnel-interface
ip address 1.1.0.12/24
no shutdown
! while within the vpn0 router, configure default route
ip route 0.0.0.0/0 1.1.0.1
dns 172.16.32.11
ping vbond.or.sys.cisco
Download CA Certificate
Download in Base64 format
Rename this to root_ca
Access vmanage GUI but make sure you do using IP address and not FQDN, using FQDN it does not work and simply spins and comes back to login screen
Login as admin/Cisco123@
There is only one vmanage that is why we only see one on top summary
Upload root CA to all controllers’ trust store
WinSCP SFTP to the vManage
drag root.ca file to /home/admin folder
Do same for vSmart and vBond
Before adding certificate, make sure that basic system config is in place the configuration that we configured earlier
Install root CA certificate chain in Trust store of Controllers
so controllers are configured but we are missing very important bit
even though we configured Org name in command line, it does not get picked up automatically, so click edit to configure it
Add Organisation Name
or2.sys.cisco
Add vBond FQDN
vbond.or.sys.cisco
Controller Certificate Authorization mode
This is much simpler method as it uses Cisco’s Pre-installed Certificates
This root certificate can be same as the one added in the “trust store” earlier as this option is asking us to provide a root CA which will be used for “Authentication” for devices
this will tell other controllers vbond and vsmart to authenticate using this certificate
Now vmanage knows about the IP addresses of the controllers like authorization or whitelisting but they are not onboarded yet, before they can be onboarded on to fabric they need certificate that is signed by CA and this will be done using each controller CSR
Install certificates on vSmart and vBond through vManage
Generate CSR per controller from vmanage
Click on vManage three dots > generate CSR even vManage itself needs a certificate
CSR for vSmart and vBond is generated and installed on vSmart and vBond and it is then signed by our windows server CA, so when this cert is presented to vmanage, it can trust the presented cert and once certificates are “issued” by vmanage to vbond and vsmart, a certificate based mutual authentication will take place before controllers are added to fabric in vmanage
Click on download
same process is required for vmanage as well because vmanage also needs to issue certificate to itself
Copy and paste it to certsrv
Repeat same process of CSR generate for vsmart and vbond as well
Install Certificates
Follow same steps to install certificates on other controllers
in above screenshot we can see that “site ID” is still missing and “System IP” also This has to do with tunnel interface, as the “site ID and System IP” are exchanged over fabric so we need to bring up the tunnel interface with allowed services which are safe over WAN or internet such as HTTP and icmp etc, Allowed service are both inbound and outbound such as NTP will be outbound but SSH will be inbound
vManage tunnel interface
vpn 0
interface eth0
tunnel-interface ! DTLS tunnel
allow-service all ! only use all in lab for prod restrict services
allow-service sshd
allow-service ntp
allow-service dns
allow-service https
vSmart tunnel interface
vpn 0
interface eth0
tunnel-interface ! DTLS tunnel
allow-service all ! only use all in lab for prod restrict services
allow-service sshd
allow-service ntp
allow-service dns
allow-service https
vBond tunnel interface
vpn 0
interface ge0/0
tunnel-interface ! DTLS tunnel
encapsulation ipsec ! this is also required in case of vbond
allow-service all
after bringing up the tunnel interface we can see that system IP, hostname and site ID are present
we have successfully onboarded the controllers
vManage commands
show runn
conf t
system
show configuration
commit
show certificate root-ca-cert ! to see installed root-ca cert
show ntp associations
show run vpn 0
show control local-properties
vbond commands
show orchestrator connections
one DTLS connection per vmanage CPU core with vmanage
show orchestrator valid-vsmarts
first one is vmanage and other one is vsmart
vsmart commands
show control connections
Web certificate for vmanage
We will get the CSR
it needs to be signed by CA
for certificate to take effect, we need to reboot the vmanage under maintenance, device reboot
Edge device onboarding
Virtual cEdge devices themselves do not have any chassis numbers, in order to get those chassis number we need to go to software.cisco.com, on the portal we need to define the vbond IP (best to define FQDN for flexibility in serial file) and Org name.
There is a different process for hardware edge devices, For Virtual devices we specify how many virtual devices we want to generate chassis numbers for. If we have hardware routers then we have to enter the serial numbers, PID and certificate serial number from routers into the portal
Extracted serialFile_new.viptela
The file appears to be in a compressed binary format, starting with the byte sequence \x1f\x8b\x08, which is the standard signature for a GZIP-compressed file.
I’ll attempt to decompress the file and then read its contents. Let’s proceed.
The decompressed file starts with the text viptela_serial_file, which suggests it is a Viptela serial file (used in SD-WAN deployments). However, the rest of the content contains what looks like binary padding and possibly a tar archive header, indicated by terms like ustar.
This means the GZIP file likely contains a tar archive, not a plain text file. Let me try extracting the tar contents and see what files it contains.
The archive contains the following files:
viptela_serial_file – likely the main serial/license file.
viptela_serial_file.sig – likely a digital signature file to verify authenticity.
cisco_cert.cer – a Cisco certificate, probably used for validation.
I’ll extract and display the contents of the main viptela_serial_file first.
The viptela_serial_file contains JSON data representing serial information for Cisco SD-WAN devices. Here’s a summary of what’s inside:
✅ Key Information Extracted:
Version: 1.1
Organization:or2.sys.cisco
Overlay Network:SD-WAN-3 - 388033
vBond Controller:vbond.or.sys.cisco (DTLS port 12346)
Timestamp: 2022-02-25 00:06:49
✅ Chassis & Serial Numbers:
The file lists multiple devices, each with:
Chassis ID (example: 4567A82E-54D1-FA17-E1A4-302781B96194)
SKU: (e.g., VEDGE-CLOUD-DNA, C8000V, CSR1KV)
HWPID: Hardware Product ID
Serial Number: matches the chassis ID in each case
No SUDI (Secure Unique Device Identifier) certificates included (fields are empty).
Create new DNS zone
Since the .viptella serial file contains org name as or2.sys.cisco but it vbond profile has vbond FQDN as vbond.or.sys.cisco, we will create another DNS zone
Right-click on “Forward Lookup Zones” (or Reverse Lookup Zones if needed) ➔ New Zone…
In the New Zone Wizard:
Select Primary zone (if this is the main copy).
Choose whether to store the zone in Active Directory (recommended).
Select the replication scope – all DNS servers in the forest.
Enter your zone name or.sys.cisco
Choose how to handle dynamic updates – secure and non-secure.
Complete the wizard.
Generate .viptela serial file
These devices PID and serial numbers will be empty when you first create Smart account and virtual account, once have been assigned chassis numbers and associated to the org show up as green and “provisioned”
This section is where we define the vBond info with FQDN or IP and Org info
You define the PID of the device, quantity of devices and the vbond profile – this allowance will be added to our .viptela serial file
After submitting this wait for devices to be provisioned status
once all devices are provisioned, click on Controller profiles
Select the controller version
Upload .viptela serial file
once file is uploaded, it will be pushed by vmanage to all other controllers
If we go to devices now
you will see available devices, this serial file has some C8000v and vEdge devices
Get rid of this annoying error message
Login to vmanage CLI
vManage# request nms configuration-db update-admin-user
Enter current user name:neo4j
Enter current user password:password
Enter new user name:admin
Enter new user password:C0mplex30
configuration-db
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Successfully updated configuration database admin user
Successfully restarted NMS application server
Successfully restarted NMS data collection agent
vManage# Setting up watches.
Watches established.
When a Catalyst 8000V router is powered on for the first time, it boots up in AUTONOMOUS mode, as seen in the output below.
%BOOT-5-OPMODE_LOG: R0/0: binos: System booted in AUTONOMOUS mode
The router asks if you would like to enter the initial config dialog. We answer no. Just provide enable password and save configuration to NVRAM
% Please answer 'yes' or 'no'.
Would you like to enter the initial configuration dialog? [yes/no]: no
The enable secret is a password used to protect
access to privileged EXEC and configuration modes.
This password, after entered, becomes encrypted in
the configuration.
-------------------------------------------------
secret should be of minimum 10 characters and maximum 32 characters with
at least 1 upper case, 1 lower case, 1 digit and
should not contain [cisco]
-------------------------------------------------
Enter enable secret: ************
Confirm enable secret: ************
The following configuration command script was created:
enable secret 9 $9$uYATfwi9sBtruU$A4/FPncLMnru9Oo4oQjaF89yHqrCXDJBp**********
!
end
[0] Go to the IOS command prompt without saving this config.
[1] Return back to the setup without saving this config.
[2] Save this configuration to nvram and exit.
Enter your selection [2]: 2
Building configuration...
Guestshell destroyed successfully ommand to modify this configuration.
Press RETURN to get started!
Install root ca cert on edge just like controllers – so it can mutually authenticate the certificate that is presented by remote device
You should have the Root CA certificate on vBond named root_ca.cer
The easiest way to install the root certificate on a Catalyst 8000v router is by creating a local file directly on the router using TCLSH, as shown in the following example.
In the highlighted section, you should paste the root_ca.cer using the “cat root_ca.cer” command in vshell mode from vBond.
Now, it is time to reboot the router in CONTROLLER mode, which is required for SD-WAN. The router will notify you that a bootstrap configuration isn’t available, but we will continue anyway.
Router# controller-mode enable
Enabling controller mode will erase the nvram filesystem, remove all configuration files, and reload the box!
Ensure the BOOT variable points to a valid image
Continue? [confirm]
% Warning: Bootstrap config file needed for Day-0 boot is missing
Do you want to abort? (yes/[no]): no
Mode change success
After the reboot, the router will boot up in CONTROLLER mode, as shown in the output below.
Oct 22 16:30:59.812: %BOOT-5-OPMODE_LOG: R0/0: binos: System booted in CONTROLLER mode
The last step is to install the root certificate using the following command.
cEdge# request platform software sdwan root-cert-chain install bootflash:root_ca.cer
Uploading root-ca-cert-chain via VPN 0
Copying ... /bootflash/ROOTCA.pem via VPN 0
Updating the root certificate chain..
Successfully installed the root certificate chain
If everything has gone smoothly, you should see our Enterprise CA Root certificate installed on the router.
cEdge# show sdwan certificate root-ca-cert | in network
Issuer: C=US, ST=NY, L=NY, O=networkacademy-io, CN=root.certificate
Subject: C=US, ST=NY, L=NY, O=networkacademy-io, CN=root.certificate
Now we need vManage to issue certificate to vEdge
Pick on C8000v device from the devices, click on three dots and click on “Generate Bootstrap Configuration”
We have to configure basic IP addressesing and default route and system configuration we will also configure a DNS name for vBond, as recommended by Cisco.
config-transaction
hostname R1-cEdge
!
int GigabitEthernet1
ip address 1.1.1.1 255.255.255.0
no shut
!
ip route 0.0.0.0 0.0.0.0 1.1.1.250
!
system
system-ip 172.16.0.11
site-id 1
ip host vbond.or.sys.cisco 1.1.0.12 ! cisco recommends adding this host entry
organization-name or2.sys.cisco
vbond vbond.or.sys.cisco
commit
You should be able to ping the controllers at this point, If there is no IP connectivity between the WAN edge router and the controllers, there is no point in continuing further. You should troubleshoot the problem first.
sdwan
int GigabitEthernet1
tunnel-interface
color biz-internet
encapsulation ipsec
!
interface Tunnel 1 !----> this tunnel interface number should be same as physical interface
ip unnumbered GigabitEthernet1
tunnel source GigabitEthernet1
tunnel mode sdwan
!
int GigabitEthernet2
tunnel-interface
color mpls restrict
encapsulation ipsec
!
interface Tunnel 2
ip unnumbered GigabitEthernet2
tunnel source GigabitEthernet2
tunnel mode sdwan
Router is now ready to join overlay fabric
Before the cEdge router can be able to join the SD-WAN fabric, it must have a device certificate signed and installed by vManage
this is the common rule for both controllers and edge devices, anything that needs to join fabric, requires a certificate issued from vmanage and mutual authentication
Once you’ve done, you should see in the logs that vManage logs into the cEdge using NETCONF over SSH, generates a CSR, then signs it and install a device certificate. Then the cEdge router should establish an OMP peering with vSmart and start receiving TLOCs and OMP routes.
R1-cEdge#
*Jul 21 20:27:09.257: %SYS-5-CONFIG_P: Configured programmatically by process iosp_dmiauthd_conn_100001_vty_100001 from consol6
*Jul 21 20:27:09.523: %SYS-5-CONFIG_P: Configured programmatically by process iosp_dmiauthd_conn_100001_vty_100001 from console as admin on vty42946
*Jul 21 20:27:09.503: %DMI-5-CONFIG_I: R0/0: dmiauthd: Configured from NETCONF/RESTCONF by admin, transaction-id 558pong
*Jul 21 20:27:17.068: %SYS-5-CONFIG_P: Configured programmatically by process iosp_dmiauthd_conn_100001_vty_100001 from console as admin on vty4294l
*Jul 21 20:28:03.534: %DMI-5-AUTH_PASSED: R0/0: dmiauthd: User 'vmanage-admin' authenticated successfully from 1.1.255.11:36606 for netconf over s:
*Jul 21 20:28:29.847: %Cisco-SDWAN-R1-cEdge-action_notifier-6-INFO-1400002: Notification: 7/21/2025 20:28:29 security-install-rcc severity-level:mi1
*Jul 21 20:28:30.030: %DMI-5-AUTH_PASSED: R0/0: dmiauthd: User 'vmanage-admin' authenticated successfully from 1.1.255.11:36688 for netconf over s:
*Jul 21 20:28:43.152: %Cisco-SDWAN-R1-cEdge-action_notifier-6-INFO-1400002: Notification: 7/21/2025 20:28:43 security-install-certificate severity-1
*Jul 21 20:29:25.117: %Cisco-SDWAN-Router-OMPD-3-ERRO-400002: vSmart peer 1.1.255.13 state changed to Init
*Jul 21 20:29:25.343: %DMI-5-AUTH_PASSED: R0/0: dmiauthd: User 'vmanage-admin' authenticated successfully from 1.1.255.11:36822 for netconf over ss
*Jul 21 20:29:27.205: %Cisco-SDWAN-Router-OMPD-6-INFO-400002: vSmart peer 1.1.255.13 state changed to Handshake
*Jul 21 20:29:27.218: %Cisco-SDWAN-Router-OMPD-5-NTCE-400002: vSmart peer 1.1.255.13 state changed to Up
*Jul 21 20:29:27.218: %Cisco-SDWAN-Router-OMPD-6-INFO-400005: Number of vSmarts connected : 1
*Jul 21 20:29:41.535: %DMI-5-AUTH_PASSED: R0/0: dmiauthd: User 'vmanage-admin' authenticated successfully from 1.1.255.11:36882 for netconf over s:
*Jul 21 20:30:01.736: %DMI-5-AUTH_PASSED: R0/0: dmiauthd: User 'vmanage-admin' authenticated successfully from 1.1.255.11:36928 for netconf over s:
*Jul 21 20:30:23.576: %DMI-5-AUTH_PASSED: R0/0: dmiauthd: User 'vmanage-admin' authenticated successfully from 1.1.255.11:37006 for netconf over s:
*Jul 21 20:30:33.557: %DMI-5-AUTH_PASSED: R0/0: dmiauthd: User 'vmanage-admin' authenticated successfully from 1.1.255.11:37052 for netconf over s:
*Jul 21 20:30:43.535: %DMI-5-AUTH_PASSED: R0/0: dmiauthd: User 'vmanage-admin' authenticated successfully from 1.1.255.11:37078 for netconf over s:
*Jul 21 20:30:48.611: %DMI-5-AUTH_PASSED: R0/0: dmiauthd: User 'vmanage-admin' authenticated successfully from 1.1.255.11:37108 for netconf over s:
R1-cEdge#show sdwan control connections
PEER PEER
PEER PEER PEER SITE DOMAIN PEER PRIV PEER PUB
TYPE PROT SYSTEM IP ID ID PRIVATE IP PORT PUBLIC IP PORT ORGANIZA
----------------------------------------------------------------------------------------------------------------------------------------------------
vsmart dtls 1.1.255.13 255 1 1.1.0.13 12446 1.1.0.13 12446 or2.sys.
vbond dtls 0.0.0.0 0 0 1.1.0.12 12346 1.1.0.12 12346 or2.sys.
vmanage dtls 1.1.255.11 255 0 1.1.0.11 12846 1.1.0.11 12846 or2.sys.
R1-cEdge#
*Jul 21 20:27:09.257: %SYS-5-CONFIG_P: Configured programmatically by process iosp_dmiauthd_conn_100001_vty_100001 from consol6
*Jul 21 20:27:09.523: %SYS-5-CONFIG_P: Configured programmatically by process iosp_dmiauthd_conn_100001_vty_100001 from console as admin on vty42946
*Jul 21 20:27:09.503: %DMI-5-CONFIG_I: R0/0: dmiauthd: Configured from NETCONF/RESTCONF by admin, transaction-id 558pong
*Jul 21 20:27:17.068: %SYS-5-CONFIG_P: Configured programmatically by process iosp_dmiauthd_conn_100001_vty_100001 from console as admin on vty4294l
*Jul 21 20:28:03.534: %DMI-5-AUTH_PASSED: R0/0: dmiauthd: User 'vmanage-admin' authenticated successfully from 1.1.255.11:36606 for netconf over s:
*Jul 21 20:28:29.847: %Cisco-SDWAN-R1-cEdge-action_notifier-6-INFO-1400002: Notification: 7/21/2025 20:28:29 security-install-rcc severity-level:mi1
*Jul 21 20:28:30.030: %DMI-5-AUTH_PASSED: R0/0: dmiauthd: User 'vmanage-admin' authenticated successfully from 1.1.255.11:36688 for netconf over s:
*Jul 21 20:28:43.152: %Cisco-SDWAN-R1-cEdge-action_notifier-6-INFO-1400002: Notification: 7/21/2025 20:28:43 security-install-certificate severity-1
*Jul 21 20:29:25.117: %Cisco-SDWAN-Router-OMPD-3-ERRO-400002: vSmart peer 1.1.255.13 state changed to Init
*Jul 21 20:29:25.343: %DMI-5-AUTH_PASSED: R0/0: dmiauthd: User 'vmanage-admin' authenticated successfully from 1.1.255.11:36822 for netconf over ss
*Jul 21 20:29:27.205: %Cisco-SDWAN-Router-OMPD-6-INFO-400002: vSmart peer 1.1.255.13 state changed to Handshake
*Jul 21 20:29:27.218: %Cisco-SDWAN-Router-OMPD-5-NTCE-400002: vSmart peer 1.1.255.13 state changed to Up
*Jul 21 20:29:27.218: %Cisco-SDWAN-Router-OMPD-6-INFO-400005: Number of vSmarts connected : 1
*Jul 21 20:29:41.535: %DMI-5-AUTH_PASSED: R0/0: dmiauthd: User 'vmanage-admin' authenticated successfully from 1.1.255.11:36882 for netconf over s:
*Jul 21 20:30:01.736: %DMI-5-AUTH_PASSED: R0/0: dmiauthd: User 'vmanage-admin' authenticated successfully from 1.1.255.11:36928 for netconf over s:
*Jul 21 20:30:23.576: %DMI-5-AUTH_PASSED: R0/0: dmiauthd: User 'vmanage-admin' authenticated successfully from 1.1.255.11:37006 for netconf over s:
*Jul 21 20:30:33.557: %DMI-5-AUTH_PASSED: R0/0: dmiauthd: User 'vmanage-admin' authenticated successfully from 1.1.255.11:37052 for netconf over s:
*Jul 21 20:30:43.535: %DMI-5-AUTH_PASSED: R0/0: dmiauthd: User 'vmanage-admin' authenticated successfully from 1.1.255.11:37078 for netconf over s:
*Jul 21 20:30:48.611: %DMI-5-AUTH_PASSED: R0/0: dmiauthd: User 'vmanage-admin' authenticated successfully from 1.1.255.11:37108 for netconf over s:
CE-01#show sdwan control local-properties
personality vedge
sp-organization-name or2.sys.cisco
organization-name or2.sys.cisco
root-ca-chain-status Installed
certificate-status Installed
certificate-validity Valid
certificate-not-valid-before Jul 7 05:58:30 2025 GMT
certificate-not-valid-after Jul 5 05:58:30 2035 GMT
enterprise-cert-status Not-Applicable
enterprise-cert-validity Not Applicable
enterprise-cert-not-valid-before Not Applicable
enterprise-cert-not-valid-after Not Applicable
dns-name vbond.or.sys.cisco
site-id 250
domain-id 1
protocol dtls
tls-port 0
system-ip 192.168.254.1
chassis-num/unique-id C8K-A1AD735C-C4D2-CE60-6D88-01686AD4ED52
serial-num 588AA845
subject-serial-num N/A
enterprise-serial-num No certificate installed
token Invalid
keygen-interval 1:00:00:00
retry-interval 0:00:00:16
no-activity-exp-interval 0:00:00:20
dns-cache-ttl 0:00:02:00
port-hopped TRUE
time-since-last-port-hop 0:00:30:51
embargo-check success
number-vbond-peers 1
INDEX IP PORT
-----------------------------------------------------
0 172.16.101.14 12346
number-active-wan-interfaces 1
NAT TYPE: E -- indicates End-point independent mapping
A -- indicates Address-port dependent mapping
N -- indicates Not learned
Note: Requires minimum two vbonds to learn the NAT type
PUBLIC PUBLIC PRIVATE PRIVATE PRIVATE MAX RESTRICT/ LAM
INTERFACE IPv4 PORT IPv4 IPv6 PORT VS/VM COLOR STATE CNTRL CONTROL/ LR/LB CON
STUN F
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GigabitEthernet1 172.16.101.200 12366 172.16.101.200 :: 12366 1/1 biz-internet up 2 no/yes/no No/
R1-cEdge#show sdwan control connections
PEER PEER
PEER PEER PEER SITE DOMAIN PEER PRIV PEER PUB
TYPE PROT SYSTEM IP ID ID PRIVATE IP PORT PUBLIC IP PORT ORGANIZA
----------------------------------------------------------------------------------------------------------------------------------------------------
vsmart dtls 1.1.255.13 255 1 1.1.0.13 12446 1.1.0.13 12446 or2.sys.
vbond dtls 0.0.0.0 0 0 1.1.0.12 12346 1.1.0.12 12346 or2.sys.
vmanage dtls 1.1.255.11 255 0 1.1.0.11 12846 1.1.0.11 12846 or2.sys.
show run ! still works
show sdwan running-config
vbond command: show orchestrator valid-vedges
Platform Console
The last thing in running Catalyst 8000V in a virtual EVE-NG environment is to change the console method after attaching a device template.
Depending on your lab, you will most likely end up attaching a device template to the 8000V edge routers. What typically happens is that you lose access to the device via the console. This happens because, by default, the device boot up configured with the following command.
platform console serial
However, after you attach a template, vManage changes the console method to
platform console virtual
The “virtual” option defines that the 8000V router is accessed through the virtual VGA console of the hypervisor. To change the console method back to “serial,” you must configure a CLI add-on feature template and add it to the respective device template the router is attached to.
Changing IP address on WAN side of the edge device
I changed IP address on R1-cEdge on its WAN transport interface and it re-established connections to controllers and all control connections came up, I did not have to edit or change addresses in any of the controllers, that is good. I changed IP address from 1.1.1.1 to 1.1.1.2
ISR 4000 Conversion and Standup as SDWAN router for CCIE Hardware router provisioning exam topic
R2-cEdge standup over MPLS (apparent no reachability to controllers)
Most MPLS setup do not have the internet access unless you pay for it and then provider can provide default route from MPLS, it will have an INET-R1 router that will route traffic for 0.0.0.0/0 towards internet cloud and allow connectivity to controllers on internet to be reached via MPLS network
Traffic for MPLS prefixes will be routed towards MPLS router and for internet connectivity will be routed to internet
Add basic configuration on R2-cEdge
system
system-ip 172.16.0.12
site-id 1
organization-name or2.sys.cisco
vbond vbond.or.sys.cisco
hostname R2-cEdge
username admin privilege 15 secret 5 $1$dYK8$TukpN4hzNpia/JRlBkEjG.
ip host vbond.or.sys.cisco 1.1.0.12
ip route 0.0.0.0 0.0.0.0 10.0.1.1
interface GigabitEthernet2
ip address 10.0.1.2 255.255.255.252
no shutdown
no mop enabled
no mop sysid
negotiation auto
exit
interface Tunnel2
no shutdown
ip unnumbered GigabitEthernet2
tunnel source GigabitEthernet2
tunnel mode sdwan
exit
sdwan
interface GigabitEthernet2
tunnel-interface
encapsulation ipsec
color mpls
allow-service all
exit
exit
commit
Untick validate – this validate option will make device status as valid directly skipping invalid and staging state, if you dont want to bring device in production straight away then untick validate
Every time a new device is added to the WAN edge list, either via syncing from smart account or viptella serial file, we need to “send to controllers”, verify on vbond that new device is added to it
For non cisco viptella based vedge onboarding this section works, this settings allows vManage to issue the cert or vedge to use the vManage as the CA and we will keep it to default setting vManage signed
configure BR2-vEdge
conf t
system
system-ip 172.16.0.102
site-id 102
organization-name or2.sys.cisco
vbond 1.1.0.12
host-name BR2-vEdge
commit
vpn 0
int ge0/0
ip address 1.1.1.102/24
no shut
tunnel-interface
allow-service all
no shut
exit
ip route 0.0.0.0/0 1.1.1.250
commit
Make sure that we can ping hops and controllers
SFTP to vedge router and drag root.cer into it because with SFTP you dont have to worry about the CLI prompt to be linux
Enter vshell and make sure that root ca cert is present as a result of the previous transfer
BR2-vEdge# request root-cert-chain install /home/admin/root_ca.cer
Uploading root-ca-cert-chain via VPN 0
Copying ... /home/admin/root_ca.cer via VPN 0
Updating the root certificate chain..
Successfully installed the root certificate chain
Verify the root ca cert installation
BR2-vEdge# show certificate root-ca-cert | inc Subject
Subject: DC=cisco, DC=sys, DC=or2, CN=or2-WIN-VQ08G6U98GF-CA
Subject Public Key Info:
X509v3 Subject Key Identifier:
We want vManage to install the certificate as it is not installed
now we need to obtain the chassis number and token from vManage for one of the devices of type vEdge
request vedge-cloud activate chassis-number 4567a82e-54d1-fa17-e1a4-302781b96194 token eca16978e13744e2ac2edda6e33c9373
show certificate installed
show control connections
sort by state to see the installed edges and we will see latest vedge in there
color is set to default so we will set it to biz-internet
Control connections to vsmart and vmanage
BR3-cEdge
tclsh
puts [open "bootflash:root_ca.cer" w+] {
-----BEGIN CERTIFICATE-----
MIIDnzCCAoegAwIBAgIQYJ1ACvIQRIlBAEITkoGNuzANBgkqhkiG9w0BAQsFADBi
MRUwEwYKCZImiZPyLGQBGRYFY2lzY28xEzARBgoJkiaJk/IsZAEZFgNzeXMxEzAR
BgoJkiaJk/IsZAEZFgNvcjIxHzAdBgNVBAMTFm9yMi1XSU4tVlEwOEc2VTk4R0Yt
Q0EwHhcNMjUwNzA2MjE1MjA1WhcNMzAwNzA2MjIwMjA1WjBiMRUwEwYKCZImiZPy
LGQBGRYFY2lzY28xEzARBgoJkiaJk/IsZAEZFgNzeXMxEzARBgoJkiaJk/IsZAEZ
FgNvcjIxHzAdBgNVBAMTFm9yMi1XSU4tVlEwOEc2VTk4R0YtQ0EwggEiMA0GCSqG
SIb3DQEBAQUAA4IBDwAwggEKAoIBAQCr6cjaoJz3vzgHlQ1hzhuy5WfIL/Ao0isM
ltIaGL+Z+9WftM1hNh10YECbxR71+lIpQKyBQTXQz8Of4nycxHjoI3dQdUvEYb8H
fysDXh4lYjQ60x82e5c7f1KPbD+AOhC31Zw1dgReMlPIuaa9LK903+z0FRnuCHaI
EG/Z9uCmv3JC22NgL69hscZc+NUGymMy1iBPN8G4EBkgqNVZ+zlRf/adW0JxEdc6
Sy53bp586/fXziRTW++jgdnhvfpn+VJ+BdG88/rEgMl7PUQE95lq4dih7qx0+OXu
ihFwQQvFxvi3dyqWWc0C1RKHPHtYQFz8rRuBJrR+uzgc0lVhrNHdAgMBAAGjUTBP
MAsGA1UdDwQEAwIBhjAPBgNVHRMBAf8EBTADAQH/MB0GA1UdDgQWBBQ/bI8yZeKD
fgjmmeWorjGo25t5hzAQBgkrBgEEAYI3FQEEAwIBADANBgkqhkiG9w0BAQsFAAOC
AQEAdtt6aiABkDDg/mAlcZfFPHcqmEEvQaMPeBaUqvfZKNrFVO8GMb9kingZJ62n
K05x5wE3tHy3jBmAl6eHZ/nUjXS11C06NwZMHpcDhty5BcDN08oEYdLF24upisNA
aRLOBhyEtKI9VKLAWfMkpWYEd/dqgVWs67GjAFT0Osgva9QHbz24iT6/c09jbZMt
41opmxacw8FFZcHMH9Afv1fIW9PwscrdlgjSSHR4XQLyDbyuDGsolzeh9PUVyPOd
f+/LYkLwH9jVcHlxl4Oy7MHRPtcbG9T3+vQGLjSAXu3Ybrl2R9Tn/sz5lYs44EEB
mqCxT00LxB3et6jAxJlEyE5vCw==
-----END CERTIFICATE-----
}
exit
controller-mode enable
request platform software sdwan root-cert-chain install bootflash:root_ca.cer
hostname BR3-cEdge
system
system-ip 172.16.0.103
site-id 103
organization-name or2.sys.cisco
vbond vbond.or.sys.cisco
ip host vbond.or.sys.cisco 1.1.0.12
ip route 0.0.0.0 0.0.0.0 1.1.1.250
interface GigabitEthernet1
ip address 1.1.1.103 255.255.255.0
no shutdown
no mop enabled
no mop sysid
negotiation auto
interface Tunnel1
no shutdown
ip unnumbered GigabitEthernet1
tunnel source GigabitEthernet1
tunnel mode sdwan
exit
sdwan
interface GigabitEthernet1
tunnel-interface
encapsulation ipsec
color biz-internet
allow-service all
exit
exit
commit
request platform software sdwan vedge_cloud activate chassis-number C8K-FF74B9C0-47EC-6B46-6F06-B63A33303C0F token 3d4817593d9e42d19092a8a7804051aa
Filter Onboarded Nodes
Type “In Sync” in filter on top however it is typed such as “in ync” and onboarded nodes will show
Web Interface
Control Status tells us about the down control connections
Site health shows us status of all the IPSec VPN tunnels between sites
This tells us that BR1-cEdge has 3 tunnels up out of 4, one is down due to BR2-vEdge being down
R1-cEdge and R2-cEdge has 2 tunnels since both are part of same site and they have tunnels to internet based 2 branch sites (out of 3 as site 102 is down)
If we click on number 4 we see
then navigate to “Tunnels” and you will see all the tunnels from one router to remote routers
Navigate to “Real Time” > Device Options: Tunnel BFD Statistics
Inventory and CPU, memory and hardware health
If we click on 4 we see this
Onboarding cEdge “CSR1K” because vEdge is crashing, may be because of version 20.6.1, download new version
First stop the PnP service so that the SD-WAN software packages can install
pnpa service discovery stop
Once the PnP service has been stopped, we tell the router to install all underlying SD-WAN packages if necessary. Depending on the CSR1k software image, this may not be necessary.
request platform software sdwan software reset
The last step is to verify the software image using the following command
see that the sdwan software is ACTIVE and CONFIRMED as highlighted below.
show sdwan soft
VERSION ACTIVE DEFAULT PREVIOUS CONFIRMED TIMESTAMP
---------------------------------------------------------------------------------
16.12.4.0.4480 true true false user 2022-04-03T08:20:13-00:00
Total Space:388M Used Space:87M Available Space:297M
in newer CSR1000v versions we dont have to do above and we can directly do
controller-mode enable
Once the router loads up with the SD-WAN software, we can go ahead and configure the minimal configuration required to join the SD-WAN overlay fabric. Notice that when the cEdge router runs in Controller mode (basically SD-WAN mode), we enter the configuration mode using the “config-transaction” command instead of the well-known “configure terminal” or simply “conf t”.
Notice something very important – the Tunnel keyword in the “interface Tunnel” command should always be with a capital T. It is not like in a regular Cisco IOS where you can create a new tunnel using the “interface tunnel 1” command.
config-transaction
hostname cEdge
!
int GigabitEthernet1
ip address 39.3.1.1 255.255.255.0
no shut
!
int GigabitEthernet2
ip address 10.10.1.1 255.255.255.0
no shut
!
ip route 0.0.0.0 0.0.0.0 39.3.1.254
ip route 0.0.0.0 0.0.0.0 10.10.1.254
ip host vbond.networkacademy.io 10.1.1.10
!
system
system-ip 1.1.1.1
site-id 1
organization-name "networkacademy-io"
vbond vbond.networkacademy.io
commit
sdwan
int GigabitEthernet1
tunnel-interface
color biz-internet
encapsulation ipsec
!
int GigabitEthernet2
tunnel-interface
color mpls restrict
encapsulation ipsec
!
interface Tunnel 1
ip unnumbered GigabitEthernet1
tunnel source GigabitEthernet1
tunnel mode sdwan
!
interface Tunnel 2
ip unnumbered GigabitEthernet2
tunnel source GigabitEthernet2
tunnel mode sdwan
commit
Install root ca cert through tclsh and same steps can be followed as C8000v
Template Configuration on Edges and Controllers
Highlighted config is the one we need to configure the template Each “section” will need a “feature template”
Remember that we need to configure system, vpn 0 (routing table for transport) and interface feature templates
BR1-1-cEdge#show sdwan running-config
system
system-ip 172.16.0.101
site-id 101
admin-tech-on-failure
organization-name or2.sys.cisco
vbond vbond.or.sys.cisco
!
memory free low-watermark processor 68484
no service tcp-small-servers
no service udp-small-servers
platform console serial
platform qfp utilization monitor load 80
platform punt-keepalive disable-kernel-core
hostname BR1-1-cEdge
username admin privilege 15 secret 5 $1$3/FD$EA4V.gZeQ6hMyUG2ct/ax.
no ip finger
no ip rcmd rcp-enable
no ip rcmd rsh-enable
no ip dhcp use class
ip host vbond.or.sys.cisco 1.1.0.12
ip route 0.0.0.0 0.0.0.0 1.1.1.250
ip ssh version 2
no ip http server
ip http secure-server
ip nat settings central-policy
ip nat settings gatekeeper-size 1024
interface GigabitEthernet1
no shutdown
ip address 1.1.1.101 255.255.255.0
no mop enabled
no mop sysid
negotiation auto
exit
interface GigabitEthernet2
no shutdown
no mop enabled
no mop sysid
negotiation auto
exit
interface GigabitEthernet3
no shutdown
no mop enabled
no mop sysid
negotiation auto
exit
interface GigabitEthernet4
no shutdown
no mop enabled
no mop sysid
negotiation auto
exit
interface Tunnel1
no shutdown
ip unnumbered GigabitEthernet1
tunnel source GigabitEthernet1
tunnel mode sdwan
exit
aaa authentication enable default enable
aaa authentication login default local
aaa authorization console
aaa authorization exec default local
login on-success log
line aux 0
!
line con 0
stopbits 1
!
line vty 0 4
!
line vty 5 80
!
sdwan
interface GigabitEthernet1
tunnel-interface
encapsulation ipsec
color biz-internet
allow-service all
no allow-service bgp
allow-service dhcp
allow-service dns
allow-service icmp
no allow-service sshd
no allow-service netconf
no allow-service ntp
no allow-service ospf
no allow-service stun
allow-service https
no allow-service snmp
no allow-service bfd
exit
exit
appqoe
no tcpopt enable
no dreopt enable
!
omp
no shutdown
graceful-restart
no as-dot-notation
address-family ipv4
advertise connected
advertise static
!
address-family ipv6
advertise connected
advertise static
!
!
!
licensing config enable false
licensing config privacy hostname false
licensing config privacy version false
licensing config utility utility-enable false
security
ipsec
integrity-type ip-udp-esp esp
!
!
sslproxy
no enable
rsa-key-modulus 2048
certificate-lifetime 730
eckey-type P256
ca-tp-label PROXY-SIGNING-CA
settings expired-certificate drop
settings untrusted-certificate drop
settings unknown-status drop
settings certificate-revocation-check none
settings unsupported-protocol-versions drop
settings unsupported-cipher-suites drop
settings failure-mode close
settings minimum-tls-ver TLSv1
dual-side optimization enable
!
Device Specific variables mean that value will be taken from us at the time when we attach the template to device Global means that all the devices that are attached to this template will inherit same static value
Each section of the running-config will require a feature template
Enhance ECMP Keyring when turned on, also considers the source and destination port to calculate the ECMP
DNS and Static IPv4 routes will come under the GRT
If devices models are different then each device model will need its own feature due to difference in interface names > Cisco VPN interface ethernet template
if this color does not have reachability to controllers such as MPLS connection then make Maximum Control Connections to 0 Setting Maximum Control Connections to 0 on MPLS only sites caused loss of control connections to all controllers and because of loss of connections caused rollback because MPLS was only connection to site
Maximum control connections allow sites to have no connection to controllers (not just vmanage, but vsmart and vbond also) from that color, but still have “data tunnels” from that color
Exclude Controller Group List: This is group of controllers that you dont want the edge to connect to, this is important when we dont want edge to connect to vsmart in far regions.
vManage Connection Preference: by default is 5, a link with higher preference is used to connect to vmanage in case we have 2x transports because only one vmanage connection is established
Port hop By default, WAN Edge devices (vEdge, C8000V) form control connections with controllers (vBond, vSmart, vManage) using: DTLS (UDP 12346) TLS (TCP 443) So normally, traffic will keep using those fixed ports.
When Port Hop is enabled, the “WAN Edge” will not stick to just a single fixed port. Instead, it will cycle through a range of ports if a connection attempt fails.
DTLS (UDP):
Starts with UDP/12346.
If blocked, it will try other ports in the UDP range 12346–12846.
It keeps retrying until it finds an open port.
TLS (TCP):
Starts with TCP/443.
If blocked, it will try other ports in the TCP range 443–12443.
Again, hops until success.
This makes control connections much more resilient in restrictive or dynamic network environments where firewalls are doing inspections and rate limiting traffic
Sometimes port hop can be issue
Control connections on the router, you see it is up from last 4 mins and 12 seconds. It will again retrigger after completing 5 mins
NDNA_c8000v#sh sdwan control connections
PEER PEER CONTROLLER
PEER PEER PEER SITE DOMAIN PEER PRIV PEER PUB GROUP
TYPE PROT SYSTEM IP ID ID PRIVATE IP PORT PUBLIC IP PORT ORGANIZATION LOCAL COLOR PROXY STATE UPTIME ID
------------------------------------------------------------------------------------------------------------------------------------------------------------
vsmart dtls 10.10.10.11 1 1 10.10.3.5 12646 17.23.12.11 12646 NDNA-111 gold No up 0:00:04:12 0
vsmart dtls 10.10.10.12 2 1 10.10.3.15 12646 17.23.12.25 12646 NDNA-111 gold No up 0:00:04:12 0
vmanage dtls 10.10.10.10 1 0 10.10.3.12 13046 17.23.12.88 13046 NDNA-111 gold No up 0:00:04:12 0
checked again after like a minute now and you will notice, it is showing 8 seconds now which means it is bounced again.
NDNA_c8000v#sh sdwan control connections
PEER PEER CONTROLLER
PEER PEER PEER SITE DOMAIN PEER PRIV PEER PUB GROUP
TYPE PROT SYSTEM IP ID ID PRIVATE IP PORT PUBLIC IP PORT ORGANIZATION LOCAL COLOR PROXY STATE UPTIME ID
------------------------------------------------------------------------------------------------------------------------------------------------------------
vsmart dtls 10.10.10.11 1 1 10.10.3.5 12646 17.23.12.11 12646 NDNA-111 gold No up 0:00:00:08 0
vsmart dtls 10.10.10.12 2 1 10.10.3.15 12646 17.23.12.25 12646 NDNA-111 gold No up 0:00:00:08 0
vmanage dtls 10.10.10.10 1 0 10.10.3.12 13046 17.23.12.88 13046 NDNA-111 gold No up 0:00:00:08 0
For troubleshooting, move the router to CLI mode First check the mode in which router is working, if we see below in red, the template is attached to the router which means the router is in controller mode.
Move the router from controller mode to CLI mode in order to do packet captures on the router. Although it is recommended to capture using vmanage datastream mode Once you moved, run the below script in order to capture the packets on the interface with the source and the destination IPs as shown below :
!
ip access-list extended CAP-Filter
10 permit ip host 10.10.1.23 host 17.23.12.88
20 permit ip host 17.23.12.88 host 10.10.1.23
exit
monitor capture CAP access-list CAP-Filter interface GigabitEthernet1 both buffer circular size 25
monitor capture CAP limit pps 1000000
monitor capture CAP access-list CAP-Filter both buffer circular size 25
monitor capture CAP start
monitor capture CAP stop
!
Now run below commands to get debugs
NDNA_c8000v# debug platform software sdwan vdaemon all high
NDNA_c8000v# monitor logging process vdaemon internal
Once you run the above commands, you will see logs related to the interfaces You will see that in debug logs , TLOC Disable … Why ?
2024/04/19 17:47:59.779970993 {vdaemon_R0-0}{255}: [event] [18342]: (debug): Disabling tloc GigabitEthernet1.
2024/04/19 17:47:59.780001093 {vdaemon_R0-0}{255}: [misc] [18342]: (ERR): Delta preference value added to TLOC pref.
2024/04/19 17:47:59.780003193 {vdaemon_R0-0}{255}: [misc] [18342]: (ERR): Sending TLOC: ifname:GigabitEthernet3 color:gold spi:18915 smarts:2 manages:1 state:DOWN LR encap:0 LR hold time:7000 bw:0, down-bw 0 range: 0-0,adapt period 0 up-bw range 0-0 up_fia 0 capability:0x3f
Check the interface for port-hop and you will see port-hop is enabled. Now disable the port hop and you will see the control connections will be stable
interface GigabitEthernet1
tunnel-interface
encapsulation ipsec weight 1
no border
color gold restrict
no last-resort-circuit
no low-bandwidth-link
no vbond-as-stun-server
vmanage-connection-preference 5
port-hop
Check the control connection after disabling port-hop on the interface , you will see it is up from last 19 min. and stable.
NDNA_c8000v#sh sdwan control connections
PEER PEER CONTROLLER
PEER PEER PEER SITE DOMAIN PEER PRIV PEER PUB GROUP
TYPE PROT SYSTEM IP ID ID PRIVATE IP PORT PUBLIC IP PORT ORGANIZATION LOCAL COLOR PROXY STATE UPTIME ID
------------------------------------------------------------------------------------------------------------------------------------------------------------
vsmart dtls 10.10.10.11 1 1 10.10.3.5 12646 17.23.12.11 12646 NDNA-111 gold No up 0:00:19:02 0
vsmart dtls 10.10.10.12 2 1 10.10.3.15 12646 17.23.12.25 12646 NDNA-111 gold No up 0:00:19:02 0
vmanage dtls 10.10.10.10 1 0 10.10.3.12 13046 17.23.12.88 13046 NDNA-111 gold No up 0:00:19:02 0
Now we can copy the template and also change its device model as well
Once you have changed the device model, make sure that interface names match, such as make sure that interface name is not GigabitEthernet0/0/0 and GigabitEthernet1, if it is different then change it inside template as well
on hardware models we also need to make sure that we have template for management gig0 interface to satisfy the requirement for device template on hardware platforms otherwise deployment fails, for managemet gig0 interface same template “Cisco VPN Interface Ethernet” is used and input its name from “show ip int brief”
Now create device template
This template is device specific + transport connectivity types specific
In case we have another transport interface, we can add another from plus icon next to the type of interface
In case we have to attach mgmt interface to avoid deployment errors on hardware device
Now we need to attach the device template to a device – C8000v that has internet only connectivity And you do that from the template itself
fill the variables with following information from the running-config of edge device
deployment failed and it rolled back to restore connectivity to vmanage as edge lost connectivity to vmanage and also other controllers
As I checked the template, the default route was missing from feature template FT_C8000V_GRT
after successful deployment I was not able to login, so new AAA policy was attached
now I can login
Whenever there is a change made on templates, these changes need to be pushed to the devices While making those changes there is an option to download the CSV and make bulk changes and then upload the CSV back This is very useful when you have large number of devices
When making changes there is an option on the bottom left corner Configure Device Rollback Timer
NTP Feature Template common for all edges
Login Banner Feature Template
Banner text new lines should be replaced with \n so it can be pasted in this box
************************************************************\n* *\n* WARNING: Authorized Access Only! *\n* *\n* This system is for the use of authorized users only. *\n* Any unauthorized access or use is prohibited and *\n* may be subject to criminal and civil penalties. *\n* *\n* All activities on this system are monitored. *\n* *\n************************************************************
Local Disk Logging Feature Template
As log messages are in /var/log for troubleshooting
In case Syslog server is inside Datacenter and not over the WAN transport then we have to change the below VPN number and change it from 0 to service side VPN / VRF number of local site / datacenter in which Syslog server lives
SNMP Feature Template
Templates on Controllers
Remember that we need to configure system, vpn 0 (routing table for transport) and interface feature templates
but when device type vManage and vSmart, template types are reduced
with vmanage and vsmart selected we can have common feature template for system and vpn
vedge cloud is applied on vbond
we are more limited in terms of template when we select vedge cloud, vmanage and vsmart
Lets configure template for vmanage
SDWAN – GUI
This transport health is of different transports to transports and by loss by default
We can see that these are BFD stats telling us that BR3-cEdge (branch 3) to BR2-cEdge (branch 2) there is Avg Loss of 28.539 %, this is per connection as compared to color to color stats shown in “Transport Health”
It is displayed by loss by default
Monitor > Geography shows geographical location of our sites / edges for now because we have not assigned any coordinates, it shows as blank
Monitor > Network shows all network devices and all of their information such as names, states, system ip, reachability, site id, bfd tunnels, control DTLS sessions, version, up since, device groups etc
Clicking on one of the devices takes us into the device
we can see hardware inventory, power supply and fan info – reboot menu – CPU and memory
Hardware Inventory
DPI Applications – when traffic passes through router, traffic discovered applications show here it is not showing as no traffic is passing through router
Interface shows all stats on interfaces
This is good place to check the admin / operational status of the interfaces
WAN throughput, Flows and Top Talkers as there are TCP optimisation features and are only available on hardware routers
It says “WAN Throughput is not applicable for C8000v”
It says “TCP Optimization Flows are not applicable for C8000v”
It says “Top Talkers is not applicable for C8000v”
WAN > TLOC
WAN > Tunnels
Control Connections
Events
Troubleshooting
Tunnel Health
Good for troubleshooting per tunnel health
Per tunnel health check for loss, latency and octets or bytes
App route visualisation
DPI is Deep Packet Inspection
This shows applications stats from site to site as previously we say per tunnel health, this options allows us to check beyond tunnels and applications stats after the tunnel
No filter option will show us stats for all traffic
Troubleshooting
It shows us that 1. edge was authorized by vbond 2. Software image update 3. Router configuration 4. Control plane connectivity established 5. Data plane connectivity established
in case edge is behind the firewall and firewall is blocking control plane connectivity
We also have option for ping and traceroute
Simulate flow allows us to see how our applications will route and which TLOCs it will pass through
Real time shows us any information that we can see on command line also
Some troubleshooting options are not available as we dont have Data stream enabled such as Packet Capture, Speed Test, and Debug Logs
We can add Alarm notification email as well
Now we move to events, Alarms are like syslogs generated and events are more detailed events
Audit log
We can see all the audit trail for who did what and what was pushed to which device
This comes in very handy as we can see ACL logs not just for one device but for the whole system
once a device is managed by vmanage, local CLI changes cannot be made on device nodes which are PNP or ZTP enrolled are already in vmanage mode as at time of enrolment a template is applied and devices are vmanage managed device manually enrolled are CLI managed as they are manually enrolled
one reason to convert from vmanage mode to CLI is to quickly test a command and then return device to vmanage mode, you dont even have to revert the changes made in CLI as we change back to vmanage mode, the changes made in CLI will disappear because configuration from template will be applied, if change test was successful then make that change part of the template
Device status must always be “In Sync”
To see if device communication is working with vmanage, pull its running configuration, if it works then we know that netconf over DTLS (control connection) is working between vmanage and edge
Template log is where we can see the changes
Decommission WAN edge – removes the edge device and puts chassis number back in the controllers so new virtual device can be assigned that chassis / token
Similar options for controllers
Templates Device Templates Feature Templates
Centralized Policy
Localized Policy
Security section where we can configure Zone Based Firewall etc
This is great when you want to quickly check something in SSH
Rediscover network if there is a difference in configuration between vmanage and edge device Rediscover edge device to sync all those changes
Generate Admin Tech for support Reset interface is to bounce the port
BR1-1-cEdge#
*Aug 24 02:18:57.042: %DMI-5-AUTH_PASSED: R0/0: dmiauthd: User 'vmanage-admin' authenticated successfully from 1.1.255.11:47394 for netconf over ssh. External groups:
BR1-1-cEdge#
*Aug 24 02:19:00.560: %Cisco-SDWAN-RP_0-VDAEMON-3-ERRO-500012: Device does not have an active connection to a vSmart controller
*Aug 24 02:19:04.058: %DMI-5-AUTH_PASSED: R0/0: dmiauthd: User 'vmanage-admin' authenticated successfully from 1.1.255.11:47412 for netconf over ssh. External groups:
BR1-1-cEdge#
*Aug 24 02:19:13.388: %DMI-5-AUTH_PASSED: R0/0: dmiauthd: User 'vmanage-admin' authenticated successfully from 1.1.255.11:47468 for netconf over ssh. External groups:
BR1-1-cEdge#
Request port hop color – Essentially, it forces a TLOC (Transport Locator) color hop so the device re-initiates connections using another WAN interface/color (for example: from biz-internet → public-internet, or mpls → lte). This is mostly used for troubleshooting and validating policies (e.g., checking failover between MPLS and Internet links).
Reset locked user is used to unlock admin
once a vmanage is switched from single tenant to multitenant then it cannot go back to single tenant
OMP , TLOCs and IPSec VPN
vSmart# show omp peers
R -> routes received
I -> routes installed
S -> routes sent
DOMAIN OVERLAY SITE
PEER TYPE ID ID ID STATE UPTIME R/I/S
------------------------------------------------------------------------------------------
172.16.0.11 vedge 1 1 1 up 0:04:15:03 0/0/0
172.16.0.12 vedge 1 1 1 up 0:03:16:13 0/0/0
172.16.0.101 vedge 1 1 101 up 0:02:45:09 0/0/0
172.16.0.102 vedge 1 1 102 up 0:04:15:21 0/0/0
172.16.0.103 vedge 1 1 103 up 0:04:14:59 0/0/0
172.16.0.111 vedge 1 1 101 up 0:02:45:31 0/0/0
R1-cEdge#show sdwan omp peers
R -> routes received
I -> routes installed
S -> routes sent
DOMAIN OVERLAY SITE
PEER TYPE ID ID ID STATE UPTIME R/I/S
------------------------------------------------------------------------------------------
1.1.255.13 vsmart 1 1 255 up 0:04:17:40 0/0/0
All the TLOCs known by router, two repeating system IPs means router has transports / colors
R1-cEdge#show sdwan omp tlocs
---------------------------------------------------
tloc entries for 172.16.0.11
biz-internet
ipsec
---------------------------------------------------
RECEIVED FROM:
peer 0.0.0.0
status C,Red,R
loss-reason not set
lost-to-peer not set
lost-to-path-id not set
Attributes:
attribute-type installed
encap-key not set
encap-proto 0
encap-spi 284
encap-auth sha1-hmac,ah-sha1-hmac
encap-encrypt aes256
public-ip 1.1.1.2
public-port 12366
private-ip 1.1.1.2
private-port 12366
public-ip ::
public-port 0
private-ip ::
private-port 0
bfd-status up << BFD status should be up
domain-id not set
site-id 1
overlay-id not set
preference 0
tag not set
stale not set
weight 1
version 3
gen-id 0x80000001
carrier default
restrict 0
on-demand 0
groups [ 0 ]
bandwidth 0
bandwidth-dmin 0
bandwidth-down 0
bandwidth-dmax 0
adapt-qos-period 0
adapt-qos-up 0
qos-group default-group
border not set
extended-ipsec-anti-replay not set
unknown-attr-len not set
---------------------------------------------------
tloc entries for 172.16.0.12
mpls
ipsec
---------------------------------------------------
RECEIVED FROM:
peer 1.1.255.13
status C,I,R
loss-reason not set
lost-to-peer not set
lost-to-path-id not set
Attributes:
attribute-type installed
encap-key not set
encap-proto 0
encap-spi 287
encap-auth sha1-hmac,ah-sha1-hmac
encap-encrypt aes256
public-ip 10.0.1.2
public-port 12406
private-ip 10.0.1.2
private-port 12406
public-ip ::
public-port 0
private-ip ::
private-port 0
bfd-status down
domain-id not set
site-id 1
overlay-id not set
preference 0
tag not set
stale not set
weight 1
version 3
gen-id 0x80000000
carrier default
restrict 0
on-demand 0
groups [ 0 ]
bandwidth 0
bandwidth-dmin 0
bandwidth-down 0
bandwidth-dmax 0
adapt-qos-period 0
adapt-qos-up 0
qos-group default-group
border not set
extended-ipsec-anti-replay not set
unknown-attr-len not set
---------------------------------------------------
tloc entries for 172.16.0.101
biz-internet
ipsec
---------------------------------------------------
RECEIVED FROM:
peer 1.1.255.13
status C,I,R
loss-reason not set
lost-to-peer not set
lost-to-path-id not set
Attributes:
attribute-type installed
encap-key not set
encap-proto 0
encap-spi 285
encap-auth sha1-hmac,ah-sha1-hmac
encap-encrypt aes256
public-ip 1.1.1.101
public-port 12386
private-ip 1.1.1.101
private-port 12386
public-ip ::
public-port 0
private-ip ::
private-port 0
bfd-status up
domain-id not set
site-id 101
overlay-id not set
preference 0
tag not set
stale not set
weight 1
version 3
gen-id 0x80000000
carrier default
restrict 0
on-demand 0
groups [ 0 ]
bandwidth 0
bandwidth-dmin 0
bandwidth-down 0
bandwidth-dmax 0
adapt-qos-period 0
adapt-qos-up 0
qos-group default-group
border not set
extended-ipsec-anti-replay not set
unknown-attr-len not set
---------------------------------------------------
tloc entries for 172.16.0.102
mpls
ipsec
---------------------------------------------------
RECEIVED FROM:
peer 1.1.255.13
status C,I,R
loss-reason not set
lost-to-peer not set
lost-to-path-id not set
Attributes:
attribute-type installed
encap-key not set
encap-proto 0
encap-spi 262
encap-auth sha1-hmac,ah-sha1-hmac
encap-encrypt aes256
public-ip 10.0.102.2
public-port 12426
private-ip 10.0.102.2
private-port 12426
public-ip ::
public-port 0
private-ip ::
private-port 0
bfd-status up
domain-id not set
site-id 102
overlay-id not set
preference 0
tag not set
stale not set
weight 1
version 3
gen-id 0x80000000
carrier default
restrict 0
on-demand 0
groups [ 0 ]
bandwidth 0
bandwidth-dmin 0
bandwidth-down 0
bandwidth-dmax 0
adapt-qos-period 0
adapt-qos-up 0
qos-group default-group
border not set
extended-ipsec-anti-replay not set
unknown-attr-len not set
---------------------------------------------------
tloc entries for 172.16.0.102
biz-internet
ipsec
---------------------------------------------------
RECEIVED FROM:
peer 1.1.255.13
status C,I,R
loss-reason not set
lost-to-peer not set
lost-to-path-id not set
Attributes:
attribute-type installed
encap-key not set
encap-proto 0
encap-spi 280
encap-auth sha1-hmac,ah-sha1-hmac
encap-encrypt aes256
public-ip 1.1.1.102
public-port 12366
private-ip 1.1.1.102
private-port 12366
public-ip ::
public-port 0
private-ip ::
private-port 0
bfd-status up
domain-id not set
site-id 102
overlay-id not set
preference 0
tag not set
stale not set
weight 1
version 3
gen-id 0x80000000
carrier default
restrict 0
on-demand 0
groups [ 0 ]
bandwidth 0
bandwidth-dmin 0
bandwidth-down 0
bandwidth-dmax 0
adapt-qos-period 0
adapt-qos-up 0
qos-group default-group
border not set
extended-ipsec-anti-replay not set
unknown-attr-len not set
---------------------------------------------------
tloc entries for 172.16.0.103
mpls
ipsec
---------------------------------------------------
RECEIVED FROM:
peer 1.1.255.13
status C,I,R
loss-reason not set
lost-to-peer not set
lost-to-path-id not set
Attributes:
attribute-type installed
encap-key not set
encap-proto 0
encap-spi 265
encap-auth sha1-hmac,ah-sha1-hmac
encap-encrypt aes256
public-ip 10.0.103.2
public-port 12366
private-ip 10.0.103.2
private-port 12366
public-ip ::
public-port 0
private-ip ::
private-port 0
bfd-status up
domain-id not set
site-id 103
overlay-id not set
preference 0
tag not set
stale not set
weight 1
version 3
gen-id 0x80000000
carrier default
restrict 0
on-demand 0
groups [ 0 ]
bandwidth 0
bandwidth-dmin 0
bandwidth-down 0
bandwidth-dmax 0
adapt-qos-period 0
adapt-qos-up 0
qos-group default-group
border not set
extended-ipsec-anti-replay not set
unknown-attr-len not set
---------------------------------------------------
tloc entries for 172.16.0.103
biz-internet
ipsec
---------------------------------------------------
RECEIVED FROM:
peer 1.1.255.13
status C,I,R
loss-reason not set
lost-to-peer not set
lost-to-path-id not set
Attributes:
attribute-type installed
encap-key not set
encap-proto 0
encap-spi 286
encap-auth sha1-hmac,ah-sha1-hmac
encap-encrypt aes256
public-ip 1.1.1.103
public-port 12426
private-ip 1.1.1.103
private-port 12426
public-ip ::
public-port 0
private-ip ::
private-port 0
bfd-status up
domain-id not set
site-id 103
overlay-id not set
preference 0
tag not set
stale not set
weight 1
version 3
gen-id 0x80000000
carrier default
restrict 0
on-demand 0
groups [ 0 ]
bandwidth 0
bandwidth-dmin 0
bandwidth-down 0
bandwidth-dmax 0
adapt-qos-period 0
adapt-qos-up 0
qos-group default-group
border not set
extended-ipsec-anti-replay not set
unknown-attr-len not set
---------------------------------------------------
tloc entries for 172.16.0.111
mpls
ipsec
---------------------------------------------------
RECEIVED FROM:
peer 1.1.255.13
status C,I,R
loss-reason not set
lost-to-peer not set
lost-to-path-id not set
Attributes:
attribute-type installed
encap-key not set
encap-proto 0
encap-spi 266
encap-auth sha1-hmac,ah-sha1-hmac
encap-encrypt aes256
public-ip 10.0.101.2
public-port 12406
private-ip 10.0.101.2
private-port 12406
public-ip ::
public-port 0
private-ip ::
private-port 0
bfd-status up
domain-id not set
site-id 101
overlay-id not set
preference 0
tag not set
stale not set
weight 1
version 3
gen-id 0x80000000
carrier default
restrict 0
on-demand 0
groups [ 0 ]
bandwidth 0
bandwidth-dmin 0
bandwidth-down 0
bandwidth-dmax 0
adapt-qos-period 0
adapt-qos-up 0
qos-group default-group
border not set
extended-ipsec-anti-replay not set
unknown-attr-len not set
Interval for BFD session is 1000 msec
R1-cEdge#show sdwan bfd sessions
SOURCE TLOC REMOTE TLOC DST PUBLIC DST PUBLIC DETECT TX
SYSTEM IP SITE ID STATE COLOR COLOR SOURCE IP IP PORT ENCAP MULTIPLIER INTERVAL(msec UPTIME TRANSITIONS
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
172.16.0.101 101 up biz-internet biz-internet 1.1.1.2 1.1.1.101 12386 ipsec 7 1000 10 0:23:51:18 0
172.16.0.102 102 up biz-internet biz-internet 1.1.1.2 1.1.1.102 12386 ipsec 7 1000 10 0:00:48:47 2
172.16.0.103 103 up biz-internet biz-internet 1.1.1.2 1.1.1.103 12426 ipsec 7 1000 10 0:03:48:08 0
172.16.0.111 101 up biz-internet mpls 1.1.1.2 10.0.101.2 12406 ipsec 7 1000 10 0:23:14:16 1
172.16.0.102 102 up biz-internet mpls 1.1.1.2 10.0.102.2 12426 ipsec 7 1000 10 0:13:47:48 0
172.16.0.103 103 up biz-internet mpls 1.1.1.2 10.0.103.2 12366 ipsec 7 1000 10 0:12:48:23 0
on ipsec outbound connections destination IP will be of remote routers
on the ipsec inbound connections, source IP will be of the remote routers
R1-cEdge#show sdwan ipsec inbound-connections
SOURCE SOURCE DEST DEST REMOTE REMOTE LOCAL LOCAL NEGOTIATED
IP PORT IP PORT TLOC ADDRESS TLOC COLOR TLOC ADDRESS TLOC COLOR ENCRYPTION ALGORITHM TC SPIs
--------------------------------------------------------------------------------------------------------------------------------------------------
1.1.1.101 12386 1.1.1.2 12366 172.16.0.101 biz-internet 172.16.0.11 biz-internet AES-GCM-256 8
10.0.102.2 12426 1.1.1.2 12366 172.16.0.102 mpls 172.16.0.11 biz-internet AES-GCM-256 8
1.1.1.102 12386 1.1.1.2 12366 172.16.0.102 biz-internet 172.16.0.11 biz-internet AES-GCM-256 8
10.0.103.2 12366 1.1.1.2 12366 172.16.0.103 mpls 172.16.0.11 biz-internet AES-GCM-256 8
1.1.1.103 12426 1.1.1.2 12366 172.16.0.103 biz-internet 172.16.0.11 biz-internet AES-GCM-256 8
10.0.101.2 12406 1.1.1.2 12366 172.16.0.111 mpls 172.16.0.11 biz-internet AES-GCM-256
Service Side VPN , Site Local LAN
Setup VPN 10 VRF
This is to redistribute connected routes in OMP
This is to redistribute static routes in OMP
These IPv4 routes are for pointing at the LAN side networks
ECMP Keyring can only be turned on in VPN0
Create following new VPN Ethernet Interface Feature templates Create Physical Interface GIG3 with IP address variable (so sites without dot1q switch can operate such as Branch 1) Create Physical Interface GIG3 without IP address so sites like Branch 2 and Branch 3 can do trunk interface on router with dot1q switch and finally create dot1q interface for Vlan 10 GIG3.10 with IP address and reduced MTU of 1496 to compensate for VLAN header on trunk
reduce the MTU to 1496
for dot1q interfaces we need to have Physical interface but without IP under VPN 0 and dot1q interface under service VPN
VPCS> ip 172.17.3.10 /25 172.17.3.1
Checking for duplicate address...
VPCS : 172.17.3.10 255.255.255.128 gateway 172.17.3.1
VPCS> ping 172.17.3.1
172.17.3.1 icmp_seq=1 timeout
84 bytes from 172.17.3.1 icmp_seq=2 ttl=255 time=1.393 ms
84 bytes from 172.17.3.1 icmp_seq=3 ttl=255 time=1.470 ms
84 bytes from 172.17.3.1 icmp_seq=4 ttl=255 time=1.429 ms
84 bytes from 172.17.3.1 icmp_seq=5 ttl=255 time=1.350 ms
VPCS> ping 172.17.3.10
172.17.3.10 icmp_seq=1 ttl=64 time=0.001 ms
172.17.3.10 icmp_seq=2 ttl=64 time=0.001 ms
172.17.3.10 icmp_seq=3 ttl=64 time=0.001 ms
172.17.3.10 icmp_seq=4 ttl=64 time=0.001 ms
172.17.3.10 icmp_seq=5 ttl=64 time=0.001 ms
VPCS> ping 172.17.2.1
84 bytes from 172.17.2.1 icmp_seq=1 ttl=254 time=6.716 ms
84 bytes from 172.17.2.1 icmp_seq=2 ttl=254 time=3.531 ms
84 bytes from 172.17.2.1 icmp_seq=3 ttl=254 time=2.678 ms
84 bytes from 172.17.2.1 icmp_seq=4 ttl=254 time=3.613 ms
84 bytes from 172.17.2.1 icmp_seq=5 ttl=254 time=3.625 ms
VPCS> ping 172.17.2.10
84 bytes from 172.17.2.10 icmp_seq=1 ttl=62 time=5.851 ms
84 bytes from 172.17.2.10 icmp_seq=2 ttl=62 time=2.274 ms
84 bytes from 172.17.2.10 icmp_seq=3 ttl=62 time=3.498 ms
84 bytes from 172.17.2.10 icmp_seq=4 ttl=62 time=3.398 ms
84 bytes from 172.17.2.10 icmp_seq=5 ttl=62 time=3.495 ms
VPCS>
VPCS> set pcname BR3-CLIENT
BR3-CLIENT>
BR3-CLIENT> save
Saving startup configuration to startup.vpc
. done
BR1-CLIENT> trace 172.17.2.10
trace to 172.17.2.10, 8 hops max, press Ctrl+C to stop
1 *172.17.1.1 0.351 ms 0.194 ms
2 *1.1.1.102 1.300 ms 1.543 ms
3 *172.17.2.10 5.741 ms (ICMP type:3, code:3, Destination port unreachable)
BR3-cEdge#show sdwan omp routes
Generating output, this might take time, please wait ...
Code:
C -> chosen
I -> installed
Red -> redistributed
Rej -> rejected
L -> looped
R -> resolved
S -> stale
Ext -> extranet
Inv -> invalid
Stg -> staged
IA -> On-demand inactive
U -> TLOC unresolved
PATH ATTRIBUTE
VPN PREFIX FROM PEER ID LABEL STATUS TYPE TLOC IP COLOR ENCAP PREFERENCE
--------------------------------------------------------------------------------------------------------------------------------------
10 172.17.2.0/25 1.1.255.13 9 1003 C,I,R installed 172.16.0.102 mpls ipsec -
1.1.255.13 10 1003 C,I,R installed 172.16.0.102 biz-internet ipsec -
10 172.17.3.0/25 0.0.0.0 66 1003 C,Red,R installed 172.16.0.103 mpls ipsec -
0.0.0.0 68 1003 C,Red,R installed 172.16.0.103 biz-internet ipsec -
BR3-cEdge#show sdwan omp routes 172.17.2.0/25 detail
---------------------------------------------------
omp route entries for vpn 10 route 172.17.2.0/25
---------------------------------------------------
RECEIVED FROM:
peer 1.1.255.13
path-id 9
label 1003
status C,I,R
loss-reason not set
lost-to-peer not set
lost-to-path-id not set
Attributes:
originator 172.16.0.102
type installed
tloc 172.16.0.102, mpls, ipsec
ultimate-tloc not set
domain-id not set
overlay-id 1
site-id 102
preference not set
tag not set
origin-proto connected
origin-metric 0
as-path not set
community not set
unknown-attr-len not set
RECEIVED FROM:
peer 1.1.255.13
path-id 10
label 1003
status C,I,R
loss-reason not set
lost-to-peer not set
lost-to-path-id not set
Attributes:
originator 172.16.0.102
type installed
tloc 172.16.0.102, biz-internet, ipsec
ultimate-tloc not set
domain-id not set
overlay-id 1
site-id 102
preference not set
tag not set
origin-proto connected
origin-metric 0
as-path not set
community not set
unknown-attr-len not set
BR3-cEdge#routing-context vrf 10
BR3-cEdge%10#ping 172.17.3.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.17.3.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/1 ms
BR3-cEdge%10#ping 172.17.3.10
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.17.3.10, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/2 ms
BR3-cEdge%10#ping 172.17.2.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.17.2.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/3 ms
BR3-cEdge%10#ping 172.17.2.10
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.17.2.10, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 2/3/6 ms
BR3-cEdge%10#exit
Bandwidth Tier is the bandwidth offered on edge devices by license starting from 50Mbps to 20Gbps aggregate (bandwidth combined uploads and download bandwidths of all interfaces) for example if you have 2 circuits of 100Mbps speed from ISP, your aggregate for WAN only will be 400Mbps – 200Mbps for one circuit and 200Mbps for another circuit and in that case we will need Tier 1 license offering 400Mbps of aggregate bandwidth
Then comes the DNA packages such as Essentials, Advantage and Premier Essentials cover most of the SDWAN features needed and recently cisco has also moved some features down from Advantage into Essentials package in order to stay competitive
HSEC is something we need to keep an eye out for Higher end routers will come with higher HSEC tier but still good to verify what is on the device
For larger environments it is good to get Cisco Enterprise Agreement as we can get a better deal on hundreds of edge devices
Recommended resources for vManage and controller numbers / sizing
This starts by defining how many edge devices we have in the deployment and based on number of edge devices guide suggests to have vCPUs / RAM and additional VMs needed
Less than 1500 edge nodes will need 1 vManage, anything above 1500 edge nodes will require 3x vManage VMs
All services is a persona on vManage called COMPUTE_AND_DATA which is basically all services A vManage with just a COMPUTE persona will only run vManage application, configuration and messaging but no Data statistics and vManage with with DATA stores statistics and data
Download software from cisco.com
we will select ova for ESXi VM
From version 20.8 onwards vManage minimum requires 500GB
and for new version of vManage – controller type should be SCSI and not IDE
make sure that organistaion matches exactly as mentioned in Cisco smart account otherwise there will be sync issues
BFD polling
Default BFD polling is 1000 msec or 1 sec
OMP parameters
If you ever have to make changes in OMP such as increase ECMP limit then perform it here
OMP timers
Shows what kind or routes are injected into OMP by default
Create loopback on MPLS routers and then advertise it on Transport side using BGP
Loopback interface
MPLS interface
Make sure color is set under tunnel section
Also make sure that Allow service all is enabled, otherwise BGP did not come up and I was troubleshooting it for long time, when testing telnet at port 179 I realised SDWAN router is not sending TCP response back to switch
BGP Configuration
MPLS#show run
Building configuration...
Current configuration : 2669 bytes
!
! Last configuration change at 02:42:24 UTC Mon Mar 9 2026
!
version 17.12
service timestamps debug datetime msec
service timestamps log datetime msec
!
hostname MPLS
!
boot-start-marker
boot-end-marker
!
!
no aaa new-model
!
!
!
!
!
!
!
!
!
!
!
!
!
ip audit notify log
ip audit po max-events 100
ip cef
login on-success log
no ipv6 cef
!
!
!
!
!
!
!
vtp version 1
multilink bundle-name authenticated
!
!
!
!
memory free low-watermark processor 80589
!
!
spanning-tree mode rapid-pvst
spanning-tree extend system-id
!
!
vlan internal allocation policy ascending
!
!
!
!
!
interface Ethernet0/0
description INTERNET SW
no switchport
ip address 172.31.255.253 255.255.255.252
ip ospf 1 area 1
!
interface Ethernet0/1
no switchport
ip address 172.31.255.249 255.255.255.252
ip ospf 1 area 1
!
interface Ethernet0/2
no switchport
ip address 172.31.255.245 255.255.255.252
ip ospf 1 area 1
!
interface Ethernet0/3
no switchport
ip address 172.31.255.241 255.255.255.252
ip ospf 1 area 1
!
interface Ethernet1/0
no switchport
ip address 172.31.255.237 255.255.255.252
ip ospf 1 area 1
!
interface Ethernet1/1
!
interface Ethernet1/2
!
interface Ethernet1/3
!
router ospf 1
router-id 172.31.255.254
redistribute bgp 10
passive-interface default
no passive-interface Ethernet0/0
!
router bgp 10
template peer-policy CE
send-community both
exit-peer-policy
!
template peer-session CE
ebgp-multihop 5
timers 5 10
exit-peer-session
!
bgp log-neighbor-changes
neighbor 172.31.255.238 remote-as 65104
neighbor 172.31.255.238 inherit peer-session CE
neighbor 172.31.255.242 remote-as 65103
neighbor 172.31.255.242 inherit peer-session CE
neighbor 172.31.255.246 remote-as 65102
neighbor 172.31.255.246 inherit peer-session CE
neighbor 172.31.255.250 remote-as 65102
neighbor 172.31.255.250 inherit peer-session CE
!
address-family ipv4
network 172.31.255.236 mask 255.255.255.252
network 172.31.255.240 mask 255.255.255.252
network 172.31.255.244 mask 255.255.255.252
network 172.31.255.248 mask 255.255.255.252
network 172.31.255.252 mask 255.255.255.252
neighbor 172.31.255.238 activate
neighbor 172.31.255.238 inherit peer-policy CE
neighbor 172.31.255.242 activate
neighbor 172.31.255.242 inherit peer-policy CE
neighbor 172.31.255.246 activate
neighbor 172.31.255.246 inherit peer-policy CE
neighbor 172.31.255.250 activate
neighbor 172.31.255.250 inherit peer-policy CE
exit-address-family
!
ip forward-protocol nd
!
!
ip http server
ip http secure-server
ip ssh bulk-mode 131072
!
!
!
!
!
!
control-plane
!
!
!
line con 0
logging synchronous
line aux 0
line vty 0 4
login
transport input ssh
!
!
end
Trunking configuration
This is the GIG3 template without IP variable – no IP address so we can configure trunking
This is GIG3.100 interface that will be trunking interface
but reduce the MTU on this interface by 4 bytes to 1496 to accomodate the VLAN tag
Now edit the device template
GIG3_NOIP will be assigned to VPN 0 transport VPN
And GIG3.100 will be assigned to the VPN 100 service VPN
VRRP configuration
Static route
Make sure that VPN supports redistribution of connected and “static”, if static is not enabled then static route will only be on specific router but rest of the routers or sites will not learn via omp
Also make sure that static route is marked as optional row
SW-1002#show ip int brief
Interface IP-Address OK? Method Status Protocol
Ethernet0/0 unassigned YES unset up up
Ethernet0/1 unassigned YES unset up up
Ethernet0/2 unassigned YES unset down down
Ethernet0/3 unassigned YES unset down down
Ethernet1/0 unassigned YES unset down down
Ethernet1/1 unassigned YES unset down down
Ethernet1/2 unassigned YES unset up up
Ethernet1/3 unassigned YES unset up up
Vlan100 172.16.2.11 YES manual up up
Vlan200 172.16.4.1 YES manual down down <<<
Vlan 200 SVI interface was down and not coming up because no access port is assigned to vlan 200
so I allowed vlan 200 on the uplinks to C8000 edge routers to bring vlan 200 interface up
SW-1002#show ip int brief
Interface IP-Address OK? Method Status Protocol
Ethernet0/0 unassigned YES unset up up
Ethernet0/1 unassigned YES unset up up
Ethernet0/2 unassigned YES unset down down
Ethernet0/3 unassigned YES unset down down
Ethernet1/0 unassigned YES unset down down
Ethernet1/1 unassigned YES unset down down
Ethernet1/2 unassigned YES unset up up
Ethernet1/3 unassigned YES unset up up
Vlan100 172.16.2.11 YES manual up up
Vlan200 172.16.4.1 YES manual up up <<<
C801-1002-DUAL#
ip route vrf 100 172.16.4.0 255.255.254.0 172.16.2.11
CSR-1004-MPLS#show sdwan omp route
Generating output, this might take time, please wait ...
Code:
C -> chosen
I -> installed
Red -> redistributed
Rej -> rejected
L -> looped
R -> resolved
S -> stale
Ext -> extranet
Inv -> invalid
Stg -> staged
IA -> On-demand inactive
U -> TLOC unresolved
PATH ATTRIBUTE
VPN PREFIX FROM PEER ID LABEL STATUS TYPE TLOC IP COLOR ENCAP PREFERENCE
--------------------------------------------------------------------------------------------------------------------------------------
100 172.16.0.0/23 22.22.22.22 6 1003 C,I,R installed 13.13.13.13 biz-internet ipsec -
100 172.16.2.0/23 22.22.22.22 7 1004 C,I,R installed 12.12.12.12 mpls ipsec -
22.22.22.22 8 1004 C,I,R installed 12.12.12.12 biz-internet ipsec -
22.22.22.22 19 1004 C,I,R installed 11.11.11.11 mpls ipsec -
22.22.22.22 20 1004 C,I,R installed 11.11.11.11 biz-internet ipsec -
100 172.16.4.0/23 >>> 22.22.22.22 19 1004 C,I,R installed 11.11.11.11 mpls ipsec -
>>> 22.22.22.22 20 1004 C,I,R installed 11.11.11.11 biz-internet ipsec -
>>> 22.22.22.22 27 1004 C,I,R installed 12.12.12.12 mpls ipsec -
>>> 22.22.22.22 29 1004 C,I,R installed 12.12.12.12 biz-internet ipsec -
100 172.16.8.0/23 0.0.0.0 66 1003 C,Red,R installed 16.16.16.16 mpls ipsec -
C801-1002-DUAL#show ip route vrf 100
Routing Table: 100
Gateway of last resort is not set
172.16.0.0/16 is variably subnetted, 5 subnets, 2 masks
m 172.16.0.0/23 [251/0] via 13.13.13.13, 03:50:11, Sdwan-system-intf
C 172.16.2.0/23 is directly connected, GigabitEthernet3.100
L 172.16.2.2/32 is directly connected, GigabitEthernet3.100
S 172.16.4.0/23 [1/0] via 172.16.2.11
m 172.16.8.0/23 [251/0] via 16.16.16.16, 03:50:11, Sdwan-system-intf
C801-1002-DUAL#
EIGRP Serviceside configuration
We will have to redistribute OMP routes into EIGRP in order to make sure that internal switch SW1 can ping remote site switches and remote destinations / subnets
We need to have EIGRP enabled on service side LAN interfaces and also on the loopback
one network for physical interface
another network for loopback interface
Now we need to specify the interface in GUI and that is for doing no passive interface
now we need to enable authentication
Rest of the configuration such as Hello time and hold time are left at defaults
Authentication
Attach EIGRP template to VPN
hello and hold time can be seen and also other EIGRP configuration that is being added
Neighborship on router will be on the vrf
but other remote sites are not learning EIGRP routes because we imported or redistributed OMP into EIGRP but not EIGRP into OMP
Now we are receiving EIGRP routes in OMP
router eigrp 1
network 172.16.2.1 0.0.0.0
network 172.16.3.1 0.0.0.0
network 172.16.16.1 0.0.0.0
redistribute connected
redistribute static route-map STATIC2EIGRP
passive-interface default
no passive-interface GigabitEthernet1/0/2
no passive-interface GigabitEthernet1/0/5
eigrp router-id 172.16.0.1
interface GigabitEthernet1/0/2
no switchport
ip address 172.16.2.1 255.255.255.252
ip authentication mode eigrp 1 md5
ip authentication key-chain eigrp 1 KEY_EIGRP
OSPF Serviceside configuration
Neighborship was not coming up so I had to add this in CLI template
interface GigabitEthernet3.100
ip ospf mtu-ignore
no logging console
platform console serial
SDWAN OSPF pushed configuration
router ospf 100 vrf 100
auto-cost reference-bandwidth 100
compatible rfc1583
distance ospf intra-area 110 inter-area 110 external 110
no local-rib-criteria
router-id 11.11.11.11
timers throttle spf 200 1000 10000
interface GigabitEthernet3.100
ip ospf 100 area 0
ip ospf authentication message-digest
ip ospf dead-interval 40
ip ospf hello-interval 10
ip ospf message-digest-key 1 md5 0 cisco
ip ospf network broadcast
ip ospf priority 1
ip ospf retransmit-interval 5
interface GigabitEthernet3.100 ! <<< coming from CLI template
ip ospf mtu-ignore
Switch OSPF configuration
router ospf 1
router-id 172.16.2.11
no auto-cost
area 0 authentication message-digest
! redistribute connected
passive-interface default
no passive-interface Vlan100
network 172.16.2.11 0.0.0.0 area 0
network 172.16.10.1 0.0.0.0 area 0
interface Vlan100
ip address 172.16.2.11 255.255.254.0
ip ospf authentication message-digest
ip ospf message-digest-key 1 md5 cisco
ip ospf mtu-ignore
Troubleshooting OMP route flow
This is much faster way of troubleshooting the routes instead of logging into each device CLI This is also a quicker way of finding out whether a route is blocked by a policy inbound or outbound
See if local router advertised it to vsmart or not
We can use filter to limit the results
now we go to vsmart
Check if vsmart received it
Check if vsmart advertised it to other edges
lets go to end router
check if received it
always pay attention to the status column to see if received routes have been installed or not and that could be because of TLOC being down or route being less preferred CIR means Chosen , Installed , Resolved
BFD configuration for transport facing IOS-XE peerings
Here we are talking about the IOS-XE BFD and not the BFD that runs over the overlay tunnels This BFD runs over the router interfaces to quickly detect link failure When we tie this BFD with routing protocol, it allows routing protocol to react to change much faster rather than its default protocol timer, BFD support started in version 17.3
CSR-1001-INET#show version
Cisco IOS XE Software, Version 17.03.05
This BFD tieing to protocol can be done with BGP, EIGRP and OSPF This can be applied to physical interfaces , SVI or sub interfaces It works on service and transport side so we can use BFD on BGP peering with MPLS router to provide fast failure detection
As of 20.8 this is not supported in a feature template so we need to use CLI template
A test was carried out and an interface facing the edge node with bgp peering to this MPLS PE router was shut, but on edge node because this is not a direct connection the peering still showed as up for some time (hold time of 180 seconds) and this edge node could ping IP of its interface but could not reach the next hop IP of the MPLS router, so BGP neighborship should have gone down but it did not and it was blackholing the traffic for some time (hold time of 180 seconds) – this is where BFD is implemented
bfd-template single-hop BFD
interval min-tx 1000 min-rx 1000 multiplier 3
! BFD type single hop is used to monitor directly connected devices
! with single hop Neighbor must be directly connected
! Send BFD packets every 1 sec
! Expect to receive BFD packets every 1 sec
! If 3 packets are missed, the neighbor is declared down
interface GigabitEthernet1
bfd template BFD
! BFD will be applied on this interface
! but any protocol "originating" from this interface can use this BFD session
router bgp 10
neighbor 172.31.255.250 fall-over bfd
! telling BGP to use bfd result of the BGP interface
IOS-XE configuration
bfd-template single-hop BFD
interval min-tx 1000 min-rx 1000 multiplier 3
interface Ethernet0/1
description MPLS CE
bfd template BFD
!
interface Ethernet0/2
description MPLS CE
bfd template BFD
!
interface Ethernet0/3
description MPLS CE
bfd template BFD
!
interface Ethernet1/0
description MPLS CE
bfd template BFD
router bgp 10
template peer-policy CE
send-community both
exit-peer-policy
!
template peer-session CE
ebgp-multihop 5
timers 5 10
fall-over bfd <<<
show bfd summary
show bfd interface
show bfd neighbors
SDWAN CLI template configuration
BFD is attached to physical interface and not tunnel interface, because tunnel interface already has SDWAN version of BFD running
We could have an INET switch span internet vlan between 2 edge routers but issue is that ISP only provides one internet IP address to use
TLOC extension allows us to share or use one of the colors or WAN transport from another router and build IPSEC / BFD over it All we need is a way for a router to router connection and there are few options
Back to back connections per transport for example 1 back to back link on Gig4 for Internet and 1 back to back link on Gig5 for MPLS
Only one back to back connection but use sub interfaces per transport
and least preferred option in case you dont have any spare interfaces, is to do sub interfaces on LAN side of the router and use that as the TLOC extension
We are also not allowed to have tloc extension from tunnel interface that is why we either need dedicated interfaces / sub interfaces or we need sub interfaces on LAN interface
Notice that red are tunnels and green is TLOC extension once a transport is extended via TLOC extension (green dot) and as it terminates on another router (red dot) that red dot becomes the tunnel interface / color
One thing to take care of on MPLS is that we need to advertise the TLOC subnet for MPLS into MPLS network on the internet side we dont have to advertise the private TLOC subnet, instead everything will be NATed behind internet interface
How to install a ThousandEyes Enterprise Agent on a Cisco Catalyst 9000-series switch with Docker
The Cisco IOS XE 16.12.1 release introduced native Docker container hosted on internal flash (in case of no SSD)
Containers connects into the management interface’s network using an internal bridge and also connects to data ports using another seperate internal bridge
Downloading Docker Image
Download the Docker image from the ThousandEyes dashboard and copy it to your Cisco switch using SCP, FTP, TFTP, or USB storage.
If the switch has internet access, download the image directly onto the switch. Download the package from the ThousandEyes downloads site.
Log in to the ThousandEyes platform using a login belonging to the account group that will be associated with the appliance.
Go to Network & App Synthetics > Agent Settings and click Add New Enterprise Agent.
Download the .tar file with the ThousandEyes appliance for Catalyst 9000-series switches.
Use SCP, FTP, TFTP, or USB storage to copy the signed Docker image to the switch’s flash: directory.
Your application should now be installed. You can check on it by running the following:
catalyst#sh app-hosting list
App id State
thousandeyes_enterprise_agent DEPLOYED
Configuring the Docker Container
Configure a single virtual network interface card (vNIC) for the appliance Docker container supports both static IP assignment (Guest IP address) or dynamic IP address
Verify that the front panel data port is running, with Layer-2 VLAN allowed from uplink:
Setup run options Next, set up the required Docker run options to specify account token. If you want to specify a hostname other than the switch’s name, do this here as well:
Once inside the agent shell, you can refer to the agent log for any further troubleshooting:
# tail /var/log/agent/te-agent.log
If connection or DNS resolution errors are found in the log file, your agent cannot connect to the ThousandEyes platform. Check your app-vnic configuration and make sure the agent IP can reach the internet.
Redistribution is always import feature, when redistribution is configured under a routing protocol it is importing prefixes from the protocol mentioned in redistribute “xxx” command Only routes that are selected as best paths and installed in the global routing table (RIB) are eligible for redistribution from source protocol, this stops from redistribution of backup paths or longer routes into the protocol, because you dont want EIGRP’s feasible successors (NOT in RIB) but only successor (installed in RIB) similarly OSPF may know multiple paths but you only want the best path (shortest path) from OSPF
A route must exist in the RIB in order for it to be redistributed into the destination protocol. In essence, this provides a safety mechanism by ensuring that the route is deemed reachable by the redistributing router.
OSPF from RIB is mentioned in the path information
show ip route
O 10.13.1.0/24 [110/3] via 10.45.1.4, 00:04:27, GigabitEthernet0/0
show ip eigrp topology 10.13.1.0/24
! Output omitted for brevity
EIGRP-IPv4 Topology Entry for AS(100)/ID(10.56.1.5) for 10.13.1.0/24
State is Passive, Query origin flag is 1, 1 Successor(s), FD is 2560000256
Descriptor Blocks:
10.45.1.4, from Redistributed, Send flag is 0x0
External data:
AS number of route is 1
External protocol is OSPF, external metric is 3
When redistributing from a source protocol with a higher AD into a destination protocol with a lower AD, the route shown in the routing table is always that of the source protocol, its not like that now a route is redistributed in protocol of lower AD and ownership has transferred
Using a route map allows for the filtering or modification of route attributes during the injection (catch and change)
Redistribution Sources:
Static – Static routes that are present in RIB
Connected – Interfaces that are in up state only
EIGRP – Any routes in EIGRP, including EIGRP-enabled connected networks.
OSPF – Any routes in the OSPF link-state database (LSDB), including OSPF-enabled interfaces.
BGP – Any routes in the Border Gateway Protocol (BGP) Loc-RIB table learned externally. Internal BGP (iBGP) routes are not redistributed by default and require the command bgp redistribute-internal for redistribution into Interior Gateway Protocol (IGP) routing protocols.
Redistribution Is Not Transitive
When redistributing between two or more routing protocols on a single router, redistribution is not transitive. In other words, when a router redistributes protocol 1 into protocol 2, and protocol 2 redistributes into protocol 3, the routes from protocol 1 are not redistributed into protocol 3. Only routes from protocol 2 are injected into protocol 3 and not include protocol 1
Seed Metrics
Seed means default metric to start with, source protocol must provide some metrics to the destination protocols so that the destination protocol can calculate the best path for the redistributed routes, Every protocol provides a seed metric at the time of redistribution, following are the seed metric offered by protocols
Protocol
Default Seed Metric
EIGRP
Infinity. Routes set with infinity are not installed into the EIGRP topology table.
OSPF
All routes are Type 2 external. Routes sourced from BGP use a seed metric of 1, and all other protocols uses a seed metric of 20.
BGP
Origin is set to incomplete, the multi-exit discriminator (MED) is set to the IGP metric, and the weight is set to 32,768.
Protocol specific redistribution behavior
Every routing protocol has a unique redistribution behavior.
redistribute connected
redistribute static
redistribute eigrp as-number
redistribute ospf process-id
redistribute ospf process-id match internal << this is match without Route map
redistribute ospf process-id match external 1 << this is match without Route map
redistribute ospf process-id match external 2 << this is match without Route map
redistribute bgp as-number
redistribute xxx route-map route-map-name
Route map “match” options
Redistribute connected route-map RM -> match interface Gixxxx
matching interface in route map applied to redistribute “connected”
Matches 10.1.1.0/24 interface on which the connected network exists
It makes sense that when connected are being considered then matching interface will introduce only interfaces in route map – this when we only selectively want to introduce few router interfaces and not all router interface because redistribute connected imports all connected interfaces on routers
redistribute static route-map RM -> match interface Gixxxx
matching interface in route-map applied on redistribute “static”
ip route 10.2.2.0 255.255.255.0 GigabitEthernet0/2
match interface matches: The outgoing interface defined in the static route
✔ This works only if the static route explicitly references an interface ❌ It will NOT match if the static route points to a next-hop IP only – so this will never be used practically
Routes learned via a routing protocol (OSPF, EIGRP, RIP, etc.) redistribute ospf route-map RM -> match interface Gixxxx
match interface matches: Only routes learned from OSPF neighbor on that interface
match route-type external [type-1 | type-2] match route-type internal match route-type local match route-type nssa-external [type-1 | type-2]
Selects prefixes based on routing protocol characteristics: external: External BGP, EIGRP, or OSPF internal: Internal EIGRP or intra-area/inter-area OSPF routes local: Locally generated BGP routes nssa-external: NSSA external (Type 7 LSAs)
Route map set actions
set Action
Description
set as-path prepend {as-number-pattern | last-as 1-10}
Prepends the AS_Path for the network prefix with the pattern specified or uses multiple iterations from the neighboring autonomous system.
set ip next-hop {ip-address | peer-address | self}
Sets the next-hop IP address for any matching prefix. BGP dynamic manipulation requires the peer-address or self keywords.
set local-preference 0-4294967295
Sets the BGP PA local preference.
set metric {+value | –value | value}* value parameters are 0–4294967295
Modifies the existing metric or sets the metric for a route.
set origin {igp | incomplete}
Sets the BGP PA origin.
set weight0-65535
Sets the BGP PA weight.
Connected Networks
A common scenario in “service provider” networks involves the need for external Border Gateway Protocol (eBGP) peering or transit subnet to exist in the routing table of internal BGP (iBGP) routers within the autonomous system. Instead of enabling the IGP routing protocol on the external interface so that the network is installed into the routing topology, the networks could be redistributed into the Interior Gateway Protocol (IGP). Choosing not to enable a routing protocol on that link removes security concerns within the IGP.
By default, BGP redistributes only eBGP routes into IGP protocols
BGP’s default behavior requires that a route have an AS_Path to redistribute into an IGP, which means only the eBGP routes are redistributed and not iBGP routes, iBGP routes were not included because it is common assumption that the IGP routing topology already has those internal ibgp like routes
BGP is designed to handle a large routing table, whereas IGPs are not. To redistribute BGP into an IGP on a router with a larger BGP table (for example, the Internet table with 800,000+ routes), you use selective route redistribution. Otherwise, the IGP can become unstable in the routing domain, which can lead to packet loss.
You can change BGP behavior so that all BGP routes are redistributed by using the BGP configuration command bgp redistribute-internal. To enable the iBGP route 192.168.3.3/32 to redistribute into OSPF, the bgp redistribute-internal command is required on R2.
Redistributing iBGP routes into an IGP could result in routing loops. A more logical solution is to advertise the network into the IGP
EIGRP Behaviour
When EIGRP redistributes something into itself, that route is given an AD of 170 and classed as external EIGRP route and use a default seed metric of infinity.
Default seed metric of infinity (effectively “unreachable”) (prevents the route from being installed unless you manually define a metric)
The default path metric can be changed from infinity to specific values for bandwidth, load, delay, reliability, and maximum transmission unit (MTU), thereby allowing for the installation into the EIGRP topology table. Routers can set the default metric with the address family configuration command
default-metric bandwidth delay reliability load mtu
!BDRLM
The metric can also be set within a route map or at the time of redistribution with the command
EIGRP to EIGRP redistribution (EIGRP AS X into EIGRP AS Y):
EIGRP does carry over the original EIGRP metric components (bandwidth, delay, reliability, load, MTU)
BUT EIGRP still treats them as external routes in the receiving AS
The routes become EIGRP external (D EX) with:
AD = 170 External tag “Original metric preserved”
Example config:
R2 mutually redistributes OSPF into EIGRP R3 mutually redistributes BGP into EIGRP R1 is advertising the Loopback 0 address 192.168.1.1/32 R4 is advertising the Loopback 0 address 192.168.4.4/32
R2 uses the default-metric configuration command both classic and named mode configurations
You can overwrite EIGRP seed metrics by setting K values also with the route map command set metricbandwidth delay reliability load mtu. Setting the metric on a prefix-by-prefix basis during redistribution
R2# show ip eigrp topology
EIGRP-IPv4 Topology Table for AS(100)/ID(192.168.2.2)
Codes: P - Passive, A - Active, U - Update, Q - Query, R - Reply,
r - reply Status, s - sia Status
P 10.34.1.0/24, 1 successors, FD is 3072
via 10.23.1.3 (3072/2816), GigabitEthernet0/1
P 192.168.4.4/32, 1 successors, FD is 3072, tag is 65200
via 10.23.1.3 (3072/2816), GigabitEthernet0/1
P 10.12.1.0/24, 1 successors, FD is 2816
via Redistributed (2816/0)
P 192.168.1.1/32, 1 successors, FD is 2816
via Redistributed (2816/0)
P 10.23.1.0/24, 1 successors, FD is 2816
via Connected, GigabitEthernet0/1
The redistributed routes are shown in the routing table with D EX and an AD of 170
R2# show ip route | begin Gateway
! Output omitted for brevity
Gateway of last resort is not set
10.0.0.0/8 is variably subnetted, 5 subnets, 2 masks
C 10.12.1.0/24 is directly connected, GigabitEthernet0/0
C 10.23.1.0/24 is directly connected, GigabitEthernet0/1
D EX 10.34.1.0/24 [170/3072] via 10.23.1.3, 00:07:43, GigabitEthernet0/1
O 192.168.1.1 [110/2] via 10.12.1.1, 00:29:22, GigabitEthernet0/0
D EX 192.168.4.4 [170/3072] via 10.23.1.3, 00:08:49, GigabitEthernet0/1
R3# show ip route | begin Gateway
! Output omitted for brevity
D EX 10.12.1.0/24 [170/15360] via 10.23.1.2, 00:22:27, GigabitEthernet0/1
C 10.23.1.0/24 is directly connected, GigabitEthernet0/1
C 10.34.1.0/24 is directly connected, GigabitEthernet0/0
D EX 192.168.1.1 [170/15360] via 10.23.1.2, 00:22:27, GigabitEthernet0/1
B 192.168.4.4 [20/0] via 10.34.1.4, 00:13:21
EIGRP-to-EIGRP Redistribution
Redistributing routes between EIGRP autonomous systems preserves the path metrics during redistribution but still classes them as EIGRP external routes
R2 mutually redistributes routes between AS 10 and AS 20 R3 mutually redistributes routes between AS 20 and AS 30 R1 advertises the Loopback 0 interface (192.168.1.1/32) into EIGRP AS 10 R4 advertises the Loopback 0 interface (192.168.4.4/32) into EIGRP AS 30
The default seed metrics do not need to be set because they are maintained between EIGRP ASs R2 is using classic configuration mode, and R3 is using EIGRP named configuration mode.
R1# show ip route eigrp | begin Gateway
Gateway of last resort is not set
10.0.0.0/8 is variably subnetted, 4 subnets, 2 masks
D EX 10.23.1.0/24 [170/3072] via 10.12.1.2, 00:09:07, GigabitEthernet0/0
D EX 10.34.1.0/24 [170/3328] via 10.12.1.2, 00:05:48, GigabitEthernet0/0
192.168.4.0/32 is subnetted, 1 subnets
D EX 192.168.4.4 [170/131328] via 10.12.1.2, 00:05:48, GigabitEthernet0/0
R4# show ip route eigrp | begin Gateway
Gateway of last resort is not set
10.0.0.0/8 is variably subnetted, 4 subnets, 2 masks
D EX 10.12.1.0/24 [170/3328] via 10.34.1.3, 00:07:31, GigabitEthernet0/0
D EX 10.23.1.0/24 [170/3072] via 10.34.1.3, 00:07:31, GigabitEthernet0/0
192.168.1.0/32 is subnetted, 1 subnets
D EX 192.168.1.1 [170/131328] via 10.34.1.3, 00:07:31, GigabitEthernet0/0
EIGRP topology table for the route 192.168.4.4/32 in AS 10 and AS 20. The EIGRP path metrics for bandwidth, reliability, load, and delay are the same between the autonomous systems. Notice that the feasible distance (131,072) is the same for both autonomous systems, but the reported distance (RD) is 0 for AS 10 and 130,816 for AS 20. The RD was reset when it was redistributed into AS 10.
R2# show ip eigrp topology 192.168.4.4/32
! Output omitted for brevity
EIGRP-IPv4 Topology Entry for AS(10)/ID(192.168.2.2) for 192.168.4.4/32
State is Passive, Query origin flag is 1, 1 Successor(s), FD is 131072
Descriptor Blocks:
10.23.1.3, from Redistributed, Send flag is 0x0
Composite metric is (131072/0), route is External
Vector metric:
Minimum bandwidth is 1000000 Kbit
Total delay is 5020 microseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1500
Hop count is 2
Originating router is 192.168.2.2
External data:
AS number of route is 20
External protocol is EIGRP, external metric is 131072
Administrator tag is 0 (0x00000000)
EIGRP-IPv4 Topology Entry for AS(20)/ID(192.168.2.2) for 192.168.4.4/32
State is Passive, Query origin flag is 1, 1 Successor(s), FD is 131072
Descriptor Blocks:
10.23.1.3 (GigabitEthernet0/1), from 10.23.1.3, Send flag is 0x0
Composite metric is (131072/130816), route is External
Vector metric:
Minimum bandwidth is 1000000 Kbit
Total delay is 5020 microseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1500
Hop count is 2
Originating router is 192.168.3.3
External data:
AS number of route is 30
External protocol is EIGRP, external metric is 2570240
OSPF Behaviour
The AD is set to 110 for intra-area, inter-area, and external OSPF routes. External OSPF routes are classified as Type 1 or Type 2, with Type 2 as the default setting. The seed metric is 1 for BGP-sourced routes and 20 for all other protocols
The exception is that if OSPF redistributes from another OSPF process, the path metric is transferred. The main differences between Type 1 and Type 2 external OSPF routes follow:
Type 1 routes are preferred over Type 2 routes.
The Type 1 metric equals the redistribution metric plus the total path metric to the autonomous system boundary router (ASBR). In other words, as the LSA propagates away from the originating ASBR, the metric increases.
The Type 2 metric equals only the redistribution metric. The metric is the same for the router next to the ASBR as for the router 30 hops away from the originating ASBR. If two Type 2 paths have exactly the same metric, the lower forwarding cost is preferred. This is the default external metric type used by OSPF.
For redistribution into OSPF, you use the command redistribute source-protocol [subnets] [metric metric] [metric-type {1 | 2}] [tag 0-4294967295] [route-map route-map-name].
If the optional subnets keyword is not included, only the classful networks are redistributed.
The optional tag keyword allows for a 32-bit route tag to be included on each redistributed route.
The metric and metric-type keywords can be set during redistribution.
R2 mutually redistributes EIGRP into OSPF R3 mutually redistributes RIP into OSPF R1 is advertising the Loopback 0 interface 192.168.1.1/32 R4 is advertising the Loopback 0 interface 192.168.4.4/32.
R3# show ip ospf database external
! Output omitted for brevity
OSPF Router with ID (192.168.3.3) (Process ID 2)
Type-5 AS External Link States
Link State ID: 10.12.1.0 (External Network Number )
Advertising Router: 192.168.2.2
Network Mask: /24
Metric Type: 2 (Larger than any link state path)
Metric: 20
Link State ID: 10.34.1.0 (External Network Number )
Advertising Router: 192.168.3.3
Network Mask: /24
Metric Type: 2 (Larger than any link state path)
Metric: 20
Link State ID: 192.168.1.1 (External Network Number )
Advertising Router: 10.23.1.2
Network Mask: /32
Metric Type: 2 (Larger than any link state path)
Metric: 20
Link State ID: 192.168.4.4 (External Network Number )
Advertising Router: 192.168.3.3
Network Mask: /32
Metric Type: 2 (Larger than any link state path)
Metric: 20
R2# show ip route | begin Gateway
Gateway of last resort is not set
10.0.0.0/8 is variably subnetted, 5 subnets, 2 masks
C 10.12.1.0/24 is directly connected, GigabitEthernet0/0
C 10.23.1.0/24 is directly connected, GigabitEthernet0/1
O E2 10.34.1.0/24 [110/20] via 10.23.1.3, 00:04:44, GigabitEthernet0/1
192.168.1.0/32 is subnetted, 1 subnets
D 192.168.1.1 [90/130816] via 10.12.1.1, 00:03:56, GigabitEthernet0/0
192.168.2.0/32 is subnetted, 1 subnets
C 192.168.2.2 is directly connected, Loopback0
O E2 192.168.4.0/24 [110/20] via 10.23.1.3, 00:04:42, GigabitEthernet0/1
R3# show ip route | begin Gateway
Gateway of last resort is not set
10.0.0.0/8 is variably subnetted, 5 subnets, 2 masks
O E2 10.12.1.0/24 [110/20] via 10.23.1.2, 00:05:41, GigabitEthernet0/1
C 10.23.1.0/24 is directly connected, GigabitEthernet0/1
C 10.34.1.0/24 is directly connected, GigabitEthernet0/0
192.168.1.0/32 is subnetted, 1 subnets
O E2 192.168.1.1 [110/20] via 10.23.1.2, 00:05:41, GigabitEthernet0/1
192.168.3.0/32 is subnetted, 1 subnets
C 192.168.3.3 is directly connected, Loopback0
R 192.168.4.0/24 [120/1] via 10.34.1.4, 00:00:00, GigabitEthernet0/0
OSPF-to-OSPF Redistribution
Redistributing routes between OSPF processes preserves the path metric during redistribution, independent of the metric type
R2 redistributes routes between OSPF process 1 and OSPF process 2 R3 redistributes between OSPF process 2 and OSPF process 3. R2 and R3 set the metric type to 1 during redistribution so that the path metric increments R1 advertises the Loopback 0 interface 192.168.1.1/32 into OSPF process 1 R4 advertises the Loopback 0 interface 192.168.4.4/32 into OSPF process 3.
but it results in the loss of path information as the Type 1, Type 2, and Type 3 LSAs are not propagated through route redistribution, only metrics are maintained
R1# show ip route ospf | begin Gateway
Gateway of last resort is not set
10.0.0.0/8 is variably subnetted, 4 subnets, 2 masks
O E1 10.23.1.0/24 [110/2] via 10.12.1.2, 00:00:21, GigabitEthernet0/0
O E1 10.34.1.0/24 [110/3] via 10.12.1.2, 00:00:21, GigabitEthernet0/0
192.168.4.0/32 is subnetted, 1 subnets
O E1 192.168.4.4 [110/4] via 10.12.1.2, 00:00:21, GigabitEthernet0/0
R4# show ip route ospf | begin Gateway
Gateway of last resort is not set
10.0.0.0/8 is variably subnetted, 4 subnets, 2 masks
O E1 10.12.1.0/24 [110/3] via 10.34.1.3, 00:01:36, GigabitEthernet0/0
O E1 10.23.1.0/24 [110/2] via 10.34.1.3, 00:01:46, GigabitEthernet0/0
192.168.1.0/32 is subnetted, 1 subnets
O E1 192.168.1.1 [110/4] via 10.34.1.3, 02:38:49, GigabitEthernet0/0
OSPF Forwarding Address
OSPF Type 5 LSAs include a field known as the forwarding address that optimizes forwarding traffic when the source uses a shared network segment
OSPF is enabled on all the links in Area 0 except for network 10.123.1.0/24 R1 forms an eBGP session with R2 (the ASBR) which then redistributes the AS 100 route 192.168.1.1/32 into the OSPF domain R3 has direct connectivity to R1 but does not establish a BGP session with R1 ASBR is 10.123.1.2 which is the IP address that all OSPF routers forward packets to in order to reach the 192.168.1.1/32 network
Notice that the forwarding address is the default value 0.0.0.0
R3# show ip ospf database external
! Output omitted for brevity
Type-5 AS External Link States
Routing Bit Set on this LSA in topology Base with MTID 0
LS Type: AS External Link
Link State ID: 192.168.1.1 (External Network Number )
Advertising Router: 10.123.1.2
Network Mask: /32
Metric Type: 2 (Larger than any link state path)
Metric: 1
Forward Address: 0.0.0.0
Network traffic from R3 (and R5) takes the suboptimal route R3→R5→R4→R2→R1 The optimal route would use the directly connected 10.123.1.0/24 network
When the forwarding address is 0.0.0.0, all routers forward packets to the ASBR, introducing the potential for suboptimal routing.
The OSPF forwarding address changes from 0.0.0.0 “to the next-hop IP address in the source routing protocol” when:
OSPF is enabled on the ASBR’s interface that points to the next-hop IP address.
That interface is not set to passive.
That interface is a broadcast or nonbroadcast OSPF network type.
When the forwarding address is set to a value besides 0.0.0.0, the OSPF routers forward traffic only to the forwarding address.
OSPF has been enabled on R2’s and R3’s Ethernet interface connected to the 10.123.1.0/24 network, The interface is Ethernet, which defaults to the broadcast OSPF network type, and all conditions have been met.
Type 5 LSA for the 192.168.1.1/32 network. Now that OSPF has been enabled on R2’s 10.123.1.2 interface and the interface is a broadcast network type, the forwarding address has changed from 0.0.0.0 to 10.123.1.1.
R3# show ip ospf database external
! Output omitted for brevity
Type-5 AS External Link States1
Options: (No TOS-capability, DC)
LS Type: AS External Link
Link State ID: 192.168.1.1 (External Network Number )
Advertising Router: 10.123.1.2
Network Mask: /32
Metric Type: 2 (Larger than any link state path)
Metric: 1
Forward Address: 10.123.1.1
verifies that connectivity from R3 and R5 now takes the optimal path to R1 because the forwarding address has changed to 10.123.1.1.
R3# trace 192.168.1.1
Tracing the route to 192.168.1.1
1 10.123.1.1 0 msec * 1 msec
If the Type 5 LSA forwarding address is not a default value, the address must be an intra-area or inter-area OSPF route If the route does not exist, the LSA is ignored and is not installed into the RIB
The OSPF forwarding address optimizes forwarding toward the destination network, but return traffic is unaffected. Outbound traffic from R3 or R5 still exits at R3’s Gi0/0 interface, but return traffic is sent directly to R2.
BGP Behaviour
Redistributing routes into BGP does not require a seed metric because BGP is a path vector protocol. Redistributed routes have the following BGP attributes set.
The origin is set to incomplete.
The next-hop address is set to the IP address of the source protocol
The weight is set to 32,768
The MED is set to the path metric of the source protocol
R2 mutually redistributes between OSPF and BGP R3 mutually redistributes between EIGRP AS 100 and BGP R1 is advertising the Loopback 0 interface 192.168.1.1/32 R4 is advertising the Loopback 0 interface 192.168.4.4/32
Notice that R2 and R3 have used the command bgp redistribute-internal, which allows for any iBGP learned prefixes to be redistributed into OSPF or EIGRP
Verification, notice the metric is carried over from the IGP metric during redistribution
R2# show bgp ipv4 unicast | begin Network
Network Next Hop Metric LocPrf Weight Path
*> 10.12.1.0/24 0.0.0.0 0 32768 ?
* i 10.23.1.0/24 10.23.1.3 0 100 0 i
*> 0.0.0.0 0 32768 i
*>i 10.34.1.0/24 10.23.1.3 0 100 0 ?
*> 192.168.1.1/32 10.12.1.1 2 32768 ?
*>i 192.168.4.4/32 10.34.1.4 130816 100 0 ?
Detailed BGP path information for the redistributed routes The origin is incomplete, and the BGP metric matches the IGP metric.
R2# show bgp ipv4 unicast 192.168.1.1
! Output omitted for brevity
BGP routing table entry for 192.168.1.1/32, version 3
Paths: (1 available, best #1, table default)
Local
10.12.1.1 from 0.0.0.0 (192.168.2.2)
Origin incomplete, metric 2, localpref 100, weight 32768, valid, sourced, best
R3# show bgp ipv4 unicast 192.168.4.4
BGP routing table entry for 192.168.4.4/32, version 3
Paths: (1 available, best #1, table default)
Local
10.34.1.4 from 0.0.0.0 (10.34.1.3)
Origin incomplete, metric 130816, localpref 100, weight 32768, valid, sourced,
best
Redistribution of routes from OSPF to BGP does not include OSPF external routes by default. match external [1 | 2] is required to redistribute OSPHighly available network designs use multiple points of redistribution to ensure redundancy, which increases the probability of route feedback. Route feedback can cause suboptimal routing or routing loops, but it can be resolved with the techniques explained in this chapter and in Chapter 12, “Advanced BGP.”F external routes.
Redistribution and Redundancy
Due to redundancy in networks, there are usually 2 redistirbuting points in the network, but following issues may arise
Suboptimal routing – slow connectivity
Routing loops – Total loss of service
Suboptimal routing
Whenever redistribution takes place, network visiblity is lost and seed metric is used as a starting point and this is not an issue when there is only one point of redistribution in the network however it can become an issue if there are 2 or more points of redistribution and it can cause sub optimal routing to the destination learned via redistribution
Left to right, better path to reach 192.168.2.0/24 is via R2 because via R1 we will encounter R1’s 10Mbps link which is slowest in the topology
When you perform redistribution on R1 and R2 (Internal Routers) into EIGRP, EIGRP does not know that the 10 Mbps link or the 1 Gbps link exists in the OSPF domain, in order to avoid this situation we have to add lower seed metric on R2 and higher seed metric on R1
Same Seed Metric
In case seed metric defined on R1 and R2 are same, in EIGRP AS or domain, after adding seed metric (distance vector calculation) and cost of links (1 Gbps link and 100 Mbps links), inside EIGRP AS route to 192.168.2.0/24 through R1 will win and from there I will be routed over the 10 Mbps link
You can recognize this issue in a topological diagram and also by using the traceroute command
You can solve this issue by providing lower seed metric on R2 and higher seed metric on R1
In reverse when EIGRP routes (10.1.1.0/24) are redistributed into OSPF, the redistributed routes have a default seed metric of 20 and are classified as E2 routes;
Due to E2 routes, the metric remains as 20 throughout the OSPF domain, whenever E2 are used we need to keep in mind that routes