⊹ 91. CCIE Multicast ⊹

Multicast – Suresh

Multicast traffic is sent using UDP

Concept of OIL / OIF and Incoming interfaces are that the routers in the path should only forward the stream if there are interested receivers downstream. If no one has joined the multicast group on a given path, the routers should not send traffic that way

When a switch sees Broadcast MAC address of FF:FF:FF:FF:FF:FF, it knows the frame is a broadcast and floods it out of all ports in the same VLAN, “except the port it was received on”.

Multicast handling

If routers see destination address is a multicast address, routers treat it as multicast traffic and not like unicast traffic

Similarly if switches look at the ethernet frame and detect it to be multicast mac address then they treat it differently

Multicast IP address is never used as source address and it is always used as destination address, source IP will be the unicast IP address of the sender.

Simiarly the destination MAC on layer 2 will be a Multicast MAC address starting with “01:00:5E”

Multicast ranges

224.0.0.0 to 224.0.0.255 – Reserved for local network control traffic and TTL of 1
232.0.0.0 to 232.255.255.255 – Reserved for Source Specific Multicast (SSM).
239.0.0.0 to 239.255.255.255 – These addresses are meant to be used inside an organization

Multicast Forwarding with PIM-DM first

PIM-SM has mandtory requirement for RP so to keep things simple we will start learning from PIM-DM first, even though we never deploy PIM-DM due to high control plane footprint

Source starts sending traffic to a multicast IP address
Any number of receivers can choose to subscribe to that group

You can see that r1 forwards the multicast traffic toward r3 because there is an interested receiver behind it.

Multicast Reverse Path Forwarding (RPF)

This is required to prevent duplicate packets arriving,

RPF checks “source IP address” of that packet in unicast routing table
If multicast packet arrive on the interface it would use to reach the source?’ If the answer is yes, the RPF check passes, and that multicast traffic is accepted

If the packet arrived on any other interface, the RPF check fails, and the packet is dropped

IGMP – Internet Group Management Protocol

IGMP (Layer 2) playes key role in multicast on LHR and FHR’s LAN side
Switches have to perform IGMP snooping

Part of IGMP also runs on host
Host uses IGMP to signal their interest in multicast traffic
Host sends IGMP Membership Report, also known as an IGMP join to the multicast group address

if PIM neighborship is established on LHR then (PIM-DM will start forwarding traffic right away) in case of PIM-SM LHR will send join towards RP

IGMPv1 is the original version. It allows hosts to join a multicast group but does not support leaving a group explicitly, Routers rely on timeouts to figure out when receivers are no longer interested

IGMPv2 improves on this by adding an explicit leave message.

IGMPv3 adds support for Source Specific Multicast. With IGMPv3, receivers can specify not only the multicast group they want to join, but also which source they want to receive traffic from

IGMPv2

IGMP messages are carried inside IP packets using IP protocol number 2, IP because it is the router that initiates using General Membership query / Group specific query and also the Membership report or IGMP join has to come from end host inside IP

They are always sent with a TTL of 1

IGMPv2 General Membership query

As soon as PIM is enabled on router interface, IGMP is also enabled automatically
Router immediately starts sending IGMP General Membership Queries with source as the interface IP and destination is 224.0.0.1 (All hosts)

The router periodically sends IGMP General Membership Queries to check if receivers are still interested. As long as reports continue to arrive, the router keeps forwarding the multicast stream.

in General Membership Query multicast address is set to 0.0.0.0

As a result all the hosts still interested in even one Multicast group will respond with IGMP Membership report or IGMP join with their randomzied Max response time among themselves

Host whose max response timer expires first, responds to the General Membership query with an IGMP membership or IGMP Join to 239.1.1.1 and other host also see that because they are also listning for 239.1.1.1, If those other hosts are listening for same group and see that IGMP join then they do not send their membership report and suppress it

This report suppression mechanism is to keep IGMP traffic low, otherwise hundreds of hosts can respond at the same time or different times and burden the router’s CPU

IGMPv2 Join or IGMPv2 Membership Report

Host sends Memership report to the Multicast group address and not to 224.0.0.2 or 224.0.0.13

IGMP Leave Message

When a host no longer wants to receive a multicast stream, it sends an IGMP Leave Group message

The source IP of this message is the unicast IP of the host, and the destination IP is the all routers multicast address 224.0.0.2.

The router sends two group-specific queries, one second apart, and if no membership report is received within 0.5 seconds after the last query, the router removes the multicast group from that interface.

IGMP Snooping

Switch gleans or snoops into the IGMP exchange between router and the hosts to map the Multicast group to ports mapping, this whole mechanism exists because of the way switch performs forwarding using source based learning, once a mac is learned by switch, frames with that destination address it is not flooded but unicasted according to mac table on switch

But multicast mac address is never used as source address so its entry is never built
By default, the switch has no idea which ports actually have interested receivers, so the safest option is to flood multicast traffic out of all ports in the VLAN which is very ineffecient subjecting all connected hosts to the multicast traffic.

When IGMP snooping is enabled, It watches for IGMP membership reports from receivers and notes which ports those reports came in on

From this, the switch builds a table that maps multicast groups to specific switch ports

When multicast traffic starts flowing, the switch no longer floods it everywhere. Instead, it forwards the multicast frames only out of the ports where it has seen joins for that multicast group. Ports with no interested receivers do not get the traffic

Switch also learns which port is connected to the routers by listening on IGMP and PIM messages

The switch also listens for IGMP leave messages. When a receiver leaves a group, the switch updates its table and stops forwarding multicast traffic to that port

IGMP Behvaiour deviation from standard

Remember earlier we talked about how, when a router sends an IGMP query, only one receiver replies due to the report suppression mechanism. This behaviour creates a challenge for the switch.

With this approach, sometimes the switch would not know which ports actually have active receivers for the multicast group, and it would have no way to build an accurate multicast forwarding table.

Switch changes the beahviour and forwards the first IGMP membership report toward the router, but it does not flood that report to other hosts. This way other receivers on different ports get delay timers expire and then send their own reports. The switch sees these reports locally and learns that there are multiple interested ports for the same multicast group even though only one report was forwarded upstream to the router.

Similarly, when the switch receives an IGMP Leave message on a port, the switch only forwards a leave message to the router when the leaving host is actually last host

For example, if two receivers are joined to the same multicast group and one of them sends a leave, the switch does not forward that leave to the router. Only when the last receiver leaves does the switch forward the leave message upstream.

Also worth mentioning that when the switch receives IGMP leave message on a port, it does not immediately assume that there are no receivers left on that port. It sends an IGMP query out of that same port to check if there are any other interested receivers. This is important in cases where multiple hosts exist behind a single port or when that port connects to another switch.

If you enable IGMP immediate leave, the switch skips this verification step and removes the port from the multicast group as soon as it sees the leave message.

Using Cisco routers as hosts for Multicast send and Multicast receive

no ip routing 
ip default-gateway x.x.x.x

Basic PIM-DM configuration

no ip pim autorp
!
ip multicast-routing
!
interface Ethernet0/1
 description r1 -> sender
 ip address 10.1.0.1 255.255.255.0
 ip pim dense-mode
!
interface Ethernet0/2
 description r1 -> [receiver_01,receiver_02]
 ip address 10.1.1.1 255.255.255.0
 ip pim dense-mode
! 

As soon as we en­able PIM, IG­MPv2 is auto­mat­ic­ally en­abled on those in­ter­faces.
The router im­me­di­ately starts send­ing IGMP Gen­er­al Mem­ber­ship Query messages out of the in­ter­faces, ef­fect­ively ask­ing, ’Is there any in­ter­ested re­ceiv­er on this seg­ment?’

You can check the
-IGMP enabled
-Timers like 60 seconds query in­ter­val
-10 seconds max re­sponse time
-IGMP quer­i­er router
-Multicast designated router
-R1 is the only router on the seg­ments

r1#show ip igmp interface
!
Ethernet0/1 is up, line protocol is up
 Internet address is 10.1.0.1/24
 IGMP is enabled on interface
 Current IGMP host version is 2
 Current IGMP router version is 2
 IGMP query interval is 60 seconds
 IGMP configured query interval is 60 seconds
 IGMP robustness-variable is 2
 IGMP querier timeout is 120 seconds
 IGMP configured querier timeout is 120 seconds
 IGMP max query response time is 10 seconds
 Last member query count is 2
 Last member query response interval is 1000 ms
 Inbound IGMP access group is not set
 IGMP activity: 0 joins, 0 leaves
 Multicast routing is enabled on interface 
 Multicast TTL threshold is 0
 Multicast designated router (DR) is 10.1.0.1 (this system)
 IGMP querying router is 10.1.0.1 (this system)
 No multicast groups joined by this system
!
Ethernet0/2 is up, line protocol is up
 Internet address is 10.1.1.1/24
 IGMP is enabled on interface
 Current IGMP host version is 2
 Current IGMP router version is 2
 IGMP query interval is 60 seconds
 IGMP configured query interval is 60 seconds
 IGMP robustness-variable is 2
 IGMP querier timeout is 120 seconds
 IGMP configured querier timeout is 120 seconds
 IGMP max query response time is 10 seconds
 Last member query count is 2
 Last member query response interval is 1000 ms
 Inbound IGMP access group is not set
 IGMP activity: 0 joins, 0 leaves
 Multicast routing is enabled on interface
 Multicast TTL threshold is 0
 Multicast designated router (DR) is 10.1.1.1 (this system)
 IGMP querying router is 10.1.1.1 (this system)
 No multicast groups joined by this system 

IGMP Snoop­ing Switch Con­fig­ur­a­tion

Debugs

debug ip igmp

next post


Multicast

Multicast

It allows a source to send packets to a group of destination hosts (receivers) in an efficient manner

IGMP operates on Layer 2 (on receivers’ side)
PIM operates on Layer 3 (Routed network)

Multicast starts from source and then branch out to receivers top to bottom

Multicast Source sits behind a router called First-Hop router or FHR

Router that has receivers connected is called Last-Hop router or LHR
on its LAN side it will have IGMP enabled, and also a role called IGMP querier will be active on that LAN side

Between these 2 routers, PIM will operate

Switch will run IGMP snooping in order to snoop the IGMP messages

Without Mutlicast

the network link between R1 and R2 needs 50 Mbps of bandwidth

Stream of data is sent to special addresses called group addresses

Local network control block 224.0.0.0 to 224.0.0.255
Addresses in the local network control block are used for “control traffic” which is not forwarded outside of a local broadcast domain.

Examples of this type of multicast control traffic are:
1. all hosts in this subnet (224.0.0.1)
2. all routers in this subnet (224.0.0.2)
3. all PIM routers (224.0.0.13)
Control traffic sent out on this range has TTL of 1 and packet expires as soon as it enters next hop router, you might think that packers from 224.0.0.0 network cannot propagate through the network? even though the packet expires reaching next router but that “control” message is delivered through out the network router by router using these packets with TTL of 1

224.0.0.1 – all hosts in this subnet (all hosts listen on this address)
224.0.0.2 – all routers in this subnet
224.0.0.5 – all OSPF routers (AllSPFRouters)
224.0.0.6 – all OSPF DRs (AllDRouters)
224.0.0.10 – all EIGRP routers
224.0.0.13 – all PIM routers
224.0.0.18 – VRRP
224.0.0.22 – IGMPv3
224.0.0.102 – HSRPv2 and GLBP

Internetwork control block (224.0.1.0/24):m
Addresses in the internetwork control block are used for “control traffic” that may be forwarded through the Internet. Examples include Network Time Protocol (NTP) (224.0.1.1), Cisco-RP-Announce (224.0.1.39), and Cisco-RP-Discovery (224.0.1.40).

224.0.1.1 – NTP
224.0.1.39 – Cisco-RP-Announce (Auto-RP)
224.0.1.40 – Cisco-RP-Discovery (Auto-RP)

Source Specific Multicast (SSM) block
232.0.0.0 to 232.255.255.255 232.0.0.0/8
This is the default range used by SSM. SSM is a PIM extension
SSM forwards traffic only from sources for which the receivers have explicitly expressed or chosen, for example, receivers input the sender address in the software. Used for one-to-many applications.

Administratively scoped block 239.0.0.0 to 239.255.255.255
This range is like private 10.0.0.0/8 range that can be used for multicasting internally in organsiation’s network and is used for normal multicasting or non SSM

Multicast & Layer 2

In order for multicast packets to be delivered to end hosts, their NIC needs to listen to “Multicast Group’s MAC address”

The first 24 bits of a multicast MAC address always start with “01:00:5E”
This is very much like “OLOOSE”

That “01” is Individual / Group bit (group means multicast group)

remaining 23 bits of the MAC address come from the lower 23 bits of the IPv4 multicast address.

“an example of mapping the multicast IP address 239.255.1.1 into the multicast MAC address 01:00:5E:7F:01:01”

When bits from IP address (top row) are transfered down into last 23 bits with first 24 bits 01:00:5E: and one individual / group bit (total 24 + 1 = 25 bits)

some multicast group IP addresses can map to single MAC address

because first 9 bits of the multicast IP address are not copied into the multicast mac address because of this phenomenon there are 32 (25) multicast IP addresses that are not universally unique and could correspond to a single MAC address or overlap, this can result in some machines which are subscribed to one multicast address, also receive multicast for another IP address

To keep it all simple just imagine that due to 01:00:5E (0LOOSE) only last 23 bits are copied from IP address

When a receiver wants to receive a specific multicast feed, it sends an IGMP join using the multicast IP or group address
The receiver programs its interface to accept the multicast MAC group address that correlates to the group address

IGMPv2

Receivers use IGMP to join multicast groups and leave multicast groups,
When a receiver wants to receive multicast traffic from a source, it sends an IGMP join to its router. If the router does not have IGMP enabled on the interface, the request is ignored.

Most common IGMP version is IGMPv2
IGMPv3 is used by SSM

IGMPv2 uses “packet” that travels to the router with TTL of 1,
if a router is the one that decremented the TTL from 1 to 0,
that router does not proceed with forwarding / routing of that packet and that packet is then discarded.

Type Field

Version 2 membership report
also known as IGMP join , remember M in IGMP for Membership
used by receivers to join a multicast group
or to respond to a local router’s membership query message

Version 1 membership report
is used by receivers for backward compatibility with IGMPv1

Version 2 leave group
is used by receivers to indicate they want to stop receiving multicast traffic for a group they joined.

General membership query is periodically sent to the all-hosts group address 224.0.0.1 to see whether there are any receivers in the attached subnet. It sets the group address field to 0.0.0.0 (and not to a specific group address).

Group specific query is sent in response to a leave group message to the group address the receiver requested to leave, this is a test by local router to see if there are any more receivers on LAN and if this leaving router is the last receiver.

Upstream after receiving IGMP join message from LAN

The local router once receives an IGMP join message on LAN side then sends a PIM join message upstream toward the source to request the multicast stream

When the local router starts receiving the multicast stream, it forwards it downstream to the subnet where the receiver that requested it resides.

Router then starts periodically sending general membership query messages into the subnet, to the all-hosts group address 224.0.0.1, to see whether any members are in the attached subnet. The general query message contains a max response time field that is set to 10 seconds by default

In response to this query, “receivers” set an internal random timer between 0 and 10 seconds (which can change if the max response time is using a non-default value). When the timer expires, receivers send membership reports for each group they belong to. If a receiver receives another receiver’s report for one of the groups it belongs to while it has a timer running, it stops its timer for the specified group and does not send a report; this is meant to suppress duplicate reports from everybody

When the leave group message is received by the router, it follows with a group-specific membership query to the group multicast address to determine whether there are any receivers interested in the group remaining in the subnet. If there are none, the router removes the IGMP state for that group.

IGMP querier election (if there is more than one IGMP router on segment)

If there is more than one router in a LAN segment, an IGMP querier election takes place to determine which router will be the querier.

IGMPv2 routers send general membership “query” messages destined to the 224.0.0.1 multicast address

When an IGMPv2 router receives such a message, It cannot receive this “query” message, as host only “report” and not “query” that means there is another router on thet network

The router with the lowest (Layer 2) interface IP address in the LAN subnet is elected as the IGMP querier.

All the non-
querier routers (routers that did not have lowest IP and lost) start a timer that resets each time they receive a membership query report from the querier router.

If the querier router stops sending membership queries for some reason (for instance, if it is powered down), a new querier election takes place. A non-querier router waits twice the query interval, which is by default 60 seconds, and if it has heard no queries from the IGMP querier, it triggers IGMP querier election.

IGMPv3

In IGMPv2, when a receiver sends a membership report to join a multicast group, it does not specify which source it would like to receive multicast traffic from. IGMPv3 is an extension of IGMPv2 that adds support for multicast source

gives the receivers the capability to pick the source they wish to accept multicast traffic from, it could be sender in same group such as 239.0.0.12 but receiver has ability to receive from better sender

IGMPv3 is designed to coexist with IGMPv1 and IGMPv2

IGMPv3 sources can be mentioned by receivers in following ways:

Include mode: In this mode, the receiver announces membership to a multicast group address and provides a list of source addresses (the include list) from which it wants to receive traffic.

Exclude mode: In this mode, the receiver announces membership to a multicast group address and provides a list of source addresses (the exclude list) from which it does not want to receive traffic. The receiver then receives traffic only from sources whose IP addresses are not listed on the exclude list. To receive traffic from all sources, which is the behavior of IGMPv2, a receiver uses exclude mode membership with an empty exclude list.

IGMPv3 is used to provide source filtering for Source Specific Multicast (SSM).

IGMP Snooping

To optimize forwarding and remove flooding, switches need a method of sending traffic only to interested receivers.

A multicast MAC address is never used as a source MAC address

And because multicast mac address is never seen as source MAC address, and never learned (because of source based learning), multicast frame going into the switch as destination is treated as unknown frame and flood them out all ports just like broadcast

IGMP snooping works by examining IGMP joins sent by receivers and maintaining a table of groups, IGMP groups and interfaces. When the switch receives a multicast frame destined for a multicast group, it forwards the packet only out the ports where IGMP joins were received for that specific multicast group

source sending traffic to 239.255.1.1 (01:00:5E:7F:01:01). Switch 1 receives this traffic, and it forwards it out only the g0/0 and g0/2 interfaces because those are the only ports that received IGMP joins for that group.

Even with IGMP snooping enabled, some multicast groups are still flooded on all ports (for example, 224.0.0.0/24 reserved addresses Local Network Control Block).

If IGMP snooping is not enabled, then a static entry can also be added in mac address table. A multicast static entry can also be manually programmed into the MAC address table, but this is not a scalable solution because it cannot react dynamically to changes; for this reason, it is not a recommended approach.

Protocol Independent Multicast

PIM uses routing table already built that is why it is called Protocol Independent

PIM works at Layer 3

The two basic types of multicast distribution trees are
source trees as shortest path trees (SPTs), and shared trees (not shortest)

Source Trees or Shortest Path Tree SPT

A source tree or SPT or (S,G) is a multicast distribution tree where the source is the root of the tree, and branches form a distribution tree through the network towards receivers. When this tree is built, it has the shortest path through from the source to the leaves of the tree; for this reason, it is also referred to as a shortest path tree (SPT).

Forwarding state of the SPT is known by the notation (S,G), pronounced “S comma G,” where S is the sender of the multicast stream (server), and G is the multicast group address
Notice that this is Specific source / sending server per group, if a new server or sender is created, a new (S,G) will need to be built

Shared Trees or Rendezvous Point Trees RPT

A shared tree or (*,G) is a multicast distribution tree where the root of the shared tree is not the source but a router designated as the rendezvous point (RP) is, For this reason, shared trees are also referred to as rendezvous point trees (RPTs)

Multicast traffic is forwarded to RP even if source is next to receivers because of shared tree

shared tree is referred to by the notation (*,G), pronounced “star comma G.”
notice that it is “any” source per group

RP keeps record of all the senders (while R1 only has record of its sender) and is also responsible for receiving all Mcast streams, and then forwarded out of the RP

One of the benefits of shared trees over source trees is that they require fewer multicast entries (for example, S,G and *,G). For instance, as more sources are introduced into the network, sending traffic to the same multicast group, the number of multicast entries for R3 and R4 always remains the same: (*,239.1.1.1)

The major drawback of shared trees is that the receivers receive traffic from all the sources sending traffic to the same multicast group. Even though the receivers’ applications can filter out the unwanted traffic, this situation still generates a lot of unwanted network traffic, wasting bandwidth. In addition, because shared trees can allow all sources in an IP multicast group, there is a potential network security issue because unintended sources could send unwanted packets to receivers.

PIM Terminology

This diagram should be read from top or source to down, all the roles are from top to bottom such as First-hop Router

Reverse Path Forwarding (RPF) interface
The interface with the lowest-cost path to the IP address of the source (SPT) or the RP, If multiple interfaces have the same cost, the interface with the highest IP (Layer 3) address is chosen as the tiebreaker

Also known as the incoming interface (IIF) (Incoming interface because this is where incoming multicast traffic will come), The only type of interface that can accept multicast traffic coming from the source, which is the same as the RPF interface. An example of this type of interface is Te0/0/1 on R3 because the shortest path to the source is known through this interface.

Another example of this type of interface is Te0/1/2 on R5 because it is the shortest path to the source. Another example is Te1/1/1 on R7 because the shortest path to the source was determined to be through R4.

RPF neighbor
The PIM neighbor or PIM enabled router on the RPF interface, if R7 is using the RPT shared tree, the RPF neighbor would be R3, which is the lowest-cost path to the RP. If it is using the SPT, R4 would be its RPF neighbor because it offers the lowest cost to the source.

A PIM join always travels upstream toward the source

Downstream interface
Any interface that is used to forward multicast traffic down the tree, also known as an outgoing interface (OIF). An example of a downstream interface is R1’s Te0/0/0 interface, which forwards multicast traffic to R3’s Te0/0/1 interface.

Outgoing interface (OIF)
Any interface that is used to forward multicast traffic down the tree, also known as the downstream interface.

Outgoing interface list (OIL)
A group of OIFs that are forwarding multicast traffic to the same group. An example of this is R1’s Te0/0/0 and Te0/0/1 interfaces sending multicast traffic downstream to R3 and R4 for the same multicast group.

Last-hop router (LHR)
A router that is directly attached to the receivers, also known as a leaf router. It is responsible for sending PIM joins upstream toward the RP or to the source.

First-hop router (FHR)
A router that is directly attached to the source, also known as a root router. It is responsible for sending register messages to the RP.

Multicast Routing Information Base (MRIB)
A topology table that is also known as the multicast route table (mroute). It is built based on information from the unicast routing table and PIM. MRIB contains the
source S, group G,
incoming interfaces (IIF),
outgoing interfaces (OIFs),
and RPF neighbor information

for each multicast route as well as other multicast-related information.

Multicast Forwarding Information Base (MFIB)
A forwarding table that uses the MRIB to program multicast forwarding information in hardware for faster forwarding.

There are currently five PIM operating modes:

  • PIM Dense Mode (PIM-DM) or ASM (Any Source Multicast)
  • PIM Sparse Mode (PIM-SM)
  • PIM Sparse Dense Mode
  • PIM Source Specific Multicast (PIM-SSM) or ASM (Any Source Multicast)
  • PIM Bidirectional Mode (Bidir-PIM)

PIM-DM and PIM-SM are also commonly referred to as any-source multicast (ASM)

All PIM control messages use the IP protocol number 103
they are either unicast (higher TTL)
or multicast, with a TTL of 1 to the all PIM routers address 224.0.0.13

PIM Hello and Neighborship

PIM hello messages are sent by default every 30 seconds out each PIM-enabled interface to learn about the neighboring PIM routers on each interface to the all PIM routers address 224.0.0.13 (all PIM routers)

In both PIM-SM and PIM-DM, PIM neighborship is created the same way:
when directly connected routers running PIM exchange PIM Hello messages on an interface.

The difference between Sparse Mode and Dense Mode is how multicast forwarding works afterward—not how neighbors form.

When is a PIM neighborship created?

A PIM neighborship forms when:

1️ PIM is enabled on an interface
2 Another router on the same subnet also has PIM enabled
3 They exchange PIM Hello packets

These Hellos are sent to:

IPv4: 224.0.0.13
IPv6: FF02::D

Once received, routers become PIM neighbors (adjacent).

No RP, no source, no receiver required to form neighborship
Just PIM enabled + Hello exchange

PIM-SM neighborship (Sparse Mode)

In PIM-SM, neighbors are required to build multicast trees using Join messages.

What happens after neighborship forms?

Depending on router role:

LHR (Last-Hop Router) sends Join toward RP
FHR (First-Hop Router) sends Register to RP
Intermediate routers forward Join/Prune

So flow is:

Enable PIM → Hello exchange → neighborship forms → Join/Register messages start

Key idea:
Sparse Mode builds trees only where receivers exist

PIM-DM neighborship (Dense Mode)

In PIM-DM, neighborship still forms via Hello messages, but forwarding behavior differs.

After adjacency:

Routers immediately:

1 Flood multicast traffic everywhere
2 Then send Prune messages where receivers don’t exist
3 Later send Graft if receivers appear again

So flow is:

Enable PIM → Hello exchange → neighborship forms → Flood traffic → Prune unwanted paths

Key idea:
Dense Mode assumes receivers are everywhere first

Hello messages are also the mechanism used to elect a designated router (DR)

PIM Dense Mode

PIM Dense Mode (PIM-DM), Dense means that there are Multicast receivers in every subnet of the network, in other words, when the multicast receivers of a multicast group are densely populated across the network.

For PIM-DM, the multicast tree is built by flooding traffic out every interface from the source to every Dense Mode router in the network (forced feed)

As each router receives traffic for the multicast group, it must decide whether it already has active receivers wanting to receive the multicast traffic. If so, the router remains quiet and lets the multicast flow continue. If no receivers have requested the multicast stream for the multicast group on the LHR, the router sends a prune message toward the source.

That branch of the tree is then pruned off or goes offline so that the unnecessary traffic does not continue.

Initial Flooding in PIM-DM

As each router receives the multicast traffic from its upstream neighbor via its RPF interface, it forwards the multicast traffic to all its PIM-DM neighbors, This results in some traffic arriving via a non-RPF interface, as in the case of R3 receiving traffic from R2 on its non-RPF interface. Packets arriving via the non-RPF interface are discarded because it is duplicate traffic and a prune message is prepared

Each router uses Reverse Path Forwarding (RPF) to decide which incoming interface is the correct one for multicast traffic from a particular source.

R3 checks its unicast routing table to see “Which interface would I use to reach the source?”
Route to the source (through R1) is the best path.

So, the interface from R1 is the RPF interface.
The interface from R2 is non-RPF.

That means:
R3 accepts multicast packets coming from R1 (RPF interface)
R3 drops multicast packets received from R2 (non-RPF interface)

These non-RPF multicast flows are normal for the initial flooding of multicast traffic and are corrected by the normal PIM-DM pruning mechanism.

Pruning after Initial Flooding in PIM-DM

Prunes are sent out even the RPF interface when the router has no downstream members that need the multicast traffic, as is the case for R4, which has no interested receivers,
and they are also sent out non-RPF interfaces to stop the flow of multicast traffic that is arriving on non-RPF interface, in case of R3

the resulting topology after all unnecessary links have been pruned off. This results in an SPT from the source to the receiver.

This (S,G) state remains until the source stops transmitting. S is the source IP address and G is the group IP address along with the OIL containing OIF or Outgoing Interfaces and also the Incoming / RPF interfaces

In PIM-DM, prunes expire after three minutes.

This causes the multicast traffic to be reflooded to all routers just as was done during the initial flooding. This periodic (every three minutes) flood and prune behavior is normal and must be taken into account when a network is designed to use PIM-DM.

PIM-DM is applicable to small networks where there are active receivers on every subnet of the network. Because this is rarely the case, PIM-DM is not widely deployed and not recommended for production environments.

PIM Sparse Mode

PIM Sparse Mode (PIM-SM) was designed for networks with multicast application receivers scattered throughout the network—in other words, when the multicast receivers of a multicast group are sparsely populated across the network. However, PIM-SM also works well in densely populated networks. It also assumes that no receivers are interested in multicast traffic unless they explicitly request, it opposite of PIM DM

Just like PIM-DM, PIM-SM uses the unicast routing table to perform RPF checks, and it does not care which routing protocol (including static routes) populates the unicast routing table; therefore, it is protocol independent.

PIM Shared and Source Path Trees

PIM-SM uses an explicit join model where the receivers send an IGMP join to their locally connected router, which is also known as the last-hop router (LHR), and this join causes the LHR to send a PIM join in the direction of the root of the tree, which is either the RP in the case of a shared tree (RPT) or in case of SPT, the first-hop router (FHR) where the source transmitting the multicast streams is connected

A multicast forwarding state is created as the result of these explicit joins

Multicast source sends multicast traffic to the FHR. The FHR then sends this multicast traffic to the RP, which makes the multicast source known to the RP

Receiver sends an IGMP join to the LHR to join the multicast group. The LHR then sends a PIM join (*,G) to the RP, and this forms a shared tree from the RP to the LHR, this (*,G) PIM join would travel hop-by-hop to the RP, building (*,G) on all routers it is passing through.

In essence, two trees are created: an SPT from the FHR to the RP (S,G) and a shared tree from the RP to the LHR (*,G).

multicast starts flowing down from the source to the RP
and from the RP to the LHR and then finally to the receiver.

Remember and also from diagram these S,G and *,G messages always travel in reverse direction of multicast traffic flow
*,G is from LHR to RP and S,G is from RP to FHR

Receiver A attached to the LHR joins multicast group G using IGMP join. The LHR knows the IP address of the “RP for group G” – “there can be different RP per group” and LHR then sends (*,G) PIM join for this group to the RP.

Source for a group G sends a packet, the FHR that is attached to this source creates a unidirectional PIM register tunnel interface that encapsulates the multicast data received from the source in a special PIM-SM (Sparse Mode) message called the register message. The encapsulated multicast data is then unicasted due to tunnel to the RP using the PIM register tunnel. This Multicast packet needs to be encapsulated in a unicast packet to RP, so it is not multicasted through the network below FHR and this PIM register tunnel is for one way traffic (Multicast stream from FHR to RPT inside a tunnel / unicasted to RPT)

When the RP receives a register message, it decapsulates the multicast data packet inside the register message, and if there is no active shared tree because there are no interested receivers, the RP unicasts a register stop message directly to the registering FHR, without traversing the PIM register tunnel, instructing it to stop sending the register messages.

If there is an active shared tree for the group, it forwards the multicast packet down the shared tree, and it sends an (S,G) join back toward the source network S to create an (S,G) SPT. If there are multiple hops (routers) between the RP and the source, this results in an (S,G) state being created in all the routers along the SPT, There will also be a (*,G) in R1 and all of the routers between the FHR and the RP. So how can (*,G) and (S,G) co exist on same router?

(*,G): The “shared tree” state — means “any source for group G.”
It’s built towards the Rendezvous Point (RP) in both forward (Between LHR and RP) and reverse direction (between FHR and RP)?
Used before the specific source is known or joined.

(S,G): The “source tree” state — means “specific source S for group G.”
It’s built directly toward the multicast source between FHR and RPT

1. Initial IGMP Join

Receiver A (host) on R3 sends an IGMP Join for group G.
R3 (the Last-Hop Router, LHR) sends a PIM Join (*,G) upstream — towards the RP (R2).

So:

  • R3 and R2 now have (*,G) state entries.
  • This builds the shared tree (R3 → R2 → R1).

2. Source Starts Sending

When the multicast source (at R1) begins transmitting for group G:

  • R1 (the First-Hop Router, FHR) registers with the RP using a PIM Register message.
  • The RP learns that source S exists for group G.

3. Receiver Activity Triggers (S,G) Join

When the RP receives traffic from S, it knows there are active receivers (due to the earlier (*,G) join from R3).
So, the RP sends a PIM (S,G) Join back toward the source network — i.e., towards R1.

This creates:

  • An (S,G) entry in R2 and R1 (routers along that source tree path).
  • But — crucially — the (*,G) state still remains in R1 (and the path between R1 ↔ R2).

So Why Does (*,G) Exist in R1?

Even though R1 is the first-hop router (directly connected to the source), it forms a (*,G) state because:

  1. It’s part of the shared tree path from the RP to the source (built earlier when RP didn’t yet know the source existed).
    The shared tree extends from RP → R1, so all routers on that path (including R1) must maintain (*,G) state. So this (*,G) is created on all routers around RP in 360 degrees.
  2. Transitions are gradual, not instantaneous — both trees (shared and source) coexist temporarily while the network optimizes to the SPT.

As soon as the SPT is built from the source router to the RP, multicast traffic begins to flow natively from the source S to the RP instead of being encapsulated inside unicast PIM Regsiter tunnel and once the RP begins receiving data natively from source S

it sends a register stop message to the source’s FHR to inform it that it can stop sending the unicast register messages inside a tunnel. At this point, multicast traffic from the source is flowing down the SPT to the RP and, from there, down the shared tree (RPT) to the receiver – register stop message’s only function is to make FHR stop sending register message for speicific group and not to stop multicast operation

The PIM register tunnel from the FHR to the RP remains in an active up/up state even when there are no active multicast streams, and it remains active as long as there is a valid RPF path for the RP.

PIM SPT Switchover

PIM-SM allows the LHR to switch from the RPT (shared tree) to an SPT for a specific source

In Cisco routers, this is the default behavior, and it happens immediately after the first multicast packet is received from the RP via the RPT on LHR, even if shortest parth to the source is through RP.

When the LHR receives the first multicast packet from the RP, it becomes aware of the IP address of the multicast source, at this point LHR sends (S,G) PIM Join towards the “source IP” following routing table (and not RP IP) and that can result in PIM Join going out of a different interface (shorter route) than interface through which RP is reachable

This PIM Join going from LHR to FHR creates (S,G) on all routers in the path

When the LHR receives a multicast packet from the source through the SPT, if the SPT RPF interface differs from the RPT RPF interface, the LHR will start receiving duplicate multicast traffic from the source; at this moment, it will switch the RPF interface to be the SPT RPF interface and send an (S,G) PIM prune message to the RP to shut off the duplicate multicast traffic coming through the RPT.

the shortest path to the source is between R1 and R3; if that link were shut down or not present, the shortest path would be through the RP, in which case an SPT switchover would still take place, even though the path used by the SPT is the same as the RPT.

The PIM SPT switchover mechanism can be disabled for all groups or for specific groups.

If the RP has no other interfaces that are interested in the multicast traffic, it sends a PIM prune message in the direction of the FHR. If there are any routers between the RP and the FHR, this prune message would travel hop-by-hop until it reaches the FHR.

What if SPT switchover takes place and LHR’s RPF incoming interface is same for new source as last RP, Then does the reciever recieve duplicate streams? One stream from new source following the SPT switchover and one stream from RP?

No — the receiver should not continue to receive duplicate streams after the SPT switchover, even if the RPF incoming interface toward the new source is the same as toward the RP.

LHR:

Builds an (S,G) entry

Joins directly toward the source

Sends a (S,G) prune toward the RP, That prune stops the RP path from forwarding traffic downstream.

Designated Routers

When multiple PIM-SM routers exist on a LAN segment, PIM hello messages are used to elect a designated router (DR) to avoid sending duplicate multicast traffic into the LAN (LHR) or to the RP (FHR). “Designated” router on LAN to receive traffic or send traffic, so second router does not duplicate multicast on network.

By default, the DR priority value of all PIM routers is 1, and it can be changed to force a particular router to become the DR during the DR election process, where a higher DR priority is preferred. If a router in the subnet does not support the DR priority option or if all routers have the same DR priority, the highest IP address in the subnet is used as a tiebreaker.

On an FHR, the designated router is responsible for encapsulating in unicast register messages any multicast packets originated by a source that are destined to the RP.

On an LHR, the designated router is responsible for sending PIM join and prune messages toward the RP to inform it about host group membership, and it is also responsible for performing a PIM SPT switchover.

Without DRs, all LHRs on the same LAN segment would be capable of sending PIM joins upstream, which could result in duplicate multicast traffic arriving on the LAN.

On the source side, if multiple FHRs exist on the LAN, they all send register messages to the RP at the same time.

The default DR hold time is 3.5 times the PIM hello interval (PIM Hello is 30 seconds) which makes DR hold time to 105 seconds. If there are no hellos after this interval, a new DR is elected. To reduce DR failover time, the hello query interval can be reduced to speed up failover with a trade-off of more control plane traffic and CPU resource utilization of the router.

Reverse Path Forwarding

Reverse Path Forwarding is a method routers use when multicast traffic arrives on interface and it checks source address against routing table and if this is the interface., if not then interface is non RPF interface.

This is used to prevent loops and also avoid duplicated multicast traffic

If a router receives a multicast packet on an interface it uses to send unicast packets to the source, the packet has arrived on the RPF interface.

Next If the packet arrives on the RPF interface, a router forwards the packet out the interfaces present in the outgoing interface list (OIL) of a multicast routing table entry.

If the packet does not arrive on the RPF interface, the packet is discarded to prevent loops.

RPF check is performed differently for RPT and SPT

If a PIM router has an (S,G) entry present in the multicast routing table (an SPT state), the router performs the RPF check against the IP address of the “source” for the multicast packet.

If a PIM router has no explicit source-tree state, this is considered a shared-tree state. The router performs the RPF check on the address of the RP, which is known when members join the group.

PIM Assert , forwarder role

PIM assert mechanism is used to stop duplicate flows into LAN, well was it not the function of DR? yes DR does its best from control plane perspective to prevent duplicate flows, after DR elections only one router sends out PIM Join to receive traffic only on that specific DR router but in some cases (discussed below) you can still end up having duplicate multicast coming from 2 routers on same LAN and if that happens then this condition is detected and remediated using PIM Assert mechanism

in above figure, Both R2 and R3 receive traffic on their (one and only) RPF interface, as these routers dispatch multicast traffic on LAN, R2’s sent multicast hits R3’s OIF interface (due to LAN) and R3’s sent multicast traffic hits R2’s OIF, this triggers the PIM Assert mechanism on both routers as this should not happen

In other words, they detect a multicast packet for a specific (S,G) coming into their OIF that is also OIF for the same (S,G) (this OIF cannot be also IIF for same group)

R2 and R3 both send PIM assert messages into the LAN. These assert messages “send” each other following inside PIM Assert message to determine which router should forward the multicast traffic to that network segment.

  • Administrative distance to source
  • Metric or cost to the source (unicast since siurce is unicast address)
  • Highest IP address (tie-breaker)

Each router compares its own values with the received values. Preference is given to the PIM message with the lowest AD to the source. If a tie exists, the lowest route metric for the protocol wins; and as a final tiebreaker, the highest IP address is used.

The losing router prunes its interface just as if it had received a prune on this interface, and the winning router is the PIM forwarder for the LAN.

The prune times out after three minutes on the losing router and causes it to begin forwarding on the interface again. This triggers the assert process to repeat. If the winning router were to go offline, the loser would take over the job of forwarding on to this LAN segment after its prune timed out. Remember that anything relying on Prune messages will only last 3 minutes as Prune messages expire in 3 mins

The PIM forwarder concept applies to PIM-DM and PIM-SM. It is commonly used by PIM-DM but rarely required by PIM-SM because duplicate packets can end up in a LAN only if there is some sort of routing inconsistency.

PIM-SM would not send duplicate flows into the LAN as PIM-DM would because of the way PIM-SM operates.

Corner case for PIM-SM to send duplicated Multicast in LAN

PIM-SM will only forward duplicated multicast in LAN because of routing inconsistency only

R1 is the RP

R2 and R4 are running the OSPF, and R3 and R5 are running EIGRP, and this is inconsistency in this network – to be more specific 2 different routing domains on same LAN.

R4 learns about the RP (R1) through R2, and R5 learns about the RP through R3

when R4 sends a PIM join message upstream toward it, it sends the message to the all PIM routers address 224.0.0.13, and R2 and R3 receive it but in PIM-SM PIM join message includes the IP address of the upstream neighbor, also known as the RPF neighbor (which is only one neighbor – PIM neighbor on RPF interface)

R4’s RPF neighbor is R2, and R5’s RPF neighbor is R3

Receiver A and Receiver B join the same group

Receiver A’s IGMP join will cause PIM Join from R4 to both R2 and R3 (because of same LAN) R2 is the only one that will send a PIM join to R1 because PIM join from R4 has header that contains R2 as its RPF neighbor, R3 will not because the PIM join was not meant for it, from R4 it was only meant for R2 (its RPF neighbor) and R2 will send PIM Join message to RP

Similarly IGMP join from receiver B will trigger R5 to send a PIM join to to both R2 and R3, but because PIM SM’s PIM Join has RPF neighbor R3 is specified in packet, R3 is the one that will send a PIM join to R1.

At this point, traffic starts flowing downstream from R1 into R2 and R3, and duplicate packets are then sent out into the LAN and to the receivers.

At this point, the PIM assert mechanism kicks in, R3 is elected as the PIM forwarder, and R2’s OIF interface is pruned, as illustrated in the topology on the right side.

Rendezvous Points

In PIM-SM, it is mandatory to choose one or more routers to operate as rendezvous points (RPs). An RP is a single common root placed at a chosen point of a shared distribution tree, as described earlier in this chapter. An RP can be either configured statically in each router or learned through a dynamic mechanism. A PIM router can be configured to function as an RP either statically in each router in the multicast domain or dynamically by configuring Auto-RP or a PIM bootstrap router (BSR), as described in the following sections.

Note

BSR and Auto-RP were not designed to work together and may introduce unnecessary complexities when deployed in the same network. The recommendation is not to use them concurrently.

Static RP

It is possible to statically configure RP for a multicast group range by configuring the address of the RP on every router in the multicast domain. Configuring static RPs is relatively simple and can be achieved with one or two lines of configuration on each router. If the network does not have many different RPs defined or if the RPs do not change very often, this could be the simplest method for defining RPs. It can also be an attractive option if the network is small.

However, static configuration can increase administrative overhead in a large and complex network. Every router must have the same RP address. This means changing the RP address requires reconfiguring every router. If several RPs are active for different groups, information about which RP is handling which multicast group must be known by all routers. To ensure this information is complete, multiple configuration commands may be required. If a manually configured RP fails, there is no failover procedure for another router to take over the function performed by the failed RP, and this method by itself does not provide any kind of load splitting.

Auto-RP

Auto-RP is a Cisco proprietary mechanism that automates the distribution of group-to-RP mappings in a PIM network. Auto-RP has the following benefits:

  • It is easy to use multiple RPs within a network to serve different group ranges.
  • It allows load splitting among different RPs.
  • It simplifies RP placement according to the locations of group participants.
  • It prevents inconsistent manual static RP configurations that might cause connectivity problems.
  • Multiple RPs can be used to serve different group ranges or to serve as backups for each other.
  • The Auto-RP mechanism operates using two basic components: candidate RPs (C-RPs) and RP mapping agents (MAs).

Candidate RPs

A C-RP advertises its willingness to be an RP via RP announcement messages. These messages are sent by default every RP announce interval, which is 60 seconds by default, to the reserved well-known multicast group 224.0.1.39 (Cisco-RP-Announce). The RP announcements contain the default group range 224.0.0.0/4, the C-RP’s address, and the hold time, which is three times the RP announce interval. If there are multiple C-RPs, the C-RP with the highest IP address is preferred.

Note

The RP announcement can be configured to announce specific multicast groups instead of the default group range 224.0.0.0/4. This allows for having multiple RPs in the network serving different multicast groups, which is useful for RP design.

RP Mapping Agents

RP MAs join group 224.0.1.39 to receive the RP announcements. They store the information contained in the announcements in a group-to-RP mapping cache, along with hold times. If multiple RPs advertise the same group range, the C-RP with the highest IP address is elected.

The RP MAs advertise the RP mappings to another well-known multicast group address, 224.0.1.40 (Cisco-RP-Discovery). These messages are advertised by default every 60 seconds or when changes are detected. The MA announcements contain the elected RPs and the group-to-RP mappings. All PIM-enabled routers join 224.0.1.40 and store the RP mappings in their private cache.

Multiple RP MAs can be configured in the same network to provide redundancy in case of failure. There is no election mechanism between them, and they act independently of each other; they all advertise identical group-to-RP mapping information to all routers in the PIM domain.

Auto-RP mechanism where the MA periodically receives the C-RP Cisco RP announcements to build a group-to-RP mapping cache and then periodically multicasts this information to all PIM routers in the network using Cisco RP discovery messages.

Figure 13-22 Auto-RP Mechanism

With Auto-RP, all routers automatically learn the RP information, which makes it easier to administer and update RP information. Auto-RP permits backup RPs to be configured, thus enabling an RP failover mechanism.

PIM Bootstrap Router

The bootstrap router (BSR) mechanism, described in RFC 5059, is a nonproprietary mechanism that provides a fault-tolerant, automated RP discovery and distribution mechanism.

PIM uses the BSR to discover and announce RP set information for each group to all the routers in a PIM domain. This is the same function accomplished by Auto-RP, but BSR is implemented in a different way and is not compatible with Auto-RP. BSR is an IETF standard that is part of the PIM Version 2 specification, which is defined in RFC 4601.

The RP set is a group-to-RP mapping that contains the following components:

  • Multicast group range
  • RP priority
  • RP address
  • Hash mask length
  • SM/Bidir flag

Bootstrap messages (BSMs) originate on the BSR, and they are flooded hop-by-hop by intermediate routers. When a Bootstrap message is forwarded, it is forwarded out of every PIM-enabled interface that has PIM neighbors (including the one over which the message was received). Bootstrap messages use the all PIM routers address 224.0.0.13 with a TTL of 1.

To avoid a single point of failure, multiple candidate BSRs (C-BSRs) can be deployed in a PIM domain. All C-BSRs participate in the BSR election process by sending PIM Bootstrap messages containing their BSR priority out all interfaces.

The C-BSR with the highest priority is elected as the BSR and sends Bootstrap messages to all PIM routers in the PIM domain. If the BSR priorities are equal or if the BSR priority is not configured, the C-BSR with the highest IP address is elected as the BSR.

Candidate RPs

A router that is configured as a candidate RP (C-RP) receives the Bootstrap messages, which contain the IP address of the currently active BSR. Because it knows the IP address of the BSR, the C-RP can unicast candidate RP advertisement (C-RP-Adv) messages directly to it. A C-RP-Adv message carries a list of group address and group mask field pairs. This enables a C-RP to specify the group ranges for which it is willing to be the RP.

The active BSR stores all incoming C-RP advertisements in its group-to-RP mapping cache. The BSR then sends the entire list of C-RPs from its group-to-RP mapping cache in Bootstrap messages every 60 seconds by default to all PIM routers in the entire network. As the routers receive copies of these Bootstrap messages, they update the information in their local group-to-RP mapping caches, and this allows them to have full visibility into the IP addresses of all C-RPs in the network.

Unlike with Auto-RP, where the mapping agent elects the active RP for a group range and announces the election results to the network, the BSR does not elect the active RP for a group. Instead, it leaves this task to each individual router in the network.

Each router in the network uses a well-known hashing algorithm to elect the currently active RP for a particular group range. Because each router is running the same algorithm against the same list of C-RPs, they will all select the same RP for a particular group range. C-RPs with a lower priority are preferred. If the priorities are the same, the C-RP with the highest IP address is elected as the RP for the particular group range.

Figure 13-23 illustrates the BSR mechanism, where the elected BSR receives candidate RP advertisement messages from all candidate RPs in the domain, and it then sends Bootstrap messages with RP set information out all PIM-enabled interfaces, which are flooded hop-
by-hop to all routers in the network.

Figure 13-23 BSR Mechanism

Multicast Listener Discovery (MLD)

Multicast Listener Discovery (MLD) is the IPv6 equivalent of IGMP (Internet Group Management Protocol) used in IPv4. It allows IPv6 hosts to signal their interest in receiving multicast traffic and enables routers to learn which multicast groups have active receivers on a link.

It operates between hosts and LHR, not between routers themselves.

MLD supports
PIM-SM (Protocol Independent Multicast – Sparse Mode)
PIM-SSM (Source-Specific Multicast)

Application → Multicast Group Join
        ↓
Host sends MLD Report
        ↓
Router updates membership table
        ↓
Router builds multicast forwarding tree (via PIM)

Step 1: Router Sends Query
Router periodically transmits: MLD General Query

Step 2: Host Responds with Report
Interested hosts reply: MLD Report: Join FF3E::1

MLD Snooping
Switches can inspect MLD messages using MLD snooping.

MLD Packet Characteristics:

  • Encapsulated inside ICMPv6
  • Sent with hop limit = 1
  • Operate only on the local link
  • Use link-local scope addresses

Multicast commands

Using Cisco routers as hosts for Multicast send and Multicast receive

no ip routing 
ip default-gateway x.x.x.x

Basic PIM-DM configuration

no ip pim autorp

more…

coming soon

next post