IPv6 multihoming on MikroTik RouterOS: active/standby failover between two ISPs

TL;DR: I wanted IPv6 failover between two ISPs on a MikroTik RB5009. Announcing two prefixes via SLAAC technically works, but failover is slow because Router Advertisement lifetimes don't expire on demand, and LAN clients can't see upstream routing decisions anyway. As an interim workaround, a PPPoE down script that bounces the DHCPv6-PD client forces RouterOS to deprecate the dead prefix immediately, which gets failover down to about a second. The cleaner approaches are conditional Router Advertisements (RFC 8475) or Source Address Dependent Routing with RFC 8028–aware hosts (RFC 8678). Neither is a native turnkey feature on consumer/SMB gear, but both are reachable on RouterOS through scripting and policy routing, with caveats.

The setup

I recently bought a MikroTik RB5009UPr to replace my primary FTTH ISP's router and optics. It also replaced a PoE switch, letting me reclaim some desk space. It really is a sweet bit of kit.

I then had a second fiber line from another ISP (ISP2) installed, which sadly only does IPv4. My primary ISP1 hands me a single /64 via DHCPv6-PD. Yes, just a /64. RFC 6177 says end sites should get "significantly more than a single /64" without pinning a specific minimum, so a single /64 isn't a violation of any number the RFC sets, but it's clearly against the spirit of the document. That's a fight for another day. The router advertises that prefix on the LAN via Router Advertisements, and clients build their addresses with SLAAC. That part works well.

The problem

I had the bright idea of also announcing a Hurricane Electric /48 tunnel prefix to the LAN, because having only one IPv6 route when I have two IPv4 uplinks just won't do.

This worked, except failover to the HE.net tunnel route took 5 to 10 minutes. Going back to ISP1's /64 was instant.

I had set a default IPv6 gateway in RouterOS 7.14 with a distance of 2 for the HE.net tunnel. (RouterOS distance is administrative distance, a preference between route sources, not a routing metric in the OSPF/IS-IS sense, but the effect for picking between two default routes is the same.) The trouble is that LAN clients receive both prefixes simultaneously and are completely unaware of any routing decisions made upstream. They pick a source address per RFC 6724 and then send the packet to whichever default gateway looks best, which may or may not match the prefix they sourced from.

The 20000+ metric on the LAN client's default route turned out to be its own piece of evidence. It isn't a quirk of two-prefix configurations directly. It's NetworkManager's connectivity-check penalty: NM periodically runs an HTTP probe against a configured URL to detect whether a connection has full Internet access, and if the probe fails or comes back "limited" it adds 20000 to that connection's route metric to demote it without removing it. The base metric for the first Ethernet connection is 100, so a failing connectivity check on an Ethernet interface shows up as exactly 20100. The +20000 constant is documented in the upstream NetworkManager source: see man/NetworkManager.conf.xml in the freedesktop.org repository, which states "default-route of devices without global connectivity get a penalty of +20000 to the route-metric." (NM's connectivity check is enabled by default on Ubuntu and Fedora via a config snippet in /etc/NetworkManager/conf.d/; on other distros you may need to enable it explicitly.)

In my case the probe was failing intermittently because RFC 6724 source-address selection on the client was deterministic but essentially arbitrary across destinations: with two unrelated /64s on the same interface (one from ISP1 directly, one carved out of the HE /48), neither matches a typical destination longer than the other, so the algorithm falls through to lower-priority tiebreakers and effectively picks unpredictably. Probes sourced from the "wrong" prefix took an asymmetric path or got dropped, the connectivity check came back "limited," and NM applied the penalty. The 20100 metric was NM telling me, in its own language, that multi-prefix IPv6 on the LAN was broken in a way related to the same RFC 6724 mess the rest of this post is about. Here's what the resulting default route looked like:

Kernel IPv6 routing table
Destination  Next Hop  Flag  Met    Ref  Use  If
[::]/0       _gateway  UG    20100  23   0    enp106s0f3u2

Why failover was slow (and fail-back was instant)

The asymmetry has a clean root cause and it's worth spelling out, because the workaround below makes no sense without it.

Router Advertisements carry two lifetimes for each prefix: preferred lifetime and valid lifetime. When ISP1 goes down, RouterOS doesn't immediately tell anybody. The previously announced prefix sits in clients' address tables until its lifetime ticks down, or until a new RA arrives that explicitly deprecates it (preferred lifetime = 0). Meanwhile clients keep cheerfully sourcing packets from the dead prefix and dropping them into a black hole. Eventually they give up and try the HE prefix. That's your 5 to 10 minutes.

Fail-back is instant because the moment ISP1 returns, RouterOS sends a fresh RA and clients pick up the prefix immediately. There's no analogous "tell everyone right now" event on the way down unless something forces it.

Down the RFC rabbit hole

I asked the boffins in #ipv6 on Libera.chat, and rm pointed me at a stack of RFCs. The problem is largely unimplemented in consumer gear, and several of the relevant RFCs open by admitting how hard it is. RFC 8678's abstract gets right to the point: "Connecting an enterprise site to multiple ISPs over IPv6 using provider-assigned addresses is difficult without the use of some form of Network Address Translation (NAT). Much has been written on this topic over the last 10 to 15 years, but it still remains a problem without a clearly defined or widely implemented solution." That's the abstract. The first sentence. Of an RFC.

  1. RFC 5220, Problem Statement for Default Address Selection in Multi-Prefix Environments: Operational Issues of RFC 3484 Default Rules (July 2008)
  2. RFC 6724, Default Address Selection for IPv6 (September 2012). The successor to RFC 3484. Modern stacks implement this; it still doesn't fully solve multihoming.
  3. RFC 7157, IPv6 Multihoming without Network Address Translation (March 2014)
  4. RFC 8028, First-Hop Router Selection by Hosts in a Multi-Prefix Network (November 2016). Arguably the most directly relevant: it tells hosts to send a packet sourced from prefix P via the router that advertised P. Without 8028-aware hosts, even perfect router-side configuration can't fully save you.
  5. RFC 8475, Using Conditional Router Advertisements for Enterprise Multihoming (October 2018)
  6. RFC 8678, Enterprise Multihoming Using Provider-Assigned IPv6 Addresses without Network Prefix Translation: Requirements and Solutions (December 2019)
  7. RFC 8801, Provisioning Domains (PvD) for IPv6 (July 2020). The newer direction: let hosts distinguish "this prefix belongs to that uplink, with these properties."

I don't recall any IPsec RFC opening with an admission of difficulty quite that frank. These multihoming RFCs, on the other hand, have gems like (from RFC 7157):

The aforementioned document proposes a solution to this problem by introducing a new routing functionality (Source Address Dependent Routing) to solve the uplink selection issue.

SADR, or Source Address Dependent Routing, means the router picks an upstream based on which source address the client used: a packet sourced from the ISP1 prefix exits via ISP1, one sourced from the HE prefix exits via HE. Combined with RFC 8028 on the host side, this is what RFC 8678 actually recommends for the enterprise case. Neat in theory, painful in practice, and not something RouterOS does out of the box. I was sadder on learning the complex approaches that have been proposed so far.

Pinning the HE tunnel to ISP2

The HE tunnel is protocol-41 over IPv4 (6in4 per RFC 4213, manually configured endpoints, not the deprecated 6to4 anycast scheme of RFC 3056), so it has to ride one of my IPv4 uplinks. RouterOS confusingly calls its 6in4 implementation /interface 6to4; that's an upstream naming inheritance, not the actual protocol. I pin the tunnel to ISP2 with a /32 static route so the failure domains are independent at the router level: if ISP1 dies, HE keeps IPv6 alive over ISP2; if ISP2 dies, ISP1's native IPv6 is still announced on the LAN. (See "What this doesn't cover" below for the wrinkle on the ISP2-down case.)

/ip route add dst-address=216.218.xxx.xxx/32 gateway=<ISP2-nexthop> \
    distance=1 comment="pin HE tunnel endpoint via ISP2"

That 216.218.xxx.xxx is the IPv4 address of the specific Hurricane Electric tunnel server you were assigned when you created your tunnel. HE has tunnel servers around the world (Fremont, Ashburn, Frankfurt, Singapore, Tokyo, and so on), each on a different /32 in HE's 216.218.0.0/16 range. Look up your assigned server under "Server IPv4 Address" on tunnelbroker.net, or read it off the local 6to4 interface with /interface 6to4 print detail; the remote-address field is what you want. Pin a /32 to that address, not to 216.218.0.0/16, since you want the route applied only to your specific tunnel endpoint and not to all of HE.

(Note that LTE/5G/Starlink generally aren't viable as a third uplink for this purpose, since they almost always sit behind CGNAT and won't forward protocol 41 even if you wanted them to.)

The workaround for slow failover

Pinning the tunnel fixes the failure-domain problem but not the slow-RA-deprecation problem. For that, I have a PPPoE down script (thanks @CGGXANNX for the original idea) that bounces the DHCPv6-PD client whenever ISP1's PPPoE link goes down:

:local id [/ipv6 dhcp-client find where interface="pppoe-out1"]

:if ([:len $id] = 0) do={
    :log warning "ppp-dhcpv6-down: no client on pppoe-out1"
    :return
}

:do {
    /ipv6 dhcp-client set $id disabled=yes
    :delay 250ms
    /ipv6 dhcp-client set $id disabled=no
} on-error={
    :log error "ppp-dhcpv6-down: bounce failed, forcing re-enable"
    /ipv6 dhcp-client set $id disabled=no
}

This works because tearing down the DHCPv6-PD client causes RouterOS to immediately emit a Router Advertisement carrying the previously-delegated prefix with preferred-lifetime=0. That's the explicit deprecation event RFC 4862 §5.5.3 and §5.5.4 between them define: clients stop using the prefix as a source for new connections, but the prefix stays valid (the valid-lifetime field is left at its normal multi-hour value), so existing long-lived TCP sessions aren't ripped out from under applications. RouterOS also follows RFC 4861 §6.2.4's MAX_INITIAL_RTR_ADVERTISEMENTS burst pattern (constant defined in §10), sending several deprecation RAs in the first second after the state change so clients reliably hear at least one. Net effect: clients see the deprecation within roughly a second of the PPPoE down event, stop sourcing from the dead prefix, and fall back to the HE prefix on the next packet.

I confirmed this with a packet capture on the LAN side. After running the bounce script, the first RA carrying preferred-lifetime=0 for the deprecated prefix appeared on the wire about 700 ms after the disable, followed by three more in the next five seconds. The valid-lifetime field on those RAs was still ~86400, exactly as RFC 4862 prescribes. If you want to reproduce, sniff ICMPv6 type 134 on the LAN bridge with /tool sniffer across a forced bounce and look at the Prefix Information option.

Saving the script in /system script isn't enough. You have to call it from the PPP profile's on-down handler:

/ppp profile set default-airtel on-down="/system/script/run ppp-dhcpv6-down"

Substitute your own PPP profile name, and check the current value with /ppp profile print detail.

With this in place, LAN clients see the deprecation RA within about a second of the PPPoE down event and fail over on the next outgoing packet, instead of waiting 5 to 10 minutes for prefix lifetimes to time out.

What this doesn't cover: ISP2 going down

One thing the current setup handles asymmetrically. The PPPoE down script deprecates the ISP1 prefix when ISP1's link drops, so failover from native to HE is fast. But there's no analogous handler for the HE tunnel side: if ISP2 dies, the tunnel goes down at the router immediately, but LAN clients still have the HE /48 in their address tables with the original lifetime, and RFC 6724 source-address selection will keep picking it for a meaningful fraction of new connections. Those packets get black-holed exactly the way ISP1's prefix did before the workaround, and clients don't fully consolidate onto ISP1's native /64 until the HE prefix lifetime expires. Same 5 to 10 minute pain, just in the other direction.

The fix is similar in spirit but the mechanism is genuinely different. On the ISP1 side, the script bounces the DHCPv6-PD client, which is what received the prefix from upstream, and RouterOS reacts by withdrawing it and emitting deprecation RAs. On the HE side there's no DHCPv6-PD client to bounce; the HE /48 is statically configured, with one /64 announced via a manual /ipv6 nd prefix entry. To get the same deprecation-RA effect when the tunnel dies, you'd disable that /ipv6 nd prefix entry directly (or remove the address), triggered by either a /tool netwatch probing the HE tunnel server or the 6to4 interface's own state change. I haven't wired this up yet, so the post is honest about it being a known gap in the current implementation rather than a solved problem.

Cleaner alternatives

Three approaches the RFCs actually prescribe, with notes on what's possible on RouterOS today:

  1. Conditional Router Advertisements (RFC 8475). Only advertise the HE.net prefix while ISP1 is down, so clients never see two prefixes simultaneously. RouterOS has no native RFC 8475 feature, but the building blocks are there: /ipv6 nd prefix entries can be enabled and disabled from a script triggered by /tool netwatch or interface state changes. This is essentially the workaround above completed in both directions, and it's the most realistic next step on this hardware.
  2. SADR plus RFC 8028–aware hosts (RFC 8678). Router-side SADR should be achievable on RouterOS in principle via policy routing (/routing rule plus per-uplink routing tables and IPv6 mangle marks), though I haven't actually built it. The host side is the harder blocker: RFC 8028 first-hop selection is implemented well on macOS, partially on Linux, and barely on Windows. Even with a perfect router, mixed-OS LANs will still misbehave.
  3. NPTv6 / NAT66 (RFC 6296). Stateless prefix translation, or its stateful cousin NAT66. RouterOS 7 actually does have stateful NAT66 via /ipv6 firewall nat (srcnat, dstnat, and masquerade work the same as on IPv4), which gets you a working IPv6 path with IPv4-style semantics at the cost of end-to-end transparency and per-flow connection state. True stateless NPTv6 per RFC 6296 (algorithmic 1:1 prefix mapping, no state) is a distinct feature that RouterOS does not appear to expose as such; if you go this route on a MikroTik you'd be doing stateful masquerade, not NPTv6 in the strict sense. If you actually want stateless RFC 6296 behavior, that's where Linux nftables, VyOS, or Cisco/Juniper come in.

Caveats

This is failover, not true multihoming. Real multihoming would do per-flow ISP selection or load balancing; I'm doing active/standby.

Flow continuity across failover is also worse than IPv4 + NAT. Because the client's source address actually changes when it switches prefixes, anything that cached the old address (long-lived TCP, mDNS records, PCP mappings) breaks. This is the structural cost of doing IPv6 multihoming without NAT, and there's no clever script that fixes it.

PMTU is a footgun across the two paths. Native ISP1 IPv6 has a 1500-byte MTU; the HE 6in4 tunnel is 1480 (1500 minus the 20-byte outer IPv4 header). Cached PMTU entries and broken PMTUD anywhere along the path can make a perfectly working failover look like "the new path is dead" when actually it's just stuck on a too-large segment. Worth being aware of when debugging.

I haven't tuned RA preference per RFC 4191 yet, and probably should. RouterOS lets you set per-prefix RA preference (high/medium/low), and RFC 4191–capable clients use it as a hint when both routes are available. Setting the HE prefix's preference to "low" and ISP1's to "medium" or "high" gives clients a free nudge toward native, without any scripting. It doesn't solve the cached-deprecated-prefix problem on link loss, but it's a one-line improvement for steady-state behavior.

What's next

Three things on the to-do list. First, wire up the symmetric handler for the HE-tunnel-down case described above, so failover is fast in both directions. Second, implement RFC 8475–style conditional RAs on RouterOS, withdrawing the HE.net prefix entirely while ISP1 is healthy and only announcing it on failover, which removes the dual-prefix problem at the source. Third, put a proper measurement on the failover window with packet captures at the client, not just the router. If anyone has done either of the first two cleanly on ROS 7, I'd love to hear how.

Comments