Arista VxLAN Using BGP EVPN

EVPN for Overlay

EVPN is a standards based (RFC 7432) extension for BGP that provides a control plane for VXLAN to deliver Layer 2 and Layer 3 VPN services. Just like the equivalent to MPLS L2VPN and L3VPNs.

Ethernet VPN (EVPN) control plane was created to resolve scalability issues which networks were facing using Multicast or Headend-Replication (HER), since every time a new vlan is introduced a mapping is required on all the leaf switches which makes it difficult to manage VxLAN using the flood and learn method which has been used till now. To overcome this scalability issue (and more) a new address family was introduced under Multi-protocol BGP called EVPN.

When configuring VxLANs, the BGP EVPN acts as an overlay to exchange NLRI (Network Layer Reachability Information) with the following two types of routes.

  • Type 2 – Host MAC and IP addresses (MAC-VRF)
  • Type 5 – IP Prefix information (IP-VRF)

If you are thinking are there any more route types between 2 and 5, the answer is yes, but I will not be going into the details of each one of them. But will show you the list of route types.

  • auto-discovery Ethernet auto-discovery (A-D) route (type 1)
  • ethernet-segment Ethernet segment route (type 4)
  • imet Inclusive multicast Ethernet tag route (type 3)
  • ip-prefix IP prefix route (type 5)
  • mac-ip MAC/IP advertisement route (type 2)

For this lab we will start with the type-2 where we will see L2 MAC address are learned through VXLAN. Further down below I will show type-5 routes using IP-VRF.

Network Topology

Arista%20BGP%20EVPN.png

Now we configure BGP EVPN for overlay. For this we will configure a peer-group called evpn and activate the peer-group under the new BGP address-family called evpn.

For this wiki page, I will only share the configuration of Spine-1 and Leaf-1/2.

Configuration on Spine-1

Configuration on Leaf-1 & Leaf-2

If you are looking for the first part of this configuration you can click this link ->Arista VxLAN with Cloudvision Exchange (CVX)

Verification

After the configuration we will verify our EVPN neighbors are all UP.

Spine-1#show bgp evpn summary 
BGP summary information for VRF default
Router identifier 10.0.250.1, local AS number 65000
Neighbor Status Codes: m - Under maintenance
  Neighbor         V  AS           MsgRcvd   MsgSent  InQ OutQ  Up/Down State   PfxRcd PfxAcc
  10.0.250.11      4 65001            188       193    0    0 02:22:14 Estab   2      2
  10.0.250.12      4 65001            162       173    0    0 02:02:43 Estab   2      2
  10.0.250.13      4 65002            138       149    0    0 01:44:40 Estab   1      1
  10.0.250.14      4 65002            134       144    0    0 01:38:44 Estab   1      1
  10.0.250.15      4 65003            148       143    0    0 01:44:18 Estab   4      4
  10.0.250.16      4 65003            132       131    0    0 01:37:06 Estab   4      4

Spine-2#show bgp evpn summary 
BGP summary information for VRF default
Router identifier 10.0.250.2, local AS number 65000
Neighbor Status Codes: m - Under maintenance
  Neighbor         V  AS           MsgRcvd   MsgSent  InQ OutQ  Up/Down State   PfxRcd PfxAcc
  10.0.250.11      4 65001            199       200    0    0 02:29:49 Estab   2      2
  10.0.250.12      4 65001            185       182    0    0 02:17:32 Estab   2      2
  10.0.250.13      4 65002            160       161    0    0 01:59:29 Estab   1      1
  10.0.250.14      4 65002            156       155    0    0 01:53:33 Estab   1      1
  10.0.250.15      4 65003            166       156    0    0 01:59:07 Estab   4      4
  10.0.250.16      4 65003            138       148    0    0 01:51:55 Estab   4      4

Now let's ping from Host-1 to it's gateway IP 10.40.40.1

Host-1#ping 10.40.40.1
PING 10.40.40.1 (10.40.40.1) 72(100) bytes of data.
80 bytes from 10.40.40.1: icmp_seq=1 ttl=64 time=72.9 ms
80 bytes from 10.40.40.1: icmp_seq=2 ttl=64 time=101 ms
80 bytes from 10.40.40.1: icmp_seq=3 ttl=64 time=103 ms
80 bytes from 10.40.40.1: icmp_seq=4 ttl=64 time=98.4 ms
80 bytes from 10.40.40.1: icmp_seq=5 ttl=64 time=103 ms

--- 10.40.40.1 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 50ms
rtt min/avg/max/mdev = 72.934/95.943/103.632/11.658 ms, pipe 5, ipg/ewma 12.667/84.851 ms

Similarly Host-2 and 3 will be able to ping their respective gateways configured on Leaf-3/4 and Leaf-5/6. If you are wondering how can each Host ping the same gateway IP (10.40.40.1) configured on its respective Leaf pair. I am using a technology called Virtual ARP (VARP) which makes each gateway to have the same IP address commonly known as Anycast Gateway.
Host-2#ping 10.40.40.1
PING 10.40.40.1 (10.40.40.1) 72(100) bytes of data.
80 bytes from 10.40.40.1: icmp_seq=1 ttl=64 time=292 ms
80 bytes from 10.40.40.1: icmp_seq=2 ttl=64 time=312 ms
80 bytes from 10.40.40.1: icmp_seq=3 ttl=64 time=362 ms
80 bytes from 10.40.40.1: icmp_seq=4 ttl=64 time=359 ms
80 bytes from 10.40.40.1: icmp_seq=5 ttl=64 time=415 ms

--- 10.40.40.1 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 48ms
rtt min/avg/max/mdev = 292.130/348.444/415.906/43.277 ms, pipe 5, ipg/ewma 12.054/323.383 ms

!-- The IP configured on SVI 40.

Host-2#show interfaces vlan 40
Vlan40 is up, line protocol is up (connected)
  Hardware is Vlan, address is 0c8b.f1ee.9bed (bia 0c8b.f1ee.9bed)
  Internet address is 10.40.40.20/24
  Broadcast address is 255.255.255.255
  IP MTU 1500 bytes (default)
  Up 2 hours, 4 minutes, 58 seconds

Similarly on Host-3
Host-3#ping 10.40.40.1
PING 10.40.40.1 (10.40.40.1) 72(100) bytes of data.
80 bytes from 10.40.40.1: icmp_seq=1 ttl=64 time=293 ms
80 bytes from 10.40.40.1: icmp_seq=2 ttl=64 time=288 ms
80 bytes from 10.40.40.1: icmp_seq=3 ttl=64 time=348 ms
80 bytes from 10.40.40.1: icmp_seq=4 ttl=64 time=351 ms
80 bytes from 10.40.40.1: icmp_seq=5 ttl=64 time=378 ms

--- 10.40.40.1 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 67ms
rtt min/avg/max/mdev = 288.829/332.258/378.967/35.111 ms, pipe 5, ipg/ewma 16.810/315.576 ms

Host-3#show interfaces vlan 40
Vlan40 is up, line protocol is up (connected)
  Hardware is Vlan, address is 0c8b.f17f.f1ac (bia 0c8b.f17f.f1ac)
  Internet address is 10.40.40.30/24
  Broadcast address is 255.255.255.255
  IP MTU 1500 bytes (default)
  Up 2 hours, 6 minutes, 29 seconds

Now let's ping from Host-1 to Host-2 and Host-3
Host-1#ping 10.40.40.20
PING 10.40.40.20 (10.40.40.20) 72(100) bytes of data.
80 bytes from 10.40.40.20: icmp_seq=1 ttl=64 time=105 ms
80 bytes from 10.40.40.20: icmp_seq=2 ttl=64 time=106 ms
80 bytes from 10.40.40.20: icmp_seq=3 ttl=64 time=113 ms
80 bytes from 10.40.40.20: icmp_seq=4 ttl=64 time=110 ms
80 bytes from 10.40.40.20: icmp_seq=5 ttl=64 time=111 ms

--- 10.40.40.20 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 42ms
rtt min/avg/max/mdev = 105.392/109.425/113.594/3.209 ms, pipe 5, ipg/ewma 10.568/107.564 ms

Host-1#ping 10.40.40.30
PING 10.40.40.30 (10.40.40.30) 72(100) bytes of data.
80 bytes from 10.40.40.30: icmp_seq=1 ttl=64 time=63.2 ms
80 bytes from 10.40.40.30: icmp_seq=2 ttl=64 time=54.1 ms
80 bytes from 10.40.40.30: icmp_seq=3 ttl=64 time=86.8 ms
80 bytes from 10.40.40.30: icmp_seq=4 ttl=64 time=83.2 ms
80 bytes from 10.40.40.30: icmp_seq=5 ttl=64 time=83.7 ms

--- 10.40.40.30 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 49ms
rtt min/avg/max/mdev = 54.169/74.259/86.868/13.069 ms, pipe 5, ipg/ewma 12.397/69.516 ms

As you can see that from Host-1 I can ping Host-2 and Host-3 which are also within the same subnet but connected to different leaf switches.

To check the learned VTEPs.

Leaf-1#show vxlan vtep
Remote VTEPS for Vxlan1:

VTEP              Tunnel Type(s) 
----------------- -------------- 
10.0.255.12       unicast        
10.0.255.13       unicast        

Total number of remote VTEPS:  2

!-- Now let us check the vxlan address table.
Leaf-1#show vxlan address-table 
          Vxlan Mac Address Table
----------------------------------------------------------------------

VLAN  Mac Address     Type     Prt  VTEP             Moves   Last Move
----  -----------     ----     ---  ----             -----   ---------
  40  0c8b.f17f.f1ac  EVPN     Vx1  10.0.255.13      1       0:07:29 ago
  40  0c8b.f1ee.9bed  EVPN     Vx1  10.0.255.12      1       0:03:52 ago
1006  0c8b.f106.7416  EVPN     Vx1  10.0.255.13      1       2:12:43 ago
1006  0c8b.f1bd.db19  EVPN     Vx1  10.0.255.13      1       2:19:54 ago
Total Remote Mac Addresses for this criterion: 4

!-- From the above output we can see that the remote VTEPs are learned through EVPN and have been learned through VLAN 40. VLAN 1006 is dynamically assigned for Management_VRF.

Leaf-1#show vxlan vni 
VNI to VLAN Mapping for Vxlan1
VNI          VLAN       Source       Interface             802.1Q Tag 
------------ ---------- ------------ --------------------- ---------- 
110040       40         static       Port-Channel121       40         
                                     Vxlan1                40         

VNI to dynamic VLAN Mapping for Vxlan1
VNI          VLAN       VRF                  Source        
------------ ---------- -------------------- ------------  
101144       1006       Management_VRF       evpn

The VXLAN header includes a 24-bit field called the VXLAN Network Identifier (VNI), which allows us to have up to 16 million layer-2 domain. The VNI to VLAN mapping is the configuration we had done on all the Leaf switches under interface vxlan 1.

interface Vxlan1
   vxlan source-interface Loopback1
   vxlan udp-port 4789
   vxlan vlan 40 vni 110040               <------ L2VNI
   vxlan vrf Management_VRF vni 101144          <------ For L3VNI or type 5 routes
   vxlan learn-restrict any

But on Host-1 I have a SVI-50 with an IP 10.50.50.11.

Host-1#show ip int br
                                                                        Address 
Interface       IP Address          Status      Protocol         MTU    Owner   
--------------- ------------------- ----------- ------------- --------- ------- 
Management1     unassigned          down        down            1500            
Vlan40          10.40.40.10/24      up          up              1500            
Vlan50          10.50.50.11/24      up          up              1500

On Host-3 I have a SVI-60 with an IP 10.60.60.33.

Host-3#show ip int br
                                                                        Address 
Interface       IP Address          Status      Protocol         MTU    Owner   
--------------- ------------------- ----------- ------------- --------- ------- 
Management1     unassigned          down        down            1500            
Vlan40          10.40.40.30/24      up          up              1500            
Vlan60          10.60.60.33/24      up          up              1500

Now I will ping from Host-1 vlan 50 to Host-3 vlan 60.

Host-1#ping 10.60.60.33 source 10.50.50.11
PING 10.60.60.33 (10.60.60.33) from 10.50.50.11 : 72(100) bytes of data.
80 bytes from 10.60.60.33: icmp_seq=1 ttl=62 time=45.1 ms
80 bytes from 10.60.60.33: icmp_seq=2 ttl=62 time=53.8 ms
80 bytes from 10.60.60.33: icmp_seq=3 ttl=62 time=65.8 ms
80 bytes from 10.60.60.33: icmp_seq=4 ttl=62 time=45.8 ms
80 bytes from 10.60.60.33: icmp_seq=5 ttl=62 time=31.9 ms

--- 10.60.60.33 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 130ms
rtt min/avg/max/mdev = 31.960/48.531/65.801/11.136 ms, pipe 3, ipg/ewma 32.709/46.309 ms

So how does this work?
I have a L3VPN configured shown in the vxlan1 configuration above, this is allowing to reach between different vlans by mapping a VxLAN Network Identifier (VNI) to a VRF context, in this scenario the VRF is named Management_VRF. This concept of L3VNI is known as Symmetric IRB.

Symmetric vs Asymmetric IRB

Let me try to explain this concept which is known as Symmetric Integrated Routing & Bridging (IRB), since in this previous example I have vlan 50 configured on Leaf-1 and Leaf-2 and vlan 60 configured on Leaf-5 and Leaf-6 but they both are configured within the Management_VRF, so they can communicate with each other. Else if we were to do L2 only services and had to switch the traffic between two different vlans across seperate VTEPs, the traffic would have internally switched to a known VTEP, routed across the overlay network to the remote VTEP and internally switched to the corresponding vlan. Similarly return traffic from the remote vlan e.g vlan 60 would have internally switched to vlan 40 and routed back to the VTEP configured on L1/L2 and switched to vlan 50. This concept is known as Asymmetric Integrated Routing & Bridging (IRB).

R1
router bgp 65001
   vrf Management_VRF
      rd 10.0.250.11:1
      route-target import evpn 1:101144
      route-target export evpn 1:101144
      redistribute connected

Now for the prefix on Host-3 10.60.60.33/24, let see the route information on Leaf-1, where I will be specifying the route-type ip-prefix (type 5).

Leaf-1#show bgp evpn route-type ip-prefix 10.60.60.33/24
BGP routing table information for VRF default
Router identifier 10.0.250.11, local AS number 65001
BGP routing table entry for ip-prefix 10.60.60.0/24, Route Distinguisher: 10.0.250.15:1
 Paths: 2 available
  65000 65003
    10.0.255.13 from 10.0.250.1 (10.0.250.1)
      Origin IGP, metric -, localpref 100, weight 0, valid, external, ECMP head, ECMP, best, ECMP contributor
      Extended Community: Route-Target-AS:1:101144 TunnelEncap:tunnelTypeVxlan EvpnRouterMac:0c:8b:f1:bd:db:19
      VNI: 101144
  65000 65003
    10.0.255.13 from 10.0.250.2 (10.0.250.2)
      Origin IGP, metric -, localpref 100, weight 0, valid, external, ECMP, ECMP contributor
      Extended Community: Route-Target-AS:1:101144 TunnelEncap:tunnelTypeVxlan EvpnRouterMac:0c:8b:f1:bd:db:19
      VNI: 101144
BGP routing table entry for ip-prefix 10.60.60.0/24, Route Distinguisher: 10.0.250.16:1
 Paths: 2 available
  65000 65003
    10.0.255.13 from 10.0.250.2 (10.0.250.2)
      Origin IGP, metric -, localpref 100, weight 0, valid, external, ECMP head, ECMP, best, ECMP contributor
      Extended Community: Route-Target-AS:1:101144 TunnelEncap:tunnelTypeVxlan EvpnRouterMac:0c:8b:f1:06:74:16
      VNI: 101144
  65000 65003
    10.0.255.13 from 10.0.250.1 (10.0.250.1)
      Origin IGP, metric -, localpref 100, weight 0, valid, external, ECMP, ECMP contributor
      Extended Community: Route-Target-AS:1:101144 TunnelEncap:tunnelTypeVxlan EvpnRouterMac:0c:8b:f1:06:74:16
      VNI: 101144

Here we can see that Leaf-1 is learning this route from Leaf-5 10.0.250.15 and Leaf-6 10.0.250.16 via Spine-1 and Spine-2. Leaf-5 and Leaf-6 are advertising this prefix to both Spine-1 and Spine-2 over two equal cost multipaths (ECMP).

I will conclude this topic here, I hope this wiki will be helpful in understanding BGP EVPN with a more practical approach. If you have any suggestions or require me to provide more details please comment below.

If you are wondering why is there a BG-Router1 in this topology. It is basically injecting a default route in Management_VRF in case the traffic needs to exit this POD.

Output of Leaf-1 showing the default route learned in Management_VRF.

Leaf-1#show ip bgp vrf Management_VRF 
BGP routing table information for VRF Management_VRF
Router identifier 10.50.50.2, local AS number 65001
Route status codes: s - suppressed, * - valid, > - active, E - ECMP head, e - ECMP
                    S - Stale, c - Contributing to ECMP, b - backup, L - labeled-unicast
                    % - Pending BGP convergence
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI Origin Validation codes: V - valid, I - invalid, U - unknown
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop

          Network                Next Hop              Metric  AIGP       LocPref Weight  Path
 * >Ec    0.0.0.0/0              10.0.255.13           0       -          100     0       65000 65003 64999 ?
 *  ec    0.0.0.0/0              10.0.255.13           0       -          100     0       65000 65003 64999 ?
 *  ec    0.0.0.0/0              10.0.255.13           0       -          100     0       65000 65003 64999 ?
 *  ec    0.0.0.0/0              10.0.255.13           0       -          100     0       65000 65003 64999 ?
 * >      10.50.50.0/24          -                     -       -          -       0       i
 * >Ec    10.60.60.0/24          10.0.255.13           0       -          100     0       65000 65003 i
 *  ec    10.60.60.0/24          10.0.255.13           0       -          100     0       65000 65003 i
 *  ec    10.60.60.0/24          10.0.255.13           0       -          100     0       65000 65003 i
 *  ec    10.60.60.0/24          10.0.255.13           0       -          100     0       65000 65003 i
 * >Ec    10.90.90.0/29          10.0.255.13           0       -          100     0       65000 65003 i
 *  ec    10.90.90.0/29          10.0.255.13           0       -          100     0       65000 65003 i
 *  ec    10.90.90.0/29          10.0.255.13           0       -          100     0       65000 65003 i
 *  ec    10.90.90.0/29          10.0.255.13           0       -          100     0       65000 65003 i

References:
Arista EVPN
RFC7432

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License