Using Linux bridge to mask host asymmetrical interface name for NAD on OpenShift 4

multus nmstate openshift Muhammad Aizuddin August 11, 2023 0 Comments

Using Linux bridge to mask host asymmetrical interface name for NAD on OpenShift 4

Network Attachment Definition(NAD) is one of the features of the Multus CNI plugin. It provides a capability for the pod to have an additional network apart from standard pod networking. The typical use case is for SR-IOV/DPDK in Telco workload, or in this example, the pod needs to have an additional network to attach itself for network segmentation or performance concerns.

It is recommended to have the same hardware for the OCP cluster for manageability and scalability. However, there will be some occasions where the host’s configuration is not symmetrical.

Let`s look at the below diagram;

image-3-1024x503 Using Linux bridge to mask host asymmetrical interface name for NAD on OpenShift 4

In the above diagram, OCP Node 1 and OCP Node 2 have two different interface names and are connected to the same network segment.

Usually, this is not a problem, where we can define two distinct NAD(Network Attachment Definition)

First, enp8s0 NAD looks like this;

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: enp8s0-nad
  namespace: my-project
spec:
  config: '{ "cniVersion": "0.3.1", "type": "macvlan", "master": "enp8s0", "mode":
    "bridge", "ipam": { "type": "whereabouts", "range": "10.18.0.0/24", "exclude":
    [ "10.18.0.1/32", "10.18.0.255/32" ] } }'

And second enp3s0 NAD looks like this;

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: enp3s0-nad
  namespace: my-project
spec:
  config: '{ "cniVersion": "0.3.1", "type": "macvlan", "master": "enp3s0", "mode":
    "bridge", "ipam": { "type": "whereabouts", "range": "10.18.0.0/24", "exclude":
    [ "10.18.0.1/32", "10.18.0.255/32" ] } }'

Once this NAD is applied, from pod spec or pod deployment spec we can attach these two additional networks using the annotations field;

truncated....
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks: enp3s0-nad,enp8s0-nad 
....truncated

However, in our case, we want to simplify these NAD networks based on the same common interface name and use single NAD for pod instead of different NAD for each different interface name since they belong to the same network access segment.

Our target is in the annotations, it will look like this;

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: segment-1-nad
  namespace: my-project
spec:
  config: '{ "cniVersion": "0.3.1", "type": "macvlan", "master": "br1", "mode":
    "bridge", "ipam": { "type": "whereabouts", "range": "10.18.0.0/24", "exclude":
    [ "10.18.0.1/32", "10.18.0.255/32" ] } }'

truncated....
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks: segment-1-nad
....truncated

Notice from the above example, we have NAD name to segment-1 and master become “br1”.

image-4-1024x603 Using Linux bridge to mask host asymmetrical interface name for NAD on OpenShift 4

Let`s take a look at the above diagram, we are leveraging the nmstate operator to configure a new bridge on each node.

For first OCP Node 1, the NodeNetworkConfigurationPolicy(NNCP) looks like this;

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: br1-node1-enp8s0
spec:
  nodeSelector: 
    node-role.kubernetes.io/worker: ocp-node-1
  desiredState:
    interfaces:
    - name: br1
      description: Linux bridge with enp8s0 as a port
      type: linux-bridge
      state: up
      ipv4: 
        dhcp: false <--Layer 2 operation
        enabled: false <--Layer 2 operation
      bridge:
        options:
          stp:
            enabled: false
        port:
        - name: enp8s0

And for OCP Node 2, its NNCP looks like this;

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: br1-node2-enp3s0
spec:
  nodeSelector: 
    node-role.kubernetes.io/worker: ocp-node-2
  desiredState:
    interfaces:
    - name: br1
      description: Linux bridge with enp3s0 as a port
      type: linux-bridge
      state: up
      ipv4: 
        dhcp: false <--Layer 2 operation
        enabled: false <--Layer 2 operation
      bridge:
        options:
          stp:
            enabled: false
        port:
        - name: enp3s0

Note that ipv4 dhcp and enabled are set to false, this is due to we want the br1 interface to operate only at Layer 2. The node selector is also being used to accommodate the different interface names on each of the nodes, so only specific NNCP will be applied on the selected node.

Verify the NNCP has progressed successfully;

# oc get nncp
NAME                 STATUS      REASON
br1-node2-enp3s0     Available   SuccessfullyConfigured
br1-node1-enp8s0     Available   SuccessfullyConfigured

Now we can use a uniform interface name as “br1” across all nodes, the NAD looks like this;

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: segment-1-nad
  namespace: my-project
spec:
  config: '{ "cniVersion": "0.3.1", "type": "macvlan", "master": "br1", "mode":
    "bridge", "ipam": { "type": "whereabouts", "range": "10.18.0.0/24", "exclude":
    [ "10.18.0.1/32", "10.18.0.255/32" ] } }'

And in the deployment spec, we can use this NAD in the annotation;

truncated....
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks: segment-1-nad 
....truncated

Create the pod from this spec and we can inspect the network interface(the pod container needs to have the below binaries available to execute this test);

sh-5.2# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1400
        inet 10.128.0.69  netmask 255.255.254.0  broadcast 10.128.1.255
        inet6 fe80::858:aff:fe80:45  prefixlen 64  scopeid 0x20<link>
        ether 0a:58:0a:80:00:45  txqueuelen 0  (Ethernet)
        RX packets 7  bytes 746 (746.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 15  bytes 936 (936.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

net1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.18.0.2  netmask 255.255.255.0  broadcast 10.18.0.255
        inet6 fe80::6c87:90ff:fe02:1620  prefixlen 64  scopeid 0x20<link>
        ether 6e:87:90:02:16:20  txqueuelen 0  (Ethernet)
        RX packets 3392  bytes 1159004 (1.1 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 21  bytes 1558 (1.5 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

sh-5.2# ping 10.18.0.1 -c 4
PING 10.18.0.1 (10.18.0.1) 56(84) bytes of data.
64 bytes from 10.18.0.1: icmp_seq=1 ttl=64 time=0.360 ms
64 bytes from 10.18.0.1: icmp_seq=2 ttl=64 time=0.294 ms
64 bytes from 10.18.0.1: icmp_seq=3 ttl=64 time=0.413 ms
64 bytes from 10.18.0.1: icmp_seq=4 ttl=64 time=0.397 ms

--- 10.18.0.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3073ms
rtt min/avg/max/m

How about with VLAN?

With the same configuration, we can also use tagged VLAN with another layer of NNCP to configure the physical interface with tagged VLAN. Look at the below diagram;

image-5-1024x377 Using Linux bridge to mask host asymmetrical interface name for NAD on OpenShift 4

In the diagram, the connection to the underlying network must be a tagged VLAN with id 1018. Therefore “br1” Linux bridge should have VLAN 1018-tagged interface attached to it.

First, we need to create <nic>.1018 interface as a tagged interface with the correct physical NIC.

On OCP Node 1, enp8s0.1018 NNCP looks like this;

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: enp8s0-vlan1018
spec:
  nodeSelector: 
    node-role.kubernetes.io/worker: ocp-node-1
  desiredState:
    interfaces:
    - name: enp8s0.1018
      type: vlan
      state: up
      vlan:
        id: 1018
        base-iface: enp8s0
      ipv4:
        dhcp: false
        enabled: false

Once this enp8s0.1018 is successfully created, proceed with NNCP for “br1”;

kind: NodeNetworkConfigurationPolicy
metadata:
  name: br1-node1-enp8s0-vlan1018
spec:
  nodeSelector: 
    node-role.kubernetes.io/worker: ocp-node-1
  desiredState:
    interfaces:
    - name: br1
      description: Linux bridge with enp8s0.1018 as a port
      type: linux-bridge
      state: up
      ipv4:
        dhcp: false
        enabled: false
      bridge:
        options:
          stp:
            enabled: false
        port:
        - name: enp8s0.1018

Now for the OCP Node 2, do the same for the enp3s0 interface;

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: enp3s0-vlan1018
spec:
  nodeSelector: 
    node-role.kubernetes.io/worker: ocp-node-2
  desiredState:
    interfaces:
    - name: enp3s0.1018
      type: vlan
      state: up
      vlan:
        id: 1018
        base-iface: enp3s0
      ipv4:
        dhcp: false
        enabled: false

kind: NodeNetworkConfigurationPolicy
metadata:
  name: br1-node1-enp3s0-vlan1018
spec:
  nodeSelector: 
    node-role.kubernetes.io/worker: ocp-node-2
  desiredState:
    interfaces:
    - name: br1
      description: Linux bridge with enpes0.1018 as a port
      type: linux-bridge
      state: up
      ipv4:
        dhcp: false
        enabled: false
      bridge:
        options:
          stp:
            enabled: false
        port:
        - name: enp3s0.1018

Once these are successfully progressed and configured, we can use the same NAD to attach additional networks to the pod, but with VLAN access configured on the interface.