Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
September 11, 2021 12:00 pm GMT

Deep dive into Linux Networking and Docker - Bridge, vETH and IPTables

This was originally published here: https://aly.arriqaaq.com/linux-networking-bridge-iptables-and-docker/

These series of articles are my log of learning about various networking concepts related to Container Orchestration Platforms (Docker, Kubernetes, etc)

Linux Networking is a very interesting topic. In this series, my aim is to dig deep to understand the various ways in which these container orchestration platforms implement network internals underneath.

Getting Started

A few questions before getting started.

1) What are namespaces?

TLDR, a linux namespace is an abstraction over resources in the operating system. Namespaces are like separate houses with their own sets of isolated resources. There are currently 7 types of namespaces Cgroup, IPC, Network, Mount, PID, User, UTS

Network isolation is what we are interested in, so we will be discussing in depth about network namespaces.

2) How to follow along?

All the examples in this article have been made on a fresh vagrant Ubuntu Bionic virtual machine.

Linux Networking is a very interesting topic. In this series, my aim is to dig deep to understand the various ways in which these container orchestration platforms implement network internals underneath.

Getting Started

A few questions before getting started.

1) What are namespaces?

TLDR, a linux namespace is an abstraction over resources in the operating system. Namespaces are like separate houses with their own sets of isolated resources. There are currently 7 types of namespaces Cgroup, IPC, Network, Mount, PID, User, UTS

Network isolation is what we are interested in, so we will be discussing in depth about network namespaces.

2) How to follow along?

All the examples in this article have been made on a fresh vagrant Ubuntu Bionic virtual machine.

Linux Networking is a very interesting topic. In this series, my aim is to dig deep to understand the various ways in which these container orchestration platforms implement network internals underneath.

Getting Started

A few questions before getting started.

1) What are namespaces?

TLDR, a linux namespace is an abstraction over resources in the operating system. Namespaces are like separate houses with their own sets of isolated resources. There are currently 7 types of namespaces Cgroup, IPC, Network, Mount, PID, User, UTS

Network isolation is what we are interested in, so we will be discussing in depth about network namespaces.

2) How to follow along?

All the examples in this article have been made on a fresh vagrant Ubuntu Bionic virtual machine.

vagrant init ubuntu/bionic64vagrant upvagrant ssh

Exploring Network Namespaces

How do platforms virtualise network resources to isolate containers by assigning them a dedicated network stack, and making sure these containers do not interfere with the host (or neighbouring containers)? Network Namespace. A network namespace isolates network related resources a process running in a distinct network namespace has its own networking devices, routing tables, firewall rules etc.Lets create one quickly.

ip netns add ns1

And voila! You have your isolated network namespace (ns1) created just like that. Now you can go ahead and run any process inside this namespace.

ip netns exec ns1 python3 -m http.server 8000

This was pretty neat! The exec $namespace $command executes $command in the named network namespace $namespace. This means that the process runs within its own network stack, separate from the host, and can communicate only through the interfaces defined in the network namespace.

Host NamespaceBefore you read ahead, Id like to draw your attention on the default namespace for the **host **network. Lets list down all the namespaces

ip netns# all namespacesns1default

You can notice the default **namespace that is created. This is the **host namespace, which implies whatever services that you run simply on your VM or your machine, is run under this namespace. This would be important to note moving forward.

Creating a Network Namespace

So with that said, lets quickly move forward and create two isolated network namespaces (similar to two containers)

#!/usr/bin/env bashNS1="ns1"NS2="ns2"# create namespaceip netns add $NS1ip netns add $NS2

Connecting the cables

We need to go ahead and connect these namespaces to our host network. The vETH (virtual Ethernet) device helps in making this connection. vETH is a local Ethernet tunnel, and devices are created in pairs.Packets transmitted on one device in the pair are immediately received on the other device. When either device is down, the link state of the pair is down.

#!/usr/bin/env bashNS1="ns1"VETH1="veth1"VPEER1="vpeer1"NS2="ns2"VETH2="veth2"VPEER2="vpeer2"# create namespaceip netns add $NS1ip netns add $NS2# create veth linkip link add ${VETH1} type veth peer name ${VPEER1}ip link add ${VETH2} type veth peer name ${VPEER2}

Think of VETH like a network cable. One end is attached to the host network, and the other end to the network namespace created. Lets go ahead and connect the cable, and bring these interfaces up.

# setup veth linkip link set ${VETH1} upip link set ${VETH2} up# add peers to nsip link set ${VPEER1} netns ${NS1}ip link set ${VPEER2} netns ${NS2}

Localhost

Ever wondered how localhost works? Well, the loopback interface directs the traffic to remain within the local system. So when you run something on localhost (127.0.0.1), you are essentially using the loopback interface to route the traffic through. Lets bring the loopback interface up in case wed want to run a service locally, and also bring up the peer interfaces inside our network namespace to start accepting traffic.

# setup loopback interfaceip netns exec ${NS1} ip link set lo upip netns exec ${NS2} ip link set lo up# setup peer ns interfaceip netns exec ${NS1} ip link set ${VPEER1} upip netns exec ${NS2} ip link set ${VPEER2} up

In order to connect to the network, a computer must have at least one network interface. Each network interface must have its own unique IP address. The IP address that you give to a host is assigned to its network interface.But does every network interface require an IP address right? Well, not really. Well see that in the coming steps.

# assign ip address to ns interfacesVPEER_ADDR1="10.10.0.10"VPEER_ADDR2="10.10.0.20"ip netns exec ${NS1} ip addr add ${VPEER_ADDR1}/16 dev ${VPEER1}ip netns exec ${NS2} ip addr add ${VPEER_ADDR2}/16 dev ${VPEER2}

Remember, here weve only assigned network addresses to the interfaces inside the network namespaces (ns1 (vpeer1), ns2 (vpeer2)). The host namespaces interfaces do not have an IP assigned (veth1, veth2). Why? Do we need it? Well, not really.

Build Bridges, Not Walls

Men build too many walls and not enough bridges

Remember that when you have multiple containers running, and want to send traffic to these containers, wed require a bridge to connect them. A network bridge creates a single, aggregate network from multiple communication networks or network segments. A bridge is a way to connect two Ethernet segments together in a protocol independent way. **Packets are forwarded based on Ethernet address*, rather than IP address (like a router). *Since forwarding is done at Layer 2, all protocols can go transparently through a bridge. Bridging is distinct from routing. Routing allows multiple networks to communicate independently and yet remain separate, whereas bridging connects two separate networks as if they were a single network.

Docker has a **docker0 **bridge underneath to direct traffic. When Docker service starts, a Linux bridge is created on the host machine. The various interfaces on the containers talk to the bridge, and the bridge proxies to the external world. Multiple containers on the same host can talk to each other through the Linux bridge.

So lets go ahead and create a bridge.

BR_ADDR="10.10.0.1"BR_DEV="br0"# setup bridgeip link add ${BR_DEV} type bridgeip link set ${BR_DEV} up# assign veth pairs to bridgeip link set ${VETH1} master ${BR_DEV}ip link set ${VETH2} master ${BR_DEV}# setup bridge ipip addr add ${BR_ADDR}/16 dev ${BR_DEV}

Now that we have out network interfaces connected to the bridge, how do these interfaces know how to direct the traffic to the host? The route tables in both network namespaces only have route entries for their respective subnet IP range.

Since we have the VETH pairs connected to the bridge, the bridge network address is available to these network namespaces. Lets add a default route to direct the traffic to the bridge.

# add default routes for nsip netns exec ${NS1} ip route add default via ${BR_ADDR}ip netns exec ${NS2} ip route add default via ${BR_ADDR}

Done. Sweet! We have a proper setup to test if our containers can talk to each other. Lets finally interact with the namespaces.

# add default routes for nsip netns exec ${NS1} ping ${VPEER_ADDR2}PING 10.10.0.20 (10.10.0.20) 56(84) bytes of data.64 bytes from 10.10.0.20: icmp_seq=1 ttl=64 time=0.045 ms64 bytes from 10.10.0.20: icmp_seq=2 ttl=64 time=0.039 msip netns exec ${NS2} ping ${VPEER_ADDR1}PING 10.10.0.10 (10.10.0.10) 56(84) bytes of data.64 bytes from 10.10.0.10: icmp_seq=1 ttl=64 time=0.045 ms64 bytes from 10.10.0.10: icmp_seq=2 ttl=64 time=0.039 ms

MASQUERADE

We are able to send traffic between the namespaces, but we havent tested sending traffic outside the container. And for that, wed need to use IPTables to masquerade the outgoing traffic from our namespace.

# enable ip forwardingbash -c 'echo 1 > /proc/sys/net/ipv4/ip_forward'iptables -t nat -A POSTROUTING -s ${BR_ADDR}/16 ! -o ${BR_DEV} -j MASQUERADE

MASQUERADE modifies the source address of the packet, replacing it with the address of a specified network interface. This is similar to SNAT, except that it does not require the machines IP address to be known in advance.Basically, what we are doing here is that we are adding an entry to NAT table, to masquerade the outgoing traffic from the bridge, except for the bridge traffic itself. With this, we are done with a basic setup on how docker actually implements linux network stack to isolate containers. You can find the entire script here.

Now lets dive deep into how docker works with various networking setups.

How does Docker work?

Each Docker container has its own network stack, where a new network namespace is created for each container, isolated from other containers. When a Docker container launches, the Docker engine assigns it a network interface with an IP address, a default gateway, and other components, such as a routing table and DNS services.

Docker offers five network types. All these network types are configured through docker0 via the --net flag

1. Host Networking (--net=host*)*: The container shares the same network namespace of the default host.

You can verify this easily.

# check the network interfaces on the hostip addr1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00    inet 127.0.0.1/8 scope host lo       valid_lft forever preferred_lft forever    inet6 ::1/128 scope host       valid_lft forever preferred_lft forever2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000    link/ether 02:98:b0:9b:6c:78 brd ff:ff:ff:ff:ff:ff    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3       valid_lft 85994sec preferred_lft 85994sec    inet6 fe80::98:b0ff:fe9b:6c78/64 scope link       valid_lft forever preferred_lft forever3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default    link/ether 02:42:e5:72:10:c0 brd ff:ff:ff:ff:ff:ff    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0       valid_lft forever preferred_lft forever

Run docker in host mode, and you will see it lists out the same set of interfaces.

# check the network interfaces in the containerdocker run --net=host -it --rm alpine ip addr1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00    inet 127.0.0.1/8 scope host lo       valid_lft forever preferred_lft forever    inet6 ::1/128 scope host       valid_lft forever preferred_lft forever2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000    link/ether 02:98:b0:9b:6c:78 brd ff:ff:ff:ff:ff:ff    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3       valid_lft 85994sec preferred_lft 85994sec    inet6 fe80::98:b0ff:fe9b:6c78/64 scope link       valid_lft forever preferred_lft forever3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default    link/ether 02:42:e5:72:10:c0 brd ff:ff:ff:ff:ff:ff    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0       valid_lft forever preferred_lft forever

2. Bridge Networking ( net=bridge/default*):* In this mode, the default bridge is used as the bridge for containers to connect to each other.The container runs in an isolated network namespace. Communication is open to other containers in the same network. Communication with services outside of the host goes through network address translation (NAT) before exiting the host. Weve already seen above, the creation of a bridge network.

# check the network interfaces in the containerdocker run --net=bridge -it --rm alpine ip addr1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00    inet 127.0.0.1/8 scope host lo       valid_lft forever preferred_lft forever16: eth0@if17: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff    inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0       valid_lft forever preferred_lft forever

You can notice that there is an eth0 veth pair that has been created for this container and the corresponding pair should exist on the host machine

# check the network interfaces on the hostip addr21: veth8a812a3@if20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default    link/ether d2:c4:4e:d4:08:ad brd ff:ff:ff:ff:ff:ff link-netnsid 0    inet6 fe80::d0c4:4eff:fed4:8ad/64 scope link       valid_lft forever preferred_lft forever

3. Custom bridge network ( network=xxx*):* This is the same as Bridge Networking but uses a custom bridge explicitly created for containers.

# create custom bridgedocker network create foo2b25342b1d883dd134ed8a36e3371ef9c3ec77cdb9e24a0365165232e31b17b6# check the bridge interface on the host22: br-2b25342b1d88: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default    link/ether 02:42:49:79:07:30 brd ff:ff:ff:ff:ff:ff    inet 172.18.0.1/16 brd 172.18.255.255 scope global br-2b25342b1d88       valid_lft forever preferred_lft forever    inet6 fe80::42:49ff:fe79:730/64 scope link       valid_lft forever preferred_lft forever

You can see that on the custom creation of a bridge, a bridge interface is added to the host. Now, all containers in a custom bridge can communicate with the ports of other containers on that bridge. This provides better isolation and security.

Now lets run two containers in different terminals

# terminal 1docker run -it --rm --name=container1 --network=foo alpine sh# terminal 2docker run -it --rm --name=container2 --network=foo alpine sh# check the network interfaces on the hostip addr22: br-2b25342b1d88: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default    link/ether 02:42:49:79:07:30 brd ff:ff:ff:ff:ff:ff    inet 172.18.0.1/16 brd 172.18.255.255 scope global br-2b25342b1d88       valid_lft forever preferred_lft forever    inet6 fe80::42:49ff:fe79:730/64 scope link       valid_lft forever preferred_lft forever30: veth86ca323@if29: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-2b25342b1d88 state UP group default    link/ether 1e:5e:66:ea:47:1e brd ff:ff:ff:ff:ff:ff link-netnsid 0    inet6 fe80::1c5e:66ff:feea:471e/64 scope link       valid_lft forever preferred_lft forever32: vethdf5e755@if31: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-2b25342b1d88 state UP group default    link/ether ba:2b:25:23:a3:40 brd ff:ff:ff:ff:ff:ff link-netnsid 1    inet6 fe80::b82b:25ff:fe23:a340/64 scope link       valid_lft forever preferred_lft forever

As expected, with bridge networking, both containers (container1, container2) have got their respective veth (veth86ca323, vethdf5e755) cable attached. You can verify this bridge simply by running:

# you can notice both the containers are connected via the same bridgebrctl show br-2b25342b1d88bridge name bridge id       STP enabled interfacesbr-2b25342b1d88     8000.024249790730   no  veth86ca323                            vethdf5e755

4. Container-defined Networking( net=container:$container2*):* With this enabled, the container created shares its network namespace with the container called $container2.

# create a container in terminal 1docker run -it --rm --name=container1  alpine sh# ip addr1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00    inet 127.0.0.1/8 scope host lo       valid_lft forever preferred_lft forever33: eth0@if34: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff    inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0       valid_lft forever preferred_lft forever# create a container in terminal 1docker run -it --rm --name=container2 --network=container:container1 alpine ip addr1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00    inet 127.0.0.1/8 scope host lo       valid_lft forever preferred_lft forever33: eth0@if34: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff    inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0       valid_lft forever preferred_lft forever

You can see that both the container share the same network interface.

5. No networking: This option disables all networking for the container

docker run --net=none alpine ip addr1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00    inet 127.0.0.1/8 scope host lo       valid_lft forever preferred_lft forever

You can notice that only the lo (loopback) interface is enabled, nothing else is configured in this container.

In the next article, we will dive deeper (inshaAllah) into how docker manipulates iptables rules to provide network isolation.


Original Link: https://dev.to/arriqaaq/diving-into-linux-networking-and-docker-bridge-veth-and-iptables-419a

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To