Or how to (ab)use Kubernetes to get your way.
What I want to achieve
I want to forward a software's traffic so that its public IP is not the server's IP.
Preferably I'd like to use Wireguard which is a modern, secure, very low overhead kernel based L3 VPN. In short, Wireguard has very few lines of code compared to other VPN solutions and uses simple modern cryptography. It's Integrated in the Linux kernel (as well as OpenBSD and apparently soon FreeBSD). Plus the documentation is awesome :).
My requirements are :
- As setup-agnostic as possible
- Only route the internet requests through it (i.e make local network still work!)
- Simple. Probably the most difficult part :).
How my setup is done
More details in an upcoming blog post! Subscribe to RSS for more :D.
My setup is basically a weird beast made of a K3s (lightweight distribution of Kubernetes) running on top of a NixOS host. I won't delve into why I have this setup as this deserves a blog post on its own.
Most of my applications are on Kubernetes, that's what I'll use! Except if there are too many reasons I should use host based services.
Easy testing setup
wg-quick is a shell wrapper that allows deploying Wireguard in an easier way.
Let's take my provider's Wireguard configuration by my provider and enable it
using wg-quick up
.
What does it do ?
It creates a network interface named wg-0
(see ip a
). And adds some
ip rules
. More on this later!
That's nice, but I only want one process' traffic routed through the VPN, not all of them. So how do I do that ?
Well since I use Kubernetes, I'm using containers (also known as Docker or OCI containers). So let's try to work with that.
In-Docker setup
Let's try running Wireguard inside a container, shall we?
docker run -it ubuntu bash
We need to install wg-quick, iproute2 (the ip
tool we saw above ;) ), vim and a
couple of useful tools.
apt update && apt install vim wg-quick iproute2
Let's try to run wg-quick
!..
RTNETLINK answers: Operation not permitted
It errors out with a Netlink error. This is a permission issue where it's trying to access some privileged interfaces of the kernel but by default, docker containers have reduced privileges.
We can use --privileged
to run the container which will solve our error, for
now. (DO NOT USE THIS AT HOME, IT'S NOT THE
END SOLUTION!).
Setup works using wg-quick! We can test this using curl https://ifconfig.me
PoLP (Principle of Least Privilege)
As a security-aware person I don't really like having a --privileged
laying
around. So how can we fix this ? Let's discuss a bit what --privileged
actually does (or what its absence does).
Privileged deactivates two things which are important in our case : capabilities
(see man capabilities
or online).
Capabilities are Linux' answer to separating root into different privileges. So
when you're root on linux, you're not necessarily the big boss on the machine.
Another thing that's not limited in privileged mode are syscalls via seccomp. We
don't really care about that here though!
The capability we want is network related and CAP_NET_ADMIN
gives us that!
Quick googling shows the answer : https://stackoverflow.com/questions/27708376/why-am-i-getting-an-rtnetlink-operation-not-permitted-when-using-pipework-with-d
Okay that's good and all. But I haven't done anything yet. I've just made a way
to tunnel traffic from a container. But it's a bit constraining because If I
install the software inside this same container, it gets tied down version wise
with wg-quick
, which is annoying. As a programmer, I like good abstractions
and since this software shouldn't care what public IP it has, this "setup part"
will be hidden from it.
abstract this "setup part" from the original software.
The Kubernetes' way
Remember I said "I use a Kubernetes distribution" ? Let's get on the same page terminology wise.
What are pods
K8s represents things as resources. The most basic, the Pod is one (or more) containers sharing namespaces. One interesting characteristic of this is that it runs all its containers in the same networking namespace. See where this is going ? One interesting characteristic of a pod is that it runs all its containers in one networking namespace. See where this is going ? :D
Let's add what we had before in the docker container in a pod!
We still have the problem of running the software separately with this unusual networking setup.
Turns out, pods have something called initContainers which are containers that run for the initialization part of a workload. Since the network namespace is shared, we can setup the interface, and then run our workload transparently!
Couple of knobs to tune for this to work :
- DNS setup for K8s. We get an error while setting the DNS since it's K8s provided and we're messing with the underlying network with (unfortunately) overlapping ranges, so let's hardcode to some public one (1.1.1.1) and be done with it. I don't use K8s' service discovery there anyway, but that's something to know.
Local networking access
Wait but everything isn't working properly yet!
We still need to access the web interface of that application. The container listens on port 80, but the traffic is forwarded via the default route to our upstream VPN provider. So trying to curl it just hangs forever.
So we need to add
ip rules
for local traffic. Since our upstream seems to use IPs in 10.0.0.0/8
and I'd prefer having a generic solution there and not hardcode the upstream IP
:), I added the following to the wireguard configuration. It adds a route for
10.0.0.0/8
specifically to override the "default" route to the local
interface instead of the wireguard one for local traffic. This makes the local
traffic able to go back to its origin.
PostUp = ip route add 10.0/8 dev eth0 table 51820
Here is the final manifest !
apiVersion: apps/v1
kind: Deployment
metadata:
name: something
spec:
selector:
matchLabels:
app: something
template:
metadata:
labels:
app: something
spec:
# We change this from the normal configuration which points to CoreDNS
# because we're using ~wg-quick~ which makes all traffic go to the ~wg-0~
# interface. Thus making the basic DNS inaccessible. Thus we need
# to provide an alternative, public DNS server.
#
# Note that this works because this container doesn't contact any other
# containers, so it's perfectly fine not to have internal DNS resolution.
#
# dnsPolicy None allows pods to ignore default K8s' resolv.conf.
dnsPolicy: "None"
dnsConfig:
# We could use DNScrypt / DoH / DoT here but well, we won't leak much
# information anyway.
nameservers:
# Cloudflare
- "1.1.1.1"
# Google
- "8.8.8.8"
# edns0 for 512 bytes+ UDP DNS requests.
options:
- name: edns0
value: ""
# The way this works is the following :
# Pods in K8s share the same network namespace (they are on the same node).
# So we can setup the network namespace to have a wireguard interface and
# add ip rules / iptables options to make everything go through it by
# default.
#
# Once setup is finished, we continue to the "normal containers".
#
# Cleanup is automatic when the container exits via the Linux Garbage
# Collector TM.
#
# The more intelligent way (forward) would be to have a way to patch
# containers on the fly to make them "remote" by default. We can do that
# by means of MutatingWebhook from K8s' dynamic admission controllers.
initContainers:
- name: wg-init
image: wg-init:some-tag
# We need to have CAP_NET_ADMIN capability here :
# - to create the ~wg-0~ network interface ;
# - to create ip rules/iptables to redirect all traffic but wireguard's
# one through the ~wg-0~.
securityContext:
capabilities:
add:
- NET_ADMIN
args:
- wg-quick
- up
- wg0
volumeMounts:
- name: wg-key
mountPath: /etc/wireguard
containers:
- name: some-container
image: docker.io/some-container:some-tag
# Web UI
ports:
- name: http
containerPort: 8080
volumeMounts:
- name: content
mountPath: /content
volumes:
# NB: we need to remove the `DNS =` from the configuration otherwise
# it tries calling resolvconf which doesn't work in a container.
# We patch the resolv.conf configuration from the PodSpec. Otherwise it
# would fail for all DNS request.
- name: wg-key
secret:
secretName: wg-key
defaultMode: 0420
- name: content
persistentVolumeClaim:
claimName: content
Going further
We can go further by creating a mutating webhook server in Kubernetes to provide
this automatically to containers with an annotation for instance.
This would use MutatingWebhookConfiguration
to auto-patch the pod(s) instead
of having a static configuration like this.
See this for an example of a patching server.
Probably something for yet another blog post!
On a pure docker setup, we can use something called OCI hooks which are executable launched at certain times like preStart of containers to do "things". I've used that in the past to move an interface from the host to a container.