NetBird client on MikroTik router

RouterOS is MikroTik's operating system that powers MikroTik's physical routers, switches and Cloud Hosted Routers (CHR).
Container is MikroTik's implementation of Linux containers, added in RouterOS v7.4 as an extra package, allowing users to run containerized environments within RouterOS.
In this guide we'll deploy NetBird client in a MikroTik container.

Use cases

Running NetBird on MikroTik router or CHR enables cost-effective remote access to RouterOS devices (and their networks) without the need for additional hardware. In some usecases this can greatly simplify the setup and eliminate the need for additional infrastructure.

Branch offices

Not all remote locations have a server room or a similar setup where they can just throw in an additional machine to run Netbird. Think of small shops or branch offices that barely have a network cabinet to fit a switch and a router.
Running Netbird directly on a router allows us to have remote access to perform basic network management and monitoring without having to maintain an additional machine for Netbird router, or even worse - using one of the business-critical Netbird clients as a router.
The idea is that all computers on the network would still run clients, and this container would only be used for infrastructure management, monitoring and maybe one or two small camera streams. Note that container routing in RouterOS is currently very CPU-bound and is likely not good enough for massive file transfers, database connectivity nor proper camera streaming.

Field routers

For companies with field teams operating in remote areas, the use of MikroTik routers is a game-changer. These routers ensure connectivity without the reliance on field infrastructure, a unique advantage that sets them apart. Think construction sites, pop-up events, field support teams for vehicles or industrial equipment, etc.
Team members would still run NetBird on computers and phones, but a separate IT or infra team needs to be able to remotely manage MikroTik devices to help with unpredicted issues in the field. For example, reconfigure the router to piggyback the entire network over the location's guest Wi-Fi or quickly switch between that and 4G or satellite backup, depending on the type of failure.
Traditionally, we would always have 4G in routers for minimal management connectivity and then run CHR in a cloud VM. Those routers would all start VPN tunnels to the cloud VM so the IT team can connect to the router if needed. On top of that, we would need an additional NetBird router in the cloud to enable remote access from NetBird to that cloud router and NAT to remote devices.
Running NetBird directly on field routers removes the need for a lot of complexity because there's no longer a need for CHR to serve as a VPN concentrator or a dedicated VM to route NetBird clients to MikroTiks.

Limitations

There are quite a few caveats to this approach because containers on RouterOS are still a relatively new feature, provide relatively slow throughput, and are CPU-bound. They are also very restrictive compared to standard Kubernetes or Docker platforms, so NetBird can't take advantage of kernel modules or netfilter rules.
Also, very few current MikroTik devices are optimized for running containers, so we should be careful when deploying this in production.

  • Routing through RouterOS containers is relatively slow, CPU intensive and may overload smaller devices.
  • NetBird in RouterOS containers can not use an exit node (because it uses legacy routing mode).
  • NetBird in RouterOS containers can't perform NAT, but it can do direct routing, and we can do NAT on RouterOS instead.

Tested on

  • Cloud Hosted Router (a.k.a CHR, x86) v7.15.3, v7.16b7
  • D53G-5HacD2HnD (a.k.a. Chateau, arm) v7.15.3, v7.16b7

Step-by-step guide

Prerequisites

  1. RouterOS v7.5 or newer on MikroTik router, physical machine, or CHR on a virtual machine
  2. Enabled container mode
  3. Installed RouterOS container package from extra packages
  4. Adequate storage, such as a good quality USB thumb drive or external SSD. We should not put a container filesystem or container pull caches in the router's built-in flash storage. Normal container use could wear out the built-in storage's write cycles or fill up the disk space, thus bricking or even destroying the router.
    If our device has plenty of RAM, we can use Tmpfs for the container filesystem and image cache, but that complicates the setup due to race conditions after reboot. Please check RouterOS documentation and the MikroTik forum if you want to go that route.

Prepare RouterOS for container networking

These actions can be performed on RouterOS either via SSH or in Terminal (via Winbox or Web interface), or using Winbox gui.
More information is available in MikroTik's management tools documentation.

Create a bridge interface for containers and VETH interface for NetBird container:

/interface/veth/add name=netbird address=172.17.0.2/24 gateway=172.17.0.1
/interface/bridge/add name=containers
/ip/address/add address=172.17.0.1/24 interface=containers
/interface/bridge/port add bridge=containers interface=netbird

Set up NAT for containers so they can access the internet and other networks:

/ip/firewall/nat/add chain=srcnat action=masquerade src-address=172.17.0.0/24

Because NetBird in RouterOS containers can't perform NAT, we'll want to add a route from MikroTik to our NetBird subnet via NetBird container. This assumes our NetBird subnet is 100.80.0.0/16.

/ip/route/add dst-address=100.80.0.0/16 gateway=172.17.0.2

We'll also want to add appropriate in, out, and forward rules, but those vary depending on the network setup, so we won't cover them in this guide.

We should also allow remote DNS queries from the container to the router's DNS server. Just also make sure that router's firewall rules are set to block external access to DNS ports and also allow access to DNS ports from containers. This is out of this guide's scope but it's important to mention because we'll be setting container's resolvers to router's IP addresses.

Enable container functionality logging in RouterOS and configure DockerHub registry cache on the external disk. This assumes that our USB drive is mounted as /usb1:

/system/logging add topics=container
/container/config/set registry-url=https://registry-1.docker.io tmpdir=/usb1/pull

Prepare the NetBird container

/container/mounts/add name=netbird_etc src=disk1/etc dst=/etc/netbird

Note that we placed /etc/netbird on router's built-in flash. This is because we don't want someone stealing the USB drive and getting access to router's private keys. This file doesn't really change all that often so it's ok to put it there.

/container envs
add key=NB_SETUP_KEY name=netbird value=YOUR_NETBIRD_SETUP_KEY
add key=NB_NAME name=netbird value=CONTAINER_HOSTNAME
add key=NB_HOSTNAME name=netbird value=CONTAINER_HOSTNAME
add key=NB_LOG_LEVEL name=netbird value=info
add key=NB_DISABLE_CUSTOM_ROUTING name=netbird value=true
add key=NB_USE_LEGACY_ROUTING name=netbird value=true

We had to set NB_DISABLE_CUSTOM_ROUTING and NB_USE_LEGACY_ROUTING because RouterOS containers don't allow access to netfilter kernel module. We also set NB_NAME and NB_HOSTNAME to match our router's identity as seen in /system/identity/print because RouterOS won't allow us to set container's hostname to the same value as router's hostname. If using a self-hosted NetBird server we'll also want to add the correct URLs to our server:

add key=NB_MANAGEMENT_URL name=netbird value=YOUR_NETBIRD_MANAGEMENT_URL
add key=NB_ADMIN_URL name=netbird value=YOUR_NETBIRD_ADMIN_URL

Create the container and trigger image pull from DockerHub:

/container/add remote-image=netbirdio/netbird interface=netbird root-dir=/usb1/netbird_filesystem mounts=netbird_etc envlist=netbird dns=10.71.71.1 hostname=netbird logging=yes                                                                 

Note that we had to set container's hostname to something other than router's identity because RouterOS doesn't allow hostname conflicts.
We have also set container's DNS resolver to router's DNS server. Feel free to tweak this if needed.

Our container is now ready and container image pull from DockerHub should have been triggered. We can check the RouterOS logs to see if the pull was successful and we should see that RouterOS created our image cache directory in /usb1/pull.

Start the container

We can verify that the container is created by running

/container print

And finally we can start it using the appropriate number from the print command:

/container start number=0

At this point we should see the container in our NetBird dashboard and we should be able to create routes through it in NetBird. So, hop on to NetBird dashboard and create a route through the container to the router's bridge IP address. Address will be 172.17.0.1/32 and routing peer will be our container. Don't forget to disable NAT on this route.

Troubleshooting

  1. Increase NetBird's verbosity by setting NB_LOG_LEVEL env var to trace.\
  2. Check the logs to see what's going on:
    /log/print without-paging  where topics~"container"
    
  3. In firewall rules, enable logging for any dropp/reject rules to see if packets are being dropped.\

Get a shell in the container

Assuming that our container keeps stopping because NetBird is crashing, we can override the container entrypoint to get a shell in the container and investigate. Setting the entrypoint to 600 gets us 10 minutes to investigate before the container stops.

/container/set entrypoint="sleep 600" numbers=0
/container/shell number=0

When done revert the entrypoint back to NetBird:

/container/set entrypoint="" numbers=0

NetBird starts and logs into management server but it doesn’t show up as online

Log shows something like this:

DEBG client/internal/login.go:93: connecting to the Management service https://api.netbird.io:443
DEBG client/internal/login.go:63: connected to the Management service https://api.netbird.io:443
DEBG client/internal/login.go:93: connecting to the Management service https://api.netbird.io:443
DEBG client/internal/login.go:63: connected to the Management service https://api.netbird.io:443
INFO client/internal/connect.go:119: starting NetBird client version 0.28.6 on linux/amd64
DEBG client/internal/connect.go:180: connecting to the Management service api.netbird.io:443
DEBG client/internal/connect.go:188: connected to the Management service api.netbird.io:443
DEBG signal/client/grpc.go:81: connected to Signal Service: signal.netbird.io:443
INFO iface/tun_usp_unix.go:33: using userspace bind mode
DEBG client/internal/routemanager/sysctl/sysctl_linux.go:86: Set sysctl net.ipv4.conf.all.src_valid_mark from 0 to 1
ERROR client/internal/routemanager/systemops/systemops_linux.go:100: Error setting up sysctl: 1 errors occurred:
t* read sysctl net.ipv4.conf.eth0.rp_filter: open /proc/sys/net/ipv4/conf/eth0/rp_filter: no such file or directory
INFO client/internal/routemanager/manager.go:135: Routing setup complete
INFO iface/tun_usp_unix.go:48: create tun interface
DEBG iface/tun_link_linux.go:113: adding address 100.80.100.176/16 to interface: wt0
DEBG iface/wg_configurer_usp.go:39: adding Wireguard private key
INFO client/firewall/create_linux.go:58: no firewall manager found, trying to use userspace packet filtering firewall
DEBG iface/tun_usp_unix.go:95: device is ready to use: wt0
INFO client/internal/dns/host_unix.go:68: System DNS manager discovered: file
DEBG signal/client/grpc.go:126: signal connection state READY
WARN signal/client/grpc.go:141: disconnected from the Signal Exchange due to an error: didn't receive a registration header from the Signal server while connecting to the streams
DEBG signal/client/grpc.go:126: signal connection state IDLE
ERROR util/grpc/dialer.go:38: Failed to dial: dial: dial tcp: lookup signal.netbird.io on 172.17.0.1:53: read udp 172.17.0.2:34638->172.17.0.1:53: i/o timeout

Solution: double-check environment variables:

NB_DISABLE_CUSTOM_ROUTING=true
NB_USE_LEGACY_ROUTING=true

[Environment]::SetEnvironmentVariable("NB_DISABLE_CUSTOM_ROUTING", "true", "Machine") [Environment]::SetEnvironmentVariable("NB_USE_LEGACY_ROUTING", "true", "Machine")