Setting up Site-to-Site/VPN access over NetBird

This page explains the characteristics of Site-to-Site, Site-to-VPN, and VPN-to-Site setups and how to configure them with NetBird. We'll start by defining key terminology and reviewing the available setup options before walking through concrete implementation examples.

Overview

A Site in the context of this guide refers to any single network or subnet that is typically not exposed to the Internet nor directly accessible from other Sites.

Examples include:

  • Home or office networks
  • Internal networks at cloud providers or datacenters
  • Restricted VLANs
  • Internal container or VM networking ranges
  • Other VPN networking ranges
  • Another NetBird organization's resource ranges

A device in this guide refers to any physical computing device (PC, laptop, phone, datacenter server, etc.) or virtual computing device (VM, container, load balancer, etc.). A device can be either clientless or a Peer.

Clientless devices are devices that don't run a NetBird client themselves (they are not Peers).

A Peer is a device running the NetBird client directly on it:

  • A laptop running the NetBird client directly on the system is a Peer
  • A laptop running the NetBird client in a container using default (internal) networking mode is not a Peer - the container itself is the Peer in this case
  • A laptop running the NetBird client in a container using host-networking mode could be considered a Peer

Other bold terms refer to NetBird-specific features or configuration options: Network Route, Network, Resource, Access Control Policy, ACL Group.

Non-bold terms refer to context-specific concepts:

  • route: a generic term for an operating system network route
  • resource: a generic term for software or a machine listening on a specific IP address and port

Site-to-Site

A Site-to-Site setup enables clientless devices from two or more Sites to reach each other. Each Site requires at least one Peer to route traffic over the VPN, but other on-site devices don't need to run (or even be aware of) the VPN software.

The clientless devices must be configured to route the remote Site's IP address range through the local Peer. You can configure this manually using commands and persist it with your operating system's native tools, or automate it using DHCP route advertisements or device management software.

The remote Peer must also know how to route responses back to the local Site. Typically, you'll need to set up a pair of routes to enable site-to-site access:

  1. A route from the local Peer to the remote Site for outbound traffic
  2. A reverse route from the remote Site to the local Peer for return traffic

Site-to-VPN

A Site-to-VPN setup enables a clientless device to reach Peers in the VPN network.

You can think of this as the 'local half' of a Site-to-Site setup. The clientless devices need to be configured to reach the VPN network, but typically no additional setup is required to route responses back from the VPN.

VPN-to-Site

A VPN-to-Site setup enables a Peer to reach clientless devices on a network external to the VPN itself. This is the default mode of operation for most VPNs, but we're including it here for completeness. In NetBird, this scenario is achieved using Networks or the older Network Routes feature.

NetBird implementations overview

While NetBird doesn't yet have explicit support for Site-to-VPN or Site-to-Site scenarios, you can achieve them using one of the following approaches, depending on your requirements:

  1. Using a Network Route for each Site with Masquerade (with or without ACL Groups)
  2. Using a Network Route for each Site without Masquerade (without ACL Groups)
  3. Using a Network Resource with Masquerade

All of these options require the following:

  • You must manually configure clientless devices to route traffic appropriately
    • The easiest method is using device management software or DHCP route advertisements from your local router
  • You can only have one routing Peer per Site
    • Routing traffic correctly and reliably through multiple routing devices is extremely complex (if not impossible), so multi-peer routing is out of scope for this guide

You'll need to consider one of two primary tradeoffs:

  1. Forfeit source IP information to preserve basic access control - Use Masquerade to maintain a basic form of access control
    • Masqueraded traffic can only be controlled by Access Control Policies attached to the Routing Peer, with no way to restrict access for specific clientless devices
    • You can still create coarse-grained access controls by setting up multiple Network Routing Peers for different purposes
  2. Forfeit access control to preserve source IP information - Skip Masquerade to keep the original source IP addresses (this allows any traffic through)
    • This approach may be required for specific networking setups

Prerequisites and initial assumptions

For this guide, we'll use four libvirt Ubuntu virtual machines, split into two separate Sites (networks) as follows:

root@vms ~# virsh net-dhcp-leases local-site
... IP address           Hostname    ...
... ---------------------------------...
... 192.168.122.144/24   local-nb-01 ...
... 192.168.122.65/24    local-01    ...

root@vms ~# virsh net-dhcp-leases remote-site
... IP address           Hostname      ...
...------------------------------------...
... 192.168.100.189/24   remote-nb-01  ...
... 192.168.100.215/24   remote-01     ...

All VMs can be reached from the host vms:

kdn@pc ~> ssh vms.lan ping -c1 192.168.100.189
PING 192.168.100.189 (192.168.100.189) 56(84) bytes of data.
64 bytes from 192.168.100.189: icmp_seq=1 ttl=64 time=0.154 ms

--- 192.168.100.189 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.154/0.154/0.154/0.000 ms
kdn@pc ~> ssh vms.lan ping -c1 192.168.122.144
PING 192.168.122.144 (192.168.122.144) 56(84) bytes of data.
64 bytes from 192.168.122.144: icmp_seq=1 ttl=64 time=0.162 ms

--- 192.168.122.144 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.162/0.162/0.162/0.000 ms

Both Sites can reach the Internet, and devices within each site can communicate with each other, but they cannot directly reach devices on the other Site:

kdn@pc ~> ssh 192.168.100.189 -J vms.lan ping -c1 192.168.122.144
PING 192.168.122.144 (192.168.122.144) 56(84) bytes of data.
From 192.168.122.1 icmp_seq=1 Destination Port Unreachable

--- 192.168.122.144 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

kdn@pc ~ [1]>

The local-site network VMs are also directly attached to my LAN. The remote-site VMs are only attached to their network, so we'll reference them by IP address and use vms as an SSH jump host.

The devices local-01 and remote-01 are clientless for the purposes of this guide. Additionally, remote-01 runs CoreDNS, which responds with OK to http://192.168.100.10/health.

The Peers are configured as follows:

dns_labelnetbird_ipgroups
local-nb-01100.83.73.97s2s: local peers
remote-nb-01100.83.136.209s2s: remote peers

We'll grant access between:

  • local-01 running on the local-site through Routing Peer local-nb-01 using Group s2s: local peers
  • remote-01 running on the remote-site through Routing Peer remote-nb-01 using Group s2s: remote peers

Site-to-Site with Masquerade

The Masquerade option means that packets forwarded by a Routing Peer will:

  • Have their source IP address replaced with the Routing Peer's NetBird IP address when leaving the Site
  • Be translated back from the Routing Peer's IP address to the local Site's IP address when returning

This is currently the easiest way to configure routing because it uses existing Peer forwarding and policy enforcement facilities.

The main downsides of this approach are:

  • Loss of source IP addressing information, which may be required for auditing purposes
  • Very coarse-grained access control limited to the Routing Peer's permissions

Site-to-Site using Network Routes with Masquerade and without Access Control

In this section, we'll set up Site-to-Site Network Routes with Masquerade but without access control. We'll start by configuring the required Network Routes and Access Control Policies, then manually configure a clientless device to route traffic through the local Routing Peer. Finally, we'll verify that everything works as expected.

Setting up a simple VPN-to-Site access

routes-noacl-vpn-to-site

and an Access Control Policy that establishes connectivity between the (future) Routing Peers:

acl-ping-to-local-only

We can verify that the local Peer can reach the remote-site using both ping and curl:

kdn@pc ~> ssh local-nb-01.lan "netbird networks ls"
Available Networks:

  - ID: network-route-srvs-site
    Network: 192.168.122.0/24
    Status: Selected
    
kdn@pc ~> ssh local-nb-01.lan "ping -c1 192.168.100.10"
PING 192.168.100.10 (192.168.100.10) 56(84) bytes of data.
64 bytes from 192.168.100.10: icmp_seq=1 ttl=63 time=0.475 ms

--- 192.168.100.10 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.475/0.475/0.475/0.000 ms

kdn@pc ~> ssh local-nb-01.lan "curl 192.168.100.10/health; echo"
OK

Setting up the clientless device

Now we'll manually configure the clientless local-01 device to route traffic to the remote-site through local-nb-01's local IP address 192.168.122.144:

kdn@pc ~> ssh local-01.lan "ip route | grep 192.168.100"
kdn@pc ~ [1]> ssh local-01.lan "sudo ip route add 192.168.100.0/24 via 192.168.122.144"
kdn@pc ~> ssh local-01.lan "ip route | grep 192.168.100"
192.168.100.0/24 via 192.168.122.144 dev enp7s0 

This won't work yet from a clientless device because we're missing the other half of the connection needed to route responses back:

kdn@pc ~> ssh local-01.lan "ping -c1 192.168.100.10"
PING 192.168.100.10 (192.168.100.10) 56(84) bytes of data.

--- 192.168.100.10 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

kdn@pc ~ [1]>

Closing the loop with a reverse Network Route

Now we can complete the setup by enabling the reverse Network Route (from remote-site to local-site):

routes-noacl-site-to-site

Let's verify it's working for both ICMP and HTTP:

kdn@pc ~> ssh local-01.lan "ping -c1 192.168.100.10"
PING 192.168.100.10 (192.168.100.10) 56(84) bytes of data.
64 bytes from 192.168.100.10: icmp_seq=1 ttl=62 time=0.867 ms

--- 192.168.100.10 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.867/0.867/0.867/0.000 ms
kdn@pc ~> ssh local-01.lan "curl 192.168.100.10/health; echo"
OK

Confirming remote Site access to the local Site

Let's fetch the local-01 IP address, perform the reverse setup on remote-01, and test access back from the remote-site:

kdn@pc ~> ssh local-01.lan "ip a | grep 192.168.122"
    inet 192.168.122.65/24 metric 100 brd 192.168.122.255 scope global dynamic enp7s0
kdn@pc ~> ssh 192.168.100.215 -J vms.lan "ip route | grep 192.168.122"
kdn@pc ~ [1]>  ssh 192.168.100.215 -J vms.lan "sudo ip route add 192.168.122.0/24 via 192.168.100.189"
kdn@pc ~> ssh 192.168.100.215 -J vms.lan "ip route | grep 192.168.122"
192.168.122.0/24 via 192.168.100.189 dev enp7s0
kdn@pc ~> ssh 192.168.100.189 -J vms.lan "ping -c1 192.168.122.65"
PING 192.168.122.65 (192.168.122.65) 56(84) bytes of data.
64 bytes from 192.168.122.65: icmp_seq=1 ttl=63 time=0.523 ms

--- 192.168.122.65 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.523/0.523/0.523/0.000 ms
kdn@pc ~>

Site-to-Site using Network Routes with Masquerade and Access Control

We'll start by picking up where we left off in the previous example: Site-to-Site using Network Routes with Masquerade and without Access Control. Now we can restrict access to the remote Site's resources to ICMP only and verify the restrictions are enforced. We'll set up and verify unidirectional access first, then enable bidirectional access.

First, let's add dedicated * resources Access Control Groups to the Network Routes:

routes-with-acl-site-to-site

Note that we're using a different Group to grant access to the Network Route than the one used for Routing Peers. Using the Routing Peer's Group in ACL Groups would also work and be slightly simpler to manage.

Next, let's set up Access Control Policies for one-way access from local-site to remote-site:

acl-unidirectional-site-to-site

Now we can verify that ping (ICMP) is allowed while curl (HTTP) is blocked in the local-to-remote direction:

kdn@pc ~> ssh local-01.lan "ip route | grep 192.168.100"
192.168.100.0/24 via 192.168.122.144 dev enp7s0
kdn@pc ~> ssh local-01.lan "ping -c1 192.168.100.10"
PING 192.168.100.10 (192.168.100.10) 56(84) bytes of data.
64 bytes from 192.168.100.10: icmp_seq=1 ttl=62 time=0.738 ms

--- 192.168.100.10 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.738/0.738/0.738/0.000 ms
kdn@pc ~> ssh local-01.lan "curl -sv -m 2 192.168.100.10/health; echo"
*   Trying 192.168.100.10:80...

* Connection timed out after 2002 milliseconds
* closing connection #0

Let's also verify that reverse access (from remote-site to local-site) isn't possible yet:

kdn@pc ~> ssh 192.168.100.215 -J vms.lan "ip route | grep 192.168.122"
192.168.122.0/24 via 192.168.100.189 dev enp7s0
kdn@pc ~> ssh 192.168.100.215 -J vms.lan "ping -c1 192.168.122.65"
PING 192.168.122.65 (192.168.122.65) 56(84) bytes of data.

--- 192.168.122.65 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

kdn@pc ~ [1]>

Finally, let's enable the s2s: ping to local resources Access Control Policy:

acl-bidirectional-site-to-site-minus-routing-peer

Now let's verify that remote-to-local access is working:

kdn@pc ~> ssh 192.168.100.215 -J vms.lan "ip route | grep 192.168.122"
192.168.122.0/24 via 192.168.100.189 dev enp7s0
kdn@pc ~> ssh 192.168.100.215 -J vms.lan "ping -c1 192.168.122.65"
PING 192.168.122.65 (192.168.122.65) 56(84) bytes of data.
64 bytes from 192.168.122.65: icmp_seq=1 ttl=62 time=0.755 ms

--- 192.168.122.65 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.755/0.755/0.755/0.000 ms
kdn@pc ~>

Site-to-Site using Network Resources with Masquerade

In this section, we'll replicate the previous Site-to-Site using Network Routes with Masquerade and Access Control configuration using Network Resources and verify that it works. We'll start by setting up a Network for each Site, enable the minimum set of Access Control Policies required (which already exist), and finally verify that access control is working as expected.

Let's start by creating two new Networks, one for each Site:

network-local-noacl

network-remote-noacl

and enable the two required Access Control Policies:

acl-networks-bidirectional

Let's verify it's working:

kdn@pc ~> ssh local-01.lan "ip route | grep 192.168.100"
192.168.100.0/24 via 192.168.122.144 dev enp7s0
kdn@pc ~> ssh local-01.lan "ping -c1 192.168.100.10"
PING 192.168.100.10 (192.168.100.10) 56(84) bytes of data.
64 bytes from 192.168.100.10: icmp_seq=1 ttl=62 time=0.783 ms

--- 192.168.100.10 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.783/0.783/0.783/0.000 ms
kdn@pc ~> ssh 192.168.100.215 -J vms.lan "ip route | grep 192.168.122"
192.168.122.0/24 via 192.168.100.189 dev enp7s0
kdn@pc ~> ssh 192.168.100.215 -J vms.lan "ping -c1 192.168.122.65"
PING 192.168.122.65 (192.168.122.65) 56(84) bytes of data.
64 bytes from 192.168.122.65: icmp_seq=1 ttl=62 time=0.925 ms

--- 192.168.122.65 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.925/0.925/0.925/0.000 ms

Let's also verify that no additional traffic is allowed:

kdn@pc ~> ssh local-01.lan "curl -m 2 192.168.100.10/health; echo"
curl: (28) Connection timed out after 2002 milliseconds

Asymmetric Network Resource policies

The reverse Access Control Policy doesn't need to match the protocol and access level of the forward policy. Established connections will be routed back correctly as long as the reverse (operating system) route is registered on the remote end.

acl-networks-bidirectional

Now we can verify that local-site can reach remote-site only over ICMP:

kdn@pc ~> ssh local-01.lan "ping -c1 192.168.100.10"
PING 192.168.100.10 (192.168.100.10) 56(84) bytes of data.
64 bytes from 192.168.100.10: icmp_seq=1 ttl=62 time=0.836 ms

--- 192.168.100.10 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.836/0.836/0.836/0.000 ms
kdn@pc ~ [1]> ssh local-01.lan "nc -v -w 2 192.168.100.10 22"
nc: connect to 192.168.100.10 port 22 (tcp) failed: Connection timed out
kdn@pc ~ [1]>

while remote-site can only reach local-site over SSH:

kdn@pc ~> ssh 192.168.100.215 -J vms.lan "ip route | grep 192.168.122"
192.168.122.0/24 via 192.168.100.189 dev enp7s0
kdn@pc ~> ssh 192.168.100.215 -J vms.lan "ping -c1 192.168.122.65"
PING 192.168.122.65 (192.168.122.65) 56(84) bytes of data.

--- 192.168.122.65 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

kdn@pc ~ [1]> ssh 192.168.100.215 -J vms.lan "nc -w 2 192.168.122.65 22"
SSH-2.0-OpenSSH_9.7p1 Ubuntu-7ubuntu4.3
kdn@pc ~>

Site-to-Site without Masquerade

This approach preserves source IP addressing information, but the traffic will be immediately rejected by the remote Routing Peer if you try to enable any Access Control Policies (such as Network Resources or ACL Groups on Network Routes).

This happens because all access control in NetBird is currently based on Peer IP addresses. Packets arriving from different address spaces (without Masquerade) are unknown to the NetBird policy engine and are therefore immediately rejected by the receiving Peer/Routing Peer.

Site-to-Site using Network Routes without Masquerade

Simply disable Masquerade on each Network Route from the first example.

To summarize, you'll need:

  • A pair of local and remote Network Routes
  • An Access Control Policy to establish connectivity between Routing Peers
  • Manual route configuration on clientless devices pointing to the respective Routing Peers

The Network Routes list will look just like above:

routes-noacl-site-to-site

but you'll need to turn off Masquerade in each Network Route's update dialog:

route-without-masquerading

Only one Access Control Policy is required, just like above:

acl-ping-to-local-only

With these two pieces of configuration in place, we can verify that ping works:

kdn@pc ~> ssh local-01.lan "ip route | grep 192.168.100"
192.168.100.0/24 via 192.168.122.144 dev enp7s0 
kdn@pc ~> ssh local-01.lan "ping -c1 192.168.100.10"
PING 192.168.100.10 (192.168.100.10) 56(84) bytes of data.
64 bytes from 192.168.100.10: icmp_seq=1 ttl=62 time=0.897 ms

--- 192.168.100.10 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.897/0.897/0.897/0.000 ms

and that packets arrive unmodified on the remote end:

kdn@pc ~> ssh 192.168.100.10 -J vms.lan "sudo tcpdump -nvv -i any --immediate-mode -l icmp"
tcpdump: WARNING: any: That device doesn't support promiscuous mode
(Promiscuous mode not supported on the "any" device)
tcpdump: data link type LINUX_SLL2
tcpdump: listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
17:32:17.845428 enp7s0 In  IP (tos 0x0, ttl 62, id 56506, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.122.65 > 192.168.100.10: ICMP echo request, id 4480, seq 1, length 64
17:32:17.845468 enp7s0 Out IP (tos 0x0, ttl 64, id 51781, offset 0, flags [none], proto ICMP (1), length 84)
    192.168.100.10 > 192.168.122.65: ICMP echo reply, id 4480, seq 1, length 64
^C⏎