ADR 0015: WireGuard Peer Enrollment for Hub/Spoke Bootstrap
Status
Proposed -- 2026-06-09.
Related issues: #377, #387.
Context
WireGuardInterface.spec.peersFrom can already derive WireGuard peers from a
shared SAMNodeSet. This removes most static peer duplication when every
router already has a trusted copy of the node registry.
That does not fully solve hub/spoke bootstrap. In a route-reflector or spine
deployment, leaf routers usually initiate WireGuard traffic toward fixed
RR/spine endpoints. The RR/spine side still needs each leaf's public key,
allowed IPs, and optional endpoint before the kernel will accept the peer.
Adding more pve-rt style leaves therefore still creates operational churn in
the RR/spine source of truth.
The first contact path cannot be the target WireGuard tunnel. WireGuard drops unknown peers before any application protocol can run, so enrollment must use a separate bootstrap transport such as a management address, an underlay listener, or another pre-established control channel.
Lab Evaluation -- 2026-06-09
The svnet1 hub/spoke lab was rewritten toward the current SAMNodeSet and
peersFrom config style for k8s-rt-01 and k8s-rt-02 as RR/spine routers.
The candidate rendered and passed config.Load / config.Validate against
current main.
The compact form removed the per-RR static WireGuardPeer resources and the
duplicated SAMTransportProfile.spec.peers / topologyNodeRefs lists:
- generated RR resources changed from 22 to 16;
- explicit
WireGuardPeerresources changed from 8 to 0; SAMTransportProfilechanged from 8 static peers plus 10 topology node refs to onepeersFrom: SAMNodeSet/svnet1-nodessource.
Line count did not drop in that lab snapshot because the same change also
introduced MobilityPool membership data in the RR config. The meaningful
complexity reduction was removal of duplicated peer declarations, not raw YAML
lines.
The deployed live image in the lab (routerd live v20260608.0642) rejected the
candidate during routerctl validate --replace with:
parse config candidate: unsupported resource kind SAMNodeSet in mobility.routerd.net/v1alpha1/SAMNodeSet/svnet1-nodes
Therefore the compact config style is valid on current main, but cannot be
rolled out to that deployed image until the runtime artifact includes
SAMNodeSet and WireGuardInterface.spec.peersFrom support.
This also confirms the remaining operational gap: with only SAMNodeSet, adding
or deleting a leaf still requires changing the RR/spine registry entry that
contains the leaf public key, transport endpoint, and allowed IPs. Enrollment is
still required before RR/spine configs can remain untouched during leaf
addition, removal, or key rotation.
Decision
Add an optional WireGuard peer enrollment flow for hub/spoke deployments.
An RR/spine router may expose an enrollment endpoint on an explicitly configured non-WireGuard listen address and port. A leaf submits its node identity and WireGuard peer material, and the RR/spine validates that request against local policy and the expected topology before the peer becomes active.
The enrollment record should include:
nodeRefand target WireGuard interface;- WireGuard
publicKey; - endpoint or listen port, when the leaf has a stable endpoint;
- requested
allowedIPsand/orsamEndpoint; - a nonce or generation value so retries are idempotent and stale writes are detectable.
Approved registrations are stored as dynamic config, not as ad-hoc runtime
state hidden from the config graph. The effective config path then turns those
records into ordinary WireGuard peer inputs, either as generated
WireGuardPeer resources or as entries consumable by the existing
WireGuardInterface.spec.peersFrom machinery. Static WireGuardPeer resources
continue to override generated peers by name so operators retain an emergency
override.
The leaf's static bootstrap config stays small: it needs its own private key, the RR/spine public key and fixed endpoint, and the enrollment credentials. The RR/spine owns approval and activation.
Validation and Security
Enrollment fails closed. A request is accepted only when all configured checks pass.
- The enrollment endpoint is disabled by default and bound only to configured addresses.
- The request is authenticated with an explicit mechanism such as a bearer token, mTLS client identity, or a signed registration payload.
- The requested
nodeRefis allowed by policy and, when configured, present in the expectedSAMNodeSet. - Requested
allowedIPsandsamEndpointmatch the node identity and do not collide with existing nodes. - Public keys are unique unless the same node is retrying the same generation.
- Re-registration, key rotation, rejection, revocation, and expiration are visible in audit/status output.
- Rate limiting protects the bootstrap endpoint from repeated invalid registrations.
routerctl should expose enrollment state as Pending, Approved,
Rejected, Revoked, or Expired, with the validation reason when a request
is not active.
Non-Goals
- Do not replace WireGuard cryptokey routing. The RR/spine still installs one kernel peer per approved leaf.
- Do not accept arbitrary public keys without an explicit policy decision.
- Do not run first-contact enrollment through the target WireGuard interface.
- Do not make
SAMNodeSetdistribution depend on a tunnel that itself requires the newly enrolled peer.
Implementation Plan
- Define the enrollment API resource shape, status model, and CLI/status output. Keep it separate from WireGuard runtime reconciliation.
- Add RR/spine enrollment storage as a dynamic config source with durable audit information and stale-entry cleanup.
- Add validation against policy and optional
SAMNodeSetmembership. - Feed approved registrations into the existing effective WireGuard peer generation path while preserving static peer override behavior.
- Add leaf-side submit/retry logic that is idempotent and safe to run at boot.
- Add revocation and key rotation flows.
Consequences
RR/spine configs stop growing by one hand-authored WireGuard peer block per leaf when the deployment has an approved enrollment policy. The kernel peer count and the need for identity validation remain, but the operator workflow shifts from editing peer material on every RR/spine to approving or pre-authorizing leaf registrations.
The feature also creates a clear bootstrap boundary: topology distribution can
continue to use SAMNodeSet and peersFrom, while first-contact trust is
handled by an explicit non-WireGuard enrollment surface.