Reconciling pushed routes to desired state: a RouteManager that converges

4/8/2026Cloud & DevOps•OpenVPN Control Plane•345 views•13 min read•by Kuray Karaaslan

Reconciling pushed routes to desired state: a RouteManager that converges

Pushing a route once assumes nothing ever drifts. A reconcile loop that diffs desired against actual and corrects the delta is the difference between "I set it" and "it is set".

The decision this framework is for

You are running a VPN control plane that owns per-client kernel routing. A client connects, your connect handler runs ip route replace 10.8.0.5/32 dev tun1, and you move on. The problem is everything that happens after that one call. The interface was not up yet at boot. A client moved from the roaming tunnel instance to the site-to-site one and the old /32 still points at tun0. A process restarted and lost track of what it had installed. None of these throw at the moment they matter — packets just exit the wrong tunnel and drop silently.

The decision is whether per-client routes are fire-and-forget or continuously reconciled. This post is about the second option: a control loop that treats the desired route set as a target, reads the actual kernel state, and corrects the difference on every pass. It is the same shape Kubernetes uses for pods — declare what should exist, let a controller drive reality toward it — applied to the Linux routing table. The implementation lives in src/route/reconcile.ts and is pinned by reconcile.test.ts in the same vpn-control-plane repo, so every claim here maps to code you can run.

This is not about how the desired route set is computed. That set comes from the live lease store and is its own concern. Here, desired state is just the input.

The framework

The loop has four moves, and they always run in the same order:

Read actual. Enumerate the routes you currently own — not every route on the box, only the ones in your scope.
Diff against desired. For each desired route, decide: missing, wrong, or already correct.
Correct the delta. Install the missing, rewrite the wrong, remove the orphaned. Skip anything already correct.
Count and report. Return a typed summary of what changed so the caller can feed metrics and logs.

The whole thing is one async function, reconcileRoutes, and it fits on a screen:

export async function reconcileRoutes(
  rm: RouteManager,
  desired: DesiredRoute[],
  scope: ReconcileScope,
): Promise<ReconcileResult> {
  const result: ReconcileResult = { desired: desired.length, installed: 0, fixed: 0, removed: 0, failed: 0 };
  if (!rm.isEnabled) return result;
  const current = await rm.listManaged(scope);
  const desiredMap = new Map<string, string>();
  for (const d of desired) desiredMap.set(d.ip, d.iface);
  for (const [ip, iface] of desiredMap) {
    const observed = current.get(ip);
    if (observed === iface) continue;
    const ok = await rm.add(ip, iface);
    if (!ok) { result.failed++; continue; }
    if (observed === undefined) result.installed++;
    else result.fixed++;
  }
  for (const [ip] of current) {
    if (desiredMap.has(ip)) continue;
    await rm.remove(ip);
    result.removed++;
  }
  return result;
}

Read actual (listManaged), build the desired index (desiredMap), correct forward (the first loop installs and fixes), correct backward (the second loop removes orphans), return the tally. There is no internal state between passes — the loop derives everything it needs from the two inputs and the kernel. That property is what makes it safe to run on a timer.

Each step with one paragraph of explanation

Read actual, scoped. The reconciler never asks "what routes exist?" It asks "what routes that I manage exist?" listManaged enumerates the kernel table with ip -4 -j route show, parses the JSON, and keeps only /32 host routes whose destination falls inside the configured pool subnet. Broad kernel-installed /17 routes and anything outside the pool are filtered out before the diff ever sees them. This is the single most important step for safety — it bounds the blast radius. The reconciler can only ever touch routes it would itself create.

Diff with a map, not a list. Desired routes become a Map<ip, iface> so the comparison is a constant-time lookup per route. For each desired entry, current.get(ip) returns one of three states: the same iface (already correct, skip), a different iface (wrong, rewrite), or undefined (missing, install). The diff is not a separate phase that builds a changeset — it is interleaved with correction, because the correction is idempotent and cheap to call.

Correct idempotently. Every mutation goes through RouteManager.add(ip, iface), which runs ip route replace. replace installs if absent and overwrites if present, so the same call handles both "missing" and "wrong iface" without branching. The function then verifies the result with ip route get and returns false if the kernel did not end up routing that IP out the expected device. A failed correction does not throw — it increments a counter and the loop moves on. Removal is the mirror: anything in current but not in desiredMap gets ip route del, and a missing route on delete is not an error.

Count and report. The return value is a ReconcileResult — five integers: desired, installed, fixed, removed, failed. The shape is defined in types.ts as a Zod schema, so it is validated, not just typed. A clean pass returns { desired: N, installed: 0, fixed: 0, removed: 0, failed: 0 }. The caller turns those numbers into Prometheus counters and a log line; the loop itself stays free of side effects beyond the kernel.

Walk the framework through a real artifact in the target repo

The scope filter is where most naive reconcilers get this wrong, so it is worth reading in full. listManaged is the "read actual" step:

async listManaged(scope: ReconcileScope): Promise<Map<string, string>> {
  const out = new Map<string, string>();
  if (!this.cfg.enabled) return out;
  let raw: string;
  try { raw = await this.run(['-4', '-j', 'route', 'show']); }
  catch (err) { log.error({ err }, 'route: list failed'); return out; }
  let rows: KernelRoute[];
  try { rows = JSON.parse(raw) as KernelRoute[]; }
  catch (err) { log.error({ err }, 'route: parse failed'); return out; }
  const lo = ipToInt(scope.subnet);
  const maskInt = ipToInt(scope.netmask);
  const hi = (lo | ((~maskInt) >>> 0)) >>> 0;
  for (const r of rows) {
    if (!r.dst || !r.dev) continue;
    if (r.protocol === 'kernel') continue;
    if (r.dst.includes('/')) continue;
    if ((r.dst.match(/\./g) ?? []).length !== 3) continue;
    let n: number;
    try { n = ipToInt(r.dst); } catch { continue; }
    if (n < lo || n > hi) continue;
    out.set(r.dst, r.dev);
  }
  return out;
}

Four filters run before a route is considered managed: it must have a destination and device, it must not be a kernel-installed route (protocol === 'kernel'), it must be a bare /32 (no prefix slash), and its integer must fall inside [subnet, subnet | ~netmask]. The eth0 host route, the broad /17, the loopback — all excluded. The reconciler's removal loop can only delete what survives this filter, which is the guarantee that an empty desired set wipes your pool's /32s and nothing else.

The correction step is add, the "correct idempotently" move:

async add(ip: string, iface: string): Promise<boolean> {
  if (!this.cfg.enabled) return true;
  try {
    await this.run(['route', 'replace', `${ip}/32`, 'dev', iface]);
  } catch (err) {
    log.error({ err, ip, iface }, 'route: /32 install failed');
    return false;
  }
  const observed = await this.verify(ip);
  if (observed !== iface) {
    log.error({ ip, expected: iface, observed }, 'route: /32 install verified wrong iface');
    return false;
  }
  return true;
}

The route replace is the idempotent primitive — calling it on an already-correct route is a no-op against reality, and calling it on a wrong one rewrites it. The post-write verify is the part most people skip: it runs ip route get <ip>, which returns the kernel's actual forwarding decision (most-specific prefix), not a textual scan of the table. If a broader route is shadowing the /32 you just installed, verify catches it and the function returns false. That is the difference between "I issued the command" and "the kernel agrees".

The whole loop is wired into a closure, makeRouteReconciler, which is the boundary between the pure algorithm and the live system:

export function makeRouteReconciler(deps: {
  routes: RouteManager;
  leases: LeaseStore;
  instanceToDev: (inst: string | undefined) => string;
}) {
  return async () => {
    const liveLeases = await deps.leases.listAll();
    const desired = liveLeases
      .filter((l) => Boolean(l.assignedIp))
      .map((l) => ({ ip: l.assignedIp, iface: deps.instanceToDev(l.instance) }));
    metrics.routeManaged.set(desired.length);
    const scope = { subnet: config.pool.subnet, netmask: config.pool.netmask };
    try {
      const r = await reconcileRoutes(deps.routes, desired, scope);
      if (r.installed) metrics.routeReconcile.inc({ action: 'installed' }, r.installed);
      if (r.fixed)     metrics.routeReconcile.inc({ action: 'fixed' },     r.fixed);
      if (r.removed)   metrics.routeReconcile.inc({ action: 'removed' },   r.removed);
      if (r.failed)    metrics.routeReconcile.inc({ action: 'failed' },    r.failed);
      return r;
    } catch (err) {
      log.error({ err }, 'route: reconcile crashed');
      metrics.routeReconcile.inc({ action: 'crashed' });
      return null;
    }
  };
}

Desired state is computed here, fresh, from the live lease set — never cached, never tracked across calls. That is why the loop survives restarts: the source of truth is the lease store and the kernel, both external. The closure also owns the only try/catch wide enough to keep a crashed pass from killing the timer that drives it.

The reason any of this is trustworthy is reconcile.test.ts, which pins each branch against a fake ip binary. The fake, fakeIp.ts, is a Node shim that impersonates iproute2 by reading and writing a JSON state file — it seeds the kernel table, records which commands ran, and can force specific args to exit non-zero. The convergence scenario reproduces the production bug directly:

test("reconcile(): scenario — A on tun0, C+D on tun1, all in the SAME /17 numerically", async () => {
  seed(ws.statePath, {
    routes: [
      { dst: '10.8.0.0/17',   dev: 'tun0', protocol: 'kernel', scope: 'link' },
      { dst: '10.8.128.0/17', dev: 'tun1', protocol: 'kernel', scope: 'link' },
      { dst: '10.8.0.2', dev: 'tun0', scope: 'link' },
      { dst: '10.8.0.3', dev: 'tun0', scope: 'link' },
      { dst: '10.8.0.5', dev: 'tun0', scope: 'link' }, // C: stale, should be tun1
    ],
  });
  const r = await reconcileRoutes(rm, [
    { ip: '10.8.0.2', iface: 'tun0' }, // A — already correct
    { ip: '10.8.0.3', iface: 'tun0' }, // B — already correct
    { ip: '10.8.0.5', iface: 'tun1' }, // C — wrong iface
    { ip: '10.8.0.8', iface: 'tun1' }, // D — missing
  ], POOL);
  assert.equal(r.installed, 1, 'D had no /32');
  assert.equal(r.fixed, 1, 'C was pinned to wrong tun');
  assert.equal(r.removed, 0);
  assert.equal(r.failed, 0);
});

One pass: A and B are skipped, C is rewritten, D is installed, nothing removed. The assertions on installed and fixed are what stop a refactor from quietly collapsing the install/fix distinction or double-counting. There is also a no-op test that asserts zero route replace and zero route del calls were issued when the kernel already matched — that is the test that pins idempotence, not just correctness, and it is the one I would refuse to delete.

Where the framework fails

This shape is wrong when the underlying primitive is not idempotent. The whole loop leans on ip route replace behaving the same whether the route exists or not. The moment your correction step has different semantics for create versus update, you are back to branching on observed state, and the elegance evaporates. Reconcile is also a poor fit for high-frequency churn — it runs on a timer, so there is a worst-case window between a client connecting and the next pass catching a missed add. The connect handler still calls add directly for that reason; reconcile is the second line of defense, not the only one. If you need sub-second convergence you want an event stream, not a polling loop.

It also does nothing for routes outside its scope by design, which cuts both ways. A misconfigured instanceToDev mapping that produces a valid-but-wrong iface will be faithfully and repeatedly installed — the loop converges to a wrong desired state just as happily as a right one. Reconcile guarantees actual matches desired; it cannot tell you desired is correct. And the verify step adds a second ip invocation per write, which is fine for a few hundred /32s and would not be fine for tens of thousands.

Trade-off

The cost you accept is throughput for certainty. Every correction pays for a route replace plus a route get verify, and the loop walks the full managed set on every tick even when nothing changed. In exchange you get a system where "the route is set" is a property the code re-establishes continuously, not a hope pinned on one call succeeding at one moment. For a control plane where a dropped /32 means a client silently loses connectivity, that is the right side of the trade. For a hot path doing thousands of mutations a second, it is not.

Business impact

Silent routing failures are the worst kind of incident: nothing alerts, the client just reports "it stops working sometimes", and you spend an afternoon in ip route output trying to reproduce a race. A reconcile loop converts that class of bug into a self-healing event with a metric attached — routeReconcile increments on fixed and failed, so drift shows up on a dashboard instead of in a support ticket. For the operator, it removes the standing instruction to "SSH in and run ip route replace if a client can't connect". The application becomes the sole owner of cross-instance routing, which is one fewer runbook and one fewer 11pm escalation.

What to do next

Open whatever component in your own system pushes state to an external surface — DNS records, firewall rules, kernel routes, S3 bucket policies — and ask one question: if that push silently failed or drifted, what re-establishes it? If the answer is "a human notices", you have a fire-and-forget that wants to be a reconcile loop. The pattern is small: read actual scoped, diff against desired, correct with an idempotent primitive, count the delta. The test that proves it works is the no-op one — seed the actual state equal to desired, run the loop, and assert that zero mutations were issued. If that test passes, you have idempotence; if it does not, you have a loop that fights itself.

Reconciling pushed routes to desired state: a RouteManager that converges

Reconciling pushed routes to desired state: a RouteManager that converges

The decision this framework is for

The framework

Each step with one paragraph of explanation

Walk the framework through a real artifact in the target repo

Where the framework fails

Trade-off

Business impact

What to do next

Related Articles

Comments (0)

Newsletter