04 · Every packet picks its path using on-device reinforcement learning

A 40 KB PPO agent runs on each device's NPU, jointly optimizing 8 features — signal, battery, trust, bandwidth, latency, mobility, congestion, sybil risk.

vs. OLSR / AODV / BATMAN

Classical MANET protocols optimize one metric. AETHER jointly optimizes 8 features and adapts to context.

vs. SDN

SDN needs a central brain. AETHER's policy is learned globally, executed locally — no controller, no SPOF.

Federated learning

Nodes share gradient updates (not data) over a privacy-preserving aggregation channel. The mesh gets smarter every hour.

function selectNextHop(packet, neighbors):
  candidates = []
  for n in neighbors:
    score = w1*signalQuality(n)
          + w2*batteryHeadroom(n)
          + w3*trustScore(n)
          + w4*bandwidthAvailable(n)
          + w5*(1/predictedLatency(n))
          + w6*mobilityAlignment(n, packet.dst)
          - w7*congestion(n)
          - w8*sybilRisk(n)
    candidates.push({n, score})
  // Multi-path: forward via top-K with probability ∝ softmax(score)
  return weightedSample(candidates, K=2)
// Weights w1..w8 adapted online by on-device PPO agent (8-dim state, 64-unit MLP).