376 lines
9.8 KiB
Markdown
376 lines
9.8 KiB
Markdown
# SPEC-018: L2 Session Manager
|
|
|
|
**Status:** DRAFT
|
|
**Version:** 0.1.0
|
|
**Date:** 2026-02-02
|
|
**Profile:** :service (with :core crypto primitives)
|
|
**Supersedes:** None (New Feature)
|
|
|
|
---
|
|
|
|
## 1. Overview
|
|
|
|
The L2 Session Manager provides sovereign, cryptographically verified peer-to-peer session management for the Libertaria Stack. It establishes trust relationships, maintains them through network disruptions, and ensures post-quantum security through automatic key rotation.
|
|
|
|
### 1.1 Design Principles
|
|
|
|
1. **Explicit State**: Every session state is explicit, logged, and auditable
|
|
2. **Graceful Degradation**: Sessions survive network partitions without data loss
|
|
3. **No WebSockets**: Uses QUIC/μTCP only (see ADR-001)
|
|
4. **Post-Quantum Security**: X25519Kyber768 hybrid key exchange
|
|
|
|
### 1.2 Transport Architecture
|
|
|
|
| Transport | Role | Protocol Details |
|
|
|-----------|------|------------------|
|
|
| QUIC | Primary | UDP-based, 0-RTT, TLS 1.3 built-in |
|
|
| μTCP | Fallback | Micro-optimized TCP, minimal overhead |
|
|
| Raw UDP | Discovery | Stateless probing, STUN-like |
|
|
|
|
**Rationale**: WebSockets (RFC 6455) are excluded. They add HTTP handshake overhead, require proxy support, and don't support UDP hole punching natively.
|
|
|
|
---
|
|
|
|
## 2. Behavioral Specification (BDD)
|
|
|
|
### 2.1 Session Establishment
|
|
|
|
```gherkin
|
|
Feature: Session Establishment
|
|
|
|
Scenario: Successful establishment with new peer
|
|
Given a discovered peer with valid DID
|
|
When session establishment is initiated
|
|
Then state transitions to "handshake_initiated"
|
|
And PQxdh handshake request is sent
|
|
When valid handshake response received
|
|
Then state transitions to "established"
|
|
And shared session keys are derived
|
|
And TTL is set to 24 hours
|
|
|
|
Scenario: Session resumption
|
|
Given previous session exists with unchanged prekeys
|
|
When resumption is initiated
|
|
Then existing key material is reused
|
|
And state becomes "established" within 100ms
|
|
|
|
Scenario: Establishment timeout
|
|
When no response within 5 seconds
|
|
Then state transitions to "failed"
|
|
And failure reason is "timeout"
|
|
And retry is scheduled with exponential backoff
|
|
|
|
Scenario: Authentication failure
|
|
When invalid signature received
|
|
Then state transitions to "failed"
|
|
And failure reason is "authentication_failed"
|
|
And peer is quarantined for 60 seconds
|
|
```
|
|
|
|
### 2.2 Session Maintenance
|
|
|
|
```gherkin
|
|
Feature: Session Maintenance
|
|
|
|
Scenario: Heartbeat success
|
|
When 30 seconds pass without activity
|
|
Then heartbeat is sent
|
|
And peer responds within 2 seconds
|
|
And TTL is extended
|
|
|
|
Scenario: Single missed heartbeat
|
|
Given peer misses 1 heartbeat
|
|
When next heartbeat succeeds
|
|
Then session remains "established"
|
|
And warning is logged
|
|
|
|
Scenario: Session suspension
|
|
Given peer misses 3 heartbeats
|
|
When third timeout occurs
|
|
Then state becomes "suspended"
|
|
And queued messages are held
|
|
And recovery is attempted after 60s
|
|
|
|
Scenario: Automatic key rotation
|
|
Given session age reaches 24 hours
|
|
When rotation window triggers
|
|
Then new ephemeral keys are generated
|
|
And re-handshake is initiated
|
|
And no messages are lost
|
|
```
|
|
|
|
### 2.3 Degradation and Recovery
|
|
|
|
```gherkin
|
|
Feature: Degradation and Recovery
|
|
|
|
Scenario: Network partition detection
|
|
When connectivity lost for >30s
|
|
Then state becomes "degraded"
|
|
And messages are queued
|
|
And session is preserved
|
|
|
|
Scenario: Partition recovery
|
|
Given session is "degraded"
|
|
When connectivity restored
|
|
Then re-establishment is attempted
|
|
And queued messages are flushed
|
|
|
|
Scenario: Transport fallback
|
|
Given session over QUIC
|
|
When QUIC fails
|
|
Then re-establishment over μTCP is attempted
|
|
And this is transparent to upper layers
|
|
```
|
|
|
|
---
|
|
|
|
## 3. State Machine
|
|
|
|
### 3.1 State Definitions
|
|
|
|
| State | Description | Valid Transitions |
|
|
|-------|-------------|-------------------|
|
|
| `idle` | Initial state | `handshake_initiated`, `handshake_received` |
|
|
| `handshake_initiated` | Awaiting response | `established`, `failed` |
|
|
| `handshake_received` | Received request, preparing response | `established`, `failed` |
|
|
| `established` | Active session | `degraded`, `rotating` |
|
|
| `degraded` | Connectivity issues | `established`, `suspended` |
|
|
| `rotating` | Key rotation in progress | `established`, `failed` |
|
|
| `suspended` | Extended failure | `[cleanup]`, `handshake_initiated` |
|
|
| `failed` | Terminal failure | `[cleanup]`, `handshake_initiated` (retry) |
|
|
|
|
### 3.2 State Diagram
|
|
|
|
```mermaid
|
|
stateDiagram-v2
|
|
[*] --> idle
|
|
|
|
idle --> handshake_initiated: initiate_handshake()
|
|
idle --> handshake_received: receive_handshake()
|
|
|
|
handshake_initiated --> established: receive_valid_response()
|
|
handshake_initiated --> failed: timeout / invalid_sig
|
|
|
|
handshake_received --> established: send_response + ack
|
|
handshake_received --> failed: timeout
|
|
|
|
established --> degraded: missed_heartbeats(3)
|
|
established --> rotating: time_to_rotate()
|
|
|
|
degraded --> established: connectivity_restored
|
|
degraded --> suspended: timeout(60s)
|
|
|
|
suspended --> [*]: cleanup()
|
|
suspended --> handshake_initiated: retry()
|
|
|
|
rotating --> established: rotation_complete
|
|
rotating --> failed: rotation_timeout
|
|
|
|
failed --> [*]: cleanup()
|
|
failed --> handshake_initiated: retry_with_backoff()
|
|
```
|
|
|
|
---
|
|
|
|
## 4. Architecture Decision Records
|
|
|
|
### ADR-001: No WebSockets
|
|
|
|
**Context:** P2P systems need reliable, low-latency, firewall-traversing transport.
|
|
|
|
**Decision:** Exclude WebSockets. Use QUIC as primary, μTCP as fallback.
|
|
|
|
**Consequences:**
|
|
- ✅ Zero HTTP overhead
|
|
- ✅ Native UDP hole punching
|
|
- ✅ 0-RTT connection establishment
|
|
- ✅ Built-in TLS 1.3 (QUIC)
|
|
- ❌ No browser compatibility (acceptable — native-first design)
|
|
- ❌ Corporate proxy issues (mitigation: relay mode)
|
|
|
|
### ADR-002: State Machine Over Connection Object
|
|
|
|
**Context:** Traditional "connections" are ephemeral and error-prone.
|
|
|
|
**Decision:** Model sessions as explicit state machines with cryptographic verification.
|
|
|
|
**Consequences:**
|
|
- ✅ Every transition is auditable
|
|
- ✅ Supports offline-to-online continuity
|
|
- ✅ Enables split-world scenarios
|
|
- ❌ Higher cognitive load (mitigation: tooling)
|
|
|
|
### ADR-003: Post-Quantum Hybrid
|
|
|
|
**Context:** PQ crypto is slow; classical may be broken by 2035.
|
|
|
|
**Decision:** X25519Kyber768 hybrid key exchange.
|
|
|
|
**Consequences:**
|
|
- ✅ Resistant to classical and quantum attacks
|
|
- ✅ Hardware acceleration for X25519
|
|
- ❌ Larger handshake packets
|
|
|
|
---
|
|
|
|
## 5. Interface Specification
|
|
|
|
### 5.1 Core Types
|
|
|
|
```janus
|
|
/// Session configuration
|
|
const SessionConfig = struct {
|
|
/// Time-to-live before requiring re-handshake
|
|
ttl: Duration = 24h,
|
|
|
|
/// Heartbeat interval
|
|
heartbeat_interval: Duration = 30s,
|
|
|
|
/// Missed heartbeats before degradation
|
|
heartbeat_tolerance: u8 = 3,
|
|
|
|
/// Handshake timeout
|
|
handshake_timeout: Duration = 5s,
|
|
|
|
/// Key rotation window (before TTL expires)
|
|
rotation_window: Duration = 1h,
|
|
};
|
|
|
|
/// Session state enumeration
|
|
const State = enum {
|
|
idle,
|
|
handshake_initiated,
|
|
handshake_received,
|
|
established,
|
|
degraded,
|
|
rotating,
|
|
suspended,
|
|
failed,
|
|
};
|
|
|
|
/// Session error types
|
|
const SessionError = !union {
|
|
Timeout,
|
|
AuthenticationFailed,
|
|
TransportFailed,
|
|
KeyRotationFailed,
|
|
InvalidState,
|
|
};
|
|
```
|
|
|
|
### 5.2 Public API
|
|
|
|
```janus
|
|
/// Establish new session
|
|
func establish(
|
|
peer_did: []const u8,
|
|
config: SessionConfig,
|
|
ctx: Context
|
|
) !Session
|
|
with ctx where ctx.has(
|
|
.net_connect,
|
|
.crypto_pqxdh,
|
|
.did_resolve,
|
|
.time
|
|
);
|
|
|
|
/// Resume existing session
|
|
func resume(
|
|
peer_did: []const u8,
|
|
stored: StoredSession,
|
|
ctx: Context
|
|
) !Session;
|
|
|
|
/// Accept incoming session
|
|
func accept(
|
|
request: HandshakeRequest,
|
|
config: SessionConfig,
|
|
ctx: Context
|
|
) !Session;
|
|
|
|
/// Process all sessions (call in event loop)
|
|
func tick(sessions: []Session, ctx: Context) void;
|
|
```
|
|
|
|
---
|
|
|
|
## 6. Testing Requirements
|
|
|
|
### 6.1 Unit Tests
|
|
|
|
All Gherkin scenarios must have corresponding tests:
|
|
|
|
```janus
|
|
test "Scenario-001.1: Session establishes successfully" do
|
|
// Validates: SPEC-018 2.1 SCENARIO-1
|
|
let session = try Session.establish(test_peer, test_config, ctx);
|
|
assert(session.state == .handshake_initiated);
|
|
// ... simulate response
|
|
assert(session.state == .established);
|
|
end
|
|
```
|
|
|
|
### 6.2 Integration Tests
|
|
|
|
- Two-node handshake with real crypto
|
|
- Network partition simulation
|
|
- Transport fallback verification
|
|
- Chaos testing (random packet loss)
|
|
|
|
### 6.3 Mock Interfaces
|
|
|
|
| Dependency | Mock Interface |
|
|
|------------|----------------|
|
|
| L0 Transport | `MockTransport` with latency/packet loss controls |
|
|
| PQxdh | Deterministic test vectors |
|
|
| Clock | Injectable `TimeSource` |
|
|
| DID Resolver | `MockResolver` with test documents |
|
|
|
|
---
|
|
|
|
## 7. Security Considerations
|
|
|
|
### 7.1 Threat Model
|
|
|
|
| Threat | Mitigation |
|
|
|--------|------------|
|
|
| Man-in-the-middle | PQxdh with DID-based identity |
|
|
| Replay attacks | Monotonic counters in heartbeats |
|
|
| Key compromise | Automatic rotation every 24h |
|
|
| Timing attacks | Constant-time crypto operations |
|
|
| Denial of service | Quarantine + exponential backoff |
|
|
|
|
### 7.2 Cryptographic Requirements
|
|
|
|
- Key exchange: X25519Kyber768 (hybrid)
|
|
- Signatures: Ed25519
|
|
- Symmetric encryption: ChaCha20-Poly1305
|
|
- Hashing: BLAKE3
|
|
|
|
---
|
|
|
|
## 8. Related Specifications
|
|
|
|
- **SPEC-017**: Janus Language Syntax
|
|
- **RSP-1**: Registry Sovereignty Protocol
|
|
- **RFC-0000**: Libertaria Wire Frame Protocol (L0)
|
|
- **RFC-NCP-001**: Nexus Context Protocol
|
|
|
|
---
|
|
|
|
## 9. Rejection Criteria
|
|
|
|
This specification is NOT READY until:
|
|
- [ ] All Gherkin scenarios have TDD tests
|
|
- [ ] Mermaid diagrams are validated
|
|
- [ ] ADR-001 is acknowledged by both Architects
|
|
- [ ] Mock interfaces are defined
|
|
- [ ] Security review complete
|
|
|
|
---
|
|
|
|
**Sovereign Index**: `l2_session.zig`
|
|
**Feature Folder**: `l2_session/`
|
|
**Status**: AWAITING ACKNOWLEDGMENT
|