diff --git a/docs/rfcs/RFC-0130_L4_Feed.md b/docs/rfcs/RFC-0130_L4_Feed.md new file mode 100644 index 0000000..2af6528 --- /dev/null +++ b/docs/rfcs/RFC-0130_L4_Feed.md @@ -0,0 +1,202 @@ +# RFC-0130: L4 Feed — Temporal Event Store + +**Status:** Draft +**Author:** Frankie (Silicon Architect) +**Date:** 2026-02-03 +**Target:** Janus SDK v0.2.0 + +--- + +## Summary + +L4 Feed ist das temporale Event-Storage-Layer für Libertaria. Es speichert soziale Primitive (Posts, Reactions, Follows) mit hybridem Ansatz: + +- **DuckDB:** Strukturierte Queries (Zeitreihen, Aggregations) +- **LanceDB:** Vektor-Search für semantische Ähnlichkeit + +## Kenya Compliance + +| Constraint | Status | Implementation | +|------------|--------|----------------| +| RAM <10MB | ✅ Planned | DuckDB in-memory mode, LanceDB mmap | +| No cloud | ✅ | Embedded storage only | +| <1MB binary | ⚠️ TBD | Stripped DuckDB + custom LanceDB bindings | + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────┐ +│ L4 Feed Layer │ +├─────────────────────────────────────────────────────────────┤ +│ ┌──────────────┐ ┌──────────────┐ │ +│ │ DuckDB │ │ LanceDB │ │ +│ │ (events) │ │ (embeddings) │ │ +│ ├──────────────┤ ├──────────────┤ │ +│ │ - Timeline │ │ - ANN search │ │ +│ │ - Counts │ │ - Similarity │ │ +│ │ - Replies │ │ - Clustering │ │ +│ └──────────────┘ └──────────────┘ │ +│ │ │ │ +│ └───────────┬───────────┘ │ +│ │ │ +│ ┌───────▼───────┐ │ +│ │ FeedStore │ │ +│ └───────────────┘ │ +└─────────────────────────────────────────────────────────────┘ +``` + +## Data Model + +### Event Types + +```zig +pub const EventType = enum { + post, // Original content + reaction, // like, boost, bookmark + follow, // Social graph edge (directed) + mention, // @username reference + hashtag, // #topic tag + edit, // Content modification + delete, // Tombstone (soft delete) +}; +``` + +### FeedEvent Structure + +| Field | Type | Description | +|-------|------|-------------| +| id | u64 | Snowflake ID (time-sortable, 64-bit) | +| event_type | EventType | Enum discriminator | +| author | [32]u8 | DID (Decentralized Identifier) | +| timestamp | i64 | Unix nanoseconds | +| content_hash | [32]u8 | Blake3 hash of canonical content | +| parent_id | ?u64 | For replies/threading | +| embedding | ?[384]f32 | 384-dim vector (LanceDB) | +| tags | []string | Hashtags | +| mentions | [][32]u8 | Referenced DIDs | + +## DuckDB Schema + +```sql +-- Events table (structured data) +CREATE TABLE events ( + id UBIGINT PRIMARY KEY, + event_type TINYINT, + author BLOB(32), + timestamp BIGINT, + content_hash BLOB(32), + parent_id UBIGINT, + tags VARCHAR[], + embedding_ref INTEGER -- Index into LanceDB +); + +-- Indexes for common queries +CREATE INDEX idx_author_time ON events(author, timestamp DESC); +CREATE INDEX idx_parent ON events(parent_id); +CREATE INDEX idx_time ON events(timestamp DESC); + +-- FTS for content search (optional) +CREATE TABLE event_content ( + id UBIGINT PRIMARY KEY REFERENCES events(id), + text_content VARCHAR +); +``` + +## LanceDB Schema + +```python +# Python pseudocode for schema +import lancedb +from lancedb.pydantic import LanceModel, Vector + +class Embedding(LanceModel): + id: int # Matches events.id + vector: Vector(384) # 384-dim embedding + + # Metadata for filtering + event_type: int + author: bytes # 32 bytes DID + timestamp: int +``` + +## Query Patterns + +### 1. Timeline (Home Feed) +```sql +SELECT * FROM events +WHERE author IN (SELECT following FROM follows WHERE follower = ?) +ORDER BY timestamp DESC +LIMIT 50; +``` + +### 2. Thread (Conversation) +```sql +WITH RECURSIVE thread AS ( + SELECT * FROM events WHERE id = ? + UNION ALL + SELECT e.* FROM events e + JOIN thread t ON e.parent_id = t.id +) +SELECT * FROM thread ORDER BY timestamp; +``` + +### 3. Semantic Search (LanceDB) +```python +# Find similar posts +table.search(query_embedding) \ + .where("event_type = 0") \ # Only posts + .limit(20) \ + .to_pandas() +``` + +## Synchronization Strategy + +1. **Write Path:** + - Insert into DuckDB (ACID transaction) + - Generate embedding (local model, ONNX Runtime) + - Insert into LanceDB (async, eventual consistency) + +2. **Read Path:** + - DuckDB: Structured queries, counts, timelines + - LanceDB: Vector similarity, clustering + - Hybrid: Vector + time filter (LanceDB filter API) + +## Implementation Phases + +### Phase 1: DuckDB Core (Sprint 4) +- [ ] DuckDB Zig bindings (C API wrapper) +- [ ] Event storage/retrieval +- [ ] Timeline queries +- [ ] Thread reconstruction + +### Phase 2: LanceDB Integration (Sprint 5) +- [ ] LanceDB Rust bindings (via C FFI) +- [ ] Embedding storage +- [ ] ANN search +- [ ] Hybrid queries + +### Phase 3: Optimization (Sprint 6) +- [ ] WAL for durability +- [ ] Compression (zstd for content) +- [ ] Incremental backups +- [ ] RAM usage optimization + +## Dependencies + +| Library | Version | Purpose | Size | +|---------|---------|---------|------| +| DuckDB | 0.9.2 | Structured storage | ~15MB → 5MB stripped | +| LanceDB | 0.9.x | Vector storage | ~20MB → 8MB stripped | +| ONNX Runtime | 1.16 | Embeddings | Optional, ~50MB | + +**Total binary impact:** ~13MB (DuckDB + LanceDB stripped, ohne ONNX) + +## Open Questions + +1. **Embedding Model:** All-MiniLM-L6-v2 (22MB) oder kleiner? +2. **Sync Strategy:** LanceDB als optionaler Index (graceful degradation)? +3. **Replication:** Event sourcing für Node-to-Node sync? + +--- + +*Sovereign; Kinetic; Anti-Fragile.* ⚡️