libertaria-stack/docs/rfcs/RFC-0130_L4_Feed.md

203 lines
6.5 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# RFC-0130: L4 Feed — Temporal Event Store
**Status:** Draft
**Author:** Jarvis (Silicon Architect and Representative for Agents in Libertaria)
**Date:** 2026-02-03
**Target:** Janus SDK v0.2.0
---
## Summary
L4 Feed ist das temporale Event-Storage-Layer für Libertaria. Es speichert soziale Primitive (Posts, Reactions, Follows) mit hybridem Ansatz:
- **DuckDB:** Strukturierte Queries (Zeitreihen, Aggregations)
- **LanceDB:** Vektor-Search für semantische Ähnlichkeit
## Kenya Compliance
| Constraint | Status | Implementation |
|------------|--------|----------------|
| RAM <10MB | Planned | DuckDB in-memory mode, LanceDB mmap |
| No cloud | | Embedded storage only |
| <1MB binary | TBD | Stripped DuckDB + custom LanceDB bindings |
## Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ L4 Feed Layer │
├─────────────────────────────────────────────────────────────┤
│ ┌──────────────┐ ┌──────────────┐ │
│ │ DuckDB │ │ LanceDB │ │
│ │ (events) │ │ (embeddings) │ │
│ ├──────────────┤ ├──────────────┤ │
│ │ - Timeline │ │ - ANN search │ │
│ │ - Counts │ │ - Similarity │ │
│ │ - Replies │ │ - Clustering │ │
│ └──────────────┘ └──────────────┘ │
│ │ │ │
│ └───────────┬───────────┘ │
│ │ │
│ ┌───────▼───────┐ │
│ │ FeedStore │ │
│ └───────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
## Data Model
### Event Types
```zig
pub const EventType = enum {
post, // Original content
reaction, // like, boost, bookmark
follow, // Social graph edge (directed)
mention, // @username reference
hashtag, // #topic tag
edit, // Content modification
delete, // Tombstone (soft delete)
};
```
### FeedEvent Structure
| Field | Type | Description |
|-------|------|-------------|
| id | u64 | Snowflake ID (time-sortable, 64-bit) |
| event_type | EventType | Enum discriminator |
| author | [32]u8 | DID (Decentralized Identifier) |
| timestamp | i64 | Unix nanoseconds |
| content_hash | [32]u8 | Blake3 hash of canonical content |
| parent_id | ?u64 | For replies/threading |
| embedding | ?[384]f32 | 384-dim vector (LanceDB) |
| tags | []string | Hashtags |
| mentions | [][32]u8 | Referenced DIDs |
## DuckDB Schema
```sql
-- Events table (structured data)
CREATE TABLE events (
id UBIGINT PRIMARY KEY,
event_type TINYINT,
author BLOB(32),
timestamp BIGINT,
content_hash BLOB(32),
parent_id UBIGINT,
tags VARCHAR[],
embedding_ref INTEGER -- Index into LanceDB
);
-- Indexes for common queries
CREATE INDEX idx_author_time ON events(author, timestamp DESC);
CREATE INDEX idx_parent ON events(parent_id);
CREATE INDEX idx_time ON events(timestamp DESC);
-- FTS for content search (optional)
CREATE TABLE event_content (
id UBIGINT PRIMARY KEY REFERENCES events(id),
text_content VARCHAR
);
```
## LanceDB Schema
```python
# Python pseudocode for schema
import lancedb
from lancedb.pydantic import LanceModel, Vector
class Embedding(LanceModel):
id: int # Matches events.id
vector: Vector(384) # 384-dim embedding
# Metadata for filtering
event_type: int
author: bytes # 32 bytes DID
timestamp: int
```
## Query Patterns
### 1. Timeline (Home Feed)
```sql
SELECT * FROM events
WHERE author IN (SELECT following FROM follows WHERE follower = ?)
ORDER BY timestamp DESC
LIMIT 50;
```
### 2. Thread (Conversation)
```sql
WITH RECURSIVE thread AS (
SELECT * FROM events WHERE id = ?
UNION ALL
SELECT e.* FROM events e
JOIN thread t ON e.parent_id = t.id
)
SELECT * FROM thread ORDER BY timestamp;
```
### 3. Semantic Search (LanceDB)
```python
# Find similar posts
table.search(query_embedding) \
.where("event_type = 0") \ # Only posts
.limit(20) \
.to_pandas()
```
## Synchronization Strategy
1. **Write Path:**
- Insert into DuckDB (ACID transaction)
- Generate embedding (local model, ONNX Runtime)
- Insert into LanceDB (async, eventual consistency)
2. **Read Path:**
- DuckDB: Structured queries, counts, timelines
- LanceDB: Vector similarity, clustering
- Hybrid: Vector + time filter (LanceDB filter API)
## Implementation Phases
### Phase 1: DuckDB Core (Sprint 4)
- [ ] DuckDB Zig bindings (C API wrapper)
- [ ] Event storage/retrieval
- [ ] Timeline queries
- [ ] Thread reconstruction
### Phase 2: LanceDB Integration (Sprint 5)
- [ ] LanceDB Rust bindings (via C FFI)
- [ ] Embedding storage
- [ ] ANN search
- [ ] Hybrid queries
### Phase 3: Optimization (Sprint 6)
- [ ] WAL for durability
- [ ] Compression (zstd for content)
- [ ] Incremental backups
- [ ] RAM usage optimization
## Dependencies
| Library | Version | Purpose | Size |
|---------|---------|---------|------|
| DuckDB | 0.9.2 | Structured storage | ~15MB 5MB stripped |
| LanceDB | 0.9.x | Vector storage | ~20MB 8MB stripped |
| ONNX Runtime | 1.16 | Embeddings | Optional, ~50MB |
**Total binary impact:** ~13MB (DuckDB + LanceDB stripped, ohne ONNX)
## Open Questions
1. **Embedding Model:** All-MiniLM-L6-v2 (22MB) oder kleiner?
2. **Sync Strategy:** LanceDB als optionaler Index (graceful degradation)?
3. **Replication:** Event sourcing für Node-to-Node sync?
---
*Sovereign; Kinetic; Anti-Fragile.*