6.5 KiB

Raw Permalink Blame History

RFC-0130: L4 Feed — Temporal Event Store

Status: Draft
Author: Jarvis (Silicon Architect and Representative for Agents in Libertaria)
Date: 2026-02-03
Target: Janus SDK v0.2.0

Summary

L4 Feed ist das temporale Event-Storage-Layer für Libertaria. Es speichert soziale Primitive (Posts, Reactions, Follows) mit hybridem Ansatz:

DuckDB: Strukturierte Queries (Zeitreihen, Aggregations)
LanceDB: Vektor-Search für semantische Ähnlichkeit

Kenya Compliance

Constraint	Status	Implementation
RAM <10MB	✅ Planned	DuckDB in-memory mode, LanceDB mmap
No cloud	✅	Embedded storage only
<1MB binary	⚠️ TBD	Stripped DuckDB + custom LanceDB bindings

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    L4 Feed Layer                            │
├─────────────────────────────────────────────────────────────┤
│  ┌──────────────┐        ┌──────────────┐                   │
│  │   DuckDB     │        │  LanceDB     │                   │
│  │  (events)    │        │ (embeddings) │                   │
│  ├──────────────┤        ├──────────────┤                   │
│  │ - Timeline   │        │ - ANN search │                   │
│  │ - Counts     │        │ - Similarity │                   │
│  │ - Replies    │        │ - Clustering │                   │
│  └──────────────┘        └──────────────┘                   │
│          │                       │                          │
│          └───────────┬───────────┘                          │
│                      │                                       │
│              ┌───────▼───────┐                               │
│              │  FeedStore    │                               │
│              └───────────────┘                               │
└─────────────────────────────────────────────────────────────┘

Data Model

Event Types

pub const EventType = enum {
    post,           // Original content
    reaction,       // like, boost, bookmark
    follow,         // Social graph edge (directed)
    mention,        // @username reference
    hashtag,        // #topic tag
    edit,           // Content modification
    delete,         // Tombstone (soft delete)
};

FeedEvent Structure

Field	Type	Description
id	u64	Snowflake ID (time-sortable, 64-bit)
event_type	EventType	Enum discriminator
author	[32]u8	DID (Decentralized Identifier)
timestamp	i64	Unix nanoseconds
content_hash	[32]u8	Blake3 hash of canonical content
parent_id	?u64	For replies/threading
embedding	?[384]f32	384-dim vector (LanceDB)
tags	[]string	Hashtags
mentions	[][32]u8	Referenced DIDs

DuckDB Schema

-- Events table (structured data)
CREATE TABLE events (
    id UBIGINT PRIMARY KEY,
    event_type TINYINT,
    author BLOB(32),
    timestamp BIGINT,
    content_hash BLOB(32),
    parent_id UBIGINT,
    tags VARCHAR[],
    embedding_ref INTEGER  -- Index into LanceDB
);

-- Indexes for common queries
CREATE INDEX idx_author_time ON events(author, timestamp DESC);
CREATE INDEX idx_parent ON events(parent_id);
CREATE INDEX idx_time ON events(timestamp DESC);

-- FTS for content search (optional)
CREATE TABLE event_content (
    id UBIGINT PRIMARY KEY REFERENCES events(id),
    text_content VARCHAR
);

LanceDB Schema

# Python pseudocode for schema
import lancedb
from lancedb.pydantic import LanceModel, Vector

class Embedding(LanceModel):
    id: int  # Matches events.id
    vector: Vector(384)  # 384-dim embedding
    
    # Metadata for filtering
    event_type: int
    author: bytes  # 32 bytes DID
    timestamp: int

Query Patterns

1. Timeline (Home Feed)

SELECT * FROM events 
WHERE author IN (SELECT following FROM follows WHERE follower = ?)
ORDER BY timestamp DESC
LIMIT 50;

2. Thread (Conversation)

WITH RECURSIVE thread AS (
    SELECT * FROM events WHERE id = ?
    UNION ALL
    SELECT e.* FROM events e
    JOIN thread t ON e.parent_id = t.id
)
SELECT * FROM thread ORDER BY timestamp;

3. Semantic Search (LanceDB)

# Find similar posts
table.search(query_embedding) \
    .where("event_type = 0") \  # Only posts
    .limit(20) \
    .to_pandas()

Synchronization Strategy

Write Path:
- Insert into DuckDB (ACID transaction)
- Generate embedding (local model, ONNX Runtime)
- Insert into LanceDB (async, eventual consistency)
Read Path:
- DuckDB: Structured queries, counts, timelines
- LanceDB: Vector similarity, clustering
- Hybrid: Vector + time filter (LanceDB filter API)

Implementation Phases

Phase 1: DuckDB Core (Sprint 4)

DuckDB Zig bindings (C API wrapper)
Event storage/retrieval
Timeline queries
Thread reconstruction

Phase 2: LanceDB Integration (Sprint 5)

LanceDB Rust bindings (via C FFI)
Embedding storage
ANN search
Hybrid queries

Phase 3: Optimization (Sprint 6)

WAL for durability
Compression (zstd for content)
Incremental backups
RAM usage optimization

Dependencies

Library	Version	Purpose	Size
DuckDB	0.9.2	Structured storage	~15MB → 5MB stripped
LanceDB	0.9.x	Vector storage	~20MB → 8MB stripped
ONNX Runtime	1.16	Embeddings	Optional, ~50MB

Total binary impact: ~13MB (DuckDB + LanceDB stripped, ohne ONNX)

Open Questions

Embedding Model: All-MiniLM-L6-v2 (22MB) oder kleiner?
Sync Strategy: LanceDB als optionaler Index (graceful degradation)?
Replication: Event sourcing für Node-to-Node sync?

Sovereign; Kinetic; Anti-Fragile. ⚡️

6.5 KiB Raw Permalink Blame History Unescape Escape