Skip to main content

Storage Layer

The Storage service handles all data persistence and retrieval. It uses a 3-tier group-aware architecture with V6 columnar format: Group → Device Group → Partition.

Key Components

ComponentDescription
SubscriberReceives write messages from the queue (NATS/Redis/Kafka)
WriteWorkerPoolOne worker per partition key (db:collection:date), parallel writes
PartitionedWALDurability — partitioned by db/collection/date, protobuf-encoded with CRC32
MemoryStore64-shard FNV-hash partitioned in-memory store for hot data (configurable max_age)
FlushWorkerPoolEvent-driven flush — triggered on WAL segment boundaries, not per record
TieredStorageV6 columnar engine with per-group Storage instances (lazy-created)
CompactionWorkerBackground merge of small part files (runs every 30s)
AggregationPipelineCascading 4-level aggregation: 1h → 1d → 1M → 1y
SyncManagerStartup sync + anti-entropy for group-scoped replication
gRPC ServerServes QueryShard from Router — queries MemoryStore + TieredStorage concurrently

Architecture

┌──────────────────────────────────────────────────────────┐
│ TieredStorage │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ group_0000 │ │ group_0001 │ │ group_0042 │ │
│ │ V6 Engine │ │ V6 Engine │ │ V6 Engine │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└──────────────────────────────────────────────────────────┘

Each group gets its own independent Storage instance with its own metadata cache and directory locks.

V6 File Layout

V6 eliminated the Column Group (cg_XXXX/) directories — all columns are stored in a single part file per device group.

data/
├── group_0000/
│ └── mydb/
│ └── sensors/
│ └── 2026/
│ └── 01/
│ └── 15/
│ ├── _metadata.idx # Global metadata (fields, DG manifests, device→DG map)
│ ├── dg_0000/
│ │ ├── _metadata.idx # DG metadata (parts, device→part map)
│ │ ├── part_0000.bin # V6 columnar file
│ │ └── part_0001.bin
│ └── dg_0001/
│ ├── _metadata.idx
│ └── part_0000.bin

Storage Limits

SettingDefaultDescription
max_rows_per_part100,000Split part file when rows exceed this
max_part_size64 MBSafety limit per part file
min_rows_per_part1,000Don't split below this
max_devices_per_group50Max devices per device group
max_parts_per_dg4Compaction triggered above this

Write Strategy

Writes are append-only — new data creates new part files without reading back existing data. This ensures:

  • No read amplification during writes
  • Crash safety via atomic batch rename (.tmp → final)
  • Background compaction handles merge later

Query Strategy

Queries scan both hot and cold data:

  1. MemoryStore — binary search in sorted slices (recent data)
  2. TieredStorage — footer-based seeks in V6 columnar files (historical data)
  3. Merge + deduplicate — keep latest by InsertedAt

Optimizations: bloom filters, metadata time-range pruning, column projection, device→DG routing.