Skip to main content

V6 Storage File Format

Soltix uses a custom binary columnar format (V6) optimized for time-series data. V6 stores all columns in a single part file with a footer-based index for direct seeks.

V6 Part File Structure

┌──────────────────────────────────────┐
│ Header (64 bytes) │
│ Magic number, version, flags │
├──────────────────────────────────────┤
│ Column Chunk: device0._time │ ← Delta encoded + Snappy
│ Column Chunk: device0.field1 │ ← Gorilla/Delta/Dict/Bool + Snappy
│ Column Chunk: device0.field2 │
│ Column Chunk: device1._time │
│ Column Chunk: device1.field1 │
│ ... │
├──────────────────────────────────────┤
│ Footer │
│ ├── ColumnIndex[] │ ← Array of V6ColumnEntry
│ ├── FieldDictionary │ ← [_time, field0, field1, ...]
│ ├── FieldTypes │ ← ColumnType per field
│ ├── DeviceIndex │ ← Device name list
│ └── RowCountPerDevice │
├──────────────────────────────────────┤
│ FooterSize (4 bytes) │
│ FooterOffset (8 bytes) │ ← Last 8 bytes of file
└──────────────────────────────────────┘

Column Index Entry

Each column chunk is indexed with a V6ColumnEntry:

type V6ColumnEntry struct {
DeviceIdx uint32 // Index into device list
FieldIdx uint32 // 0 = _time, 1..N = fields
Offset int64 // Byte offset in file
Size uint32 // Compressed size in bytes
RowCount uint32 // Number of values
ColumnType uint8 // Float64, Int64, String, Bool
}

Two-Tier Metadata

Outside of part files, metadata is stored at two levels:

Global Metadata (_metadata.idx)

Per date directory. Contains:

  • Field list with types
  • Device Group (DG) manifests
  • Device → DG mapping
  • Min/max timestamps for time-range pruning

DG Metadata (dg_XXXX/_metadata.idx)

Per device group. Contains:

  • Part file names
  • Part manifests (min/max timestamps per part)
  • Device → Part mapping

Directory Structure

data/
├── group_{gid}/
│ └── {database}/
│ └── {collection}/
│ └── {year}/{month}/{date}/
│ ├── _metadata.idx
│ ├── dg_0000/
│ │ ├── _metadata.idx
│ │ ├── part_0000.bin
│ │ └── part_0001.bin
│ └── dg_0001/
│ ├── _metadata.idx
│ └── part_0000.bin

Compression Pipeline

Each column chunk goes through a two-layer compression:

Raw values → Column Encoder → Snappy Compress → Disk
Column TypeEncoderAlgorithm
float64GorillaEncoderXOR bit-packing (~1.37 bytes/value)
int64DeltaEncoderDelta + ZigZag + Varint
stringDictionaryEncoderUnique-string dictionary + varint indices
boolBoolEncoderBitmap (1 bit per value)

Data Types

TypeRaw SizeDescription
Float648 bytesSensor values, metrics
Int648 bytesTimestamps, counters
StringvariableDevice IDs, labels, status codes
Bool1 byteStatus flags

V6 vs Previous Versions

FeatureV5 (old)V6 (current)
Column Groupscg_XXXX/ directories (max 50 fields each)Eliminated — all columns in one file
_time columnDuplicated in every CG fileSingle _time per device per part
File I/O per queryO(fields/50) file opensSingle file open per part
Metadata3-tier (global → DG → CG)2-tier (global → DG)
FooterNoneFooter-based index for direct seeks