diff --git a/docs/format/wal.md b/docs/format/wal.md new file mode 100644 index 000000000..f0daba24d --- /dev/null +++ b/docs/format/wal.md @@ -0,0 +1,72 @@ +# WAL Disk Format + +The write ahead log operates in segments that that are numbered and sequential, +e.g. `000000`, `000001`, `000002`, etc., and are limited to 128MB by default. +A segment is written to in pages of 32KB. Only the last page of the most recent segment +may be partial. A WAL record is an opaque byte slice that gets split up into sub-records +should it exceed the remaining space of the current page. Records are never split across +segment boundaries. +The encoding of pages is largely borrowed from [LevelDB's/RocksDB's wirte ahead log.][1] + +Notable deviations are that the record fragment is encoded as: + +┌───────────┬──────────┬────────────┬──────────────┐ +│ type <1b> │ len <2b> │ CRC32 <4b> │ data │ +└───────────┴──────────┴────────────┴──────────────┘ + +## Record encoding + +The records written to the write ahead log are encoded as follows: + +### Series records + +Series records encode the labels that identifier a series and its unique ID. + +┌────────────────────────────────────────────┐ +│ type = 1 <1b> │ +├────────────────────────────────────────────┤ +│ ┌─────────┬──────────────────────────────┐ │ +│ │ id <8b> │ n = len(labels) │ │ +│ ├─────────┴────────────┬─────────────────┤ │ +│ │ len(str_1) │ str_1 │ │ +│ ├──────────────────────┴─────────────────┤ │ +│ │ ... │ │ +│ ├───────────────────────┬────────────────┤ │ +│ │ len(str_2n) │ str_2n │ │ +│ └───────────────────────┴────────────────┘ │ +│ . . . │ +└────────────────────────────────────────────┘ + +### Sample records + +Sample records encode samples as a list of triples `(series_id, timestamp, value)`. +Series reference and timestamp are encoded as deltas w.r.t the first sample. + +┌──────────────────────────────────────────────────────────────────┐ +│ type = 2 <1b> │ +├──────────────────────────────────────────────────────────────────┤ +│ ┌────────────────────┬───────────────────────────┬─────────────┐ │ +│ │ id <8b> │ timestamp <8b> │ value <8b> │ │ +│ └────────────────────┴───────────────────────────┴─────────────┘ │ +│ ┌────────────────────┬───────────────────────────┬─────────────┐ │ +│ │ id_delta │ timestamp_delta │ value <8b> │ │ +│ └────────────────────┴───────────────────────────┴─────────────┘ │ +│ . . . │ +└──────────────────────────────────────────────────────────────────┘ + +### Tombstone records + +Tombstone records encode tombstones as a list of triples `(series_id, min_time, max_time)` +and specify an interval for which samples of a series got deleted. + + +┌─────────────────────────────────────────────────────┐ +│ type = 3 <1b> │ +├─────────────────────────────────────────────────────┤ +│ ┌─────────┬───────────────────┬───────────────────┐ │ +│ │ id <8b> │ min_time │ max_time │ │ +│ └─────────┴───────────────────┴───────────────────┘ │ +│ . . . │ +└─────────────────────────────────────────────────────┘ + +[1][https://github.com/facebook/rocksdb/wiki/Write-Ahead-Log-File-Format] \ No newline at end of file