2018-05-30 08:05:30 -07:00
# WAL Disk Format
2018-07-19 23:26:12 -07:00
The write ahead log operates in segments that are numbered and sequential,
2018-05-30 08:05:30 -07:00
e.g. `000000` , `000001` , `000002` , etc., and are limited to 128MB by default.
A segment is written to in pages of 32KB. Only the last page of the most recent segment
may be partial. A WAL record is an opaque byte slice that gets split up into sub-records
should it exceed the remaining space of the current page. Records are never split across
2018-06-18 04:52:57 -07:00
segment boundaries. If a single record exceeds the default segment size, a segment with
a larger size will be created.
2018-08-02 14:46:45 -07:00
The encoding of pages is largely borrowed from [LevelDB's/RocksDB's write ahead log. ](https://github.com/facebook/rocksdb/wiki/Write-Ahead-Log-File-Format )
2018-05-30 08:05:30 -07:00
Notable deviations are that the record fragment is encoded as:
2018-06-18 04:52:57 -07:00
```
2018-05-30 08:05:30 -07:00
┌───────────┬──────────┬────────────┬──────────────┐
│ type < 1b > │ len < 2b > │ CRC32 < 4b > │ data < bytes > │
└───────────┴──────────┴────────────┴──────────────┘
2018-06-18 04:52:57 -07:00
```
2018-05-30 08:05:30 -07:00
2018-07-19 23:26:12 -07:00
The type flag has the following states:
* `0` : rest of page will be empty
* `1` : a full record encoded in a single fragment
* `2` : first fragment of a record
* `3` : middle fragment of a record
* `4` : final fragment of a record
2018-05-30 08:05:30 -07:00
## Record encoding
The records written to the write ahead log are encoded as follows:
### Series records
2018-07-19 23:26:12 -07:00
Series records encode the labels that identifies a series and its unique ID.
2018-05-30 08:05:30 -07:00
2018-06-18 04:52:57 -07:00
```
2018-05-30 08:05:30 -07:00
┌────────────────────────────────────────────┐
│ type = 1 < 1b > │
├────────────────────────────────────────────┤
│ ┌─────────┬──────────────────────────────┐ │
│ │ id < 8b > │ n = len(labels) < uvarint > │ │
│ ├─────────┴────────────┬─────────────────┤ │
│ │ len(str_1) < uvarint > │ str_1 < bytes > │ │
│ ├──────────────────────┴─────────────────┤ │
│ │ ... │ │
│ ├───────────────────────┬────────────────┤ │
│ │ len(str_2n) < uvarint > │ str_2n < bytes > │ │
│ └───────────────────────┴────────────────┘ │
│ . . . │
└────────────────────────────────────────────┘
2018-06-18 04:52:57 -07:00
```
2018-05-30 08:05:30 -07:00
### Sample records
Sample records encode samples as a list of triples `(series_id, timestamp, value)` .
Series reference and timestamp are encoded as deltas w.r.t the first sample.
2018-06-18 04:52:57 -07:00
```
2018-05-30 08:05:30 -07:00
┌──────────────────────────────────────────────────────────────────┐
│ type = 2 < 1b > │
├──────────────────────────────────────────────────────────────────┤
│ ┌────────────────────┬───────────────────────────┬─────────────┐ │
│ │ id < 8b > │ timestamp < 8b > │ value < 8b > │ │
│ └────────────────────┴───────────────────────────┴─────────────┘ │
│ ┌────────────────────┬───────────────────────────┬─────────────┐ │
│ │ id_delta < uvarint > │ timestamp_delta < uvarint > │ value < 8b > │ │
│ └────────────────────┴───────────────────────────┴─────────────┘ │
│ . . . │
└──────────────────────────────────────────────────────────────────┘
2018-06-18 04:52:57 -07:00
```
2018-05-30 08:05:30 -07:00
### Tombstone records
Tombstone records encode tombstones as a list of triples `(series_id, min_time, max_time)`
and specify an interval for which samples of a series got deleted.
2018-06-18 04:52:57 -07:00
```
2018-05-30 08:05:30 -07:00
┌─────────────────────────────────────────────────────┐
│ type = 3 < 1b > │
├─────────────────────────────────────────────────────┤
│ ┌─────────┬───────────────────┬───────────────────┐ │
│ │ id < 8b > │ min_time < varint > │ max_time < varint > │ │
│ └─────────┴───────────────────┴───────────────────┘ │
│ . . . │
└─────────────────────────────────────────────────────┘
2018-06-18 04:52:57 -07:00
```