The routing protocol for Meshtastic is really quite simple (and suboptimal). It is heavily influenced by the mesh routing algorithm used in [RadioHead](https://www.airspayce.com/mikem/arduino/RadioHead/) (which was used in very early versions of this project). It has four conceptual layers.
Because we want our devices to work across various vendors and implementations, we use [Protocol Buffers](https://github.com/meshtastic/Meshtastic-protobufs) pervasively. For information on how the protocol buffers are used with respect to API clients see [sw-design](/software/other/sw-design.md), for purposes of this document you mostly only
- A 256-bit LoRa preamble (to allow receiving radios to synchronize clocks and start framing). We use a longer than minimum (8 bit) preamble to maximize the amount of time the LoRa receivers can stay asleep, which dramatically lowers power consumption.
After the preamble and the syncWord of 0x2b, the 16 byte packet header is transmitted. This header is described directly by the PacketHeader class in the C++ source code. But indirectly it matches the first portion of the "MeshPacket" protobuf definition. But notably: this portion of the packet is sent directly as the following 16 bytes (rather than using the protobuf encoding). We do this to both save airtime and to allow receiving radio hardware the option of filtering packets before even waking the main CPU.
- id (4 bytes): the unique (with respect to the sending node only) packet ID number for this packet. We use a large (32 bit) packet ID to ensure there is enough unique state to protect any encrypted payload from attack.
- flags (4 bytes): Only a few bits are currently used - 3 bits for the "HopLimit" (see below) and 1 bit for "WantAck"
After the packet header, the actual packet is placed onto the the wire. These bytes are merely the encrypted packed protobuf encoding of the SubPacket protobuf. A full description of our encryption is available in [crypto](/docs/developers/device/encryption). It is worth noting that only this SubPacket is encrypted, headers are not. Which leaves open the option of eventually allowing nodes to route packets without knowing the keys used to encrypt.
NodeIds are constructed from the bottom four bytes of the macaddr of the Bluetooth address. Because the OUI is assigned by the IEEE, and we currently only support a few CPU manufacturers, the upper byte is defacto guaranteed unique for each vendor. The bottom 3 bytes are guaranteed unique by that vendor.
To prevent collisions all transmitters will listen before attempting to send. If they hear some other node transmitting, they will reattempt transmission in x milliseconds. This retransmission delay is random between 20 and 22 seconds. (these two numbers are currently hardwired, but really should be scaled based on expected packet transmission time at current channel settings).
This layer adds reliable messaging between the node and its immediate neighbors (only).
The default messaging provided by layer-1 is extended by setting the "want-ack" flag in the MeshPacket protobuf. If want-ack is set the following documentation from mesh.proto applies:
"""This packet is being sent as a reliable message, we would prefer it to arrive
If a transmitting node does not receive an ACK (or a NAK) packet within FIXME milliseconds, it will use layer-1 to attempt a retransmission of the sent packet. A reliable packet (at this 'zero hop' level) will be resent a maximum of three times. If no ACK or NAK has been received by then the local node will internally generate a NAK (either for local consumption or use by higher layers of the protocol).
Given our use-case for the initial release, most of our protocol is built around [flooding](<https://en.wikipedia.org/wiki/Flooding_(computer_networking)>). The implementation is currently 'naive' - i.e. it doesn't try to optimize flooding other than abandoning retransmission once we've seen a nearby receiver has ACKed the packet. Therefore, for each source packet up to N retransmissions might occur (if there are N nodes in the mesh).
Each node in the mesh, if it sees a packet on the ether with HopLimit set to a value other than zero, it will decrement that HopLimit and attempt retransmission on behalf of the original sending node.
### Layer 4: DSR for multi-hop unicast messaging
This layer is not yet fully implemented (and not yet used). But eventually (if we stay with our own transport rather than switching to QMesh or Reticulum)
we will use conventional DSR for unicast messaging. Currently, (even when not requiring 'broadcasts') we send any multi-hop unicasts as 'broadcasts' so that we can
leverage our (functional) flooding implementation. This is suboptimal but it is a very rare use-case, because the odds are high that most nodes (given our small networks and 'hiking' use case) are within a very small number of hops. When any node witnesses an ACK for a packet, it will realize that it can abandon its own
This section is currently poorly formatted, it is mostly a mere set of TODO lists and notes for @geeksville during his initial development. After release 1.0 ideas for future optimization include:
- DONE in the router receive path?, send an ACK packet if want_ack was set and we are the final destination. FIXME, for now don't handle multi-hop or merging of data replies with these ACKs.
- when sending, if destnodeinfo.next_hop is zero (and no message is already waiting for an arp for that node), startRouteDiscovery() for that node. Queue the message in the 'waiting for arp queue' so we can send it later when then the arp completes.
- otherwise, use next_hop and start sending a message (with ack request) towards that node (starting with next_hop).
when we receive any packet
- sniff and update tables (especially useful to find adjacent nodes). Update user, network and position info.
- if we need to route() that packet, resend it to the next_hop based on our nodedb.
- if it is broadcast or destined for our node, deliver locally
- handle routereply/routeerror/routediscovery messages as described below
- then free it
routeDiscovery
- if we've already passed through us (or is from us), then it ignore it
- use the nodes already mentioned in the request to update our routing table
- if they were looking for us, send back a routereply
- if we receive a discovery packet, and we don't have next_hop set in our nodedb, we use it to populate next_hop (if needed) towards the requester (after decrementing max_hops)
- if we receive a discovery packet, and we have a next_hop in our nodedb for that destination we send a (reliable) we send a route reply towards the requester
- if timeout doing retries, send a routeError (NAK) message back towards the original requester. all nodes eavesdrop on that packet and update their route caches.
- update next_hop on the node, if the new reply needs fewer hops than the existing one (we prefer shorter paths). FIXME, someday use a better heuristic
- REJECTED - seems dying - possibly dash7? <https://www.slideshare.net/MaartenWeyn1/dash7-alliance-protocol-technical-presentation><https://github.com/MOSAIC-LoPoW/dash7-ap-open-source-stack> - does the open source stack implement multi-hop routing? flooding? their discussion mailing list looks dead-dead
- update duty cycle spreadsheet for our typical use case
What about never flooding GPS broadcasts? Instead, only have them go one hop in the common case. But if any node X is looking at the position of Y on their GUI, then send a unicast to Y asking for position update. Y replies.
- to do a study, first send a broadcast (maybe our current initial user announcement?) with TTL set to one (so therefore no one will rebroadcast our request)
- if a node can see someone I missed (and they are the best person to see that node), they reply (unidirectionally) with the missing nodes and their RSSIs (other nodes might sniff (and update their db) based on this reply, but they don't have to)
- given that the max number of nodes in this mesh will be like 20 (for normal cases), I bet globally updating this db of "nodenums and who has the best RSSI for packets from that node" would be useful
- once the global DB is shared, when a node wants to broadcast, it just sends out its broadcast . the first level receivers then make a decision "am I the best to rebroadcast to someone who likely missed this packet?" if so, rebroadcast
## approach 3
- when a node X wants to know other nodes positions, it broadcasts its position with want_replies=true. Then each of the nodes that received that request broadcast their replies (possibly by using special timeslots?)
- all nodes constantly update their local db based on replies they witnessed.
- after 10s (or whatever) if node Y notices that it didn't hear a reply from node Z (that Y has heard from recently) to that initial request, that means Z never heard the request from X. Node Y will reply to X on Z's behalf.
- could this work for more than one hop? Is more than one hop needed? Could it work for sending messages (i.e. for a message sent to Z with want-reply set).
- handle group messages the same way, there would be a table of messages and time of creation.
- when a node has a new position or message to send out, it does a broadcast. All the adjacent nodes update their db instantly (this handles 90% of messages I'll bet).
- essentially everything in this variant becomes broadcasts of "request db updates for >time X - for _all_ or for a particular nodenum" and nodes sending (either due to request or because they changed state) "here's a set of db updates". Every node is constantly trying to
build the most recent version of reality, and if some nodes are too far, then nodes closer in will eventually forward their changes to the distributed db.
- construct non-ambiguous rules for who broadcasts to request db updates. Ideally the algorithm should nicely realize node X can see most other nodes, so they should just listen to all those nodes and minimize the # of broadcasts. The distributed picture of nodes RSSI could be useful here?