RaptorCast: Designing a Messaging Layer

klabb3 3 days ago

> Assuming zero network latency and a bandwidth of 1 Gbps, this would still take around 16 seconds

I’m not seeing how the new design affects the throughput needs, but I’ll say this:

Except for highly controlled environments (OS, NICs etc), you will run into perf issues with UDP-based protocols much sooner than with TCP, even if you’re just pushing zeroes. Packet switching is much more difficult to optimize.

If you only use sporadic messages without backpressure, and you’re willing and able to handle out-of-order messages and retransmission logic, by all means, use UDP. Like for realtime multiplayer games, it makes sense.

For high throughput on diverse platforms and hardware, the story is very different. Yes, even with Quic. I learnt this the hard way.

All that said, I’m very curious what the results are. Is this designed fully deployed, and if so in what kind of environment and traffic patterns? Even better: benchmarks/stress tests would be fantastic.

tklenze 3 days ago

Thanks for your insights! Yes, real-life behavior is indeed interesting to look at, and for this purpose, two testnets are running right now (https://www.gmonads.com).
RaptorCast uses erasure coding to break a block proposal into smaller pieces with plenty of redundancy to allow for omissions. This means that if you receive sufficiently many chunks, you can decode the block proposal (no matter which of the chunks you received). The redundancy factor can be tweaked, but it’ll likely be >2x, to allow for networking issues and faulty/malicious nodes. Furthermore, the blockchain can make progress as long as >2/3 of the validators receive the block proposal and are honest. This means that at least in theory, you should be able to tolerate a lot of packet losses.
Re throughput: Monad has 2 blocks / s, each 2MB in size. So even with a redundancy factor of 3x, each validator only has to send 12MB per second.
Re backpressure: Not really an option for blockchains. If you have 100 peers and one of them is too slow, what are you going to do? If you back pressure to slow down consensus, you slow down the entire blockchain even though most peers are fast. There’s a recent paper about this problem: https://arxiv.org/abs/2410.22080.
What’s important is that the amount of bandwidth required per validator remains constant in RaptorCast, no matter how many validators are part of the network. And you always just need one round-trip to broadcast a block proposal, as opposed to Gossip protocols that may involve more steps and have higher latency.
- lxgr 3 days ago
  
  > The redundancy factor can be tweaked, but it’ll likely be >2x
  If your packet loss is due to your traffic overwhelming a queue at any intermediate hop, sending more redundant packets would be aggravating the problem instead of solving it.
  Are you running this on top of something providing congestion control?
PhilipRoman 3 days ago

It's a shame because the datagram paradigm is so much more elegant. In real world cases you end up having to emulate it by putting length prefixed data in TCP streams, reducing TCP timeouts, constantly reconnecting sockets (with the latency penalty), etc.
Really, the only thing that's missing from UDP is (optional) backpressure.
A lot of software can handle out-of-order datagrams with no performance penalty (like file uploads, etc.). This is especially annoying when you're operating in an environment with link aggregation where the interface insists on limiting your bandwidth to a single link.
- simcop2387 3 days ago
  
  This is one reason that I'm still upset about the failure that SCTP has ended up. It really did try to create a new protocol for dealing with exactly all of these issues but support and ossification basically meant it's a non-starter. I'd have loved if it was a mandatory part of IPv6 so that it'd eventually get useful support but I'm pretty sure that would have made IPv6 adoption even worse.
  
  Veserv 3 days ago
  
  Well we have QUIC now which layers over UDP and is functionally strictly superior to SCTP as SCTP still suffered from head-of-line blocking due to bad acknowledgement design.
  
  lxgr 3 days ago
  
  As long as you're fine with UDP encapsulation, you can definitely use SCTP today! WebRTC data channels do, for example.
- klabb3 2 days ago
  
  Agreed but I think it’s better to use streams and deliver messages reliably and in order today, despite the extra effort.
  For instance, TLS is stateful. If you want to deliver unannounced (stateless) datagrams to a server, you’d technically be able to use the servers pubkey but you’d have to sacrifice forward secrecy. Making business logic with dropped packets and OOO delivery is also very difficult depending on domain.
  Personally I would like to see faster bootstrap and especially resumes of streams to eliminate handshake cost. For instance:
  - resume with last shared secret for 0rtt handshake (I believe Quic supports this)
  - allow the dialer to queue/send data before ack (this is an API-breaking change on sockets, but worthwhile imo - it makes socket FSM on app layer easier to use because it never needs to be nil
- lxgr 3 days ago
  
  > the only thing that's missing from UDP is (optional) backpressure.
  The lack of congestion control seems significant too. Most message-oriented protocols layered on top of UDP end up adding that back at the application layer as a consequence.

toolslive 5 hours ago

> If peeling fails, R10 falls back to solving the system of equations using Gaussian elimination.

I've built a decoder for online codes (very similar) in the past and learned that it's best to just to a Gauss elimination step regardless of the degree of equation/packet/message you get. Also, since the system is very sparse it's best to not have a matrix representation, even if you're using bitvectors, as you'll be scanning rows which have mostly zeros. Just use a representation for sparse sets.

Anyway, just my 2cts

yangl1996 3 days ago

Looks like there is no mentioning in the blogpost of the paper (poster) [1] in which the two-level broadcast idea is proposed.

[1] https://dl.acm.org/doi/pdf/10.1145/3548606.3563494

ethan_smith 3 days ago

The cited paper is indeed foundational, introducing not just two-level broadcast but also optimizations for validator selection and network topology that RaptorCast appears to build upon.
compyman 3 days ago

It seems very similar to an earlier paxos optimization called 'pig paxos' https://dl.acm.org/doi/10.1145/3448016.3452834

wwolffrec 8 days ago

Monad uses RaptorCast to send out block proposals quickly and reliably to a global network of validators. At Category Labs, designing an effective messaging protocol to meet Monad’s high performance requirements was challenging and educational. Read more about the design in the link.