Similarly, there's an immense amount of formal literature and textbooks out of the game development space that can be very useful to newcomers looking for structural approaches to high performance compute and IO loops. Games care a lot about local and network latency, the problem spaces aren't that far apart (and writing games is a very fun way to learn).
I don't have specific recommendations for holistic introductions to the field. I learn new techniques primarily through building things, watching conference talks, reading source code of other low latency projects, and discussion with coworkers.
[1]: HFT is quite special on the hardware side, which is discussed in the paper. The NICs, network stacks, and extensive integration of FPGAs do heavily differentiate the industry and I don't want to insinuate otherwise.
You will not find a lot of SystemVerilog programmers at a typical video game studio.
Video games try to cram as much work as possible within about 16 milliseconds whereas for most trading algorithms 16 milliseconds is too slow to do anything, you want to process and produce a response to input within the span of microseconds, which is 3 orders of magnitude faster than a single frame in a video game.
The problem spaces really are quite distinct in this respect and a lot of techniques from one space don't really carry over to the other.
It's not hard real-time like you're going to crash your car, but if you miss your deadline it causes an unacceptable and audible glitch.
I've always been a bit surprised that Jane Street uses OCaml. I know they've put a lot of attention into the GC, but it still seems fundamentally indeterminate in a way that would make most audio devs nervous.
[1]: http://www.rossbencina.com/code/real-time-audio-programming-...
As far as I know, these millisecond latencies are orders of magnitude higher than what would be tolerable in HFT. There the units are microseconds and nanoseconds.
I'm talking about real-time audio, though. There's also just audio playback (Pro Tools, e.g.) which tolerates much higher latency, but enables higher audio fidelity due to larger buffer sizes (e.g. in compressors to enable look-ahead) as well as more clever algorithms (which are often prone to latency spikes, such as a "standard" FFT, if you want a very simple example).
Both, finance and real-time audio, differ in their time scales, but more in the breadth of their time scales: real-time audio is often between 1–50 ms (yeah, we try to keep it below 10 ms, but that's tough to do in a complex setup. I've seen artists often compromise and increase buffer sizes (== latency) to accommodate for their complex virtual instruments and filters), finance in ns–ms (it's not HFT anymore at a certain point, sure, but those HFT architectures also have a layered response, with FPGAs being quick but simple and more complex strategies having higher latency; also, if you're e.g. asked for a quote on a bond, it's not a shame to take 500 ms to think about a price, because those trades are bigger but slower).
You also mentioned the consequences of missing a deadline. Well, indeed, I've seen this variance to be less of an issue in finance (we're only caring about xx µs, so I hesitate to call myself a HFT guy) than in real-time audio. A click in a live set is a no-go. A crash of your DAW is fatal, that's it with your live set. Nobody dies, but people will be very upset. I've seen real-time audio being obsessed with being reliable and keeping the variance in latency really small.
In finance, I've seen it more that people kind of accepted that variance was huge because there's some hidden allocations in a system. Hoping that those stop once the system is warmed up or that they rarely occur on a critical trade. As a side note, in finance I've even heard this well-known HFT firm: "Well, it's not the end of the world if your process crashes. If you can restart quick enough."
It's also that in finance, your data is more heterogenous: Real-time audio has audio (isochronous, homogenous buffer of floats), MIDI (asynchronous, formerly small events), parameter changes (asynchronous, often just a float) and config changes (changes the topology of your audio processing graph; enjoy to transition from A to B without causing an audible glitch). Finance has processing graphs that only process asynchronous data (heterogenous market data, telling you what's on the order book of an exchange; trade data for which trades you performed with your orders and quotes; symbol data when new securities appear or change (historically only with the start of the trading day); pre-computed steering data for algorithms and/or data from third-party data sources).
Thus, the processing graphs of both differ in the data that's transferred as well as in the structure and domain. In finance, there is no regular tick-based processing. You have to accommodate a very varying amount of data per time period, whereas in audio, it's a very regular pace and the amount of data is often the same (± activations of voices in instruments).
Real-time audio has plugins, though. And those, from experience, do whatever you can imagine. Plugins are great, from a creative perspective, but the craftsmanship is of very varying degree. I bet I'll even find a plugin that draws the UI from the real-time audio thread.
Any other experiences from the real-time realm?
You're entirely correct that quant work is dealing with pure latency in situations where game engines might amortize throughput over multiple frames. But a question about "how do I get started in performance/latency work?" isn't usefully answered by "get a Hudson River Trading internship".
You're right on the money. I worked in HFT for half a decade.
Back in the late 2000s you were on the cutting edge if you were writing really good C++ and had overclocked some CPUs to hell and back and then shoved them in a rack in a datacenter in new jersey. "To hell and back" means "they only crash every hour or two" (because while they're up, they're faster than your competition).
Around the mid 2010's it had moved entirely to FPGAs. Do something smart in software, then give the FPGA something dumb to wait for (e.g. use software to produce a model and wait for some kind of alpha to develop, then tell the FPGA "ok wait for X > Y and fire the order" was basically the name of the game. And it was often better to be faster than it was to be smarter, so really you just kept the FPGA as simple and fast as possible.
At that point, your hard realtime constraints for the language doing the smart stuff really dissolve. Compared to an FPGA getting an order out the door in some fraction of a mic, a CPU doing anything is slow. So you might as well use python, yknow?
People also play all kinds of different speed games. There's latency races around NJ and chicago where microseconds matter around price movements and C++ isn't really part of the picture in a competitive way anymore, but that's not to say someone isn't ekeing out a niche where they're doing something vaguely smarter faster than opponents. But these things, at scale, tend to be questions of - either your alpha is very short-lived (microseconds, and if it isn't organically microseconds, it will be competed until it is), or it is fundamentally longer-lived (seconds to minutes?) and you might as well use python and develop 100x faster.
The silly thing is, the really good trades that focus on short-term alphas (ms's and less) are generally obviously good trades and so you don't have to be super smart to realize it's a really fucking good trade, you had better be fast and go get it. So there's also this kind of built-in bias for being fast because if a trade is so marginally good that you needed a crazy smart and slow model to tell it apart from a bad trade, it's probably still a relatively crummy trade and all your smarts let you do is pick up a few extra pennies for your trouble.
I'll close by saying don't take my words as representative of the industry - the trades my firm specialized in was literally a dying breed and my understanding is that most of the big crazy-crazy-fast-microwave-network people have moved to doing order execution anyway, because most of the money in the crazy-crazy-fast game dried up as more firms switched to either DIYing the order execution or paying a former-HFT to do it for them.
To my knowledge Python is not suitable, though I know some players embed scripting languages that are.
When it's really important, it's implemented in FPGA or with an ASIC.
I think the main split is whether you care more about average latency or worst-case latency (or some # of nines percentile).
That's arguably a bit problematic as events you don't react to can cause you to fall behind, so it's useful to also track how far behind you are from the point where you start processing a packet (which requires hardware timestamping, unfortunately not present on many general-purpose NICs like what's in the cloud).
For a given market, you may have less than 1M samples a day (I've even seen some so-called HFT strategies or sub-strategies that only had 300 samples per day).
So overall, you'd have fewer events than 44.1kHz, and there are typically expected outliers at the beginning of the data before the system gets warmed up or recovers from bad initial parameters (which I suppose you could just ignore from your distribution, or you could try not to require a warmup).
But you're right, you probably want to look at quantiles closer to 1 as well. You're also looking at the max regardless.
The basic principles never change. Avoid copies, never allocate, keep the hot path local, and really, good codebases should be doing these things anyway. I don't code any differently when I'm writing a command line argument parser vs low-latency RPC servers. It's just a matter of how long I spend tweaking and trying to improve a specific section of code, how willing I am to throw out an entire abstraction I've been working on for perf.
In the domain of web stuff, effectively all the major load balancers are good to study. HAProxy, nginx, Envoy. Also anything antirez has ever touched.
Application servers are also interesting to study because there's many tricks to learn from a piece of software that needs to interface with something very slow but otherwise wants to be completely transparent in the call graph. FastWSGI is a good example.