This is an open-source (GPLv3) project that uses Wi-Fi signal analysis to detect motion using CSI data, and it has already garnered almost 2,000 stars in two weeks.
Key technical details:
- The system does NOT use Machine Learning, it relies purely on Math. — Runs in real-time on a super affordable chip like the ESP32. - It integrates seamlessly with Home Assistant via MQTT.
The idea of “playing” by simply moving around a room sounds a bit ridiculous… but also kind of fun.
The key is the Moving Variance of the spatial turbulence: this value is continuous and stable, making it perfect for mapping directly to pitch/frequency, just like the original Theremin. Other features can be mapped to volume and timbre.
It’s pure signal processing, running entirely on the ESP32. Has anyone here experimented with audio synthesis or sonification using real-time signal processing?
The ESP32-S3 extracts a moving variance signal from spatial turbulence (updates at 20-50 Hz), and I want to map this directly to audio frequency using a passive buzzer + PWM (square wave, 200-2000 Hz range).
Two quick questions:
1. Do you see any pitfalls with updating PWM frequency at 20-50 Hz for responsive theremin-like behavior?
2. Any recommendations on mapping strategies - linear, logarithmic (musical scale), or quantized to specific notes?
I don't know if it's useful but one technique I have used in sonification during the experimentation phase is to skip the real time aspect, capture all the available "channels" and generate all the possible permutations of what is mapped where.
Then you can listen to the outputs, see what sounds good, and then test it in real time to check if the musicality is actually a result of the physical interaction and not an artifact or a product of noise.
My first step is to 'listen' to the raw channels and features to quickly find which mapping produces the most musically coherent (i.e., clean and physically predictable) output.
If it sounds like white noise, the mapping is bad or the signal is artifact.
If it sounds like a sine wave moving predictably, the physics are sound.
Having two kids myself, I've thought of turning it into a game: blindfolded hide-and-seek where the pitch of the Wi-Fi Theremin tells the seeker how close they are to the 'signal disruption' of the other person. It's essentially a real-time sonar game!
I use a single ESP32 in STA/AP mode which sniffs ACK packets with a specific destination mac, which come from any server on my WiFi network (uses a special sniffing mode IIRC). This way I can receive regular CSI packets originating from a fixed location and doesn't need another device running.
I'll have to look at this code, maybe I just overlooked the obvious or my requirements were too high!
1. Instead of STA/AP mode on a single ESP32, ESPectre uses the natural traffic between your existing router and an ESP32-S3 in station mode. To ensure a stable, continuous CSI packet rate, I implemented a traffic generator that sends ICMP pings to the gateway at a configurable rate (default: 20 pps). This provides bidirectional traffic (request + reply) that reliably triggers CSI generation, giving you predictable packet timing without relying on ambient network traffic or special sniffing modes.
2. Rather than applying filters directly to raw CSI, ESPectre uses Moving Variance Segmentation (MVS) on unfiltered spatial turbulence (std dev of subcarrier amplitudes).
3. The filters are applied to features, not to the segmentation signal itself. This preserves motion sensitivity while cleaning up the feature data
I found that having a stable transmitter (the router) combined with controlled traffic generation provides more consistent multipath patterns and predictable CSI timing, which makes the segmentation more reliable.
Sounds like your MVS approach is a sliding window variance of the cross channel variance, with some adaptive thresholding. My pre-processing has generally been an EWMA de-meaning filter followed by some type of dimensionality reduction and feature extraction (kernel or hand-crafted, like raw moments), which I think fits into your overall architecture.
I'll have to look more closely at your work, thanks for sharing!
You may be surprised to find out how machine learning works!
When I say 'No ML,' I mean there is no training phase, no labeled data needed, and no neural network model used to infer the rules.
The distinction here is that all the logic is based purely on signal processing algorithms.
Thanks for raising the point!
Am I right in understanding that only a single ESP32 device is needed (plus a router)?
Is the author reads this, how does the system cope with multiple rooms in the same house, maybe a two or three storeys house?
You need one sensor for each area you want to monitor independently. With devices more capable than the ESP32‑S3, the coverage would likely be greater.
The ESP32‑C6, in particular, offers significantly better performance. Check out this comparison video from Espressif: https://www.youtube.com/watch?v=JjdpzM6zVJ8
But no source and "lifetime license if you join our discord" is kinda not my jam.
Regarding the lifetime license for Discord members, that's primarily to ensure that beta testers aren't being "used" for testing and then asked to pay. A lot of my users had stories about that with previous companies, and I wanted to give a promise that wasn't going to be the case here. And building a community where people help each other with device placement, hardware suggestions, etc. is a nice addition.
Anyway, I think this project is really cool, francescopace. Many have asked for TOMMY to be open-sourced, so that's definitely something you're going to have success with. I wish you all the best!
- Mike
One of our goals(abandoned) was to also extend to wifi routers, so I am excited to see continued interest in this space!
https://www.sensorsportal.com/HTML/ST_JOURNAL/PDF_Files/P_32...
Also, I use an ebay purchased ruckus router designed for commercial settings. Will the stronger signal and beam forming from the router provide better or worse performance, or is that mainly down to the esp32?
It cannot ignore cats or prioritize size over speed directly on the device, but ESPectre's architecture is designed to enable this kind of advanced classification externally.
It collects a rich set of pre-processed features (spatial turbulence, entropy, etc.) and transmits them via MQTT.
Any external server (like a Home Assistant add-on or a dedicated Python script) can use these features as the input for a trained ML model to perform classification (e.g., Cat vs. Human vs. Fall detection vs. Gesture detection).
Regardin Ruckus Router / Beamforming: for CSI sensing, stability is generally more important than raw power. I recommend starting by disabling beamforming or reducing the power output if you experience poor motion sensitivity, as the stability of the ESP32 receiver is often the bottleneck.
- It monitors CSI from that specific node (the one it's associated with)
- If the ESP32 roams to a different mesh node, it will start monitoring CSI from the new node
The system doesn't care about the router's internal mesh topology, it just needs a stable connection to receive CSI data from the associated access point.
So you might have an ESP32 placed across the room from one mesh node to monitor that particular room. But if that ESP32 roams to, say, the mesh node on the floor above it, it's going to monitoring a much less useful space - just the vertical space between itself and the mesh node on the floor above.
Am I envisioning this correctly? I'm thinking its a problem for systems like eero, where you can't lock a device to a particular mesh node.
But you are absolutely right that, in theory, misuse of this technology could reveal certain behavioral patterns that might lead to identification.
However, it can also be extremely useful for safety purposes, for example, detecting people during a house fire or an earthquake.