Stressfree qemu support was only for my even smaller avr targets, the 1281. But AVR is crazy compared to the Cortex-M4. We switched to arm completely, no avr's anymore. Anyway, no need for qemu or other emulators, when you can easily write simulators. You just throw in some mmaps, the simulated libc, the UART, and networking. Much better than emulators.
The key thing is to make the majority of the code portable enough to run on a PC. I find the best way to do that is to keep things data-oriented, using Plain Old Data type data structures and pure functions as much as possible. Alternative view: isolate out any device-specific pieces, like I/O into as small and simple pieces as possible. When one takes this mindset one realizes that even things that are considered as "device specific", like device drivers usually have a lot of logic that can actually be separate from the I/O. And by having a swappable I/O backend (say for I2C) one can actually test the vast majority of this logic on a computer. One should also have an implementation of the Hardware Abstraction Layer (HAL) that is "host", allowing to run on a PC, potentially with virtual inputs and outputs. This allows to run essentially all of the firmware on a PC. If one uses a embedded-friendly test framework, like Unity, then one can also run the same tests that is used on the PC on the device - to make sure that there is no difference when ran on host vs device.
Unfortunately, the code from embedded device vendors is rarely amenable to this, so that code can end up as "untestable" under this scheme. For them portability" is only considered between their own hardware devices, not to a computer. Then one has to trust that they have done their own QA. Which is usually not that great - looking at you ST HAL....