I wonder what would need to happen to convince people that:
1. Even if you do something extremely low level, you can draw a distinction between your hardware and the interface that 99% of your software runs at.
2. You can develop complex behaviors iteratively with automated testing just like you can develop complex programs iteratively (tests are just programs).
https://technology.riotgames.com/news/automated-testing-leag...
3. It's worth it.
I work at (relatively) low levels, and I would absolutely love to have extensive tests (plus more, e.g. TLA+ models to prove critical properties of the systems I work on).
The pushback comes from stakeholders. They don't want to invest time and money into automated testing.
And when no automated testing has been done yet, you can guess that the system hasn't been architected to be easily testable. Figuring out how to add useful tests without massive (time-consuming and expensive, potentially error-prone) re-architecting is also something that requires quite a bit of investment.
Of course a part of is just lack of experience. If someone who knows how it's done could lead by example and show the ropes, that'd probably help. Getting the framework off the ground could be the key to sneaking in some tests in the future, even when nobody asks for them.
So we have spent 2 months writing black box tests against the RabbitMQ version, swapped it out with Kafka and fixed all issues within a couple of weeks.
Since then, I believe that the integration tests are so much more valuable than unit tests.
I would normally agree with you, but a TCP Stack is one of those things where I vehemently disagree.
Communication stacks, in general, are giant piles of implicit state unless you go out of your way to manage the state explicitly. As such, they have obscure bugs which are difficult to find when some of the cases are hit rarely.
A communication stack really needs to be written such that the inputs (including TIME), the outputs, the current state, and the next state are all quite explicit. This enables you to test the stack because it is now deterministic with respect to inputs and outputs.
Yes, it's not easy. And it requires that you really mean it and architect it that way. You may not be able to evolve your current stack and may have to throw it away--that's never going to be popular.
However, every single time I have done this for a communication stack (USB, CANopen, BLE, etc.), the result was that the new stack quickly overtook the old stack on basically every metric worth monitoring (throughput, latency, reliability, bug rate, etc.).
Now, to be fair, I was obviously replacing a communication stack that was some level of "pile of crap" or I wouldn't have done it. However, I'm just one person, and those stacks generally came from a company who had a vested interest in it not sucking. I'm not some amazing programmer, and I certainly didn't spend more time on it than the original stack, so it really comes down to the fact that the "underlying architecture" was simply a better idea.
There's been a lot of research, and internal studies, done at many companies that show pretty impressive benefits.
When really questioned most engineers just say "I know my code works" or "I test my code, I don't need automated tests". That's the mentality I just don't understand.
> Testing takes effort and makes it harder to change things.
If it "makes things hard to change" just delete the test? You'll still get the benefit of knowing XYZ are broken/altered. You can also automate end-to-end and black box tests which should absolutely not require any modification if you're just refactoring.
> If I am writing code that controls a spaceship then it makes sense to spend a huge amount of effort on testing. On the other hand, if I am adding a feature to a web application then in my personal experience, most of the time adding automated testing is a waste of effort.
If you are working something that is allowed to fail, then sure, you don't really need to care about what practices you do. It's a very end-all-be-all argument to say "it's ok for my things to break". That argument goes just the same for all of these things:
"Why do I need a version control system? It's fine if I manually merge my code incorrectly"
"Why do I need a build system? It's fine if I forget to recompile one of my files"
etc.
In addition: the "argument" for automated testing isn't that it will just prevent you from breaking something. It's that it lets you know when things change and makes it easy to update your code without manually checking if things are broken. Recently, when adding features to our frontend, I just run our tests and update a png file in our repo. I then play around until my styling is how I like it. It's completely automated and saves me a lot of time. It also lets others know immediately when their CSS change will effect, or will not effect, my components.
Also, take a look at gvisor's network stack. It's definitely unit tested.
https://github.com/google/gvisor/tree/master/pkg/tcpip/link/... (an example)
Also, some networking tests use separate frameworks (which look more like the setup the original post is describing, since those are needed also), e.g.: https://github.com/google/gvisor/tree/master/test/packetimpa...
The idea of putting the TCP stack in user space is interesting. If one actually could map the memory of the whole device into user space one could maybe have fewer system calls and therefore have better performance.
Also, what I find somewhat irritating about using a linux system is how often one needs to run commands as root (sudo) for common administrative tasks like mounting a disk or stuff like that. Having a user space TCP stack could also decrease the need for that as far as setting up the network is concerned. If the linux machine is single user, as most of them are nowadays, it makes more sense that way, I think.
I would think if you don't do this, an attacker who is able to execute code but is non-root yet could easily elevate permissions by shadowing legitimate pathes and trick root into executing untrusted code.
I'm not a security engineer and just find it interesting, so if my thinking is off, please correct me.
https://fd.io/docs/vpp/master/whatisvpp/hoststack.html
And there is a sister project using this tech to get noticeable speed-ups:
Disclaimer: I am involved with the VPP project.
I think it’s important to distinguish between the protocol (TCP) and the hardware device. You would still absolutely need to talk to the device, it’s just that moving a lot of the logic to user space means much less context switching for system calls for the application.
I can imagine on Linux you can talk directly to /dev/eth0 if you would want to (in the same way that you can talk to /dev/sda), and then you would be back at square one regarding root privileges.
It's a AF_PACKET, SOCK_RAW socket rather than a device file, but yes.
Indeed! Julia Evans wrote a really nice post explaining the usecases and benefits - https://jvns.ca/blog/2016/06/30/why-do-we-use-the-linux-kern...
Most machines, at least outside embedded devices, are not like this. They are multi-user systems even when there's only ever one breathing thing at the desk because it offers a degree of separation between the privileges of your daemons, your pid 1, your web browser etc.
You basically need the fuzzer to have a model of TCP state so that it can effectively explore the state space, which is quite complicated and not something you can do with off-the shelf tools.
But once you have a bunch of unit tests designed to put the TCP stack into a specific state + a way of saving and restoring that state, it's really easy to just have snapshot of interesting situations where you can run a fuzzer on the next packet to be transmitted and see what happens.
- Input received raw data
- Output received application data
- Input application data to send
- Output raw data to send
Obviously, since TCP connection state is time sensitive, the “raw data” wouldn’t just be the IP packet and headers, but also a time stamp telling the state machine when that packet was received/sent. If you want the state machine to keep track of time even when no packets are being received or sent, there could be an additional operation just to input a timestamp without additional packets. In effect, time is just another input that the user is responsible for feeding to the state machine at sufficiently fine intervals.
In practice, you could emulate this pattern with a callback-oriented protocol stack by populating an in-memory send/receive queue in your callback function, but that design can be somewhat inflexible because it forces potentially undesirable constraints, e.g. an extra memory copy that could otherwise be elided.