Userland PCI drivers (opens in new tab)

(wiki.netbsd.org)

105 pointswean_irdeh8y ago54 comments

54 comments

I remember seeing the PCI userspace option in the Linux kernel menuconfig and wondered why anyone would do that, and then a few years ago at Kiwicon I saw my first use case. A presenter was trying to hack a Cisco router.

Older Cisco routers ran IOS directly on proprietary hardware. At some point, Cisco decided to switch to Intel hardware but didn't port their kernel. They use a Linux kernel and ran IOS as a huge 50MB+ binary. The guy doing the talk got shell access and only found one ethernet device when running ifconfig. The actual switching hardware was being handled in userspace by the large binary.

I'm guessing they probably just wrote some shim layers to connect their PCI drivers up to the userspace PCI Linux API.

q3k8y ago

Well, most of the magic of hardware routers also comes in the form of hardware acceleration of the actual data plane - ie. L3 switching - on the silicon itself. That's what makes them fast and so expensive. That sort of mechanism doesn't map nicely into the Linux interface paradigm (unless you do things like exporting kernel routes into the hardware, but that's borderline absurd).

I think even if the driver were to be implemented in kernelspace, it would still probably not expose any of it's physical interfaces to userspace as plain ethernet devices, maybe apart from virtual/mgmt ones to run SSH on, and perhaps one so that the kernel can handle packets that the router doesn't have flows programmed for (like in OpenFlow).

kijiki8y ago

> That sort of mechanism doesn't map nicely into the Linux interface paradigm (unless you do things like exporting kernel routes into the hardware, but that's borderline absurd).

Not absurd at all. Cumulus (which I cofounded) does exactly that. There are >1000 customers, including several of the largest cloud operators in the world.

It works really well in practice, since you can just fall back to the kernel for non-fast-path stuff like ARP. IOS/NXOS implement ARP (and everything else) themselves. We can just use the kernel's implementation.

The idea is essentially to use the lightning fast forwarding ASIC as a hardware accelerator for the networking functionality the kernel already has.

snuxoll8y ago

> I think even if the driver were to be implemented in kernelspace, it would still probably not expose any of it's physical interfaces to userspace as plain ethernet devices, maybe apart from virtual/mgmt ones to run SSH on, and perhaps one so that the kernel can handle packets that the router doesn't have flows programmed for (like in OpenFlow).

That's basically how switch development works in a nutshell, look at Broadcom's OpenNSL.

Demeisen8y ago

Isn't switchdev supposed to provide a way to make an interface to in-silicon forwarding engines?

ajross8y ago

For years and years, the X server was effectively a userspace device driver. It would map the configuration registers and the framebuffer and do everything outside the kernel. And it worked fine, for the most part.

Once GPUs arrived, the ability to do latency-critical management of the device state became important and the register management moved into the kernel. But for traditional framebuffers the device setup was for the most part done once, and there's no particular need for that to be managed outside userspace.

dfox8y ago

Also for a long time there was no need to do any kind of fine-grained synchronisation with the graphics hardware apart from usual IO wait states and tha whole thing could be accessed as few memory mappings without any kind of interrupt handling (even to the extent of Sun's proprietary UPA slot not even supporting interrupts in it's low-cost graphics-only incarnation).

FooBarWidget8y ago

From what I know, the reason they moved device setup to the kernel was to avoid flickering when the system switches from the boot screen to the login manager.

3 more replies

_kp6z8y ago

It may be equal parts GPL avoidance. Broadcom switch ASIC PDKs are a kind of hybrid kernel and userland application with no legacy reason, so I assume it is just arbitrarily about working around a restrictive license.

raverbashing8y ago

For those who are confused, this is not Apple's IOS, but Cisco's OS that was once (not sure about now) called IOS.

q3k8y ago

Still called IOS (well, there's also IOS-XE, IOS-XR, NX-OS... but that's a different story). Why would they change the name?

gsich8y ago

True, because Apples OS is called iOS.

kuwze8y ago

I really like NetBSD. It supports Xen[0], it has the awesome multithreaded NPF[1], the CHFS filesystem tailored for SSDs[2], and support for rump kernels[3].

[0]: https://wiki.netbsd.org/ports/xen/

[1]: https://en.wikipedia.org/wiki/NPF_(firewall)

[2]: https://en.wikipedia.org/wiki/CHFS

[3]: http://rumpkernel.org/

ttflee8y ago

Reminds me of this 34C3 talk:

https://media.ccc.de/v/34c3-9159-demystifying_network_cards

ehntoo8y ago

I heartily recommend this talk. It and the corresponding proof-of-concept Intel userland packet processing driver[1] went a long way for me in removing a lot of the magic from low-level network packet handling and device management in Linux.

[1] https://github.com/emmericp/ixy

baruch8y ago

NVMe and networking are now used from use space to reduce latency in storage systems. I'm doing that in my current job. There are libraries to help you with it such as DPDK and SPDK which are good starting points.

snvzz8y ago

Liedtke was right.

If it can be done outside the kernel, it shouldn't be in the kernel.

tinus_hn8y ago

That’s a great idea if you don’t care about performance.

snvzz8y ago

Common misconception. I suggest this article.

https://blog.darknedgy.net/technology/2016/01/01/0/

wean_irdehOP8y ago

Being in userspace doesn't mean compromised performance, just like how QNX has done for decades

mrpippy8y ago

IRIX had a userland interface for PCI access: pciba(7)

I don’t know of anything that used it, but I’m sure there were custom PCI cards for data acquisition, hardware control, etc etc. that used it.

http://nixdoc.net/man-pages/irix/man7/pciba.7.html

mey8y ago

Isn't the advantage of running hardware drivers in userspace to limit the attack surface of a driver being exploited?

nickpsecurity8y ago

See Tannenbaum et al's paper on reliability/security mechanisms for a nice intro:

http://cs.furman.edu/~chealy/cs75/important%20papers/secure%...

The main benefit is reliability. Driver code is usually lower-quality than other code that runs in kernels. The hardware itself can act weird in a way that messed the drivers up. The infamous Blue Screen of Death on Windows was usually driver errors. Isolating them in their own address space prevents errors from taking the system down. One might also use safe coding, static analysis, model-checking, etc when developing drivers themselves. Microsoft eliminated most of their blue screens with SLAM toolkit for model-checking drivers. Of the two, isolation with restarts is the easiest given you can use it on unmodified or lightly-modified drivers in many cases.

Far as security, it really depends on the design of the system and hardware. The basic, isolation mechanisms like MMU's might restrict the rogue driver enough if the attack just lets them go for other memory addresses. If it uses DMA, then they might control the DMA to indirectly hit other memory or even go for peripheral firmware. If the DMA is restricted, then maybe not. It all depends as I said on what the hardware offers you plus how the system uses it.

All these possibilities are why high-assurance security pushed in the 1980's-1990's to have formal specifications of every component, hardware and software, that map every interaction of state or flow of information. That didn't happen for most mainstream stuff. Without precise models, there's probably more attacks to come involving drivers interacting with underlying hardware that's complex. It's why I recommend simple, RISC CPU's with verified drivers for high-security applications. Quite a few folks from the old guard even use 8-16-bit microcontrollers with no DMA specifically to reduce these risks.

Far as verifying drivers, here's a sample of approaches I've seen that weren't as heavy as something like seL4:

http://spinroot.com/spin/whatispin.html

https://www.cs.dartmouth.edu/~trdata/reports/TR2004-526.pdf

https://www.microsoft.com/en-us/research/blog/p-programming-...

http://etd.library.vanderbilt.edu/available/etd-11172015-221...

https://lirias.kuleuven.be/bitstream/123456789/591994/1/phd-...

Note: Including that last one specifically for the I/O verification part.

tinus_hn8y ago

It’s nice if drivers are not running in the kernel but even if your graphics drivers are running in userspace, if they crash you can’t use your pc anymore.

The main advantage is that you don’t have to deal with all the limitations of kernel mode programming.

2 more replies

my1238y ago

Note that Windows Vista onwards has UMDF too (user-mode driver framework). NT6.0 was a very big step.

1 more reply

snvzz8y ago

Somewhat related, Minix3 finally fixed the release blocker for 3.4.0.

Expect a release soon, for the first time in years. And it's a major one.

1 more reply

paulie_a8y ago

Unless you are Cisco dev and do everything possible to screw it up

Edit: Cisco hardware was mentioned in this thread

sfuller8088y ago

thats one advantage. another is portability.

convolvatron8y ago

there can be serious performance benefits for i/o heavy workloads: - removal of copies mandated by the user/kernel boundary

   - lower control transfer overhead, up to and including becoming completely polled mode. interrupt, getting into the kernel service thread from interrupt, through the kernel stack, into epoll, and into a user thread takes some time

   - use of device specific features without having to plumb them through all the various kernel interfaces

   - native asynch removing overheads associated with i/o thread pools 

   - exploitation of workload specific optimizations that would be defeated by the kernel scheduler, memory management, buffer cache, and other machinery

of course you lose all device independence from your interface, any intra-process resource sharing provided by the kernel mechanisms. you have to deal with all the error recovery and safety issues yourself. but on some occasions its really worth it.

digi_owl8y ago

Reminds me that Intel didn't include PCI on their Atom variants aimed at mobile devices, supposedly because it was too power hungry.

This in turn lead to Microsoft balking at supporting said hardware as Windows is deeply reliant on PCI (even the ARM SOCs powering the Windows RT products support PCI).

In turn Intel developed Moblin, that later merged efforts with Nokia's Maemo to become Meego. Later still foisted onto the Linux Foundation.

Retr0spectrum8y ago

I guess monolithic kernels have gone full-circle now.

digi_owl8y ago

Most real life implementations of "microkernels" end up being hybrids. NT started out as a micro, but Microsoft have been moving things (the graphics subsystem in particular) in and out of kernel space in the hunt for the optimal tradeoff between stability and performance.

Similarly i think the Mach kernel powering Apple's OSs are a "fat micro" where various things that should be in userspace, if one followed the microkernel orthodoxy, resides in kernel space.

Perhaps the only orthodox microkernel OS out there is QNX, these days languishing in the bowels of Blackberry's holdings.

dfox8y ago

Mac OS/XNU is in fact derivate of DEC's OSF/1 (later called Tru64 Unix). It has very weird hybrid design where essentially anything that would be in monolitic kernel runs as one big Mach process.

Edit: it is somewhat ironic that Alpha's memory protection model is designed such way that the natural way to implement any OS would be to write your own microkernel as OS-specific PALcode (something between firmware and microcode, written in extended Alpha ISA and the only thing that the CPU hardware sees as privileged code), but none of the Alpha OSes is implemented this way. In OSF/1 you thus get limited microkerne-ish thing that runs two process-ish things, one of which is Mach kernel and the other currently running Mach task, which in turn is either the essentially monolithic Unix kernel or Unix userspace process.

1 more reply

pjmlp8y ago

There is Minix running in most Intel CPUs.

L4 running on most GSM radio chips.

Many embedded RTOS targeted at critical systems, are microkernels as well. For example the offerings from Green Hills.

zaarn8y ago

Personally I feel like hybrids are the best implementation variant. Simply sticking to either monolithic or micro just ends up with kernels that are impractical or consist of a thousand moving parts that can crash independently when one goes down.

There are also modular kernels, which are also neat when implemented right (Linux is basically a modular kernel at this point)

nbsd4life8y ago

if you are interested in this or other projects, GSoC is now and the deadline for student applications is 27 March (tomorrow).

zombieprocesses8y ago

Good to see the redheaded step child of the BSD world finally getting the limelight. FreeBSD gets all the attention, while OpenBSD gets all the praise.

feelin_googley8y ago

The "redheaded step child" quote reminded me of this:

"BSD is Dying"

https://www.youtube.com/watch?v=6l1HghEDJf4

As I recall, in that presentation, the redheaded stepchild is OSX.

j / k navigate · click thread line to collapse

54 comments

djsumdog8y ago

I'm guessing they probably just wrote some shim layers to connect their PCI drivers up to the userspace PCI Linux API.

q3k8y ago

kijiki8y ago

> That sort of mechanism doesn't map nicely into the Linux interface paradigm (unless you do things like exporting kernel routes into the hardware, but that's borderline absurd).

Not absurd at all. Cumulus (which I cofounded) does exactly that. There are >1000 customers, including several of the largest cloud operators in the world.

The idea is essentially to use the lightning fast forwarding ASIC as a hardware accelerator for the networking functionality the kernel already has.

snuxoll8y ago

That's basically how switch development works in a nutshell, look at Broadcom's OpenNSL.

Demeisen8y ago

Isn't switchdev supposed to provide a way to make an interface to in-silicon forwarding engines?

ajross8y ago

dfox8y ago

FooBarWidget8y ago

From what I know, the reason they moved device setup to the kernel was to avoid flickering when the system switches from the boot screen to the login manager.

3 more replies

_kp6z8y ago

raverbashing8y ago

For those who are confused, this is not Apple's IOS, but Cisco's OS that was once (not sure about now) called IOS.

q3k8y ago

Still called IOS (well, there's also IOS-XE, IOS-XR, NX-OS... but that's a different story). Why would they change the name?

gsich8y ago

True, because Apples OS is called iOS.

kuwze8y ago

I really like NetBSD. It supports Xen[0], it has the awesome multithreaded NPF[1], the CHFS filesystem tailored for SSDs[2], and support for rump kernels[3].

[0]: https://wiki.netbsd.org/ports/xen/

[1]: https://en.wikipedia.org/wiki/NPF_(firewall)

[2]: https://en.wikipedia.org/wiki/CHFS

[3]: http://rumpkernel.org/

ttflee8y ago

Reminds me of this 34C3 talk:

https://media.ccc.de/v/34c3-9159-demystifying_network_cards

ehntoo8y ago

[1] https://github.com/emmericp/ixy

baruch8y ago

snvzz8y ago

Liedtke was right.

If it can be done outside the kernel, it shouldn't be in the kernel.

tinus_hn8y ago

That’s a great idea if you don’t care about performance.

snvzz8y ago

Common misconception. I suggest this article.

https://blog.darknedgy.net/technology/2016/01/01/0/

wean_irdehOP8y ago

Being in userspace doesn't mean compromised performance, just like how QNX has done for decades

mrpippy8y ago

IRIX had a userland interface for PCI access: pciba(7)

I don’t know of anything that used it, but I’m sure there were custom PCI cards for data acquisition, hardware control, etc etc. that used it.

http://nixdoc.net/man-pages/irix/man7/pciba.7.html

mey8y ago

Isn't the advantage of running hardware drivers in userspace to limit the attack surface of a driver being exploited?

nickpsecurity8y ago

See Tannenbaum et al's paper on reliability/security mechanisms for a nice intro:

http://cs.furman.edu/~chealy/cs75/important%20papers/secure%...

Far as verifying drivers, here's a sample of approaches I've seen that weren't as heavy as something like seL4:

http://spinroot.com/spin/whatispin.html

https://www.cs.dartmouth.edu/~trdata/reports/TR2004-526.pdf

https://www.microsoft.com/en-us/research/blog/p-programming-...

http://etd.library.vanderbilt.edu/available/etd-11172015-221...

https://lirias.kuleuven.be/bitstream/123456789/591994/1/phd-...

Note: Including that last one specifically for the I/O verification part.

tinus_hn8y ago

It’s nice if drivers are not running in the kernel but even if your graphics drivers are running in userspace, if they crash you can’t use your pc anymore.

The main advantage is that you don’t have to deal with all the limitations of kernel mode programming.

2 more replies

my1238y ago

Note that Windows Vista onwards has UMDF too (user-mode driver framework). NT6.0 was a very big step.

1 more reply

snvzz8y ago

Somewhat related, Minix3 finally fixed the release blocker for 3.4.0.

Expect a release soon, for the first time in years. And it's a major one.

1 more reply

paulie_a8y ago

Unless you are Cisco dev and do everything possible to screw it up

Edit: Cisco hardware was mentioned in this thread

sfuller8088y ago

thats one advantage. another is portability.

convolvatron8y ago

there can be serious performance benefits for i/o heavy workloads: - removal of copies mandated by the user/kernel boundary

   - lower control transfer overhead, up to and including becoming completely polled mode. interrupt, getting into the kernel service thread from interrupt, through the kernel stack, into epoll, and into a user thread takes some time

   - use of device specific features without having to plumb them through all the various kernel interfaces

   - native asynch removing overheads associated with i/o thread pools 

   - exploitation of workload specific optimizations that would be defeated by the kernel scheduler, memory management, buffer cache, and other machinery

digi_owl8y ago

Reminds me that Intel didn't include PCI on their Atom variants aimed at mobile devices, supposedly because it was too power hungry.

This in turn lead to Microsoft balking at supporting said hardware as Windows is deeply reliant on PCI (even the ARM SOCs powering the Windows RT products support PCI).

In turn Intel developed Moblin, that later merged efforts with Nokia's Maemo to become Meego. Later still foisted onto the Linux Foundation.

Retr0spectrum8y ago

I guess monolithic kernels have gone full-circle now.

digi_owl8y ago

Similarly i think the Mach kernel powering Apple's OSs are a "fat micro" where various things that should be in userspace, if one followed the microkernel orthodoxy, resides in kernel space.

Perhaps the only orthodox microkernel OS out there is QNX, these days languishing in the bowels of Blackberry's holdings.

dfox8y ago

Mac OS/XNU is in fact derivate of DEC's OSF/1 (later called Tru64 Unix). It has very weird hybrid design where essentially anything that would be in monolitic kernel runs as one big Mach process.

1 more reply

pjmlp8y ago

There is Minix running in most Intel CPUs.

L4 running on most GSM radio chips.

Many embedded RTOS targeted at critical systems, are microkernels as well. For example the offerings from Green Hills.

zaarn8y ago

There are also modular kernels, which are also neat when implemented right (Linux is basically a modular kernel at this point)

nbsd4life8y ago

if you are interested in this or other projects, GSoC is now and the deadline for student applications is 27 March (tomorrow).

zombieprocesses8y ago

Good to see the redheaded step child of the BSD world finally getting the limelight. FreeBSD gets all the attention, while OpenBSD gets all the praise.

feelin_googley8y ago

The "redheaded step child" quote reminded me of this:

"BSD is Dying"

https://www.youtube.com/watch?v=6l1HghEDJf4

As I recall, in that presentation, the redheaded stepchild is OSX.

j / k navigate · click thread line to collapse