Here are my random thoughts on interrupt latency... First of all, the point you bring up about priority is absolutely valid, but it's also the crux of my argument -- some devices need to be serviced in a more time-critical manner than others, so this makes sense to me. I don't much care if the printer is ready for the next page if there's a Gigabit Ethernet card craving attention (for example). As for the overhead of making nested interrupts possible, I guess the only way to be sure is to try both approaches and do some benchmarking.
I can imagine a situation as follows (here we go again...
). Let's say you have N devices, whose ISR's should run at increasingly higher priorities (0 is the lowest priority, 1 is the next highest, etc.).
Now imagine the nested interrupt case in the following situation -- Device 0 interrupts the CPU, its ISR runs most of the way through, and is about to queue up some work to send a message to its corresponding driver thread. Before it gets a chance to do this, device 1 interrupts, goes through the same steps, and it gets interrupted by device 2, and so on. The end result is that device N's ISR runs to completion after a fairly short wait time, and queues up a message for its driver thread.
Now, everything unwinds, and assuming the driver thread priorities reflect their corresponding ISR priorities, the kernel switches to device N's driver thread and sends it the message. All the other messages are on hold.
What if non-nested interrupts were being used? In this case, assuming the interrupts were triggered at roughly the same time as in the first case, device 0's ISR would run to completion, resulting in a context switch and potentially message delivery (depending on preemptibility of the implementation) to device 0's driver thread, before device N's ISR even gets a chance to run. If the other N-2 devices' interrupts get through before device N, multiply that initial context switch (and possibly message-pass) by N-2 before device N's ISR even gets to run.
I guess my point is, it seems to me there is overhead in either case. It may just depend on the kind of frequency and distribution of interrupts during "typical usage" (whatever that means) and system stress. I'll have to experiment, once I have enough code to experiment with. ;D