Crash Recovery and Seamless Return

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!

Moderators: JAAman, klange, Octocontrabass, sortie, kmcguire, chase, thepowersgang, Owen, Combuster, AJ, 01000101, carbonBased, Candy, pcmattman

Post Reply
pcmattman
Member
Member
Posts: 2566
Joined: Sun Jan 14, 2007 9:15 pm
Freenode IRC: miselin
Location: Sydney, Australia (I come from a land down under!)
Contact:

Crash Recovery and Seamless Return

Post by pcmattman »

OK, I've been toying with this idea for a couple of days, and it sounds good to me.

Windows, when it dies, does so rather ungracefully and screams at users with the dreaded BSOD. IMHO, this is really frustrating and quite rude. Instead, the operating system should attempt to handle the exception and recover.

Recovery can be as simple as skipping over the faulty instruction (though this has a 'domino' effect) or as complex as finding the source process and killing it, then somehow switching to a new task (gracefully?).

The first can be quite complex in implementation, requiring tables of opcodes and their lengths and a matching algorithm to ensure that the faulting instruction is skipped.

Any input?
Tyler
Member
Member
Posts: 514
Joined: Tue Nov 07, 2006 7:37 am
Location: York, England

Post by Tyler »

Probably best todo just do what Windows (and Linux, FreeBSD, OpenBSD, Solaris and whatever other operating system run on an x86) does and kill any application that faults.

For one thing, Windows (NT) never blue screens on an application failure, only kernel problem. If you begin skipping opcodes in kernel mode you have a serious chance of messing up people's computers..
jnc100
Member
Member
Posts: 775
Joined: Mon Apr 09, 2007 12:10 pm
Location: London, UK
Contact:

Post by jnc100 »

Tyler wrote:For one thing, Windows (NT) never blue screens on an application failure, only kernel problem.


Yes, that's how I understand it. Unfortunately, NT is rather monolithic in design and faults in kernel-mode device drivers can equally cause a BSOD.

In a microkernel environment, however, it should be easier to restart a driver. For example, consider that the display driver crashes (quite a common occurence on my XP machine, unfortunately). The kernel needs to detect this somehow (as the error might not cause an exception), close down the process, inform all clients that the driver is not currently responding to messages and that all recently sent messages might not have been processed, and then restart it, informing clients that it has done so.

To take it one step further, what if the microkernel itself does something silly and needs to be shut down? We could have control transferred to a watcher program that simply reloads the kernel and attempts to restart it whilst maintaining process information, but I don't know how feasible this is. Besides, what watches the watcher?

Regards,
John.
User avatar
AJ
Member
Member
Posts: 2646
Joined: Sun Oct 22, 2006 7:01 am
Location: Devon, UK
Contact:

Post by AJ »

jnc100 wrote:
Tyler wrote:For one thing, Windows (NT) never blue screens on an application failure, only kernel problem.


Yes, that's how I understand it. Unfortunately, NT is rather monolithic in design and faults in kernel-mode device drivers can equally cause a BSOD.

In a microkernel environment, however, it should be easier to restart a driver....


The way I understand it with Vista is that it's farmed as much of the driver code as possible in to userland making it less likely that drivers will crash the kernel...
jnc100
Member
Member
Posts: 775
Joined: Mon Apr 09, 2007 12:10 pm
Location: London, UK
Contact:

Post by jnc100 »

AJ wrote:The way I understand it with Vista is that it's farmed as much of the driver code as possible in to userland making it less likely that drivers will crash the kernel...


If that's true then it can only be a good thing. I have no experience with Vista and no current desire to upgrade. Something about having 20 different versions with only the most expensive being a home os with the ability to support that most modern of concepts: domain logons.

Regards,
John.
Tyler
Member
Member
Posts: 514
Joined: Tue Nov 07, 2006 7:37 am
Location: York, England

Post by Tyler »

Yeah... well even Kernel Mode Drivers can be restarted... but you still have to restart them, you can not simply skip over an opcode when an uncoverable exception (Abort) occurs.
Post Reply