OSDev.org

The Place to Start for Operating System Developers
It is currently Thu Mar 28, 2024 3:21 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 5 posts ] 
Author Message
 Post subject: Fault tolerant OS
PostPosted: Sun Jul 29, 2018 1:16 pm 
Offline

Joined: Sun Jul 01, 2018 7:23 am
Posts: 4
Hello, I am wondering if there any operating systems ( or OS theory) that are fault tolerant . By fault tolerant I mean, if there is kernel process that is faulty and hogging a cpu or got some error is there a way to isolate the problem to that single kernel thread or some pool of the threads related to the faulty thread and let the rest of system work properly.

I know that getting an error in one part of the kernel can compromise the correctness of kernel but sometimes it might not affect most of the kernel. For example when a kernel thread stuck in while loop with no progress hogging the CPU without taking any locks we might be able to safely terminate it if some magic oracle says this kernel thread ( possibly a driver) is no good. But again having such an oracle is a major problem as well.

So are they any OS design that have fault tolerance ( of the kind I described ) as a design goal?

Thank you


Top
 Profile  
 
 Post subject: Re: Fault tolerant OS
PostPosted: Sun Jul 29, 2018 3:02 pm 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

bharathm1 wrote:
I know that getting an error in one part of the kernel can compromise the correctness of kernel but sometimes it might not affect most of the kernel.


Sometimes it might not effect most of the kernel, but you never know if it did or not so that doesn't help - you have to assume that almost all of the kernel might have been ruined regardless.

To fix that, you want to isolate the pieces so you can know that if one piece has a problem it can't ruin other pieces. In other words; a micro-kernel ends up being necessary.

Of course a micro-kernel isn't enough on its own. You'd also need code to monitor, terminate and restart drivers; and (in some cases) ways to recover lost state.

This has been done before (e.g. Minix 3).


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject: Re: Fault tolerant OS
PostPosted: Mon Jul 30, 2018 3:20 pm 
Offline

Joined: Sun Jul 01, 2018 7:23 am
Posts: 4
Do you think there is any hope for fault tolerant monolithic kernels?

I think ideas such as nooks where we isolate the address space of kernel drivers is a good line of research though it pushes some burden to driver programmers.

What are the main principles that the present monolithic kernels (eg. Linux) violating that made kernel terrible at fault tolerance?


Top
 Profile  
 
 Post subject: Re: Fault tolerant OS
PostPosted: Mon Jul 30, 2018 3:36 pm 
Offline
Member
Member

Joined: Mon Jul 05, 2010 4:15 pm
Posts: 595
Department of Computer Science University of Illinois did a paper "Building a Self-Healing Operating System"

http://choices.cs.illinois.edu/selfhealing.pdf

It goes through a few techniques for "healing" an error.


Top
 Profile  
 
 Post subject: Re: Fault tolerant OS
PostPosted: Mon Jul 30, 2018 9:38 pm 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

bharathm1 wrote:
Do you think there is any hope for fault tolerant monolithic kernels?


That really depends on what kinds of faults you're trying to tolerate. A driver fails to initialise because it can't allocate enough memory? Easy. A single bit flip (if there's no memory encryption)? Maybe. A CPU failing while holding kernel locks? No.

bharathm1 wrote:
I think ideas such as nooks where we isolate the address space of kernel drivers is a good line of research though it pushes some burden to driver programmers.


If drivers are isolated it's either a micro-kernel or a hybrid (and is no longer a monolithic); regardless of whether that isolation is implemented with the hardware's virtual memory management or if it's done in software only, and regardless of whether the driver is still in an area that would've been considered "kernel space".

bharathm1 wrote:
What are the main principles that the present monolithic kernels (eg. Linux) violating that made kernel terrible at fault tolerance?


The main principle that is missing is isolation (that would prevent it from being called a true monolithic kernel if it existed).

Linux is a special case - it maps all physical memory into kernel space (so any dodgy pointer anywhere in many millions of lines of code can corrupt anything that's in memory anywhere); so you can get all your hopes for fault tolerance and nail them to all your hopes for security, and glue on a few extra hopes (e.g. for decent NUMA optimisations), and then throw the that huge ball of hopes in the trash.


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 28 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group