OSDev.org

The Place to Start for Operating System Developers
It is currently Thu Apr 18, 2024 5:05 pm

All times are UTC - 6 hours




Post new topic Reply to topic  [ 7 posts ] 
Author Message
 Post subject: Verification of machine code
PostPosted: Sat Aug 11, 2007 3:41 pm 
Offline
User avatar

Joined: Wed Jul 11, 2007 7:04 am
Posts: 5
Location: Austria
Hi there!

Well an idea popped up in my head last week. Would it be possible to scan the machine code of all and every binary loaded by the kernel (not assembly code and not some intermediate code, real machine code), to verify that it doesn't use any priviledged instructions, doesn't modify certain registers, doesn't jump to foreign program code, doesn't dereference pointers to other programs, etc...

So in the end it would be possible to guarantee the safety of a given binary. Then it would be possible to run all programs in one address space, in ring0 and it would certainly allow for very fast message passing.

Singularity uses this too, but with an intermediate language...I would like to implement this same behaviour with normal machine code, to allow safety-checked legacy programs to run (surely there will have to be certain runtime checks too, so the binaries need to be modified a bit...).

Hm....well, what do you think about this?

_________________
My blog


Top
 Profile  
 
 Post subject:
PostPosted: Sat Aug 11, 2007 4:24 pm 
Offline
Member
Member

Joined: Mon Apr 09, 2007 12:10 pm
Posts: 775
Location: London, UK
Interesting, but incredibly difficult, which is probably one of the main reasons Singularity attempts to guarantee type safety from an intermediate language rather than machine code. Some of the things you suggest (preventing use of privileged instructions, doesn't modify certain registers - not entirely sure what you mean here but I guess you mean things like cr0 and cr3) are provided for by the protection mechanism of the i386 processor. The rest can basically be summed up as: data accesses by a process can only be made to preset parts of the image labeled data and can also not overflow any particular section within that region that they are writing to, and instruction flow is limited to those parts defined as methods/functions, with strictly defined rules as to how to transition between different parts of code.

This is where the difficulty comes in. Certainly, you can limit data accesses to a .data segment, a .bss segment, a heap and a stack, but what if you want to allow self-modifying code? Also, how do you prevent overflowing an array, as it is difficult to determine, from machine code, what is an array and how long it is. Code is a little easier: don't allow jumps to anything outside a process' .text section. This is fine for jumps of the type 'jmp rel8/16/32' or 'jmp ptr...' but what happens when you do something like 'jmp [edx]'? This is quite common in C++ code using vtables. How do you know beforehand what the value of [edx] is going to be?

One of the main reasons for CIL (what Singularity uses as its intermediate code) was that it provided a code that could be verified and yet converted to machine code quickly, as the difficulties of doing it with machine code were all too apparent.

The other issue you have to consider with the "one address space for all processes" method is that it is unrealistic on 32bit machines (the designers of Singularity admit this) as you could conceivably end up sharing 4gb between all your processes and your kernel. Long mode provides much more space, so it is feasible, but you are immediately ruling out a large proportion of the computer systems out there.

Regards,
John.


Top
 Profile  
 
 Post subject:
PostPosted: Sat Aug 11, 2007 4:55 pm 
Offline
Member
Member
User avatar

Joined: Tue Oct 17, 2006 6:06 pm
Posts: 1437
Location: Vancouver, BC, Canada
Being able to verify code depends a lot more on metadata accompanying the code than on the format of the code itself. Bartok (the CIL-to-x86 compiler used by Singularity) verifies CIL, but they plan to have it emit "typed assembly language" in the future that can be verified by an even smaller and simpler verifier, thus removing Bartok from the trusted part of the system. Typed assembly is just machine code along with a proof that the verifier can cross-check against the code itself.

You won't be able to verify a legacy binary unless it comes with some metadata that constitutes a proof of its type safety. This sort of verification really has to be built into the entire system from the beginning. You'll need to use hardware isolation (CPU privilege levels & the MMU) for legacy code (unless your idea of legacy is something like .NET assemblies or Java .class files).

_________________
Top three reasons why my OS project died:
  1. Too much overtime at work
  2. Got married
  3. My brain got stuck in an infinite loop while trying to design the memory manager
Don't let this happen to you!


Top
 Profile  
 
 Post subject:
PostPosted: Sat Aug 11, 2007 5:41 pm 
Offline
User avatar

Joined: Wed Jul 11, 2007 7:04 am
Posts: 5
Location: Austria
Yeah, I know, it would be very difficult...
I've read a few papers about machine code verification and then this idea simply popped up ;)

Jep, I meant registers like cr0 and cr3...and instructions like lidt, ltr, etc.

Quote:
The rest can basically be summed up as: data accesses by a process can only be made to preset parts of the image labeled data and can also not overflow any particular section within that region that they are writing to, and instruction flow is limited to those parts defined as methods/functions, with strictly defined rules as to how to transition between different parts of code.


Yes, that was the basic idea...

Quote:
The other issue you have to consider with the "one address space for all processes" method is that it is unrealistic on 32bit machines (the designers of Singularity admit this) as you could conceivably end up sharing 4gb between all your processes and your kernel. Long mode provides much more space, so it is feasible, but you are immediately ruling out a large proportion of the computer systems out there.


Yes, older systems would not be supported... (in my opinion, if I would allow hardware protection domains, there would be not much point in verifying machine code...)

Quote:
Typed assembly is just machine code along with a proof that the verifier can cross-check against the code itself.


This is proof carrying code, right?

Oh well, thank you very much for you sharing your opinions with me! I'd really love to discuss a bit more, but I'm very tired at the moment and my head's a real mess now ;)

cheers

_________________
My blog


Top
 Profile  
 
 Post subject:
PostPosted: Mon Aug 13, 2007 8:13 pm 
Offline
Member
Member

Joined: Thu Oct 21, 2004 11:00 pm
Posts: 248
There was an operating system called Go! that did that.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Aug 14, 2007 2:00 am 
Offline
Member
Member
User avatar

Joined: Sun Oct 22, 2006 7:01 am
Posts: 2646
Location: Devon, UK
Hi,

A quick look on sourceforge revealed that Go! Was superceded by http://sourceforge.net/projects/odin-os/. Looks like that has been taken down now, though.

Cheers,
Adam


Top
 Profile  
 
 Post subject:
PostPosted: Wed Aug 15, 2007 6:03 am 
Offline
User avatar

Joined: Wed Jul 11, 2007 7:04 am
Posts: 5
Location: Austria
Yeah...I know, but the source is still in cvs...maybe, I'll look into this to get some clue, if this thingy is feasible...

_________________
My blog


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 7 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 52 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group