Hi,
simeonz wrote:
Brendan wrote:
If this was possible; physical address space randomisation would mean it'd take a huge amount time (e.g. billions of tests) for the attacker to become "relatively confident" that a virtual page it can access is in the same row as a sensitive piece of data; and re-randomising physical pages periodically while the OS is running (if done often enough) would prevent the attacker from becoming "relatively confident".
All right. Let's say, some comparatively innocuous API "A" accesses statically allocated kernel data "X". And a system event "B", kept undisclosed for security reasons, accesses statically allocated kernel structure "Y". From reverse-engineering, we establish that the compiler has placed "X" and "Y" in a single page. The software finds a row that it owns that triggers collision with "X", by calling API "A" and then probing repetitively for latency. By accessing that same location in the future, depending on other factors, such as the rest of the content in that kernel page and the frequency of access to it, it
may acquire a reasonably informative test for "B" occuring. Similar reasoning could be applied to the file cache test, albeit with different assumptions.
I think you're confusing "statically allocated virtual space" with "statically allocated physical pages". If an OS uses statically allocated physical pages and maps them at random virtual addresses (e.g. KASR without PASR) then this would be a potential problem. If an OS uses randomly allocated physical pages and maps them at statically allocated virtual addresses (PASR without KASR) then this isn't a potential problem.
simeonz wrote:
Brendan wrote:
These things don't work. For example, if you split everything into isolated single-threaded processes that are only able to communicate via. messaging (a pure actor model where no memory is shared at all) and then use managed languages for all user-space code; then one process can just send messages to another as fast as it can and the receiver can measure how much time passed by counting the number of messages it received.
Mostly; there's a benefit in disallowing access to extremely precise time measurements (e.g. TSC's "better than 1 nanosecond precision" timing) because it's easy for kernel to make it harder for an attacker (without making it hard for legitimate code); but making things harder shouldn't be confused with prevention - making sure there are no timing differences an attacker can measure should be the primary goal.
I have not seriously pondered on the details involved, but let me draw a picture. Let's say that the program is a graph of events embedded within a larger graph, that of the system execution. Our aim is to order the events in the program graph in linear time, such that they are statistically independent of the order of the rest of the graph, except where the control flow dependencies through APIs, i.e. "directed edges", are explicitly required.
What this means is that any API call made, which may carry an unintended control flow side-effect, requires us to randomly reorder some of the program's own execution events.
For some background...
A long time ago I decided that (to avoid the cost of "CPL=3 -> CPL=0 -> CPL=3" switches and improve performance); it'd be a good idea to support "batch system calls", where a thread creates a list of kernel functions (and their parameters) and asks the kernel to do the list of things. That way if a thread wanted kernel to do 10 things it could pay the cost of "CPL=3 -> CPL=0 -> CPL=3" switching once and not pay that cost 10 times.
Note: this is also why I'm seriously considering "ultra-nuke" (which would add a huge amount of overhead to the "CPL=0 -> CPL=3" costs) - it'd be too expensive without "batch system calls".About 5 years ago I decided to try "kernel stack per CPU" (my older kernels were all "kernel stack per thread"). The consequence of "kernel stack per CPU" is that expensive kernel operations (e.g. process creation) can't be interrupted because there's no other stack for that CPU to use. For example, if an IRQ occurs that unblocks a high priority task, then the kernel can't switch to that high priority task and return to user-space until after it has completed the expensive operation. To fix that you end up wanting to break expensive operations into smaller pieces so that you can have "safe interruption points" between the pieces. For example, instead of having a "create process" operation, you might split it into a "create empty virtual address space" piece, then several "map next part of executable into virtual address space" pieces, then a "create initial thread" piece, etc; so that you can put those "safe interruption points" in between all the pieces (and so that you can allow a switch to a higher priority task to happen between any of those pieces and avoid waiting for ages for the whole lengthy operation to complete). Of course I've always liked "do important stuff first" so naturally I wanted to give all these pieces priorities, so I could postpone unimportant work when there's more important work to do (e.g. postpone "create virtual address space intended for a new low priority process" so that I could do "give high priority user-space thread CPU time" sooner). Then it occurred to me that this can also have scalability advantages for multi-CPU.
Essentially, a thread would tell kernel "do this list of things" and the kernel would split more expensive things into smaller pieces and give all the smaller pieces a priority (that is derived directly from the priority of the thread that asked the kernel to do the work); and all of the CPUs would do "highest priority piece first" in parallel (whenever running a user-space thread isn't more important); so that something like creating a process might use 5 CPUs at the same time (and finish faster). For dependencies; there's no problem - if "small piece B" can't be started until "small piece A" has finished, then initially only "small piece A" would be put onto the kernel's work queue, and when "small piece A" finished the kernel would put "small piece B" onto the kernel's work queue. Of course with many user-space threads running on multiple CPUs (plus work the kernel does on its own) the kernel's work queues end up being a mixed bag of many small unrelated pieces.
With all of this in mind; what if I added a random "boost or nerf" adjustment to the priority of the smaller pieces (instead of the priority of the smaller pieces being derived directly from the priority of the thread that asked the kernel to do the work)? With a very minor change, the kernel would execute the "mixed bag of many small unrelated pieces" in a semi-randomised order.
Note that I'm currently only really concerned with protecting the kernel from user-space, and currently only want to make the kernel unpredictable.
simeonz wrote:
The reason I think a JIT compiler could do that more efficiently is because it can rearrange the very code and introduce dynamic branches where necessary, or can package multiple randomization events together at compile-time into one. A JIT compiler could, in principle, use optimized user-space micro-scheduling of sorts. It could even reorder loops, in some situations. In any case, it will be cheaper than a typical system scheduler call. Similarly, communication through primitives can be done through shared memory, controlled by JIT. Synchronization intercepted at compile-time will be reordered dynamically in the code when needed, but I imagine could offer smaller throughput reduction than a similar binary interface. So, purely hypothetically, a JIT infrastructure could introduce a very "weak" execution model, and then run it on hardware with stronger execution guarantees, to enforce information barrier on the program. The program could then never reliably extract side-channel data.
Last year I was experimenting with graphics; and wanted to convert a buffer of pixel data that was in a standard format (96-bits per pixel in the CIE XYZ colour space, with no gamma) into whatever the video mode and monitor happens to want (e.g. maybe 24-bit per pixel in one of the RGB colour spaces, with gamma), including doing dithering. Normal optimised assembly wasn't really fast enough (there were too many branches and too many memory accesses for local variables) so my solution was dynamic code generation. Specifically, a collection of pre-optimised snippets that would be stitched together and then patched, so that half the branches disappeared and half of the local variables effectively became constants.
Of course this was faster than normal assembly, which was faster than "ahead of time compiled" code, which is faster than "JIT compiled" code (where you can't afford to do expensive optimisations at run time). For a crude estimate; I'd say JIT would've been at least 500% slower.
Of course this is a special case and JIT is fine for a lot of things; but there's lots of different special cases and "everything must be JIT" is completely unacceptable because of those special cases.
simeonz wrote:
The problem with AI, is that it cannot do DEP conceptually. The entire data is "intelligent" and mutable, because changes are made to it continuously. Therefore, it will be impossible to freeze the logic into immutable state and sign it, because it has floating nature. If it gets exploited, we cannot discriminate a good AI transition, from malicious AI transition. One of the main counter-exploitation mitigations, authentication and immutability of the controlling logic, is lost. Again, this is not a problem for the system integrity, but for the affected user.
Let's assume that you have a human who creates plain text files (containing C source code), and you also have an AI that creates plain text files (containing C source code). In this case (after compiling, signing, packaging, ...) both the human's code and the AI's code can use DEP; so what is the difference?
Now let's assume that a human and an AI both create plain text files (containing C source code); where (after compiling, signing, packaging, ...) the resulting code is self modifying (e.g. like the "graphics blitter from stitched together snippets" I mentioned above). In this case, neither the human's code nor the AI's code can use DEP; so what is the difference?
You're mostly saying that self modifying code is less secure (which is arguably correct); and then assuming that AI always means "intelligently self modifying". From my perspective; "intelligently self modifying" is the least plausible scenario. For example; the idea of using an extremely expensive and extremely unintelligent "try all the things until something works" approach every time anyone decides to (e.g.) open a text editor (and then not saving the generated code after you've done the extremely expensive/unintelligent code generation, and not ending up with "AI generated code that is not self modifying at all") is far fetched at best.
simeonz wrote:
Brendan wrote:
If the OS failed to prevent differences in timing, and if the publisher of "malware.exe" hasn't been blacklisted world-wide already, and if the admin was tricked into explicitly allowing "malware.exe" to access the network; then the attacker probably deserves a reward for bypassing 3 entirely different protections.
If the goal is to protect the core services and to maintain the process, user and session isolation, there is indeed no difference between malware and exploitable software. This is one layer of security, and the system can also assist the applications in protecting themselves, assist the users in protecting themselves, or even protect the applications from the users, or the users from the applications, and even protect itself from "itself". I agree that improperly authorized malware cannot be stopped and the system has no responsibility to stop it. But my argument was that with malicious software, the user experiences losses by default. In contrast, if he interacts with exploitable software, if the software is attacked, it is by virtue of special security services available to the application, assuming the developer has consciously used them, that consequent damage will be prevented (most likely with a crash). This exploitable software's identity will not be impersonated by malicious code, even if it suffers from downtime and unavailability. So, from this point of view, software exploitation is a separate concern for security, which aims to protect the user, not the system.
The goal is prevention.
Most of the reason vulnerabilities exist is because there's very little incentive to prevent them from existing. A vulnerability will be discovered, most users won't know anything about the vulnerability and just groan about being coerced into updating their software without knowing why, and the relatively small number users who do find out about it hear about vulnerabilities so regularly that they've become numb (it's just "Yawn, another vulnerability this week, everything is still as bad as it's always been"). For larger companies the incentive to prevent vulnerabilities adds up to a small loss of sales that lasts for a few weeks or until their competitor (if there is one) makes a similar blunder; which works out to a temporary loss of profit that's significantly less than the typical weekly advertising budget.
So, given that saying "don't worry, you'll do better next time" while patting them on the back in sympathy has failed to make any difference for decades; how can we encourage people to try harder to avoid releasing exploitable software? Threatening to destroy their business (by blacklisting all their products and guaranteeing zero sales forever) is one way to encourage prevention (but perhaps that's a little extreme?).
Cheers,
Brendan