linguofreak wrote:
As for system messages coming up while you're working in a terminal, Linux will print all sorts of errors to the system console regardless of what you may be doing on that virtual terminal. My box is currently spewing errors about a failed CD drive that I haven't had the time to open up the machine to disconnect. An OOM situation is next thing to a kernel-panic / bluescreen, both of which will happily intrude while you're doing other things, so I don't see any big problem with the OOM killer doing the same thing.
Not sure if it's changed but previously there were quite a few things that affected what gets printed where. Whether it's a physical console vs SSH shell, whether it's root or normal user, etc.
Being informed of issues is one thing, spewing errors on the console is really awkward. I've actually had to use such systems and it's very difficult to fix the situation when you get 100+ messages every second on a physical console since you can't do almost anything to actually get a look at what's happening. There are commands that will stop spewing errors to the console, but personally I've rarely needed those and IIRC they vary from system to system (Linux vs FreeBSD vs other) so probably not once have I actually remembered how to do it.
Finally, and this applies to Windows GUI as well, I really hate it when a pop-up comes up, asking: "Do you want the world to explode?", while I'm writing a document, email or just an address to Firefox's address bar, and just happen to press space/enter/"y" at the exact right time. There's just no good way you can ever put a popup asking something like that out of the blue.
linguofreak wrote:
The OS doesn't need to totally grind to a halt. It will have to stall on outstanding allocations until the user makes a decision or a process ends on its own or otherwise returns memory to the OS, but processes that aren't actively allocating (or are allocating from free chunks in their own heaps rather than going to the OS for memory) can continue running. Now, if a mission-critical process ends up blocking on an allocation, yes, you have a problem, and a user OOM killer might not be appropriate for situations where this is likely to cause more trouble than an OOM situation generally does in the first place.
The OS doesn't, but every app that requests memory does, right? And sooner or later that's going to be everyone one of them, right? Obviously there are things such as SNMP that ease monitoring servers, but the point is that the server has gotten itself into a mess, stopping to wait for a user in a Data Center with 10k real servers and 100k+ virtual servers isn't really feasible. Of course configurable OOM Killer might easily be a solution here, servers automatic, desktops users, but if the manual version isn't absolutely needed then I might prefer consistency..
linguofreak wrote:
For a server, your OOM killer could actually make use of a reverse-SSH protocol where the OOM killer makes an *outbound* ssh-like connection, using pre-allocated memory, to a machine running server management software, which could then send alerts to admin's cell phones, take an inbound SSH connection from an administrator's workstation (or phone), and pass input from the administrator to the OOMed server and output from the server back to the administrator.
True, but couldn't the SSH then just pre-allocate memory and allow inbound connections? If the connection isn't from one of the "root" users, then kick them back out, and maybe enforce stricter time-outs, etc.. My point was more along the lines that while you can do this, you may need to do it for every tool and it becomes impractical.
Figuring out that SSH needs pre-alloc is one thing, but all the tools you will need to resolve the situation? I've seen multiple times situations where I can't even use "ls" or the like simple commands, IIRC the "ls" was not working due to lack of available file descriptors to open and execute the "ls" command itself. However some commands did work, I never investigated why, but my guess would be that either they opened one or two FD less or it might be that the shell did something slightly different with them with regards to FD's.. trying to pre-alloc for each and every tool you might need isn't really feasible..
linguofreak wrote:
Not every process has bounded memory requirements. Not every process has bounded runtime. Daemons are generally supposed to keep running indefinitely.
Are you sure? Haven't really thought about every possible scenario but is there some reason this is the case? It might be that there's some really special case, like calculating the answer "The Answer to the Ultimate Question of Life, The Universe, and Everything", but for everything in normal world, are there cases that aren't bounded by mem and runtime?
A process might have a base mem requirement and then dynamic requirement based on the file it operates on, or files if multiple. With files there might be no max or it might max out at some predefined limit. As for time, I think for most cases it should be roughly knowable, and my suggestion doesn't even need that, I mentioned the running process flagging it's progress in case it has dynamically adjusting mem requirements, thus the OS knows where in the mem curve we are now and what's to expect going forward. Just in case someone is considering it, there's no need to link the Halting Problem here..
As for daemons, if you consider each request as independent request then they should fit in just fine. Having the daemon process "reset" itself after each request instead of the OS "reseting" it be spawning a new one once the old exit()'d.