(sigh) I
really dislike this sort of user interface.
Apologies if I botch the attributions... This is simply too tedious to interact like this!
Quote:
Quote:
Have you considered what is involved to handle each of the "options" I described?
Yes. For example, the "FIRST_FIT, LAST_FIT, BEST_FIT, WORST_FIT, EXACT_FIT, NEVER_FIT" menagerie implies that (internally) the memory manager is going to have to use an extremely stupid and slow "list of entries" approach to handle all the unnecessary options; and this will involve searching the list and can't/won't be "O(1)" for any/many of the important cases, and will therefore suck (and will probably suck so badly that it's completely unusable for "real-time" where you're meant to guarantee a worst case max. time for all operations and can't accept "searching an unknowable number of entries").
Actually, it is MORE efficient (in time and space) exactly because the (efficient) developer KNOWS how the heap will be used and will take measures to ensure it remains in a state conducive to his future requests.
Consider how a buffer pool works; essentially, all you have is a list (regardless of whether implemented as such or as a bitmap, etc.) that allows you to locate AN available buffer -- in almost constant time (depends on how you represent "free").
How is that any different from:
Code:
(pointer, size) := allocate(HEAP, BUFFER_SIZE, EXACT_FIT, FIRST_FIT, TRIM, TAKE_ANY, AS_REQUESTED);
?
The creation of the pool happens "for free" -- each new buffer requested chops off a BUFFER_SIZE'd piece of the heap and delivers it to the caller. When the caller is done, he can:
Code:
result := release(HEAP, pointer, AT_HEAD, AS_ALLOCATED)
to return the buffer to the heap WHILE KEEPING IT AS A DISCRETE FRAGMENT within the heap. The next
such request would (coincidentally) find that the first fragment happens to be EXACTLY what the caller is looking for!
If requests come in faster than releases, then each request incurs some incremental FIXED cost to slice the remaining large fragment in the free list (that represents all of the memory that has NEVER been allocated) to form a new BUFFER_SIZE fragment for use by the user. Eventually, it, too, will be released -- and retain its size in the heap.
You can expedite the cost of dicing up the heap by using:
Code:
result := release(HEAP, pointer, AT_TAIL, AS_ALLOCATED)
to ensure the next request encounters the "big fragment of unused memory" before it encouters any previously released BUFFER_SIZE pieces therefore causing them to be sliced and diced early on.
If you want to remove ALL of the incremental costs of chopping up the heap into these BUFFERS, then just iterate through the process before you elect to actually
use any of them.
Quote:
Don wrote:
Those that are more complicated (blocking in the allocator, opting for a lock on the heap itself -- or not) are required even if the simpler options are omitted/hardwired. They can't be efficiently implemented outside the memory manager.
The more "complicated" options are not required at all for most applications. If they can't be implemented efficiently inside an "application provided memory manager" (e.g. because applications can't use locks/mutexes/semaphores, or because your OS doesn't provide an efficient communication mechanism) then the OS will suck for a large number of things (that don't necessarily have anything to do with memory management).
What do you see the role of an operating system to be? Let's have the application developer write his own memory management routines, his own scheduler, his own file system, etc.?
In MY applications, the complicated options ARE required. Perhaps you missed:
Quote:
I'm working in an embedded/appliance market on custom hardware. So, no constraints as to API's, porting legacy code, etc. The environment is real-time, multithreaded and object-based. The pieces of memory being managed are large-ish; the amount of memory available is significant.
Would you advocate each application sorting out how to handle its own deadline violations? Perhaps dedicate a hardware timer to each thread so IT could program the timer to notify it when its deadline has passed? Or, would you embody that as an OS
service?
Would you advocate each thread and task/process communicate with its peers to come up with a mutually agreeable scheduling decision, NOW? Or, would you embody that as an OS service?
Would you insist that all memory consumers use a memory manager that imposes the overhead of a mutex on every access -- even if the memory manager is just being used by a single thread -- AT THIS TIME? Or, let each thread craft its own memory services??
Quote:
Don wrote:
Consider how you would allow a thread to wait for a particular allocation request to be handled -- in the absence of sufficient resources in the free list -- spin on the failing malloc()? And, hope no other thread sneaks in to grab any memory that becomes available while the other thread is busy spinning?
If, for some completely unimaginable reason I actually wanted to wait for memory (on an embedded system where my application is likely the only application, on hardware that was supposed to have a "significant" amount of memory available, on an OS that is supposed to be real-time and therefore supposed to guarantee maximum worst case times), I would:
- implement a memory manager that has a list of waiters (including how much memory they're waiting for and their thread ID)
- when freeing memory; examine the list of waiters and if one or more of the waiters can be satisfied allocate the memory for them, and send a "here's the memory you were waiting for" message to the waiter
- Have waiters blocked waiting to receive a message (including "wait for message with timeout" and a way to remove a thread from the list of waiters, if and only if the specific application actually does need the extra complexity this involves)
Your comments suggest you are thinking too small. I'm not designing a microwave oven. Or, a process control system that monitors a few dozen sensors and controls as many actuators. Possibly on a multimillion dollar piece of equipment.
Think scores of processors distributed over large areas (three-dimensional football fields). Tackling things like VoIP/video/audio delivery, location awareness, speech synthesis/recognition, security/access control, physical plant, etc. Think
hundreds of applications and as many concurrent users.
And, unlike a desktop where the user can reboot, kill a job, hit reset, etc., I have to run, reliably, 24/7/365. If Joe User in cubicle #743 wants to "play pong", I have to ensure that:
- I still have the resources (time, memory, I/O's, available power) to satisfy the GUARANTEED services that The System must provide to its users
- Joe User can't munge anything that he's not allowed to munge
- I can harvest Joe User's resources if other, "more pressing" demands materialize.
Quote:
Don wrote:
I can, instead, wrap the "swiss army knife" implementation with "simple interfaces" for those cases that always need a specific set of options. A particular combination of options that are never used can simply not have a suitable wrapper provided.
For a silly/simple example; lets say that (as part of my application) I want to log all packets sent and received over TCP/IP. I have a structure containing the details I want (dest IP address, etc) which includes a "next" field (because I'm planning to use a singly linked list of these structures) and this structure is exactly 123 bytes. For cache/TLB/data locality (performance) I want to ensure that the memory used for these structures is within the same range, so I allocate a memory pool specifically for these structures. I know (from CPU manufacturer info) that it's more efficient to align the structures on an 64-byte boundary (as this happens to be cache line size) so I'm looking at managing blocks of 128 bytes. When there are more than 5000 of these entries I want to append the oldest 2500 entries to a file on disk and free the memory the blocks used. I allocate a pool that's only large enough to allow 10000 entries to be allocated, and if that pool runs out of memory then I know that something must have gone wrong with disk IO so I terminate the application.
To get "O(1)" for everything that matters for this application (without any stupid "if(FIRST_FIT) { ....} else if(LAST_FIT) { ....}" branching/bloat); I create a pool (that is exactly "10000 * 128" bytes because that's all I need) and turn it into a singly linked list of free blocks (using the structure's "next" field that I already have). My application's "lockless" allocator (assuming 32-bit 80x86) looks like this:
Code:
count := 0
while (count < 10000) {
(pointer, size) := allocate(HEAP, 128, EXACT_FIT, FIRST_FIT, TRIM, TAKE_TAIL, AS_REQUESTED);
if (pointer != nil)
result := release(HEAP, pointer, BY_LOCATION, AS_ALLOCATED)
}
creates your 10000 128 byte buffers and ensures they are "ordered" BY_LOCATION.
The NEXT allocate request will yield the buffer having the lowest memory address -- and will provide it in constant time. The allocation after that will allocate the buffer having the next higher address -- in the exact same, constant time.
Quote:
I'd only free 2500 blocks at a time (and this is only ever done by the thread responsible for appending the data to the file on disk). This code would unlink the oldest 2500 entries, do the disk IO stuff, then merge the list of 2500 "now free" blocks with the allocator's list of free blocks in a single lockless "lock cmpxchg [nextBlock], ..." loop, then do a "lock sub allocatedBlocksMinus5000],2500 | (1 << 31)" to atomically adjust the number of allocated blocks and clear the "sending to disk in progress" flag at the same time.
Now see if you can tell me exactly how your "swiss army knife" would be used to implement this "application specific lockless allocator" better than the application's own custom designed allocator that I've described. Note that this is only one example (with relatively simple requirements), and an application may have/want many different allocators each specifically designed to be "as ideal as possible" for extremely different requirements/usage.
I counter (REAL example): you have live video coming off a camera on a node, "somewhere". You have to examine the video, in real time (i.e., it's an endless stream, even if you opt to buffer it somewhere, you still have to be able to process it at the same effective rate that it is being generated). You have to monitor for signs of motion in a certain portion of the image and signal an event when that is detected or persists for some period of time.
If you sense motion, you have to determine if a person is present (i.e., not just trees blowing in the breeze) and, if so, recognize his/her face. Based on who you do/don't recognize, you may have to unlock a door, connect a bidirectional audio stream between the guest and the receptionist, notify the boss that his wife has arrived, etc.
At the same time, you have to record all of this to persistent store.
And, present it for live review by an "attendant" located "in the guard shack".
Of course, Joe User shouldn't be able to eavesdrop on any of this stuff happening -- even if the resources to accomplish these tasks happen to be coming from "his" node!
And, all these "applications" are running on different device nodes scattered around the building. How many memory managers are you going to write to handle this "simple" set of applications? How much time are you going to devote to quantifying, documenting and validating each of their operations?
MIPS are dirt cheap. So is memory. Developer time is where the money gets wasted -- and user disappointment. "Gee, should I run each node at 300MHz? Or 600? 64MB of DRAM? or 256MB? One core or two? 100BaseTX or GBe?"
If I spend a few instruction fetches in a critical region... feh.
Quote:
"OS designed specifically for the application" is an oxymoron. Either it's designed for one specific application and therefore it's part of one specific application and not an OS at all; or it's not designed specifically for one specific application.
[/quote]
Seems like an arrogant assumption. Do you think embedded devices are just "spaghetti code" simply because YOU can't load an application of your own choosing onto it? Is a math library no longer a math library because a single application is using it? Amusing when you consider there are probably two orders of magnitude MORE embedded systems in the world than all the desktop machines combined!
Is Android only an OS *if* a user opts to load some non-default application onto it? I.e., if he uses the stock phone, as is, how does that differ from using an OS in a "single" application?