jnc100 wrote:
Out of interest, do you allow processes to exchange pointers with each other?
No, I do copy-on-share for objects (except for memory buffers which will be a special case of shared memory, not covered by the GC.) Sharing pointers (especially function pointers) between processes is really cool, but would be a pain to cleanup unless I had a single global garbage collector.
I want to avoid the single global garbage collector because I don't want to stop-the-world, just a single process (e.g. if a badly written program keeps invoking the GC, I only want that process to stop.) Also, garbage collection per process is a very easy way to spatially divide the garbage collector, making a single collection faster since it only garbage collectors for a single process.
embryo wrote:
A bit better solution is to start incremental GC at these thresholds. It also will lead to overhead, but the overhead in this case is less significant because of non invasive nature of the incremental GC (better user experience).
I like the sound of incremental sweeping. Right now, my GC is a simple mark-and-sweep. It walks up the local stack, marking each item it encounters (objects, arrays, function pointers (closures) are slightly more complicated because I then walk their properties) and mark each object. I then free every unmarked object.
A very simple way to make this incremental is to turn it into a coroutine:
Code:
gc_incremental() {
start:
scan_objects();
yield;
start_time = now;
while(objects_to_free) {
free_object;
if(now > start_time + 100ms) {
yeild;
start_time = now;
}
}
yeild;
goto start; // restart
}
On the first pass, mark each object. On subsequent passes, free for 100 ms. When there are no more objects to free, and gc_incremental is called again restart the first pass. The only problem with this method is if marking takes a long time.
Re incremental marking: it would require a write barrier that marks objects on assignment. I don't see how I could avoid rewalking the call stack on each marking pass (because the call stack will differ between incremental markings)? However, walking the call stack would be fairly quick - I expect in a real world process, the majority of the time would be spent walking through objects not the call stack.
Brendan wrote:
Finally; for a lot of cases data is cached by processes
My VM caches some objects like strings into a string table. This saves memory and makes string based property look-ups and comparisons very fast (any two strings will have the same pointer).
Brendan wrote:
The most sensible way to do that is to have some sort of special/explicit "this won't be used again" notification that software uses.
I have a special case object (buffers) that do allow this. They can be garbage collected, but the user can also explicitly dispose of them, immediately freeing them. This will be useful for storing large items like media (images, audio), binary data, etc.
linguofreak wrote:
What I'd do is have each process do its own GC, and have its garbage collector return memory to its own free pool rather than to the system. When a process makes an allocation, it first tries to make that allocation from its own free pool. If that does not succeed, it runs its garbage collector to try to expand the free pool. If the free pool is still not large enough, the process requests memory from the system.
That is a nice idea.
Anyway,
when should the GC be called (incremental or full)?
I'm thinking:
- Continuous - trigger an incremental collection once every, say, 5 seconds as long as the process isn't sleeping. This will ensure that garbage will never sit around.
- Out of memory - trigger a full garbage collection if the system runs out of memory.
- On demand - the process explicitly calls the garbage collector (e.g. the user closed a file, or finished playing a level.)