How not to implement sleep()!

pcmattman · Post by **pcmattman** » Thu Oct 30, 2008 6:27 pm

Hi guys,

Was just going through my sleep code updating it to use the new sleep queue I've implemented into my OS, and I stumbled across this:

Code: Select all

// sleeps for a certain number of seconds
uint32_t sleep( uint32_t seconds )
{
	// delay() delays for a certain number of milliseconds
	return delay(seconds * 0x1000);
}

No wonder sleep() never seemed to sleep for the correct amount of time

.

Troy Martin · Post by **Troy Martin** » Thu Oct 30, 2008 6:36 pm

Lol!

You would've wanted 0x3E8

karloathian · Post by **karloathian** » Tue Nov 04, 2008 3:22 pm

I was wondering though how to implement a nice sleep function in my kernel.

I was thinking of maybe adding two DWORDs to my task structs, so when sleep() is involved one is stamped with the current time and the other with the "wakeup" time, and then invoking a task switch. Basicly when the task manager would switch task it would check those two stamps.

How did you guys implement sleep?

CodeCat · Post by **CodeCat** » Tue Nov 04, 2008 3:53 pm

Most of the time you'll need two kinds of sleep. The short one just delay-loops, and is best for short delays not longer than a few microseconds. The longer one instead acts as a blocking call that puts the thread to sleep, and wakes it back up when the specified time has passed. The former one is easy to do but wastes clock cycles if it's too long, while the latter needs a timer interrupt and thread scheduling to work and also has more overhead if the delay is short.

pcmattman · Post by **pcmattman** » Tue Nov 04, 2008 7:50 pm

Here's how I do it:

Code: Select all

// sleeps for a certain number of milliseconds
uint32_t delay( uint32_t t )
{
	// get the cpu flags
	uint32_t int_state = 0;
	asm volatile( "pushf; popl %0" : "=r" (int_state) );

	// clear the interrupt flag - we're messing around with process states here
	asm volatile( "cli" );

	// setup the sleeper
	currprocess->sleep_start_time = kGetPITCounter();
	currprocess->sleep_end_time = kGetPITCounter() + (t);
	currprocess->sleep_curr_time = 0;

#if DEBUG
	dprintf( "[sleep] process delay: %d, %d, %d\n", currprocess->sleep_start_time, currprocess->sleep_end_time, currprocess->sleep_curr_time );
#endif

	// take it off the ready state
	currprocess->status = PSTATE_SLEEPING;

	// get a pointer to this process
	struct pentry* ent = currprocess->ent;

	// unlink from the current queue
	if(ent->prev)
		ent->prev->next = ent->next;
	else
	{
		// if the previous is null then this is actually the first item
		// on the already queue, so we need to fix that
		if(ent == p_alreadyq)
			p_alreadyq = ent->next;
		else if(ent == p_readyq)
			p_readyq = ent->next;
	}
	if(ent->next)
		ent->next->prev = ent->prev;

	// link onto the sleep queue
	if(p_sleepq == 0)
	{
		p_sleepq = ent;
		ent->prev = ent->next = 0;
	}
	else
	{
		p_sleepq->prev = ent;
		ent->prev = 0;
		ent->next = p_sleepq;
		p_sleepq = ent;
	}

	// go to a ready context
	kSchedule(0);

	// restore interrupts, if needed
	if( int_state & 0x200 )
		asm volatile( "sti" );

	// return the time
	return t;
}

Brendan · Post by **Brendan** » Tue Nov 04, 2008 10:15 pm

Hi,

karloathian wrote:How did you guys implement sleep?

First, you may want 4 or more different sleep functions.

The first one runs other threads until the time delay expires (efficient, but less accurate). The second one runs other threads until just before the time delay expires (if possible) and then spins (e.g. "while(wakeTime > currentTime) {}") to improve accuracy.

However, sometimes you want to wake up at a specific time (instead of sleeping for a certain duration). For example, imagine you want to call "foo()" once per second. This won't work:

Code: Select all

next:
    sleep(ONE_SECOND);
    foo();
    jmp next

The problem here is drift - you might actually sleep for 1.05 seconds and "foo()" might take 0.2 seconds, and on average you might end up calling "foo()" once every 1.25 seconds. It'd be a lot better if you could do this:

Code: Select all

    nextTime = getCurrentTime();
next:
    nextTime += ONE_SECOND;
    sleep_until(nextTime);
    foo();
    jmp next

In this case you can guarantee that you call "foo()" once per second on average. The length of time between individual calls to "foo()" might vary, but that might not matter. With code like this, even if you don't get any CPU time for several seconds it'll catch up!

You might also want a "sleep_until_calendar()" function. For example, imagine if a thread wants to sleep until midnight on new year's eve (or maybe the thread just wants to wake up at 6:00 AM each morning). The normal "sleep_until()" function might not work for this because the current time might be adjusted (e.g. by the user/administrator, or automatically by NTP). You'd need to recalculate the "wake_time" if the real time clock is changed to make sure you wake up at the right time.

That gives you up to 6 different functions for sleeping: sleep(), sleep_precise(), sleep_until(), sleep_until_precise(), sleep_until_calendar() and sleep_until_calendar_precise().

For "sleep()" and "sleep_precise()", you can just do this:

Code: Select all

void sleep(delay_time) {
     sleep_until(current_time + delay_time);
}

void sleep_precise(delay_time) {
     sleep_until_precise(current_time + delay_time);
}

For "sleep_until_precise()" you'd do this:

Code: Select all

void sleep_until_precise(wake_time) {
     sleep_until(wake_time - timer_quantum);
     while(wake_time > current_time);
}

Note: the "timer_quantum" variable here is how accurate the timer is. For example, if you're using the PIT at 100 Hz, then "timer_quantum = 10 ms".

For "sleep_until()", you could use many lists where the timer IRQ wakes up any threads that are on the "next" list. For example (with 256 lists):

Code: Select all

void sleep_until(wake_time) {
    current_thread->wake_time = wake_time;
    list = (wake_time / time_between_IRQs) & 0xFF;
    current_thread->next = first_sleep_list_entry[list];
    first_sleep_list_entry[list] = current_thread;
    tell_scheduler_to_remove_this_thread_from_ready_to_run_queue();
}

timerIRQ:
    current_time += time_between_IRQs;
    list = (current_time / time_between_IRQs) & 0xFF;
    thread = first_sleep_list_entry[list];
    first_sleep_list_entry[list] = NULL;
    while(thread != NULL) {
        if(entry.wake_time <= current_time) {
            tell_scheduler_to_put_thread_back_on_ready_to_run_queue(thread);
        } else {
            thread->next = first_sleep_list_entry[list];
            first_sleep_list_entry[list] = thread;
        }
    }

Note: The example code above lacks any re-entrancy locking (but, single-linked lists can be done with lockless code - use a "lock cmpxchg" when you update "first_sleep_list_entry

")...

For this code, when a timer IRQ occurs some tasks would be put back on the list. For example, with a 10 ms timer IRQ and 256 lists, a thread that sleeps for less than 2560 ms will be woken up the first time the list is checked, but a thread that sleeps for 25600 ms will be checked and put back on the list 9 times (before being woken up on the 10th time it's list is checked). If you use a more accurate timer IRQ (e.g. 1 ms), or if lots of threads sleep for long lengths of time, you may want more lists (e.g. with a 1 ms timer IRQ I'd be tempted to use 2048 or 4096 lists).

There is another way though. If you use a single list that's sorted (e.g. when you add something to the list you insert it at the right spot; so that the first thread on the list is always the thread that should wake up first) , then you could program the timer to generate an IRQ when the first thread on the list is due to wake up (for e.g. for the PIT, use mode zero "interrupt on terminal count"). This has more overhead - searching through the list to find the right spot to insert a thread, and PIT reprogramming overhead. The advantage is that you'd be able to ask for an IRQ with 838 ns precision - for example, if a thread wants to sleep for 12345 ns, then you'd set the PIT count to 15 and get an IRQ after 12570 ns. For something like "sleep_until_precise" this means you spend a lot less time doing "while(wake_time > current_time);", which could save you lots of CPU time. You'd have to be careful here though - there's a minimum amount of time that you can set that depends on the chipset (e.g. you probably can't keep asking for "2 * 838 ns" and expect the PIT and PIC to be fast enough to respond in time).

You could also use the local APIC timer in the same way (e.g. in "one shot" mode) to get extremely precise timing. In this case (for "multi-CPU" systems) you could have a different sorted list for each local APIC timer, which would probably be a good thing for scalability.

There is one interesting thing about the RTC I should mention. It has 2 different timer IRQs! There's an "update" IRQ that occurs once per second and a periodic IRQ that can be set to generate an IRQ between 2 Hz and 8192 Hz. Even better, these timer IRQs are synchronized, so that there's always exactly the same number of periodic IRQs between update IRQs. This means (for e.g.) you could set the RTC periodic IRQ to 1024 Hz and have 1024 lists; and then use the RTC update IRQ with another 256 lists. When the RTC update IRQ occurs you'd take everything in the next "update list " and sort them according to their wake time, putting some back on the "update list" and putting the rest on the appropriate "periodic lists". When the periodic IRQ occurs you'd tell the scheduler to start every thread in the next "periodic list" without checking anything. In this case, if a thread sleeps for less than 1 second it'd be put on a "periodic list" straight away; if a thread sleeps for less than 256 seconds it'd be put on an "update list" and shifted to a "periodic list" at the right time; and if a thread sleeps for longer than 256 seconds it'd be put on an "update list" and then this list would be checked several times before it's shifted from the "update list" to a "periodic list".

Using the RTC like this also leaves the PIT free; so you could use the PIT's mode zero "interrupt on terminal count" feature to get even more precision after the RTC is finished with it.

You could also use HPET instead of the RTC and PIT, if it's present.

The "sleep_until_calendar()" and "sleep_until_calendar_precise()" are mostly the same as "sleep_until()", except you need be able to recalculate the wake up time if real time is changed. Note: I assume the OS keeps track of "time since boot", where "current calendar time = time since boot + calendar offset", and where "thread wake time = calendar wake time - calendar offset"; so that if "calendar offset" is modified then "thread wake time" needs to be recalculated.

To do this I'd use a separate set of lists (e.g. 256 lists for "sleep_until()" and another 256 lists for "sleep_until_calendar()", where the timer IRQ checks the next "sleep_until()" list and the next "sleep_until_calendar()" list), so that when real time changes you can recalculate the wake up times for anything in the second set of lists (without touching the first set of lists). I'd also maintain a version number for the "calendar offset", so you can do:
Code: Select all
```
void sleep_until_calendar_precise(wake_calendar_time) {
     sleep_until_calendar(wake_calendar_time - timer_quantum);

     wake_time = wake_calendar_time - calendar_offset;
     version = calendar_offset_version;
      while(wake_calendar_time > current_time) {
         if(version != calendar_offset_version) {
             wake_time = wake_calendar_time - calendar_offset;
             version = calendar_offset_version;
         }
     }
}
```
You'd also need to keep track of the current time somehow. The best way to do this is with something like "current_time += (RDTSC - lastTSC) * K", where "K" is a scaling factor (that may change when the CPU's speed changes). Unfortunately, for some CPUs RDTSC isn't reliable or isn't even supported. In that case you want to use the most precise timer you can. For the PIT and local APIC, it's possible to get the "current count" value to improve precision - for example, if the PIT is set to 10 ms then every timer IRQ you'd do "base_time = base_time + 10ms", and when you get the current time you can do "current_time = base_time + read_PIT_count() * K", so that "current_time" is accurate to within 838 ns (instead of just being accurate to within 10 ms).

You'd also need to use something for the scheduler, to work out when the currently running thread should be preempted (the "end of time slice" time). If you use the PIT's mode zero "interrupt on terminal count" feature (or the local APIC in "one shot mode"), you can still use the same timer for the scheduler, where the time for the next IRQ is whichever happens soonest (waking up a sleeping thread or ending the current threads time slice). Alternatively you could use any fixed frequency timer for this too.

Finally, I should probably mention that in previous versions of my OS I've had "light sleep" and "deep sleep". The difference was what happened if a message arrived - if a thread was using light sleep and received a message then it'd wake up immediately without waiting for the delay to expire (but for deep sleep nothing woke a thread except delay expiry). This was useful for time-outs (e.g. if you're expecting a message, then you could light sleep until it arrives but wake up after a while in case the message never arrives) and for doing something regularly (e.g. calling "foo()" once per second) while also handling any messages that arrive. Light sleep complicates things a little more, because you need to be able to find and remove a thread from any sleep list. However, you probably need to be able to do that anyway (if you let threads sleep for longer than a second) - for e.g. a thread might decide to sleep for 3 weeks, and the user might spend those three weeks pressing "control + C" and wondering why the thread is still there (and then after three weeks the thread wakes up to find thousands of "terminate" messages).

An alternative to "light sleep" is to support alarms, where you have some sort of service that sends you a message at a requested time (e.g. a thread would request an alarm, and then block until a message arrives and see if it's the alarm message or not).

Cheers,

Brendan

pcmattman · Post by **pcmattman** » Wed Nov 05, 2008 12:18 am

Thanks Brendan for some extremely good advice. I may not ever actually implement this into my operating system but the design ideas and specifics are very helpful.

EddyDeegan · Post by **EddyDeegan** » Wed Nov 05, 2008 5:34 pm

pcmattman wrote:Thanks Brendan for some extremely good advice. I may not ever actually implement this into my operating system but the design ideas and specifics are very helpful.

I'm pretty sure I'll be digging into quite a lot of this in detail in the not too distant future...

Thanks Brendan, some really good food for thought there. Now I just have to figure out the exact mechanics of:

sleep_until(I-need-to-wake-up);

where I-need-to-wake-up is determined while I'm asleep. Hey, I do it every Fri/Sat night...

Eddy

OSDev.org

How not to implement sleep()!

How not to implement sleep()!

Re: How not to implement sleep()!

Re: How not to implement sleep()!

Re: How not to implement sleep()!

Re: How not to implement sleep()!

Re: How not to implement sleep()!

Re: How not to implement sleep()!

Re: How not to implement sleep()!