Hi,
rdos wrote:
Brendan wrote:
TSC
Highly unreliable on older CPUs (when it is present), and per-core, which makes it useless for keeping elapsed time. Has no IRQ.
Insanely awesome (extremely low overhead and extremely high precision) on recent CPUs though. You'd want to detect TSC capabilities and use TSC if it's suitable.
rdos wrote:
Brendan wrote:
Local APIC timer (possibly including "TSC deadline mode")
Affected by some types of power-management, but pretty useful for preemption. Not good for high precision timing due to possible effects of power-management.
Very nice if it's not effected by power management (but still very useful even when it is effected by power management). You'd want to detect local APIC timer capabilities and use it if it's suitable.
rdos wrote:
Brendan wrote:
Performance monitoring counters and/or IRQs
Better being free for performance measuring
It's not like the performance monitoring stuff is actually used for performance monitoring most of the time anyway. If there's no other choice, I'd rather use performance monitoring counters for timing than be unable to boot. You'd want to detect performance monitoring capabilities and use it if it's more suitable than something else.
rdos wrote:
Brendan wrote:
HPET main counter
The best alternative for measuring elapsed time.
Second best (TSC on recent CPUs is much better).
rdos wrote:
Brendan wrote:
HPET comparators
The best alternative for high-precision timers. When they work. When the HPET doesn't support MSI-delivery it seems like it often malfunctions. The configuration information returned on some motherboards is incorrect regarding IRQ routings. The ACPI tables are not always correct either. Some report IRQs when they don't work, while some report no IRQ, but they still work.
Probably second best (local APIC is better for high-precision timers). In cases where the local APIC timer gets messed up due to sleep states, you'd still want to use local APIC timers, but when the CPU enters a sleep state migrate the work to HPET until the CPU comes back out of the sleep state. You'd want to detect HPET capabilities and use it if it's more suitable than something else (whether it's "use it on it's own" or "use it as a backup when CPUs are in sleep states").
rdos wrote:
Brendan wrote:
ACPI's "power management timer" (32-bit or 24-bit) counter
HPET?
AFAIK, there is no garantee in ACPI that this is not the HPET, or some channel on HPET.
ACPI's "power management timer" is *not* HPET. It's a counter that is increased at a rate of about 3.5795 MHz. HPET typically runs at 10 MHz or more, which makes it separate. Of course it's possible (likely even) that some chipsets have a central 14.31818 MHz clock that is used to drive HPET directly, used via. a "divide by 4" to drive ACPI's counter, and used via. a "divide by 12" to drive the PIT (but that does not mean "HPET = ACPI's counter = PIT" - they're still all separate devices with separate control logic and capabilities).
rdos wrote:
Brendan wrote:
PIT channel 0
PIT channel 2
Both can be used for elapsed time or high-precision (us) timers.
Yes, but they're slow and ugly (e.g. "legacy IO port" accesses to read the current count); and for channel 2 the thing can roll over several times without you knowing.
rdos wrote:
Brendan wrote:
RTC periodic and/or update IRQ
Can be used to synchronize elapsed time with real time. Not useful for anything else.
For old systems (where you've only got PIT and RTC and nothing else), you'd want to use PIT for the scheduler's timer (in "one shot" mode) and RTC for everything else.
rdos wrote:
Brendan wrote:
Watchdog timer/s (e.g. the "WDAT" and "WDRT" ACPI tables)
These are better left out of this.
Why? Is your OS a general purpose desktop thing that doesn't have to care if it locks up completely due to a hardware fault (rather than some sort of embedded system that might be used for banking)?
rdos wrote:
Brendan wrote:
Some sort of counter to measure real time (need accuracy, precision would be nice, per-CPU would be nice, don't need an IRQ)
No, should absolutely not be per-CPU, but per system. That means one of the PIT channels or HPET. TSC doesn't work, as it is both affected by power-management and is per-CPU. I have tried to use TSC for elapsed time, and it doesn't work. There is no reliable way to synchronize time between cores, especially not when TSCs start ticking at different frequences when power-management "kicks-in".
Don't be silly - on recent systems (where TSC is guaranteed to run at a fixed frequency - e.g. the "TSC invariant" CPUID flag) TSC would be perfect for this (but synchronised to the RTC occasionally). For situations where TSC ticks at different frequencies on different CPUs, just synchronise more often to ensure that the TSC is always within an acceptable amount of error.
rdos wrote:
Brendan wrote:
Some sort of timer to wake sleeping tasks (need accuracy, precision would be nice, per-CPU would be nice, need an IRQ)
This is what I refer to as "timers". This can be APIC timer, PIT channel 0 or HPET comparator. The APIC timer is per-CPU, so if it is used, timers needs to be per CPU. When using PIT channel 0 or HPET comparators, timers would be per-system. It might be possible to use combinations if both APIC timer and PIT channel 0 / HPET comparator is available.
Sadly, Linux does something like this too - take a high precision timer like the local APIC timer, and use it as a general purpose timing thing so that you can bury it under millions of networking timeouts (that have no need for high precision). It's stupid because there's always a minimum amount of time between delays, and when there's too many things using the same timer you have to group things together to avoid the "minimum time between delays" problem. For example, if "foo" should happen in 1000 ns and "bar" should happen in 1234 ns, then you can't setup a 234 ns delay and have to bunch them together, and "foo" ends up happening 234 ns too late. Things that don't need such high precision should use a completely different timer to avoid screwing up the precision for things that do need high precision.
rdos wrote:
Brendan wrote:
Some sort of counter to measure how much time each task has used (precision would be nice, per-CPU would be really nice, low overhead would be nice, don't need an IRQ, don't really need accuracy)
I use elapsed time for this. When the task is started, the elapsed counter is saved, and then when a new task is scheduled, elapsed time is read again, and then subtracted from the saved value. There is no need for a separate hardware resource for this.
What is "elapsed time"? I'd use TSC for this if I could (and fall back to HPET if TSC can't be used, and fall back to ACPI's counter if both HPET and TSC can't be used).
rdos wrote:
Brendan wrote:
Some sort of timer that the scheduler can use to know when a task has used all of the time it was given (some precision would be nice, accuracy doesn't matter much, don't need to be able to read the current count, do need an IRQ, "one shot" IRQ would be nice)
APIC timer, if available, is most suitable for this. If timers also use APIC timer, there is a need to combine the timeouts, but this works. If APIC timer is not available, HPET or PIT channel 0 can be used (most often combined with timer function). The most effecient allocation is to use APIC timer for preemption and HPET comparator for timers.
The most efficient way would be using performance monitoring counters for the scheduler, local APIC timer for high precision "sleep()", and HPET or PIT or TSC for low precision timing (e.g. network packet timeouts).
rdos wrote:
Brendan wrote:
Some sort of timer to keep track of power management (don't really need precision or accuracy, don't need to be able to read the
current count, do need an IRQ, "one shot" IRQ would be nice)
I'd use a normal timer (as of above) for this. It doesn't need its own hardware resource.
Same "jack of all trades" problem (screwing up high precision timing by using it for low precision timing).
rdos wrote:
Brendan wrote:
(optional) Some sort of timer to use for "poor man's profiling" (don't really need precision or accuracy, don't need to be able to read the current count, do need an IRQ and something capable of generating an NMI would be nice, "one shot" IRQ would be nice for pseudo-random delays)
This is more or less also the normal timers I have.
Your normal "generic timer" stuff uses NMI? Sounds seriously painful to me.
rdos wrote:
Brendan wrote:
(optional) Some sort of timer to use for a watchdog (don't really need precision or accuracy, don't need to be able to read the current count, do need an IRQ and something capable of generating an NMI would be nice, fixed frequency is fine)
Same as above. This is a normal timer.
Yes - same as above (seriously flawed).
rdos wrote:
Given the complex selection rules as of above, I doubt it is possible to write something generic that selects the best resources.
Given the complex selection rules above, I doubt it's possible to avoid writing something that selects the best resources.
The only other alternative that I can think of is telling unsuspecting end users "
I'm not smart enough to figure it out, and even though you probably know less than me, I'm making you solve my design failure via. compile time idiocy".
rdos wrote:
Additionally, for such an algorithm to work there is a need to know several variables:
1. Does the hardware resource work?
2. Does the hardware resource trigger the IRQs it is supposed to trigger?
3. Does power-management affect frequencies?
These can at best be tested, but I'm not sure how to test if power-management will affect frequencies. If a resource is per-system, and leagacy, it probably won't be affected by power-management, but these are proabilities not parameters that are easily input into an algorithm.
The conservative way would be to assume the answer to all those questions is "no" unless you know otherwise. For S4 (hybernate) and S3 (suspend) you can assume all timers lose their state (as you're effectively turning everything off, except RAM for S3) and reinitialise your timing (e.g. starting from getting the new time and date from the RTC) when you come out of S3/S4. For S2 only the CPUs are turned off, so you should never need to worry about PIT, RTC, HPET, ACPI counter (and only have to worry about TSC and local APIC). For TCS and local APIC behaviour, you can use CPUID (either the "TSC invarient" flag, or the "vendor:family:model"). That probably solves 99% of the problems, and the remaining problems can easily be handled with some special case work-arounds if/when they occur.
Cheers,
Brendan