Thanks guys! Great ideas about reading a time counter in the system rather than only keeping count of interrupts!
It might be a long time before I can put the TSC ratio idea into (regular) practice though. As qemu doesn't seem to pass-through valid leaf 15h. It claims that 15h is supported in 0h but all registers of 15h read out as zeros.
Also noticed that the issue is more in my tracking rather than the initial calibration, if a user space program that mostly repeatedly call syscalls (and some heavy weight ones) is running the whole hour, then the clock will be 5s slower. But if a user space program that has a more 'normal' mix of syscalls and user space operations (initialize and compare large buffers as part of the file sys test) is running for the whole hour, then the clock actually tracks quite well (usually ends up 1 second-ish faster than host probably due to the 1 lost interrupt during calibration causing it to think that the timer is slower than it really is
)
I guess the trick is to read the some counter fast enough in the normal timer handler so it doesn't roll over more than once between 2 reads.
Now onto the new quest of figuring out the rate of some counter without CPUID 15h.