Page 1 of 1

How to store data that survives reboots?

Posted: Thu Mar 20, 2025 5:40 pm
by SpuriousCrow
Hello!

An operating system project of mine (that importantly relies on another coexisting operating system to run) sometimes can lose access to logging when critical system errors occur. An example situation might be that the coexisting operating system crashes while interacting with mine, and in those cases I can't get logs out before the processor resets.

In most cases, I can consume logs from the machine using serial ports or a circular buffer (accessed via an external DMA device), but I've encountered a unique situation where the coexisting operating system now commandeers the serial port and runs a hypervisor which prevents me from getting either serial port logs or DMA access before the device resets.

Would anybody have any suggestions as for how I could store the diagnostic data I collect about crashes at a location that would survive reboots incurred from something like a triple fault?

I would also like for this data, ideally, to persist between both warm and cold resets, wherein the machine may lose power briefly before coming back online.

So far I've explored:
  • the idea of sticking these diagnostic structures at high points in memory (our example system has 32GB of RAM, after inspecting the memory map provided to the OS from UEFI I put the structure in available system memory at around the 31GB mark), but this memory was cleared after rebooting into a UEFI shell (no memory test was performed per the boot settings)
  • saving the UEFI runtime functions for Get/SetVariable, but each of our diagnostic structures are ~2KB in length and we have one for each of the 32 processors on test machine; which is too much memory to store in those NVRAM variables + we don't have those regions mapped into our OS paging tables anyway
Some things I have not yet explored:
  • using the disk, which I believe could become complicated depending upon which OS coexists with mine, along with what hardware is used
  • using the TPM, but it doesn't seem to have enough space available for use, and may not be present on most hardware
  • using SPI flash memory
  • using ACPI NVS memory
  • using on-board controller memory, which I heard was possible but seems like a bad idea
Perhaps there's some way to utilize the UEFI EDK2 facilities to pre-allocate some block on the file system or use a file on an external USB drive where you can predict the hardware/file system format?

If anybody has ideas I would love to workshop some of them.

Thank you all for your time :)

Re: How to store data that survives reboots?

Posted: Thu Mar 20, 2025 9:42 pm
by sounds
If it's fairly recent Intel (FSP) and AMD (AGESA) intentionally wipe all system memory. You could try some of the links in https://superuser.com/questions/838797

Re: How to store data that survives reboots?

Posted: Fri Mar 21, 2025 1:26 am
by araxestroy
This is one of those problems that seems easy to solve on paper but once you start writing down notes, you come back to the issue of "something else owns the machine". If a coexisting kernel can fully deadlock the machine in a way that even DMA fails, you've basically run out of options.

There is an oft-repeated quote from the Hagakure that tells us that even if your head is cut off, you should be able to do one last action with certainty. Unfortunately you are asking how to perform that last action while someone else has possessed your body.

To which I ask: Exactly how important is this coexisting operating system? Because it seems to be the source of your real problem: that you don't own the machine you're running on.

Re: How to store data that survives reboots?

Posted: Fri Mar 21, 2025 11:15 am
by Octocontrabass
SpuriousCrow wrote: Thu Mar 20, 2025 5:40 pmAn operating system project of mine (that importantly relies on another coexisting operating system to run)
Why is the coexisting OS not running as the guest inside your OS's hypervisor where you can easily prevent it from interfering with your error logging?
SpuriousCrow wrote: Thu Mar 20, 2025 5:40 pmbut each of our diagnostic structures are ~2KB in length and we have one for each of the 32 processors on test machine; which is too much memory to store in those NVRAM variables
Have you tried compressing these data structures? Would ACPI ERST be a better fit than UEFI variables?
SpuriousCrow wrote: Thu Mar 20, 2025 5:40 pmusing the disk, which I believe could become complicated depending upon which OS coexists with mine, along with what hardware is used
As with the serial port, you can't access the disk when there's another OS running its own drivers to access the disk. If you can come up with a solution to that problem, though, you can create a reserved partition on the disk to store your error logs.
SpuriousCrow wrote: Thu Mar 20, 2025 5:40 pmusing the TPM, but it doesn't seem to have enough space available for use, and may not be present on most hardware
Even if the TPM isn't present, there might be a convenient LPC bus connector intended for the TPM where you could connect a custom piece of hardware to store your error logs.