onlyonemac wrote:
LtG wrote:
1) If the system uses swapping, this can happen at any later time anyway, so it's never guaranteed that a part of the app is in RAM
True, although I believe that some operating systems make it possible to "lock" time-critical parts of the application in RAM.
Would this not be the same? If there's a special time-critical section that must be locked in RAM, then it can make the same requirements with lazy loading too. Also I would think this is also needed for security stuff too, like crypto keys, you don't want them ending up on the HDD. Here it might also be useful to be able to hint at the reason why RAM locked memory is needed, if it's for performance or security/criticality. The latter being a "promise" and causing the process to prefer dying to being swapped to HDD.
onlyonemac wrote:
LtG wrote:
2) Who is doing the verification? The app or the system? If the app, like some games do, then it needs to read all of it's own memory, that will trigger #PF's and it can generate hash, no issue. If the system, then clearly this is a system that cares about integrity, most don't seem to. If the system cares about integrity then I don't think it's necessarily a good idea to only check hash of apps, but all files. If hash is checked for all files then you might want to do it in block size (like file system clusters, maybe 4kB, 16kB, etc), and thus the #PF always loads the mmap'd stuff in said block/cluster sizes and calculates the hash.*
It didn't occur to me that verification of this form is not common, but if it is done then memory-mapping the application file and then computing the checksum seems a bit of a "sideways" way of doing it; it would be better to just read the application file into memory and then compute the checksum. But I guess that that's ultimately a matter of personal choice, and not much of an issue anyway considering that this isn't common. (I wasn't really thinking of verifying all files, but if that was done then yes it should be done at a filesystem level where the filesystem can return an error if the checksum fails, although that wouldn't catch against malicious altering of the data on disk as the filesystem checksum could be updated accordingly, whereas I was thinking of a verification with another trusted source.)
I'm actually not 100% sure how common it is, but I think it's mainly/only done on the FS level, as I think it should. Apps are more or less useless if their data is corrupt. Also, if the FS does it, then the lazy-loading and #PF doesn't need to care about it, so I don't think it's sideways. Assuming you do it for all files, then referring to my previous point of large files, would you really want to have checksums/hashes on a file level or a block (what ever size it may be) level?
I might not be understanding what you mean by checking the checksum of the app..? You now mentioned "another source", which leads me to believe you might mean some type of signature..? There's at least two separate reasons for doing integrity verifications, checking that it hasn't become corrupt on the HDD and checking that it hasn't been tampered with, where arguably restricting it only to apps might make sense.
For the "another source" you mentioned, what source? If the OS can protect the checksum in some special "installed apps" file, why can't the OS protect it in the FS itself? And if it can, then I think it makes most sense to check all files for corruption (caused by anything) and it makes sense to check it at some block size, due to overhead of checking large files.
onlyonemac wrote:
LtG wrote:
3) I think you need to prepare for this in any case, if the app is installed on the removable device, chances are that some of it's data is there as well. The app might at any moment be in the middle of reading said data and if the user removes the device then the app has the same issue now.
It's the application's responsibility to make sure that it loads all critical data at the time that it is started, or is able to work without that data if it is unavailable at a later time; an application can't be expected to account for removal of the storage device at any time during its execution, otherwise we're pretty much back to the "memory-map the file and then force the whole thing to load via page-faults so that it's guaranteed to be available" situation where reading the file into memory the proper way would be better.
If the file will be needed fully sequentially for example, it would be useful to hint it to the OS, so it can go and load all of it in preparation, but still I would want another core to do it, if available. If not available, then it might make sense if the hint is reliable to do it in one large load to avoid switching between app data (or code) usage and loading of said data/code. For normal apps though, the app usually shows some dialog and won't need all of its own code until the user does something, so I'd much prefer the app shown instantly and before I type in something and click some buttons which might take a few seconds to a few minutes until the app actually needs the rest of its own code, this would happen in the back ground. So overall, you should get equal or better responsiveness with little to no extra overhead. Best case scenario the app starts instantly and another core starts loading data/code to RAM while you can use the app normally.
Regarding the checksums and lazy-loading:
- FS is responsible for doing checksums, these happen every 4kB
- Paging just does paging, doesn't care about mmap, FS or checksums
- Mmap reserves virtual memory for the entire file, loads the first 4kB page from FS (checksum done by FS) and marks rest of the pages as NP (or anyway to cause #PF when accessed)
- App loader mmap's app in memory and jumps to start/main()
The first page is present and already checked for integrity, so no issues. From now on when ever the app accesses a page that causes a #PF we need to do something. The something depends on what the reason is, if it's due to mmap being lazy we just need to ask the FS for the block we need and place that in RAM and adjust the page table entry and return to the app. Note, it doesn't matter if the mmap'd memory is something the app loader did (the app binary) or mmap's the app itself has requested for data files it needs, all of these are dealt with identically, all will have their contents verified.
The only issue is what to do if the verification fails, if the data can't be read, etc, which is exactly what this topic is about. The easy way out would be to not support mmap, but then all apps will need to do it themselves, need (or should) do it a page at a time (or at least HDD sector), otherwise you end up multiple pieces of the system/app doing their own inefficient buffering.
All in all, I'm pretty convinced that signals/exceptions/callbacks is the only sane way to handle issues, reporting the issue to the app so it can decide how to proceed, as it would have to if it used a plain read() instead of mmap(). The only difficulty (for app devs) is that at that point the app might be "committed" more to what ever it is doing and, since resolving the issue is next to impossible, now it has to "uncommit" itself.
NOTE: Even with read() the app can easily become "committed" and face the same issues. For example the app starts to process a large video file and half way thru there's an unreadable sector on HDD, now it needs to decide what to do even though it's half way thru.
PS. I think Windows still doesn't use checksums/hashes for FS integrity in general, not sure about the ext2/3/4, ZFS on the other hand should have pretty comprehensive has usage for pretty much everything.