mallard wrote:
Brendan wrote:
unexpected hot-unplug under load
The only "proper" way to handle that is to alert the user that their data is most likely lost and the filesystem corrupted. The "dirty" bit on most reasonable filesystems provides an indication that recovery is necessary. If it occurs on your "system drive" (the device that contains vital OS files or swap space) you have no choice but to "panic" (or
maybe terminate every process that has data swapped-out to the now-unavailable device, assuming that's nothing critical). You
might be able to get away with a "please re-connect that device immediately" type message in some very limited cases.
Why only in some very limited cases? Assuming the device itself is left in a sane state and assuming you have the "core OS" in RAM, what issues are there? You can halt all processes which try to access swapped out RAM.
Especially for non-system drives this shouldn't even be that hard.. And no data should be lost, all the queued write ops should be possible to complete once the device is plugged back in.
mallard wrote:
Brendan wrote:
TRIM
Except on very early model SSDs, TRIM varies from useless to dangerous. Competent SSDs internal garbage-collection routines are
far more capable and TRIM often has a severe and unpredictable performance penalty (especially on devices that only support the non-queued version). In the best devices it's simply a no-op and in the worst it's buggy and corrupts your filesystem. Unless you've got the resources to test every SSD (family) on the market to work out the small minority of devices where it's correctly implemented and actually improves performance, it's best avoided.
Any references for that? I wasn't aware of TRIM being useless, and tried to quickly google but came up with nothing. The TRIM Wikipedia article also says that it's useful.
I know some devices have buggy firmware, but that applies to pretty much all hardware. You don't need to test every device out there, it's reasonable to expect devices to work according to spec and blame manufacturer for hardware defects. Also you can just look at Linux source to find out which devices have issues, the Wikipedia article also lists these. It's only a handful of models/families really..
mallard wrote:
If you also have an encrypted filesystem (something any serious OS should support in 2017) it's even worse, because even in unlikely case that it's implemented correctly by the hardware and necessary for that device, TRIM reveals metadata about which blocks are/are not used by your filesystem over an insecure channel. This metadata is then stored unencrypted in the "private space" on the SSD, where it's easily available to any sufficiently determined attacker.
I agree encryption is important. Apart from completely crippling performance, I'm not sure there's much you can do..
AFAIK, even if you don't use TRIM you still have the same issue. The way "better" SSD's handle performance is by having an internal pool of unused blocks (so say a 512GiB drive actually has 512+x GiB of storage, only exposing 512GiB to user) and uses those for writing and then it can deallocate (TRIM) the old used blocks. The main issue is that it's rather slow to erase a block, so you have to keep a pool of already erased blocks to have decent performance. Whether the FS/OS does it or the SSD does it internally, does it matter?
I haven't really done much research into the security implications of TRIM w.r.t. encryption, so I'd like to know whether the SSD's internal "TRIM pool" already has the same effect on encryption and if it does then using TRIM shouldn't harm..?
As for SMART, I never liked that its not as good as it should've been. Realistically there's only a handful of failure modes for HDD's (and probably just a handful for SSD's as well), and there's only a few HDD manufacturers which means they each have decades of experience. Being able to detect in imminent failure in many if not almost all cases is something they should be able to do, but it seems they really don't.. One problem of course is that there's a conflict of interest in the consumer drives, notifying the user of imminent failure (which might still take months) might cause the user to replace the drive under warranty, while not notifying the user and allowing data loss months later might occur after the warranty has expired. So the manufacturer has no real incentive to provide good SMART...