I had an idea, ages ago, which I posted on my blog, about a distributed file hosting system. The post is still up there, however the idea is incredibly unpolished. I found myself thinking about it this morning in the shower, however, and wanted to get you guys' opinions.
So, the idea is this: a file hosting system, a la DropBox et al, but instead of a central hosting company storing the files, the people who use the service host the files. For example, if you purchase a 2TB drive, for instance, and sign up to the service, you can then put the 2TB drive on there, and get 2TB of hosting space in return, somewhere else in the world. Now, this sounds pretty much like a waste of time, but hear me out. It can be used to access your files anywhere with the service running (which if it becomes popular enough, could be anywhere...), by typing in your ID#/username (not decided which to go for yet). Furthermore, redundant backups will be made, with CRCs generated. Chances are, I'd go for something similar to a RAID array, so that it doesn't impact on disk space too much.
Obviously it would have to be encrypted, for security reasons, and I would likely do it with a key file, similar to SSH. Furthermore, the CRC hashes would be taken of both the encrypted, and unencrypted files. So, when the user decides to make a new file, the system would generate a CRC of the file, encrypt it, using the key file, download a list of (for example) 20 hosts, with enough space to store the file, send it to two of them, with the CRC, who then generate the same CRC on the encrypted file, and if the two match, store it. If they don't request the file is resent. Then, the user can view all of his/her files stored on the network, via a mountpoint (likely, I will write a driver that can match this in the fstab; Windows might come later, but I'm a Linux developer when not OSDevving). When a file is requested, one of the machines storing the file is selected, at random, and requested to send the encrypted file back. This will then check the users credentials, to make sure that the file really belongs to them (custom file system is probably in order, likely a modification of EXT4 or BtrFS), regenerates a CRC hash, and check that it matches the one on disk. If it does, send the hash, and the encrypted file to the users machine, which then generates another hash. If it doesn't match, tell the user, and use the other machine. If the user generated hash doesn't match the one sent down the wire, request a retransmit. The users machine will then decrypt the file, with the key file, and check the unencrypted hash as well.
I know it sounds complicated (at least, to me it does), but I think this could be quite useful for storage management, and backups. By donating a drive/partition to the system, and getting the equivalent storage space in return, automatically backed up (as I said, I don't know if it will be a full backup, or parity files/RAID system-esque), for better security, and better disaster-proofing.
Obviously, a list of hosts will have to be generated, so I intend to tackle this problem like this: when the user downloads the source/binaries (I intend to open source the whole thing, as closed source defeats the whole idea), a list of servers will also be downloaded. These servers maintain a list of machine id->ip address mappings, which are consulted. When a user stores a file, an entry is generated on the server, tracking which machine stores the file, which then propagates to all other servers. When a machine changes IP address, it contacts the server, which updates this. The user also gets the choice of becoming a hosts server themselves, which would also grant them some storage. Obviously, as the network grows, these servers would become more distributed; each server would store a subsection of hosts, and a subsection of files, and the system itself would ensure that each entry is stored at least 2-3 times for redundancy.
There would also be precautions against physical machines being too close to each other, both physically and virtually, by disallowing a file to be stored in any machine less than 10/8 ip addresses away. For example, if my IP is 55.55.55.55 (hint: it isn't), I could store on 45.55.55.55 and below, and 65.55.55.55 and above, but not in between. This is a naive way of doing it, but it largely protects against two hosts being on the same ISP, which means they are separated virtually, and probably (though not definitely) separated physically as well.
|