OSDev.org

The Place to Start for Operating System Developers
It is currently Thu Mar 28, 2024 6:48 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 13 posts ] 
Author Message
 Post subject: A very simple file system design
PostPosted: Mon Jun 09, 2014 2:51 pm 
Offline
Member
Member
User avatar

Joined: Tue Oct 17, 2006 11:33 pm
Posts: 3882
Location: Eindhoven
Hi all,

I'm back to the OS dev direction again. I'm reworking my entire OS code, loader code and bootup code and ended up with loading files from a filesystem, where I decided to make a simple filesystem design that basically does the same thing as FAT - ie, simple file storage with no complicated structures - but seen from a 2014 perspective.

My current design:

Code:
struct extent {
  uint64_t startblock;
  uint64_t length;
};

struct inode {
  uint256_t filehash;
  extent storage;
  uint64_t filesize;
  time_t creation;
  time_t modification;
  time_t access;
  uint32_t flags;
  uint32_t pad[12];
};

enum {
  INO_SYSTEM =0x0001,
  INO_MODIFIED = 0x0002,
  INO_READONLY = 0x0004,
  INO_HASHSTALE = 0x0008,
  INO_EXTENT_INDIRECTION = 0x0030,
  INO_DIRECTORY = 0x0040
};

struct direntry {
  uint64_t inodeno;
  uint32_t filenamelength;
  char[] name; // null-terminated, length includes terminator
};

struct directory {
  uint32_t formatid = 1;
  uint32_t entrycount;
  direntry first;
};


Basic idea is, a partition is split into 4k blocks starting from the exact start. Everything occupies blocks. Every thing on the disk is stored as files too; including the inode table itself. The disk contains (currently thought out) 6 files that are pretty much always present; the first three in fixed locations:

0. Boot block, 8 blocks (32KB) from the exact start (block 0).
1. Partition header (undefined currently), describing the partition where necessary. May be dropped if not useful. 4KB at block 8.
2. Inode table. Contains description for itself, so it is as long as it needs to be. Always contiguous though.
3. Root directory, lists file names.
4. Free block list, contains all extents that are not a part of an existing file.
5. Bad block list, contains all blocks that logically exist, but should not be used for another reason.

To read a file,

1. Read inode table first block.
2. Read root directory fully, find file you want.
3. Read inode for subdir, find file you want (repeat as necessary)
4. Read inode for file, read file.

To boot,

1. Load boot block fully from boot sector (if your platform requires)
2. Load files as desired, load any other things as desired.


Any ideas? Just wanted to reflect on this idea for a bit to see if it could be simplified or if there are major issues for writing.


Top
 Profile  
 
 Post subject: Re: A very simple file system design
PostPosted: Mon Jun 09, 2014 9:17 pm 
Offline
Member
Member
User avatar

Joined: Mon Jun 05, 2006 11:00 pm
Posts: 2293
Location: USA (and Australia)
It's unclear how you handle chaining multiple blocks together for large files (maybe I'm overlooking something implied?)

_________________
My OS is Perception.


Top
 Profile  
 
 Post subject: Re: A very simple file system design
PostPosted: Mon Jun 09, 2014 9:49 pm 
Offline
Member
Member
User avatar

Joined: Wed Dec 01, 2010 3:41 am
Posts: 1761
Location: Hong Kong
You may want to take a look into File_Systems, especially SFS.

Anything aim for simplicity but more complex than a "file list" and "file content" defeat the purpose of being simple, and not worth inventing.


Top
 Profile  
 
 Post subject: Re: A very simple file system design
PostPosted: Tue Jun 10, 2014 12:04 am 
Offline
Member
Member
User avatar

Joined: Tue Oct 17, 2006 11:33 pm
Posts: 3882
Location: Eindhoven
MessiahAndrw wrote:
It's unclear how you handle chaining multiple blocks together for large files (maybe I'm overlooking something implied?)


No, I just did a terrible job of explaining. There is a bit mask in the flags, INO_EXTENT_INDIRECTION, which is either 0, 1, 2 or 3. If it's zero then the extent maps the storage area for the file directly. If it is 1, then the extent contains extends, which contain the file. If it is 2, then the extent contains extents that contain extents, that contain the file area. Similarly for 3.

The idea is, if you have a sequentially stored file you just store "starts at X, length Y". If you have a fragmented file, you have a list of such fragments, to which a single extent points. Two or three should only be necessary for very sparse files or for very fragmented file systems. You are always guaranteed to be able to store files up to (4k / 16b = 256) ^ 3 * 4k = 64 GB, assuming enough free space and a lot of fragmentation. The idea is similar to ext2, except with extents.

bluemoon wrote:
You may want to take a look into File_Systems, especially SFS.


I helped design it and I know that it is not suitable as boot partition, as the wiki itself also claims. It does not support directories or fragmentation and both of those are IMO required for a serious file system that's not just used for transport - which SFS was intended for.

Quote:
Anything aim for simplicity but more complex than a "file list" and "file content" defeat the purpose of being simple, and not worth inventing.


Make it as simple as it can be, but no simpler. I want it to support fragmentation (so you don't need to defragment to use it at all) and I want it to support directories, so you do not end up with everything in a single folder.


Top
 Profile  
 
 Post subject: Re: A very simple file system design
PostPosted: Tue Jun 10, 2014 12:36 am 
Offline
Member
Member
User avatar

Joined: Wed Dec 01, 2010 3:41 am
Posts: 1761
Location: Hong Kong
Candy wrote:
Make it as simple as it can be, but no simpler. I want it to support fragmentation (so you don't need to defragment to use it at all) and I want it to support directories, so you do not end up with everything in a single folder.


I just misunderstood what you meant simple, I though it's simple for developer - which tends to lack features; and what you wanted is simple but with some certain non-trivial features.
IMO This put it onto same level as FAT or EXT, it is still simple to some extend, and is fine.

Back to the topic, you may want to add "meta block" (maybe many different type of meta block) to describe directory and files, including inode list, tags and other things.


Top
 Profile  
 
 Post subject: Re: A very simple file system design
PostPosted: Tue Jun 10, 2014 12:50 am 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

Candy wrote:
Any ideas? Just wanted to reflect on this idea for a bit to see if it could be simplified or if there are major issues for writing.


Just a random idea....

One of the things I'd be very tempted to do is to split it into 2 halves. The lower half would be responsible for managing storage space, and would provide an interface for storing, resizing and retrieving "arrays of bytes" where each array of bytes has a unique (integer) identifier. The upper half would provide the normal file system interface (for creating, reading and deleting files and directories; including handling all meta-data, etc); and would rely on the lower half to take care of managing the storage space.

The idea here is that you could have several different lower halves (e.g. one optimised for RAM disks, one for SSD, one for USB flash, one for rotating disks, some that do encryption and/or compression, some that are experimental and some that are stable, etc) and several different upper halves (some that use journaling and some that don't, some with different meta-data, some that support for checkpointing and rollbacks, some with "file content search" indexing, some that are experimental and some that are stable, etc).

The end-user would be able to mix & match - choose any lower half and use it with any upper half, to get a combination that's customised for their hardware and whatever they're using that file system for.


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject: Re: A very simple file system design
PostPosted: Tue Jun 10, 2014 5:22 am 
Candy wrote:
I want it to support fragmentation (so you don't need to defragment to use it at all) and I want it to support directories, so you do not end up with everything in a single folder.

So, you want to create "a bit better FAT"? Does it worth the time spent? It can be interesting for personal OS, like a puzzle is interesting to solve, but not for many people except the author. However, I am not in a position to discourage your search for better world.


Top
  
 
 Post subject: Re: A very simple file system design
PostPosted: Tue Jun 10, 2014 6:15 am 
Offline
Member
Member

Joined: Sat Mar 15, 2014 3:49 pm
Posts: 96
What is the filehash in the inode and what's it used for?


Top
 Profile  
 
 Post subject: Re: A very simple file system design
PostPosted: Tue Jun 10, 2014 10:10 am 
Offline
Member
Member
User avatar

Joined: Wed Oct 18, 2006 3:45 am
Posts: 9301
Location: On the balcony, where I can actually keep 1½m distance
Well, indirection count is perhaps one way of solving it, but it puts a hard limit on the number of extents (and if I got the math right, you can run out with a >128MB file on a 512-byte block medium), and doesn't prevent you from doing a inorder traversal of the entire tree to be able to seek to a point near the end. Implementing a linked list by having the first/last extent of an indirect block function as an indirection saves you from an algorithmic problem.



That leaves two half-related queries from my end. Why not just Ext2 (with the 64-bit extensions since you appear to be wanting that) since that saves you a fair share of tool-related issues, and why not/what happened to *fs?

_________________
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]


Top
 Profile  
 
 Post subject: Re: A very simple file system design
PostPosted: Tue Jun 10, 2014 11:46 am 
Offline
Member
Member
User avatar

Joined: Tue Oct 17, 2006 11:33 pm
Posts: 3882
Location: Eindhoven
willedwards wrote:
What is the filehash in the inode and what's it used for?


That was to allow files to be stored once, with a copy-on-write flag on it to save space. It is in the wrong place though, that should be a file system service built on top of this, so I removed it (and the timestamps) in my local update. Will post it tonight when I have a bit more free hand position.

Combuster wrote:
Well, indirection count is perhaps one way of solving it, but it puts a hard limit on the number of extents (and if I got the math right, you can run out with a >128MB file on a 512-byte block medium), and doesn't prevent you from doing a inorder traversal of the entire tree to be able to seek to a point near the end. Implementing a linked list by having the first/last extent of an indirect block function as an indirection saves you from an algorithmic problem.


Blocks are always 4K, making the minimum supported file size 64GB. You have to do an in-order traversal of the full tree. Note that this is a mechanism mostly there for making it work in the worst case, not for use in common or optimal case. Optimally, files are always in one chunk so there is no indirection, typically files will be in one or (up to) 4-5 chunks, making a single indirection enough and the caching or inorder traversal moot. On the worst-case fragmented filesystem it's not going to be fast anyway - I've implemented a worst-case test for an employer once and it crashed both our proprietary system (causing a 30 second clip that should've been written in real time to take 3 hours to write) and the Windows machine used for reading it on later (which just flat out refused & crashed).

I want the setup to be simple to understand and it to be fast in typical and optimal cases, as well as it not breaking in worst case. Optimal case is identical to Brendan's SimpleFS and Microsoft ExFAT, typical case is possible (compared to SFS) and faster (compared to ExFAT).

Quote:
That leaves two half-related queries from my end. Why not just Ext2 (with the 64-bit extensions since you appear to be wanting that) since that saves you a fair share of tool-related issues


Lot of legacy. I'm not so much afraid of having to make tools, but I am afraid of needless complexity of the tools I have to make because of legacy. One of the main reasons for this is to have a really simple file system format, that does not hide any filesystem bits behind weird access, that allows for a simple implementation above all and that is not curled up in legacy like Ext4/3/2/1, or ExFAT/FAT32/FAT16/FAT12.

Quote:
, and why not/what happened to *fs?


Same thing, unnamed as of yet as I'm not sure I want to take it in the same direction as *FS. Thing is, that name was used a while ago and has a certain meaning that this will not have, so I didn't want to tag it with the name, but you may as well consider it a further evolution of what *FS was.


Top
 Profile  
 
 Post subject: Re: A very simple file system design
PostPosted: Sun Jun 15, 2014 6:28 am 
Offline
Member
Member
User avatar

Joined: Tue Oct 17, 2006 11:33 pm
Posts: 3882
Location: Eindhoven
So... I decided to start work on the FS driver, the boot code and an MKFS tool. I found out that you can as a regular user mount & unmount FUSE file systems, which means that if you have a working FUSE driver and an MKFS that makes a blank FS, you're set. So I made them.

Because the design is so simple (intentionally) the mkfs tool is 110 lines of code total. It creates a filesystem of desired size (either a file or a device can be targeted) and will fill those with the FS-required files. The root directory contains them under their own names.

A hexdump after MKFS shows:

Code:
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00009000  24 00 00 00 01 00 00 00  00 80 00 00 00 00 00 00  |$...............|
00009010  00 00 00 00 00 00 00 00  08 00 00 00 00 00 00 00  |................|
00009020  24 00 00 00 01 00 00 00  00 10 00 00 00 00 00 00  |$...............|
00009030  08 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
00009040  24 00 00 00 01 00 00 00  00 80 00 00 00 00 00 00  |$...............|
00009050  09 00 00 00 00 00 00 00  08 00 00 00 00 00 00 00  |................|
00009060  44 00 00 00 06 00 00 00  00 10 00 00 00 00 00 00  |D...............|
00009070  11 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
00009080  24 00 00 00 01 00 00 00  00 e0 fe 00 00 00 00 00  |$...............|
00009090  12 00 00 00 00 00 00 00  ee 0f 00 00 00 00 00 00  |................|
000090a0  24 00 00 00 01 00 00 00  00 00 00 00 00 00 00 00  |$...............|
000090b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00011000  01 00 00 00 06 00 00 00  00 00 00 00 0b 00 24 62  |..............$b|
00011010  6f 6f 74 62 6c 6f 63 6b  00 01 00 00 00 08 00 24  |ootblock.......$|
00011020  68 65 61 64 65 72 00 02  00 00 00 08 00 24 69 6e  |header.......$in|
00011030  6f 64 65 73 00 03 00 00  00 0f 00 24 72 6f 6f 74  |odes.......$root|
00011040  64 69 72 65 63 74 6f 72  79 00 04 00 00 00 06 00  |directory.......|
00011050  24 66 72 65 65 00 05 00  00 00 0b 00 24 62 61 64  |$free.......$bad|
00011060  62 6c 6f 63 6b 73 00 00  00 00 00 00 00 00 00 00  |blocks..........|
00011070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
01000000


And listing from Linux shows

Code:
-rwxr-xr-x 1 pebi pebi 16777216 Jun 15 13:35 testfs.starfs
pebi@box:~/projects/atlantisos/mnt$ ls -l
total 24
-rw-r--r-- 1 root root        0 Jan  1  1970 $badblocks
-rw-r--r-- 1 root root    32768 Jan  1  1970 $bootblock
-rw-r--r-- 1 root root 16703488 Jan  1  1970 $free
-rw-r--r-- 1 root root     4096 Jan  1  1970 $header
-rw-r--r-- 1 root root    32768 Jan  1  1970 $inodes
drw-r--r-- 6 root root     4096 Jan  1  1970 $rootdirectory
pebi@box:~/projects/atlantisos/mnt$


Next thing is to add write support so I can write to it from the FUSE driver too. You can already recurse into $rootdirectory and get to the same dir again, so that looks to be working. Not the intended use of course, but it is a logical consequence.

For others following along, the updated structures are as follows:

Code:
#define INVALID_INODE 0xFFFFFFFFFFFFFFFFULL
#define INO_EXTENT_INDIRECTION 0x00000003
#define INO_SYSTEM 0x00000004
#define INO_COPY_ON_WRITE 0x00000008
#define INO_MODIFIED 0x00000010
#define INO_READONLY 0x00000020
#define INO_DIRECTORY 0x00000040

struct extent {
  uint64_t startblock;
  uint64_t length;
};

struct inode {
  uint32_t flags;
  uint32_t linkcount;
  uint64_t filesize;
  struct extent ext;
} *inodetab;

struct dirformat {
  uint32_t formatid;
  uint32_t entrycount;
};

struct dirent {
  uint32_t inodeno;
  uint16_t filenamelength;
  char *filename;
} __attribute__((__packed__));


Directly grabbed from the source code.

Any ideas? I'll see if I will open source the tooling; the design is open in any case.


Top
 Profile  
 
 Post subject: Re: A very simple file system design
PostPosted: Tue Jun 17, 2014 10:51 am 
Offline
Member
Member

Joined: Mon Apr 09, 2007 12:10 pm
Posts: 775
Location: London, UK
User-defined properties as name/value pairs? e.g. to support metadata on files or ownership/ACL information. You wouldn't have to add support yet, but maybe just reserve another entry in the dirent structure which points to a separate inode containing properties for the file/directory in question (if it has any).

Regards,
John.

_________________
Tysos | rpi-boot


Top
 Profile  
 
 Post subject: Re: A very simple file system design
PostPosted: Thu Jun 19, 2014 12:23 am 
Offline
Member
Member
User avatar

Joined: Tue Oct 17, 2006 11:33 pm
Posts: 3882
Location: Eindhoven
Sounds good. Will add that.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 30 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group