OSDev.org

The Place to Start for Operating System Developers
It is currently Tue Mar 19, 2024 1:56 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 11 posts ] 
Author Message
 Post subject: Filesystem design: B-trees
PostPosted: Wed Dec 26, 2018 2:21 pm 
Offline
Member
Member
User avatar

Joined: Sun Jan 17, 2016 7:57 am
Posts: 51
I've been reading about ext4 and NTFS. They both seem to use variants of B-trees to index their directories to improve file lookup time.
I was wondering whether it would make sense to use a single B-tree for the entire volume which contains every file and folder, using a hash of their full path and filename as keys. Therefore when a file is opened, the filesystem driver does not need to traverse every leading directory, but only a single B-tree.

So...
  • Why do these filesystems index at directory level with filename keys, rather than at the volume level with path+filename keys?
  • What are the advantages and disadvantages of doing this?
  • I think the answer is no, but would a (large) hash table be more efficient?

Thanks.


Top
 Profile  
 
 Post subject: Re: Filesystem design: B-trees
PostPosted: Thu Dec 27, 2018 4:14 am 
Offline
Member
Member
User avatar

Joined: Thu Mar 10, 2016 7:35 am
Posts: 167
Location: Lancaster, England, Disunited Kingdom
A hash system would hinder a search on a partial name - a useful facility.
You could, of course, provide a hash search just as one option.


Top
 Profile  
 
 Post subject: Re: Filesystem design: B-trees
PostPosted: Thu Dec 27, 2018 4:17 am 
Offline
Member
Member
User avatar

Joined: Tue Dec 25, 2007 6:03 am
Posts: 734
Location: Perth, Western Australia
Smaller maps are more time efficient to search (both due to being faster to load, and having less items to check), and since filesystems are for storage mediums where space isn't at a premium, it's more important to optimise for faster accesses (and notably avoiding needing to load lots of data from the disk)

Having each directory store its contents also improves locality. If you've just opened one file from a directory, then it's likely you're going to need another one sooon. If the first file's record is within the same block as the second, then it will already be loaded into the disk cache.

There's also the desire to be able to (cheaply) list a directory's contents, which necessitates grouping each directory's entries.

_________________
Kernel Development, It's the brain surgery of programming.
Acess2 OS (c) | Tifflin OS (rust) | mrustc - Rust compiler
Currently Working on: mrustc


Top
 Profile  
 
 Post subject: Re: Filesystem design: B-trees
PostPosted: Thu Dec 27, 2018 5:37 am 
Offline
Member
Member
User avatar

Joined: Sun Jan 17, 2016 7:57 am
Posts: 51
thepowersgang wrote:
There's also the desire to be able to (cheaply) list a directory's contents, which necessitates grouping each directory's entries.


Apologies that my question wasn't very clear.
I am intending to have the B-tree for path to file/directory lookup, not to replace the directory entries. Each directory still has list of directory entries containing filenames and inodes* (allowing fast directory enumeration), and the nodes of B-tree point to these directory entries.

*I actually am thinking about storing inode data inline in the directory. For a file with multiple hard links, there will be one "master" entry that contains the inode information, and that the other hard links point to. If the master link is deleted then another hard link is chosen as the master, and the inode information is moved.


Top
 Profile  
 
 Post subject: Re: Filesystem design: B-trees
PostPosted: Fri Jan 11, 2019 1:46 am 
Offline
Member
Member

Joined: Thu May 17, 2007 1:27 pm
Posts: 999
thepowersgang wrote:
There's also the desire to be able to (cheaply) list a directory's contents, which necessitates grouping each directory's entries.

That's an operation that can also be supported with a global b-tree: If the tree is sorted by path depth (number of slashes) first, and then by full file system path, the contents of a directory are placed next to each other.

_________________
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].


Top
 Profile  
 
 Post subject: Re: Filesystem design: B-trees
PostPosted: Sun Jan 13, 2019 6:31 am 
Offline
Member
Member
User avatar

Joined: Sun Jan 17, 2016 7:57 am
Posts: 51
After implementing a single global file B-tree in my new filesystem, I have found the following problems:
  • You can't implement symbolic links, since you'd have to lookup each leading directory in the tree to check if was a link (which defaults the purpose of having the single tree, as the whole point was to only need 1 tree search operation to open a file)
  • You can't implement directory permissions, since you'd have to lookup each leading directory
  • You can't move directories without iterating over every child, removing them from the index tree, recalculating the hash of their path, and inserting them into the tree again.
With these problems, I've decided to switch to a more standard per-directory index tree. I'd be interested to hear if anyone has solutions for these problems though!


Top
 Profile  
 
 Post subject: Re: Filesystem design: B-trees
PostPosted: Sun Jan 13, 2019 9:44 am 
Offline
Member
Member

Joined: Thu May 17, 2007 1:27 pm
Posts: 999
I think you have to accept that you need to look at each component of the path. IMHO that is not the goal that a single global B-tree tries to solve. Rather, I'd see them as a solution to fragmentation problem: if you store one B-tree per directory, you end of with lots of wasted blocks (because all directory entries fit into the root of the B-tree). The directory-move problem is solved by the "order by depth, then path" strategy that I described above.

Another issue though: Why are you inserting hashes into the tree instead of dealing directly with filenames?

_________________
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].


Top
 Profile  
 
 Post subject: Re: Filesystem design: B-trees
PostPosted: Sun Jan 13, 2019 11:37 am 
Offline
Member
Member
User avatar

Joined: Sun Jan 17, 2016 7:57 am
Posts: 51
Korona wrote:
I think you have to accept that you need to look at each component of the path. IMHO that is not the goal that a single global B-tree tries to solve. Rather, I'd see them as a solution to fragmentation problem: if you store one B-tree per directory, you end of with lots of wasted blocks (because all directory entries fit into the root of the B-tree).

Does the root of the B-tree have to be in a separate block to the rest of the directory metadata? With 1KB directory entries, I have roughly 900 bytes of leftover space for inline data, in which I could fit about 28 keys (with an 8 byte hash). Looking at my Linux partition, 90% of folders have less than 28 files. So maybe a B-tree per directory isn't that much of a problem?

Korona wrote:
The directory-move problem is solved by the "order by depth, then path" strategy that I described above.

I don't understand how this works, sorry. Surely you still have to move all the keys to a different place in the tree, and update their path?

Korona wrote:
Another issue though: Why are you inserting hashes into the tree instead of dealing directly with filenames?

I'm assuming using fixed size keys is simpler than variable size paths/filenames.


Top
 Profile  
 
 Post subject: Re: Filesystem design: B-trees
PostPosted: Sun Jan 13, 2019 11:59 am 
Offline
Member
Member

Joined: Thu May 17, 2007 1:27 pm
Posts: 999
nakst wrote:
Korona wrote:
I think you have to accept that you need to look at each component of the path. IMHO that is not the goal that a single global B-tree tries to solve. Rather, I'd see them as a solution to fragmentation problem: if you store one B-tree per directory, you end of with lots of wasted blocks (because all directory entries fit into the root of the B-tree).

Does the root of the B-tree have to be in a separate block to the rest of the directory metadata? With 1KB directory entries, I have roughly 900 bytes of leftover space for inline data, in which I could fit about 28 keys (with an 8 byte hash). Looking at my Linux partition, 90% of folders have less than 28 files. So maybe a B-tree per directory isn't that much of a problem?

Yeah, but the idea would be to not require a full block per directory, exactly because most folders do not contain many files.

nakst wrote:
Korona wrote:
The directory-move problem is solved by the "order by depth, then path" strategy that I described above.

I don't understand how this works, sorry. Surely you still have to move all the keys to a different place in the tree, and update their path?

If you tree is ordered by path depth first (i.e. number of slashes in a path) and the path itself second, all entries from a given directory are next to each other. The other would be:
foo/bar1
foo/bar2
foo/baz/bazbar1
foo/baz/bazbar2
and so on. Because the entries are adjacent in the tree, to move the full directory (including subdirectories), you only have to: (1) find the B-tree node that contains the first file in the directory, (2) find the B-tree node that contains the last file in the directory, (3) move the entries out of the first and last block to their new location and (4) rewrite the blocks in between the first and the last block, without actually copying data. You could store path names relative to some (suitably chosen) parent B-tree node to compress the paths and avoid making changes to the paths during a move.

I did not look into this in detail though, so those are only some rough suggestions. It might make sense to look into btrfs that does store the whole FS in a single B-tree.

_________________
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].


Top
 Profile  
 
 Post subject: Re: Filesystem design: B-trees
PostPosted: Mon Jan 13, 2020 2:43 pm 
Offline
User avatar

Joined: Sun Dec 29, 2019 10:59 pm
Posts: 17
Apart from the global B trees, it might make sense to use thresholds to switch between algorithms at different directory sizes. Hashing has some limits with searches on partial keys, though I think POSIX-like system call API's don't really do much with that. There are variations on tries, hash tries, which use open-addressed hash tables in lieu of arrays for mostly-empty nodes that are good for sparse key spaces I've heard of being used in external algorithms for databases, though one can use one level open-addressed hash tables in a pinch. I have rarely seen filesystems use hash tables or hash tries and am likely not experienced enough with filesystems to say why. I've only seen them used as external i.e. on-disk indexing structures in databases.


Top
 Profile  
 
 Post subject: Re: Filesystem design: B-trees
PostPosted: Mon Jan 13, 2020 4:12 pm 
Offline
Member
Member
User avatar

Joined: Thu Oct 13, 2016 4:55 pm
Posts: 1584
nakst wrote:
I was wondering whether it would make sense to use a single B-tree for the entire volume which contains every file and folder
The HFS+ did exactly that. It had 3 B-trees, one for the filenames, one for the extents (allocation table-like info), and one for the extended attributes. It was good for many years, but now Apple has obsoleted that in favor of APFS (not sure if that has a single B-tree catalog too).

So I guess you could do some performance tests on a HFS+ volume to see if a single B-tree worth it. Also, the source code of hfsplus is publicly available, so you can peek how to implement this efficiently.

Cheers,
bzt


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 11 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: songziming and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group