yr wrote:
In my current implementation, the kernel accepts only absolute, canonical paths (i.e., “.” and “..” folded out, no extraneous separators, etc.), and handling of relative paths is done entirely in the libc.
This concept fails the moment symlinks containing relative path names enter the mix. Unless you have libc run a userspace version of realpath(3) on every single path.
yr wrote:
The kernel usually tracks the current working directory (in addition to the root) for each process, not as a path, but as a vnode.
That is correct. Note that this also allows the CWD to be inherited without user space cooperation, which is quite the important behavior for many shell utilities. "rm -rf ." means something very different whether it is executed in some deep subdirectory or in the root.
yr wrote:
This means that handling “..” requires the kernel to track each directory’s parent node, which in turn means that there must be a unique parent node for each directory. Presumably, this is one of the reasons why multiple hard links to directories are problematic for most unix implementations.
Well, one reason. Symlinks are enough of a headache as it is, since they turn the file tree into a directed graph. Hard links on directories would only add more edges to that graph, and ones that are not specially marked.
yr wrote:
This is why most shells implement their own path handling layer to provide more interactive user-friendly behavior.
Oh boy, yes. The problem is that after following a symlink, when you go back with "cd .." you will be in a different directory, and most people will be terribly confused by that. So the shell acts as if the symlink was a directory, which again might confuse people, so now there are also options to "cd" you can use to tell the shell which behavior to use.
yr wrote:
On the balance, so far I prefer my approach, since it seems to be simpler and more flexible (though perhaps slower), but am interested in hearing people’s thoughts. Quite possibly, there are issues I have not considered or encountered yet.
Speed you already mentioned. If I am deep in a directory structure, and I want to open .., then your kernel will look up all the directories leading to my current one except the lowest one, whereas most other UNIX implementations will just look up the parent of the CWD, which is a single lookup.
Flexible? Not sure what you mean there. The structures we are talking about are quite rigid, and nobody wants to have flexibility in how path names are interpreted. BTW, if /symlink is a symlink to /a/b, and somebody opens /symlink/.., do they get /a or /? Because most would expect /a, and indeed most Unices deliver /a.
Another issue is long path names. Linux (and probably most other Unices, but I haven't read their source codes) places a hard limit of PATH_MAX (4096) on the length of a path given to a system call, but files can have a longer absolute name. You just can't refer to them with the absolute name in one go, but you can chdir() to the midway point or something. But granted, this is highly esoteric. Most people get bored of typing a path name after 100 characters or so.
Symlinks with relative paths I already mentioned. Their semantics are such that it is not acceptable to transform the relative path into an absolute one, since they must retain their relative target even if the directory is moved or mounted elsewhere.
yr wrote:
Any thoughts/comments on the above are welcome. If you have tried either of these approaches, or any others, am interested in hearing about any lessons learned.
I am going to handle relative paths in kernel; there's just no way around it. An approach I am still undecided on is to turn the CWD into a normal FD. And then just have four default FDs that are inherited from process to process. The reason I am hesitant is that I fear that shell scripts using FDs directly might overwrite FD 3, and then the working directory is gone. Maybe having a vnode that cannot be closed is a good thing. But on the other hand, that approach and all the *at() system calls would immediately rid me of all the special handling for the CWD in path name lookup. But I will need special handling for the root, because that is not supposed to be easily changed. So if I have special handling anyway, what is one more case?
Good thing I won't be at that point for a while longer.