Creating a hard link to a directory…. 

Newsgroups: comp.os.linux.misc
>I'm trying to make a hard link from a directory in my home directory to
>/mnt/robbo.

Others have already identified the issue and fixed it (use ln -s), but I will merely comment on a more technical aspect of this problem.

There are actually two representations of a file or directory; it is also worth noting that a directory is a file — a special type of file which can only be manipulated by the kernel, specifically the file system responsible for the volume upon which the directory resides. In this respect, Linux ext2fs isn't that much different from DOS FAT/FAT32, except for the fixed-size root directory — or probably from NTFS (NT's default file system), although I don't know the details.

The first representation is the familiar path representation, which starts at the root or the current directory — one can call this the "naming tree" (and it actually was in at least one OS), even though this is not strictly accurate; see below. This is the one that all users and most programmers see. Note that there's no drive specifier in Unix; all paths start from the root, which is the root of the first mounted volume. Subsequent volumes are mounted lower down in the naming tree; for example, an installation may associate / with /dev/hda1, swap with /dev/hda5, /usr with /dev/hda6, and /var with /dev/hda7 (this association is managed by either manually using the mount(1m) command, or by editing the file /etc/fstab, which is read upon bootup and 'mount -a'. The name resolver knows what to do if fed a pathname such as /usr/local/bin/netscape; it will get the file from /dev/hda6's filesystem with the path '/local/bin/netscape'. This makes things more flexible than the "Map Network drive" command in Windows, or even the '\\server\sharename' path form, as it is controlled by the node requesting the mount, not the node sharing the information.

I will call the shortest possible pathname to access a file or directory (without symbolic links) the "canonical pathname" in the following.

The second representation is using an internal number, called an inode number. This representation is only of interest to hardcore developer types who like to muck around in the file system — or do something unusual to repair a damaged one. However, understanding this concept helps to explain what a "hard link" is. (By convention, the root inode of an ext2-formatted volume is always 2, for some reason; this predates Linux. Note that this isn't a given for other volume types; a FAT volume has a root inode of 1 in Linux, for example. (It's somewhat arbitrary as a FAT volume has that weird root directory contiguous area, anyway.) )

A soft link is just a text representation, relative to the link's parent. If /a/b/c/d is a soft link containing the text "../../e/f/g", attempting to open /a/b/c/d results in the resolving of the text /a/b/c/../../e/f/g, which (usually!) leads to the path /a/e/f/g, which may or may not exist. I am not sure of the details of this at this time, but if you're really interested, you can peruse the kernel source code, starting at /usr/src/linux/fs/namei.c.

This can get rather involved if multiple soft links are encountered. Most systems, Linux included, will stop at 10 or so links, and return the error EMLINK ("Too many links").

A hard link, by contrast, is actually *another* file entry. As you may have already noticed, there is no requirement that the mapping between canonical root path and inode be one-to-one; one can easily envision /a/b/c/d and /a/e/f/g pointing to the same inode. This is precisely what 'ln' (sans -s option) does:

ln /a/e/f/g /a/b/c/d

will create a file or directory /a/b/c/d, while taking the inode number from /a/e/f/g, whatever it is. (One can list the inode number by using ls -i.)

That's all a hard link is.

Once created, a hard link is indistingishable from the file it is linking to, except perhaps for the modification time of its containing directory.

A couple more things.

First, there's the concept of links (stat.st_nlink; see /usr/include/bits/stat.h) — just to confuse things even further. When a directory refers to a file or another directory, st_nlink is incremented for the file referred to; this means that files normally have a link count of 1, and directories have a link count of 2 + subdirectories (remember that . and .., which are present in every directory, have to have their counts adjusted, too — but . is itself, and .. is the parent). find(1) uses this information to attempt to optimize its directory scanning; the -noleaf option disables this optimization if necessary.

In other words, stat.st_nlink is a link *count*.

(The link count is the first number in an ls -l output, just after the permissions and before the owner.)

However, if an object has been hard-linked, the link count increases; if a file has 3 names, its link count will be 3. This means that the "naming tree" is in fact a directed-acyclic-graph, if not worse.

If one deletes a name, the link count decreases; the space on the volume used by that file or directory is actually reclaimed only when the count goes to 0.

If one renames a hard-linked file, the inode doesn't change, just the name. All other names referring to that file still refer to the same file.

Note that soft links don't increment the link count and can become "broken" if the object being referred to (or one of its ancestor directories) is removed or moved away. One can also create a file through a soft link (but all the directories above it must exist), but a hard-linked file must exist at the time of the hard link. Also, soft links can refer to another object; check out /lib, for example, for a straightforward application of this technique:

$ ls -l /lib/libc*
-rwxr-xr-x   1 root     root      4101324 Feb 29  2000 /lib/libc-2.1.3.so
lrwxrwxrwx   1 root     root           13 May 15 18:59 /lib/libc.so.6 ->
libc-2.1.3.so
...

Second, as another poster has pointed out, one can do very bad things with hard links, if one is careless; the typical problem is a directory reference loop. Also, it's not clear to me that find will function quite right, as the link count for the directory will be slightly off. (It's probably not a big problem, as I suspect it's used to allocate an internal array; it would only be a problem if the count is too *low*.)

Hope this helps. :-)

ewill @ aimnet.com

documented on: 2000.10.25 Wed 00:05:57