Lecture 12: File System Implementation

CS111 - Winter 2008
Lecture 12 (February 25, 2008)
by Jimmy Nguyen and Stephanie Potter

Indirect blocks again (review from previous lecture)
Note: Shaded cells represent allocated storage that is wasted.
Assume:

Internal fragmentation
For each file size used, the amount of bytes wasted is calculated below:
1 byte file: 8191 bytes + 44 bytes = 8235 (wasted) bytes
80 KiB + 1B file: 8191 bytes (from data) + 4 bytes (from inode) + 8188 (from indirect block) = 16383 (wasted) bytes

Therefore, as file size grows, the amount wasted due to internal fragmentation (reserved space not being used) decreases. Similarly, for small files, a lot of disk space is wasted!

Berkeley Fast File System
This file system breaks blocks into fragments (e.g., 1K and 2K) for small files in order to avoid internal fragmentation.

Directories
Directories are particular kinds of files whose contents point to other files.

The following is an example of a directory's structure:

The inode table:

inode table

owner permissions

file type

Scaling issues
In Unix (1978), a directory with 50,000 entries would need 50,000*16 or 800,000 bytes. As a result, sequential search would be slow.

extfs sequential search
To fix this problem, use a tree structure or hash table within the directory, and search through the new tree or hash table instead. This will search through less bytes and therefore run faster.

Inodes
Inodes (index node, fixed-size file descriptor) are contained with the inode table. They contain a file's size, permissions, owner, the number of links (hard links) to this file, and time stamps. Inodes do not hold the file's name or the directory containing the file. To find the file's name, you would need to look it up using the inode's number in the directory table.

Proposal
Get rid of the inodes. Put each inode's information into its respective directory entry.

Pros
+ There are fewer seeks from the directory to the inode.
+ There are fewer data structures, making it simpler.
+ Deletion is simpler: one less inode entry to delete.

Cons
- This implementation does not support hard links.
- Directories get bigger and are therefore harder to cache.
- Renaming gets messier (if you assume the Unix (1978) file system).

link("a", "b"); writes 1 block.
unlink("a"); writes 1 block.

Links supports multiple names for the same file. However, they require inodes.

File name lookup
To look up file "a",

1.	Start with the current working directory (inode #). Each process has its own window directory. In Windows: (C:A\B and D:G\H)
2.	Read that directory's data (using inode ops).
3.	Look for “a” in its table.
4.	Get an inode number.

To change the directory to "d":
chdir("d")

A)	Look up "d" using (1) – (4) above.
B)	Store the inode number into the process table entry.

Rename and make directory:
rename("./d");
mkdir("./d");
e/f/d new directory

How to look up file name "a/b/c":

I.	Look up "a" assuming II's inode is a working directory.
II.	Check that “a” is a directory. If not, fail with ENOTDIR (you are trying to do something with a file where a directory would be expected).
III.	Recurse with "b/c".

Directories cannot be hard-linked (in order to avoid memory management issues). This is such a disaster that we prohibit it.

Example of hard-linking directories (circles represent directories; squares represent files):

ln –s a b	ls -1
ln –s b a	a-->b
cat a/f	b-->a

If you try to do this, the kernel will detect a symbolic link loop and return an error code ("ELOOP", done via an arbitrary limit (20)).

Link Counts in Inodes
Why are they needed? (Why can't the system deduce this?)
Link counts store the number of hard links to this file.
Use ls –li to get the inode's information.

57		7
59		7
10		2
10		2

Proposal: Get rid of the inode's link counts in order to save space in the inode.
Problem: If you do this, unlink can’t reclaim the storage if there is another hard link.

Problems: The link count value might be wrong due to a power failure.
We might have race conditions with fine-grained locking.
It might overflow (error "EMLINK"-- to many links).
This is why hard link cycles are not allowed.

mkdir("d")
ls –ld d
link count = 2. Why?
2 d "." Self
".." parent directory

Example:
Links in a directory with 5 subdirectories:
2 + 5 ".." in subdirs
= 7

Symbolic links and link counts:
ln –s a b
ln –s a c
ln b d
ls –l b c a

uddin 2+ b-->a
1 c-->a
1 a
2d-->a
Symbolic links can also have hard links to them.

Implementing hard links will add a new system call:
link (" ")
unlink ("…") <-- change if add link

Owner group permissions:
1 eggert faculty --> symbolic link counts to the quota
quota!

Using the inode structure to format the links, ignoring some of the contents.

Hard links - Hard links are references, or pointers, to physical data on a storage volume.
Symbolic links - Symbolic links are special types of files that contain a reference to another file or directory in the form of an absolute or relative path.
Mounts - Mounting is the process of making a file system ready for the operating system to use, typically by reading certain data structures from storage into memory ahead of time. The mount command attaches the file system found on some device to the big file tree, thus instructing the operating system that the file system is ready for usage. The umount command will detach it from the file hierarchy again.

# mount /dev/whatav/a/b

Problems?
1. Renaming the filename of the mount location.
2. An open file in an "invisible" area stays open.
3. Loops can be a problem. One way to fix this is to allow at most one mount per file system.
#unmount /dev/whatever
4. Open files (working directory) in are the mounted file system. Do not allow unmounts.
5. Mounts allow the bypassing of permission restrictions (junk in the file system data structures) via mounting strange file systems. Don’t allow ordinary users to mount.

Advantages of mounts
+ Mounts can support more than one device (say, two disks with the same format).
+ Mounts can support multiple types of file systems.

Linux: we want to approximate this in C

Struct {
*--------------------> link()
*--------------------> unlink()
}

Process Structure
The following diagram represents the structure of the files, filestruct, and task.

Virtual File System (vfs)
OO by hand in C
SunOS2 (1985)
Stackable file system
(Heidemann, 1995, UCLA)

To rename "a" to "b" [rename("a", "b")], assuming one CPU and no other access to the disk.:
1) Read the block in the current directory that contains "a".
2) Zap that “a” in main memory so it turns into "a b". Note that if the power fails, rename() will not work. This is tricky because we need to test whether or not the name change actually took effect before we do anything else. We also need to check for an already-existing b so we don't accidentally overwrite it.
3) Write the block out to disk.