Richard, Dave, etc.,
In doing some filesystem work, I discovered that
Documentation/filesystems/vfs.txt was very helpful, but that it could
have been more helpful. I have tried to flesh out the documentation of
it a bit. Most of it is probably old-hat for those who have dealt with
filesystems before, but for a neophyte like me, much of this information
would have been helpful.
Please apply this patch to vfs.txt.
If anyone notices errors, let me know...this is my understanding of how
the fs code works/is supposed to work, so I'd like to know if I'm
confused. :)
TIA,
Eli
-- --------------------. "To the systems programmer, users and applications Eli Carter | serve only to provide a test load." eli.carter@inet.com `---------------------------------- (random fortune) --------------EC51DDE1381FCFB61356D9B0 Content-Type: text/plain; charset=us-ascii; name="vfs.txt.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="vfs.txt.patch"--- /home/ejc/vfs.txt Wed Jun 28 13:23:55 2000 +++ vfs.txt Wed Jun 28 13:47:15 2000 @@ -150,7 +150,9 @@ struct super_block *sb: the superblock structure. This is partially initialised by the VFS and the rest must be initialised by the - read_super() method + read_super() method. The VFS initiallizes the following elements: + s_dev, s_flags, s_dirt, then calls the filesystem specific + read_super, then initializes s_dev, s_rd_only, s_type void *data: arbitrary mount options, usually comes as an ASCII string @@ -211,6 +213,7 @@ put_super: called when the VFS wishes to free the superblock (i.e. unmount). This is called with the superblock lock held + This cleans up and frees anything held by the filesystem. write_super: called when the VFS superblock needs to be written to disc. This method is optional @@ -219,7 +222,8 @@ is called with the kernel lock held remount_fs: called when the filesystem is remounted. This is called - with the kernel lock held + with the kernel lock held. It is used to change the mount options + on the filesystem. A non-zero return value indicates failure. clear_inode: called then the VFS clears the inode. Optional @@ -269,7 +273,8 @@ required if you want to support regular files. The dentry you get should not have an inode (i.e. it should be a negative dentry). Here you will probably call d_instantiate() with the - dentry and the newly created inode + dentry and the newly created inode. Returns <0 on error, 0 + on success. lookup: called when the VFS needs to lookup an inode in a parent directory. The name to look for is found in the dentry. This @@ -316,6 +321,28 @@ inode it points to. Only required if you want to support symbolic links + readpage: there is a generic_readpage + + writepage: + + bmap: given an inode and a file block number, return the + corresponding device block number. + + truncate: truncates a file to length 0, doing any required housecleaning. + By the time this function is called, i_size has already been set to 0. + i_blocks however, has not been altered. + + permission: returns 0 if permissions are allowed, -EACCES if denied, + -EROFS if it is a readonly filesystem. Decision is based upon the + low 16 bits of the inode->i_mode field; the bitmap for this is defined + in <linux/stat.h> If NULL, Linux supplies the default behavior. + + smap: + + updatepage: + + revalidate: + struct file_operations <section> ====================== @@ -350,6 +377,9 @@ write: called by write(2) and related system calls readdir: called when the VFS needs to read the directory contents + This starts at f_pos and moves the f_pos to the end of the last + directory entry read. On error returns <0, other wise returns + 0. poll: called by the VFS when a process wants to check if there is activity on this file and (optionally) go to sleep until there @@ -467,3 +497,130 @@ pointer is NULL, the dentry is called a "negative dentry". This function is commonly called when an inode is created for an existing negative dentry + + +struct inode <section> +============ + +i_blocks is the number of blocks used by the file, including overhead such as +indirect blocks. For du and similar user-land utilities to not be confused, +i_blocks must be in 512 byte blocks, so scale accordingly in your code. + + +system include files <section> +==================== + +A few things need to be added to <linux/fs.h> to support a new filesystem. +The struct inode and struct super_block each have a union that holds the +filesystem specific data. A new filesystem will need to have its inode +and superblock structures added to these unions. Alternatively, there is +a generic pointer in those unions which can be used instead to alleviate +the need to modify these two structures. + +There are typically three include files created per supported filesystem: + somefs_fs.h Has the on-disk representation of the filesystem + somefs_fs_i.h Defines the somefs_inode_info structure + somefs_fs_sb.h Defines the somefs_sb_info structure + + +for the novice <section> +============== + +bread()s (block read) may only be done with a size argument of s_blocksize. +You must brelse() any buffers that you bread(). You may safely brelse(NULL). + +To create a new inode, you must call get_empty_inode(). To delete the inode, +i_nlink must be 0, then call iput(). + +locking issues <section> +============== + +super_block + from <linux/locks.h> + lock_super(), wait_on_super() -- may sleep in __wait_on_super() in + fs/super.c + unlock_super() -- may be called from an interrupt + +inode + wait_on_inode(), sync_one() may sleep in __wait_on_inode() in fs/inode.c + ->i_flock + ->i_sem - a semaphore + ->i_atomic_write - atomic write semaphore + +buffer + from <linux/locks.h> + lock_buffer, wait_on_buffer() -- may sleep in __wait_on_buffer() + fs/buffer.c + unlock_buffer -- may be called from an interrupt + Other sleeping functions in fs/buffer.c: + create_buffers() + refill_freelist() +fs/locks.c + Functions that may sleep + locks_wake_up_blocks() + + + +useful helper functions <section> +======================= + +iget() Returns a struct inode* given an inode number. +iput() Releases a struct inode* gotten from iget(). +clear_inode() Calls filesystem specific clear_inode. +make_bad_inode() Returns a struct inode* with all its functions set to + return -EIO. +get_empty_inode() Returns an empty struct inode*. + +generic_file_mmap() A generic implementation of mmap(). +generic_file_read() A generic implementation of read(); requires an + implementation of bmap() +generic_readpage() + +EIO_ERROR A function that returns -EIO. Used in bad_file_ops and + bad_inode_ops. + + +useful constant values <section> +====================== + +variables <subsection> +--------- +bad_file_ops +bad_inode_ops + +error return values <subsection> +------------------- +-EACCES User does not have required privledges. +-EBADF Bad file handle. +-EEXIST An operation finds the file already exists +-EFAULT A copy_[to|from]_user() failed. +-EFBIG File is too big for the filesystem to handle. +-EINVAL Arguments are not valid, out of bounds, etc. +-EIO A bread() failed +-EISDIR An operation was attempted on a directory that a directory does + not support. +-ENAMETOOLONG The filename in question is too long for this filesystem. +-ENOENT The filename was not found. +-ENOMEM Not enough memory to complete the requested operation. +-ENOSPC Out of space on the underlying device. +-ENOTDIR A directory only operation was attempted on something other + than a directory. +-ENOTEMPTY Attempt to remove a non-empty directory. +-EPERM User does not have required permissions to complete operation. + + +glossary <section> +======== + +negative dentry - A dentry with the d_inode (a struct inode*) member + set to NULL. This indicates to the filesystem that the associated + filename does not exist in the filesystem. + + +credits <section> +======= + +Original and maintained by Richard Gooch <rgooch@atnf.csiro.au> +2000-06-28 Additional information on various functions and pointers + for newbies. eli.carter@inet.com +
--------------EC51DDE1381FCFB61356D9B0--
- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/