COMP 3000
Operating Systems
File Systems and Storage Management
Lianying Zhao
Memory vs. Storage
• RAM has (relatively) low capacity, despite higher speed
• In-RAM content disappears when powered off
COMP 3000 2
Persistence!
Source: www.flashmemorysummit.com
• Why can’t a program
run directly on your
hard drive?
• Storage device
abstractions are not
part of a computer
architecture
• It’s I/O!
Why is a Driver Needed?
• Storage devices as an example (simplified):
COMP 3000 3
The Block Device “Layer” (specific)
• The actual media
• Has its own drivers, e.g., USB Mass Storage
• May even involve multiple layers
• Block size can differ from the file system block size
• Technically, file systems can “reside” on any block devices
• Performance also matters
• The write()system call usually doesn’t immediately cause a write to the block
device
• The sync command
COMP 3000 4
The File System “Layer”
• Abstraction: from “blocks” to “files”
• For the time being, no ideal FS abstraction that enables full portability
• Tight coupling with the OS kernel, cf. the file-based access control
COMP 3000 5
From now on, we will be examining file system concepts in the context of
UNIX-like OSes — VFS
Types of “Files”
• Regular file
• Directory
• Symbolic link
• FIFO (named pipe)
• Socket
• Device file (block, character)
COMP 3000 6
What is a File Descriptor (fd)?
COMP 3000 7
• A value (non-negative integer), pointing to a data structure in the
kernel
• Like indices to arrays of structs
• We are not talking about resource handles in computing in general
• HANDLE, in Windows
• So stdin, stdout and stderr are just special ones among them
Tracing down the File Access
• The file descriptor table
• per process
• The open file table
• system-wide
• The i-node table
• In-memory copy
COMP 3000 Credit: Michael Kerrisk 8
So What really is an inode?
• A POSIX (VFS) concept
• In some sense, the inode is the file
• Identified by an inode number (unique within a file system)
• inode types:
• directory
• regular file
• char device
• block device
• (named) pipe
• symbolic link
• socket
COMP 3000 9
What is Stored in an inode?
• inodes are data structures,
so they are real, even for
special files
• They take space
• They are in the file system
storage (although there’s an
in-memory copy)
COMP 3000 10
The stat Command
• Display detailed information about files/directories
• More than ls does
• Mainly corresponding to the inode
• System calls stat(), fstat(), lstat()
• How do you find out if a file/directory exists?
COMP 3000 11
Directory entry (dentry)
• Represents a directory entry (not necessarily a directory)
• System calls – getdents(), not read()
• Library calls – readdir()
• A file is mapped to its inode by its parent directory
• The root (/) directory’s inode number is always 2 **
COMP 3000 12
Hard Links and Symbolic Links
• Symbolic link
• Only linking to the target file name (more accurately: pathname)
• What if the target file is deleted?
• Hard link
• Linking to the inode number
• Everything identical, except difference names
• Not to a directory (why?)
• Link count
• Comparing with MS Windows again…
• Shortcuts
• Reparse points
COMP 3000 13
. and ..
File operations?
Copy/move/remove
• Copy creates a new inode
• For move, it depends
• Across different file systems, new inodes are created
• Within the same file system, just relinked to the new pathname
• Remove
• Decreases link count, if greater than 1, and removes that directory entry
• Removes the inode as well if link count = 1
COMP 3000 14
How do We Access Devices?
• Special files!
• Mostly /dev/*
• Kernel-mode code behind each special file
• Special files ≢ files on a special file system
• E.g., \.PHYSICALDRIVE0 (Windows)
• E.g., /dev/sda (Linux)
• Evolution of node generation in /dev
• Manually generated hardcoded nodes
• devfs
• udev
COMP 3000 15
Device Files/Nodes (a.k.a. Special Files)
• They represent physical or virtual hardware devices
• A file system interface between device drivers and user-space applications
• Identified by a major number and a minor number
• Character devices
• Accessed at the granularity of characters (bytes)
• Not addressable (hence a stream)
• Block devices
• Accessed at the granularity of blocks
• Addressable
COMP 3000 16
Size = 0?
Superblocks
• Metadata about the whole file system
• Primary and backup superblocks
• The dumpe2fs command
• View superblock information
• Must be a block device (where a file system resides)
COMP 3000 17
Blocks on a File System
• inode
• Contains all meta data except the (file) name
• directory
• Contains the mapping between file names
and inodes (dentries)
• Also a special inode (with an inode number)
• data blocks
• Superblocks
COMP 3000 18
Physical and Logical Sizes
• Logical size:
• The actual size of the file
• MS Windows: “Size”
• Physical size:
• The amount of allocated space on disk
• MS Windows: “Size on disk”
• “Holes” in a file
COMP 3000 19
Using the dd Command
• Experimenting with real devices can be risky
• So let’s use a “virtual” version
• dd is a command-line utility to copy/convert data
• Always involves an input file (if) and an output file (of)
• Not necessarily regular files
• Block devices (e.g., /dev/sda)
• Other special files (e.g., /dev/null, /dev/random, /dev/zero)
COMP 3000 20
dd vs. cp
• Well, they both copy files, so…
• They have different positionings
• cp: works at the granularity of files
• Can handle multiple files/directories
• dd: file I/O, more control over data is handled
• Position control: seek (of), skip (if)
• Conversion: e.g., encoding
• Analogy: dd is like a file-based pipeline
COMP 3000 21
File Systems Can be Corrupted
• All types of persistent storage share the same risk
• On-disk data: lifespan is long and damages also persist
• In-RAM data: lifespan is short and can also be recreated
• What can happen:
• Failures during updates: power failure or system crash → inconsistency
• Media/data damage
• Atomicity?
COMP 3000 22
crash consistency
A Few What-ifs
• Interrupted right in the middle of updating on-disk structures
• The crash consistency problem
• Inodes are good, with missing/inconsistent data blocks
• Good data blocks, with inodes missing or corrupted
• You lose directory entries
• The superblock is corrupted
COMP 3000 23
The Lazy Approach: Let it Happen and Fix it
• The fsck tool checks:
• Superblocks
• Allocation
• Link count **
• Bad blocks *
• Etc.
• The lost+found directory
• Caution! File system integrity vs. data integrity
COMP 3000 24
Journaling File Systems
• Save the need for scanning the whole file system
• At the cost of some performance+storage overhead
• All changes must first be written to a log in persistent storage, before
applied to the actual data storage
• Common file systems with journaling
• Windows NTFS
• ext3 and ext4
COMP 3000 25
Data Recovery?
• Again, not to be confused with file system repair
• Nor is it storage device (disk) repair…
• Precondition: there must still exist the data in some form…
• Back up your data properly
COMP 3000 26
Special File System: procfs (/proc)
• Originally proposed in an academic paper in 1984
• As its name implies: “each member of which, /proc/nnnnn, corresponds to the
address space of the running process whose pid is nnnnn.”
• Gradually extended to a wide range of information about the system, e.g.,
• /proc/cpuinfo
• /proc/filesystems
• /proc/version
• But /proc/sys belongs to sysctl, to configure the kernel at run-time
COMP 3000 27
Special File System: sysfs (/sys)
• A way to interact with: kernel subsystems, hardware devices, and
associated device drivers
• Exposing the kobject structures internally to kernel code and files
externally to user space
• sysfs_create_file() to create entries
COMP 3000 28
User-space File Systems
• Why?
• Portability
• Convenience: e.g., the many programming languages
• Security & stability
• FUSE = Filesystem in USErspace
• Another layer of abstraction
• /dev/fuse
• Can convert virtually anything into a file system
• GmailFS?
COMP 3000 29
static struct fuse_operations
operations = {
.getattr .readdir .read |
= do_getattr, = do_readdir, = do_read, |
};
Network File Systems
• SSHFS (a FUSE file system)
• So far, many things through SSH
• SCP, SFTP
• Network tunneling
• Why: Showing remote files as local files
• NFS (not a FUSE file system)
• You need a dedicated server listening dedicated ports
• Reasons for choosing it… performance, reliability, etc.
COMP 3000 30
More on Differences of File Systems
• The permission bits
• How come I can mount all/many kinds of file system on Linux and still
see the same view?
• Possibility: the driver fakes it
• fstype (-t) → file system driver
• Mount options
COMP 3000 31
Accessing Files/Directories Programmatically
• File operations are based on file descriptors (FDs)
• Manipulating FDs
• Redirection of stdin/stdout/stderr
• struct dirent (directory entry)
• struct stat (file/inode, Tutorial 5)
COMP 3000 32
COMP 3000 33
COMP 3000
Operating Systems
Misc.
RE: Tutorial 3
• What if the signal handler is triggered during a system call?
• System call aborts and returns an error
• Signal handler waits until system call is finished
• System call is paused, signal handler runs, and then system call is resumed
• SA_RESTART
COMP 3000 (Fall 2020) 34