File Systems and Storage Management

COMP 3000
Operating Systems
File Systems and Storage Management
Lianying Zhao
Memory vs. Storage
RAM has (relatively) low capacity, despite higher speed
In-RAM content disappears when powered off
COMP 3000 2
Persistence!
Source: www.flashmemorysummit.com
Why can’t a program
run directly on your
hard drive?
Storage device
abstractions are not
part of a computer
architecture
It’s I/O!
Why is a Driver Needed?
Storage devices as an example (simplified):
COMP 3000 3
The Block Device “Layer” (specific)
The actual media
Has its own drivers, e.g., USB Mass Storage
May even involve multiple layers
Block size can differ from the file system block size
Technically, file systems can “reside” on any block devices
Performance also matters
The write()system call usually doesn’t immediately cause a write to the block
device
The sync command
COMP 3000 4
The File System “Layer”
Abstraction: from “blocks” to “files”
For the time being, no ideal FS abstraction that enables full portability
Tight coupling with the OS kernel, cf. the file-based access control
COMP 3000 5
From now on, we will be examining file system concepts in the context of
UNIX-like OSes — VFS

Types of “Files”
Regular file
Directory
Symbolic link
FIFO (named pipe)
Socket
Device file (block, character)
COMP 3000 6
What is a File Descriptor (fd)?
COMP 3000 7
A value (non-negative integer), pointing to a data structure in the
kernel
Like indices to arrays of structs
We are not talking about resource handles in computing in general
HANDLE, in Windows
So stdin, stdout and stderr are just special ones among them
Tracing down the File Access
The file descriptor table
per process
The open file table
system-wide
The i-node table
In-memory copy
COMP 3000 Credit: Michael Kerrisk 8
So What really is an inode?
A POSIX (VFS) concept
In some sense, the inode is the file
Identified by an inode number (unique within a file system)
inode types:
directory
regular file
char device
block device
(named) pipe
symbolic link
socket
COMP 3000 9
What is Stored in an inode?
inodes are data structures,
so they are real, even for
special files
They take space
They are in the file system
storage (although there’s an
in-memory copy)
COMP 3000 10
The stat Command
Display detailed information about files/directories
More than ls does
Mainly corresponding to the inode
System calls stat(), fstat(), lstat()
How do you find out if a file/directory exists?
COMP 3000 11
Directory entry (dentry)
Represents a directory entry (not necessarily a directory)
System calls – getdents(), not read()
Library calls – readdir()
A file is mapped to its inode by its parent directory
The root (/) directory’s inode number is always 2 **
COMP 3000 12
Hard Links and Symbolic Links
Symbolic link
Only linking to the target file name (more accurately: pathname)
What if the target file is deleted?
Hard link
Linking to the inode number
Everything identical, except difference names
Not to a directory (why?)
Link count
Comparing with MS Windows again…
Shortcuts
Reparse points
COMP 3000 13
. and ..
File operations?

Copy/move/remove
Copy creates a new inode
For move, it depends
Across different file systems, new inodes are created
Within the same file system, just relinked to the new pathname
Remove
Decreases link count, if greater than 1, and removes that directory entry
Removes the inode as well if link count = 1
COMP 3000 14
How do We Access Devices?
Special files!
Mostly /dev/*
Kernel-mode code behind each special file
Special files files on a special file system
E.g., \.PHYSICALDRIVE0 (Windows)
E.g., /dev/sda (Linux)
Evolution of node generation in /dev
Manually generated hardcoded nodes
devfs
udev
COMP 3000 15
Device Files/Nodes (a.k.a. Special Files)
They represent physical or virtual hardware devices
A file system interface between device drivers and user-space applications
Identified by a major number and a minor number
Character devices
Accessed at the granularity of characters (bytes)
Not addressable (hence a stream)
Block devices
Accessed at the granularity of blocks
Addressable
COMP 3000 16
Size = 0?
Superblocks
Metadata about the whole file system
Primary and backup superblocks
The dumpe2fs command
View superblock information
Must be a block device (where a file system resides)
COMP 3000 17
Blocks on a File System
inode
Contains all meta data except the (file) name
directory
Contains the mapping between file names
and inodes (dentries)
Also a special inode (with an inode number)
data blocks
Superblocks
COMP 3000 18
Physical and Logical Sizes
Logical size:
The actual size of the file
MS Windows: “Size”
Physical size:
The amount of allocated space on disk
MS Windows: “Size on disk”
“Holes” in a file
COMP 3000 19
Using the dd Command
Experimenting with real devices can be risky
So let’s use a “virtual” version
dd is a command-line utility to copy/convert data
Always involves an input file (if) and an output file (of)
Not necessarily regular files
Block devices (e.g., /dev/sda)
Other special files (e.g., /dev/null, /dev/random, /dev/zero)
COMP 3000 20
dd vs. cp
Well, they both copy files, so…
They have different positionings
cp: works at the granularity of files
Can handle multiple files/directories
dd: file I/O, more control over data is handled
Position control: seek (of), skip (if)
Conversion: e.g., encoding
Analogy: dd is like a file-based pipeline
COMP 3000 21
File Systems Can be Corrupted
All types of persistent storage share the same risk
On-disk data: lifespan is long and damages also persist
In-RAM data: lifespan is short and can also be recreated
What can happen:
Failures during updates: power failure or system crash inconsistency
Media/data damage
Atomicity?
COMP 3000 22
crash consistency
A Few What-ifs
Interrupted right in the middle of updating on-disk structures
The crash consistency problem
Inodes are good, with missing/inconsistent data blocks
Good data blocks, with inodes missing or corrupted
You lose directory entries
The superblock is corrupted
COMP 3000 23
The Lazy Approach: Let it Happen and Fix it
The fsck tool checks:
Superblocks
Allocation
Link count **
Bad blocks *
Etc.
The lost+found directory
Caution! File system integrity vs. data integrity
COMP 3000 24
Journaling File Systems
Save the need for scanning the whole file system
At the cost of some performance+storage overhead
All changes must first be written to a log in persistent storage, before
applied to the actual data storage
Common file systems with journaling
Windows NTFS
ext3 and ext4
COMP 3000 25
Data Recovery?
Again, not to be confused with file system repair
Nor is it storage device (disk) repair…
Precondition: there must still exist the data in some form…
Back up your data properly
COMP 3000 26
Special File System: procfs (/proc)
Originally proposed in an academic paper in 1984
As its name implies: “each member of which, /proc/nnnnn, corresponds to the
address space of the running process whose pid is nnnnn.”
Gradually extended to a wide range of information about the system, e.g.,
/proc/cpuinfo
/proc/filesystems
/proc/version
But /proc/sys belongs to sysctl, to configure the kernel at run-time
COMP 3000 27
Special File System: sysfs (/sys)
A way to interact with: kernel subsystems, hardware devices, and
associated
device drivers
Exposing the kobject structures internally to kernel code and files
externally to user space
sysfs_create_file() to create entries
COMP 3000 28
User-space File Systems
Why?
Portability
Convenience: e.g., the many programming languages
Security & stability
FUSE = Filesystem in USErspace
Another layer of abstraction
/dev/fuse
Can convert virtually anything into a file system
GmailFS?
COMP 3000 29
static struct fuse_operations
operations = {

.getattr
.readdir
.read
= do_getattr,
= do_readdir,
= do_read,

};
Network File Systems
SSHFS (a FUSE file system)
So far, many things through SSH
SCP, SFTP
Network tunneling
Why: Showing remote files as local files
NFS (not a FUSE file system)
You need a dedicated server listening dedicated ports
Reasons for choosing it… performance, reliability, etc.
COMP 3000 30
More on Differences of File Systems
The permission bits
How come I can mount all/many kinds of file system on Linux and still
see the same view?
Possibility: the driver fakes it
fstype (-t) file system driver
Mount options
COMP 3000 31
Accessing Files/Directories Programmatically
File operations are based on file descriptors (FDs)
Manipulating FDs
Redirection of stdin/stdout/stderr
struct dirent (directory entry)
struct stat (file/inode, Tutorial 5)
COMP 3000 32
COMP 3000 33
COMP 3000
Operating Systems
Misc.
RE: Tutorial 3
What if the signal handler is triggered during a system call?
System call aborts and returns an error
Signal handler waits until system call is finished
System call is paused, signal handler runs, and then system call is resumed
SA_RESTART
COMP 3000 (Fall 2020) 34